Advances in Electronics and Electron Phisics. Vol. 75

ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS VOLUME 75 EDITOR-IN-CHIEF PETER W. HAWKES Laboratoire d’Optique Electr...

Author: Author Unknown

17 downloads 1605 Views 20MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS

VOLUME 75

EDITOR-IN-CHIEF

PETER W. HAWKES Laboratoire d’Optique Electronique du Centre National de la Recherche ScientiJique Toulouse, France

ASSOCIATE EDITOR

BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California

Advances in

Electronics and Electron Physics EDITED BY

PETER W. HAWKES Laboratoire d’Optique Electronique du Centre National de la Recherche Scientifique Toulouse, France

VOLUME 75

ACADEMIC PRESS, INC. Harcourt Brace Jovanovich, Publishers Boston San Diego New York Berkeley London Sydney Tokyo Toronto

0

COPYRIGHT1989 BY ACADEMIC PRESS,I NC. ALL RIGHTS RESERVED. NO PART O F THIS PUBLICATION MAY BE REPRODUCED O R TRANSMITTED IN ANY FORM O R BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMlSSiON IN WRITING FROM T H E PUBLISHER.

ACADEMIC PRESS, INC. 1250 Sixth Avenue. San Diego, CA 92101

United Kingdom Edition published by ACADEMIC PRESS INC. (LONDON) LTD. 24-28 Oval Road. London NWI 7DX

LIBRARY OF CONGRESS CATALOG CARDNUMBER:49-7504 ISBN 0-12-014675-4 PRINTED IN THE UNITED STATES OF AMERICA

89 90 91 92

9 8 1 6 5 4 3 2 1

CONTENTS

CONTRIBUTORS TO VOLUME 75 . . . . . . . . . . . . . . . . . . . . . . PREFACE.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii ix

Linear Inverse and Ill-Posed Problems M . BERTERO I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV . Generalized Solutions . . . . . . . . . . . . . . . . . . . . . . . . . V . Regularization Theory for Ill-Posed Problems . . . . . . . . . . . VI . Inverse Problems and Information Theory . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 10 36 55 67 96 114

Recent Developments in Energy-Loss Spectroscopy JORGFINK I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. Principal Features . . . . . . . . . . . . . . . . . . . . . . . . . . 111. Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV . Sample Preparation . . . . . . . . . . . . . . . . . . . . . . . . . V . Nearly-Free-Electron Metals . . . . . . . . . . . . . . . . . . . . VI. Rare Gas Bubbles in Metals . . . . . . . . . . . . . . . . . . . . VII . Amorphous Carbon . . . . . . . . . . . . . . . . . . . . . . . . . VIII . Conducting Polymers . . . . . . . . . . . . . . . . . . . . . . . . IX . Superconductors . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

122 124 148 157 160 167 181 187 215 226 226

I1. Linear Inverse Problems . . . . . . . . . . . . . . . . . . . . . . . .

111. Linear Inverse Problems with Discrete Data

I. I1.

111.

IV . V.

............

Methods of Calculating the Properties of Electron Lenses E . HAHN Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Equation of Motion . . . . . . . . . . . . . . . . . . . . . . . . . Differential Equation of Trajectories . . . . . . . . . . . . . . . Methods of Solution . . . . . . . . . . . . . . . . . . . . . . . . . Coupling Between Field and Basis . . . . . . . . . . . . . . . . . V

233 235 237 238 246

vi

CONTENTS

VI . Representation of the Trajectory . . . . . . . . . . . . . . . . . . 250 VII . Iteration Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 VIII . Electron Mirrors . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 IX . Conventional Lenses . . . . . . . . . . . . . . . . . . . . . . . . . 262 X . Theory of Micro-Lenses . . . . . . . . . . . . . . . . . . . . . . . 269 XI . Quadrupole Optics . . . . . . . . . . . . . . . . . . . . . . . . . . 294 XI1. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 325 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

I. I1. 111.

IV. V. VI .

Derivation of a Focusing Criterion by a System-Theoretic Approach MICHAEL KAISER List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System-Theoretic Approach to Scalar Radiation Problems in Homogeneous Space . . . . . . . . . . . . . . . . . . . . . . . . . . . Focusing by Plane Radiators . . . . . . . . . . . . . . . . . . . . . Focusing in Stratified Media . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

329 333 335 348 363 382 383 387

Lightwave Receivers GARETHF . WILLIAMS I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

389

INDEX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

461

I1. Receiver and Device Requirements of Lightwave Systems . . . . 393 111. Receiver System and Noise Considerations . . . . . . . . . . . . . 395 IV . First- and Second-Generation Lightwave Receivers . . . . . . . . 422 V. Active-Feedback Lightwave Receiver Circuits . . . . . . . . . . . 431 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458

CONTRIBUTORS The numbers in parentheses indicate the pages on which the authors’ contributions begin.

M. BERTERO (l), Dipartimento di Fisica dell’ Universita and lstituto Nazionale di Fisica Nucleare, Via Dodecaneso 33, I-16146 Genova, Italy JORG FINK( 1 21), Kernforschungszentrum Karlsruhe, Institut f u r Nukleare Festkorperphysik, P.O.B. 3640, 0-7500 Karlsruhe, Federal Republic of: Germany E. HAHN(233), Pestalozzi-str. 9,6900 Jena, German Democratic Republic MICHAEL KAISER(329), Dornier System GmbH, P.O. Box 1360, 0-7990 Friedrichshafen, Federal Republic of Germany GARETH F. WILLIAMS (389), N YNEX Science and Technology, 500 Westchester Avenue, White Plains, New York 10604, U S A

vii

This Page Intentionally Left Blank

PREFACE The five contributions to this volume reflect many of the themes traditionally covered in this series. The opening chapter by M. Bertero is concerned with a topic that is a widespread preoccupation, namely, the search for satisfactory methods of handling inverse problems and especially ill-posed problems. The author is well known for his many original ideas concerning these questions and the present survey is therefore all the more welcome. The next two chapters, by J. Fink and E. Hahn, are of more specialized interest, although the information obtained by the technique of energy-loss spectroscopy, discussed by J. Fink, is used in many fields. The approach to particle optics developed over the years by E. Hahn is not very well known, probably owing to the relative inaccessibility of the Jenaer Jahrbuch, in which much of his work was published. The present account is devoted mainly to the more recent stages of this research, and I hope that this account in English in these Advances will disseminate more widely Dr. Hahn’s work. Systems theory is a well-developed discipline but there are no doubt problems to which it could be usefully applied but where it is little known. M. Kaiser explains how problems of wave propagation can be clarified with its help. We conclude with a device-orientated account by G. F. Williams of new developments in the field of lightwave receivers. This is a subject in rapid evolution, and the present account should be found very helpful by anyone trying to keep up with the changes. In conclusion, I thank all the contributors most warmly for their efforts and list forthcoming articles in the series. Peter W. Hawkes FORTHCOMING ARTICLES

Parallel Image Processing Methodologies Image Processing with Signal-Dependent Noise Pattern Recognition and Line Drawings Bod0 von Borries, Pioneer of Electron Microscopy Electron Microscopy of Very Fast Processes ix

J. K. Aggarwal H. H. Arsenault H. Bley H. von Borries 0. Bostanjoglo

X

PREFACE

Signal Analysis in Seismic Studies Magnetic Reconnection Sampling Theory Finite Algebraic Systems and Trellis Codes Electrons in a Periodic Lattice Potential The Artificial Visual System Concept Corrected Lenses for Charged Particles A Gaseous Detector Device for ESEM The Development of Electron Microscopy in Italy The Study of Dynamic Phenomena in Solids Using Field Emission Amorphous Semiconductors Resonators, Detectors and Piezoelectrics Median Filters Bayesian Image Analysis SEM and the Petroleum Industry Emission Electron Optical System Design Statistical Coulomb Interactions in Particle Beams Number Theoretic Transforms Systems Theory and Electromagnetic Waves Phosphor Materials for CRTs Tomography of Solid Surfaces Modified by Fast Ions The Scanning Tunnelling Microscope Scanning Capacitance Microscopy Applications of Speech Recognition Technology Multi-Colour AC Electroluminescent Thin-Film Devices Spin-Polarized SEM HREM and Geology The Rectangular Patch Microstrip Radiator Active-Matrix TFT Liquid Crystal Displays Electronic Tools in Parapsychology

J. F. Boyce and L. R. Murray A. Bratenahl and P. J. Baum J. L. Brown H. J. Chizeck and M. Trott J. M. Churchill and F. E. Holmstrom J. M. Coggins R. L. Dalglish G. D. Danilatos G. Donelli M. Drechsler W. Fuhs J. J. Gagnepain N. C. Gallagher and E. Coyle S. and D. Geman J. Huggett V. P. Il’in G. H. Jansen

G. A. Jullien M. Kaiser K. Kano et al. S . B. Karmohapatro and D. Ghose H. Van Kempen P. J. King H. R. Kirby

H. Kobayashi and S. Tanaka K. Koike M. Mellini H. Matzner and E. Levine S. Morozumi R. L. Morris

xi

PREFACE

Image Formation in STEM Low-Voltage SEM Languages for Vector Computers Electron Scattering and Nuclear Structure Electrostatic Lenses CAD in Electromagnetics Scientific Work of Reinhold Riidenberg Atom-Probe FIM Metaplectic Methods and Image Processing X-Ray Microscopy Applications of Mathematical Morphology Focus-Deflection Systems and Their Applications Electron Gun Optics Thin-Film Cathodoluminescent Phosphors Electron Microscopy and Helmut Ruska

C. Mory and C. Colliex J. Pawley R. H. Perrott G. A. Peterson F. H. Read and I. W. Drummond K. R. Richter and 0. Biro H. G. Rudenberg T. Sakurai W. Schempp G. Schmahl J. Serra T. Soma et a/.

Y.Uchikawa A. M. Wittenberg C. Wolpers

This Page Intentionally Left Blank

ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS. VOL . 75

Linear Inverse and Ill-Posed Problems M . BERTERO Dipartimento di Fisica dell’liniversita and lstituto Nazionale di Fisica Nucleare Genova Italy

.

I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . I I . Linear Inverse Problems . . . . . . . . . . . . . . . . . . . . . A . General Properties . . . . . . . . . . . . . . . . . . . . . . B. Inverse Source Problems . . . . . . . . . . . . . . . . . . . . C . Inverse Diffraction Problems . . . . . . . . . . . . . . . . . . . D. Linear Inverse Scattering Problems . . . . . . . . . . . . . . . . E. Radon Transform Inversion and Tomography . . . . . . . . . . . . . F . Fourier Transform Inversion with Limited Data . . . . . . . . . . . . G . Laplace Transform Inversion . . . . . . . . . . . . . . . . . . H. Generalized Moment Problems . . . . . . . . . . . . . . . . . . I11 . Linear Inverse Problems with Discrete Data . . . . . . . . . . . . . . A . General Formulation . . . . . . . . . . . . . . . . . . . . . B. Fourier Transform Inversion with Discrete Data . . . . . . . . . . . . C. Interpolation and Numerical Derivation . . . . . . . . . . . . . . . D . Finite HausdorlT Moment Problem . . . . . . . . . . . . . . . . E. Moment-Discretization of Fredholm Integral Equations of the First Kind . . . IV . Generalized Solutions . . . . . . . . . . . . . . . . . . . . . . A . Moore-Penrose Generalized Inverse . . . . . . . . . . . . . . . . B. C-Generalized Inverses . . . . . . . . . . . . . . . . . . . . . C. The Backus-Gilbert Method for Problems with Discrete Data . . . . . . . V . Regularization Theory for Ill-Posed Problems . . . . . . . . . . . . . . A . Ivanov-Phillips-Tikhonov Regularization Method . . . . . . . . . . . B. General Formulation of Regularization Methods . . . . . . . . . . . C. Spectral Windows . . . . . . . . . . . . . . . . . . . . . . D. Iterative Methods . . . . . . . . . . . . . . . . . . . . . . E. Choice of the Regularization Parameter . . . . . . . . . . . . . . . VI . Inverse Problems and Information Theory . . . . . . . . . . . . . . . A . Modulus of Continuity and Uncertainty of the Solution . . . . . . . . . B. Evaluation of Linear Functionals and Resolution Limits . . . . . . . . . C. Number of Degrees of Freedom . . . . . . . . . . . . . . . . . D . Impulse Response Function: Another Approach to Resolution Limits . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .

2 10 11 16 18 21 26 28 31 34 36 37 42 44 49 52

55 56 60 64 61 68 80 84 88 92 96 97 101

106 110

114

Copyright 0 1989 by Academic Press Inc. All nghts of reproduction in any form reserved . ISBN n-12-014675.4

1

2

M. BERTERO

I. INTRODUCTION In a paper published in 1902 on boundary-value problems for partial differential equations and their physical interpretation, Jacques Hadamard introduced the basic concept of a well-posed problem (Hadamard, 1902). In this first formulation, a problem is called well-posed when its solution is unique and exists for arbitrary (non-analytic) boundary values. From an investigation of several examples related to elliptic and hyperbolic equations, Hadamard concluded that only problems that are motivated by physical reality are well-posed. In particular, he demonstrated that the Dirichlet problem of the Laplace equation and the Cauchy problem of the wave equation with boundary values at t = 0 are well-posed, while the Cauchy problem of the Laplace equation and the Cauchy problem of the 3D wave equation with boundary values at x = 0 are not well-posed. In subsequent work Hadamard emphasized the requirement of continuous dependence of the solution on the data (Hadamard, 1923), claiming that a solution that varies considerably for a small variation of the data is not really a solution in the physical sense. Physical data are never known exactly but only with a certain degree of accuracy, and this should imply that the solution is not known at all. He provided a striking example of this fact using the Cauchy problem of the Laplace equation in two variables,

AU

u,,

+ uYy= 0.

(11

If we consider the following Cauchy data at y = 0 u(x, 0) = 0, uy(x,0)= n-' sin(nx), (2) then there exists a unique solution of Eq. (1) satisfying these conditions, given by u(x, y) = K 2sin(nx)sinh(ny). (3) The factor sin(nx)produces a fluting of the surface that represents the solution of the problem. This fluting, however imperceptible near y = 0, becomes enormous at any finite distance from the x-axis, provided the fluting is made sufficiently small by taking n sufficiently large. Notice that when n -,co, the amplitude of the oscillating data tends to zero while the frequency tends to infinity. This is now a classical example illustrating the effects that arise when the dependence of the solution on the data is not continuous. In honour of his contribution, a problem is now called well-posed in the sense of Hadamard if it has the property of continuous dependence of the solution on the data, though the complete formulation, including the three requirements of uniqueness, existence and continuity, was stated only much later, by Courant and Hilbert (1962, p. 227; cf. Hadamard, 1964, p. 28).

LINEAR INVERSE A N D ILL-POSED PROBLEMS

3

These ideas recall remarks from a biographical sketch of Hadamard. “He liked to work with the rigor of a mathematician and the practical sense of a physicist, and liked to repeat Poincart’s words: ‘La Physique ne nous donne pas seulement l’occasion de rksoudre des problemes, ... ,elle nous fait pressentir la solution”’ (Mandelbrot and Schwartz, 1965).The physics of Hadamard was, however, the physics of the nineteenth century. In fact, the requirements of existence, uniqueness and continuity of the solution were “deeply inherent in the ideal of a unique, complete, and stable determination of physical events.. . Laplace’s vision of the possibility of calculating the whole future of the physical world from complete data of the present state is an extreme expression of this attitude” (Courant and Hilbert, 1962, p. 230). The idea of well-posedness has been extremely useful in the theory of partial differential equations and in functional analysis. A negative consequence was that for many years problems that are not well-posed, now called ill-posed or incorrectly posed problems, were considered mathematical anomalies and were not seriously investigated. Recent developments in physics, and especially applied physics, have shown that ill-posed problems may, however, relate to extremely important physical situations. For example, the solution of the Cauchy problem of elliptic equations has interesting applications in electrocardiography (Colli Franzone et al, 1977),specifically in the reconstruction of the epicardial potential from body surface maps. We would like to emphasize, however, another ill-posed problem, one that has revolutionized diagnostic radiology. In 1971, the first clinical machine for the detection of head tumors, based on a new X-ray technique, called computer-assisted or computerized tomography, was installed at the Atkinson Morley’s Hospital, Wimbledon, Great Britain. (In 1979, Allan M. Cormack and Godfrey N. Hounsfield were awarded the Nobel prize in Medicine for the invention of the technique.) In computerized tomography, images of a cross-section of the human body are created from the attenuation of X-rays along a large number of lines through this cross-section. The processing of the data requires the reconstruction of a function of two variables from knowledge of its line integrals. The solution of this mathematical problem was given many years ago in a paper by Johann Radon (1917). The result of Radon was even more general, since he proved formulas for the reconstruction of a function defined on R” from knowledge of its integrals over all the hyperplanes of R“. In honour of his contribution, a mapping that transforms a function into the set of its integrals over hyperplanes is now called the Radon transform. Therefore, tomography is just a special case of Radon transform inversion. Moreover, Radon transform inversion is an elegant example of an ill-posed problem, since the solution does not exist for arbitrary data and the dependence of the solution on the data, in general, is not continuous. As a consequence, the effect of noise on the solution

4

M. BERTERO

is amplified in a way similar to what happens in the Cauchy problem of the Laplace equation. Even if Cormack and Hounsfield had not been aware of the work of Radon (Hounsfield has been quoted as saying “I find I’ve got other tools of thinking than math” (Di Chiro and Brooks, 1979)), there is no doubt that mathematics has been the source of important contributions to the development and refinement of computerized tomography. On the other hand, computerized tomography has stimulated the rapid growth of the theory of ill-posed problems in the past twenty years. Radon transform inversion is an example of an inverse problem. A precise mathematical definition of inverse problems without reference to physics is quite difficult and perhaps impossible. For a mathematician, the distinction between direct and inverse problems is quite arbitrary. “We call two problems inverses of one another if the formulation of each involves all or part of the solution of the other. Often, for historical reasons, one of the two problems has been studied extensively for some time, while the other has never been studied and is not so well understood. In such cases, the former is called the direct problem, while the latter the inverse problem’’ (Keller, 1976). Another quote from the same paper shows a route for a characterization of inverse problems based on physics rather than on mathematics.” The main sources of inverse problems are science and engineering. Often, these problems concern the determination of the properties of some inaccessible region from observations on the boundary of the region” (Keller, 1976). In fact, inverse problems are related to indirect measurements, remote sensing and so on. For every domain of physics, it is necessary to specify a definition of the direct problem peculiar to this domain and the definition of its corresponding inverse problem. For example, in classical mechanics, the direct problem is the determination of the trajectories of particles from knowledge of forces. Then the inverse problem is the determination of forces from knowledge of the trajectories. (In this sense, Isaac Newton solved the first inverse problem). In potential theory, the direct problem is the determination of the potential generated by a known mass or charge or current distribution, while the inverse problem is the determination of the mass or charge or current distribution from the measured values of the potential. In the theory of wave propagation, the direct problem is the determination of field distributions in time and space from given constitutions of sources or scatterers. Then the inverse problem is the determination of the characteristics of the sources or scatterers from observations of the fields. (Computerized tomography is an inverse problem in this sense, since it consists in the determination of the space distribution of the X-ray absorption coefficient from the observation of the attenuation of the X-rays passing through the probe). In instrumental physics, the direct problem is the computation of the output

LINEAR INVERSE AND ILL-POSED PROBLEMS

5

of a given instrument with known impulse response function from knowledge of the input. Then the inverse problem is the determination of the input from knowledge of the output, and so on. These examples in mind motivate the following definition of direct and inverse problems. Direct problems are problems oriented along a cause-effect sequence or, in other words, problems that consist in providing the consequences of given causes, while inverse problems are those associated with the reversal of the chain of causally related effects, and therefore consist in finding the unknown causes of known consequences (Turchin et al., 1971). This definition will never be misleading if it is kept in mind that the formulation of any specific problem must be based on well-established physical laws and that physics must specify what is a cause and what is an effect as well as provide the equations relating the effects to the causes. Moreover, a merit of this definition is the attempt of justifying the fact that direct problems are, in general, well-posed while the corresponding inverse problems are, in general, ill-posed. Today, inverse problems are fundamental in several domains of applied science: medical diagnostics, atmospheric sounding, radar and sonar target estimation, seismology, radio astronomy, microscopy and so on. There are routine applications not only in X-ray or MR tomography, but also in seismic data processing for geophysical exploration and in radiometric data processing for meterorological forecasts and monitoring. In recent years, several books have been published on inverse problems in various areas of applied physics, including optics (Baltes, 1978; 1980), astronomy (Craig and Brown, 1986), electromagnetics (Boerner et al., 1983), atmospheric sounding (Zuev and Naats, 1983), computerized tomography (Herman, 1979; 1980; Herman and Natterer, 1981; Natterer, 1986a). Miscellaneous examples and mathematical results may be found in Colin, 1972; Sabatier, 1978; 1987a,b; Talenti, 1986; Cannon and Hornung, 1986. Ill-posed problems for partial differential equations have been investigated by Payne (1975) and Carasso and Stone ( 1 975). Inverse problems, in general, are nonlinear. Two nonlinear problems have been the object of elegant mathematical investigations, the inverse Strum-Liouville problem and the inverse scattering problem. The first one is due most likely to Lord Rayleigh (1877). In describing the vibrations of strings of variable density, Lord Rayleigh briefly discussed the possibility of deriving the density distribution from the frequencies of vibration. A modern version and generalizations are given by Kac (1966). Roughly speaking, the mathematical problem involves the determination of the coefficients of a differential operator from knowledge of its eigenvalue spectrum. Important contributions to this problem have been made by outstanding mathematicians, such as Levinson, Marchenko, and Krein, among others. A short survey is given by Barcilon (1986).

M.BERTERO

6

The inverse scattering problem was originally investigated in connection with the Schroedinger equation and later extended to other equations, such as the Helmholtz equation, impedance equation, and so on. Briefly, the problem is to determine the potential (or refraction index, impedance, etc.) from knowledge of quantities related to the scattering amplitude, e.g. phase shifts, reflection and transmission coefficients, etc. A complete account of the main results is given by Chadan and Sabatier (1977). The above inverse problems are nonlinear. For example, in the case of the vibrating string, the direct problem consists in solving the linear eigenvalue problem -U;(X)

= w ; ~ ( x ) u ~ ( x ) ;k = 0,

1,

(4)

on the interval (0,L), for a given density function p(x) and suitable boundary conditions at the end points 0 and L. But since the eigenvalues W ; are nonlinear functionals of the density function p(x),the inverse problem, consisting in the determination of p ( x )from given values of the of, implies the inversion of a nonlinear mapping. However, when an estimate of p ( x ) , say po(x),is known and the difference f ( x ) = p ( x ) - po(x)is small, it is possible to linearize the functionals w: using, for instance, perturbation theory. In this way, the original nonlinear inverse problem is approximated by means of a linear one for the unknown function f ( x ) . Similar results can be obtained in inverse scattering using Born or Rytov approximation, geometrical optics approximation, and so on. Moreover, some inverse problems related to inverse diffraction are rigorously linear problems. In this paper, we consider only linear inverse problems which have the following general structure. The first step is the definition of the direct problem, which must be linear. Then the solution of the direct problem defines a linear mapping L from the space X of all functions characterizing the properties of the physical sample (such as the density function in the case of a vibrating string or the refraction index in the case of a semi-transparent object, etc.) into the space Y of all corresponding measurable quantities (such as sequences of eigenvalues, scattering amplitudes, and so on). Of course, in the direct problem the data are elements of X , while the solutions are elements of Y. In the corresponding inverse problem, the data and solutions are interchanged. If we assume that the operator L is known (and this implies the need of solving the direct problem), then the inverse problem can be formulated as follows: Given g E Y and a linear operator L: X + Y, find f E X such that g = Lf.

An element of X will be called an object (or a solution) while an element of Y will be called an image (or data function, data vector, etc.). Accordingly, X will

LINEAR INVERSE AND ILL-POSED PROBLEMS

7

be called the object space (or solution space) and Y will be called the image space (or data space). In this paper, as in most mathematical papers on such problems, it is assumed that both X and Y are Hilbert spaces and that the linear operator L: X + Y is continuous. Continuity means that, given any sequence {fn} converging to zero in the sense of the norm of X , the corresponding sequence {LS,) converges to zero in the sense of the norm of Y. In other words, it is assumed that the direct problem is well-posed in the sense of Hadamard. In order to illustrate the difficulties of solving linear inverse problems, we consider a particular example of Eq. (9,the case of a Fredholm integral equation of the Jirst kind of the type K(x,y)f(y)dy,

g(x) =

c

I x I d.

(6)

lab

This can be written in the general form ( 5 ) if we introduce the integral operator (Lf ) ( x )= Jab K ( x , Y ) f ( Y )dY,

c 5 x 5 d,

(7)

which transforms functions on [a, b] into functions on [c, d ] . Moreover, we consider the simple case where the solution and data space are spaces of square-integrable functions, i.e., X = L2(a,b) and Y = L2(c,d ) . Then the operator L is continuous when the integral kernel K ( x , y) is square-integrable. Notice that this is only a sufficient condition for continuity. In the case of a convolution operator, i.e., K ( x , y ) = K ( x - y ) and also a = c = -a,b = d = +a, the operator L is continuous in L2(R) when the impulse response function K ( x ) is integrable and therefore the kernel is not integrable nor square-integrable as a function of the two variables x, y. Assume now, for example, that the integral kernel K ( x , y ) is an analytic function of x for any y E [a, b]. Then, given an arbitrary object f E L2(a,b), the corresponding image g(x) computed by means of Eq. ( 6 )is also analytic. It follows that the inverse problem does not have a solution for an arbitrary g E L2(c,d),but only for functions g in a subset of analytic functions. The problem is ill-posed in the original sense of Hadamard. Also, continuous dependence of the solution on the data does not hold true. If the interval [a, b] is bounded and the kernel is square-integrable, then the kernel is also integrable and, from the Riemann-Lebesgue theorem (Titchmarsh, 1948), it follows that

jab

K ( x ,y) cos(ny) dy -,0,

n + a.

(8)

Thus, we have found a sequence of functions that does not tend to zero

8

M. BERTERO

in L2(a,b), while the sequence of the corresponding images tends to zero. This example is similar to the example of Hadamard for the Cauchy problem of the Laplace equation. The previous example indicates that it is necessary to reconsider the validity of the mathematical model provided by Eq. ( 5 ) for the description of a physical experiment. When we apply the operator L to all the functions of X , we obtain a set of images that may be called the set of exact or noise-free images. In the example discussed above, these functions are a more regular (analytic) than the corresponding functions f and this situation is rather common in most inverse problems, i.e., the operator L that solves the direct problem has rather strong smoothing properties. But measurement errors or noise can destroy the smoothness of the exact image. The measured image is not related to f by Eq. (5), but by the equation g = Lf

+ h,

(9)

where h is a stochastic function that represents random noise. Such a function is, in general, far more irregular than any exact image Lf.For example, Lf is band-limited, while h is not band-limited or has a band much broader than the band of Lf. The remarks above indicate that the space Y must be broad enough to contain both the exact and measured images and that, in general, the set of exact images is a subset of Y. Moreover, if we assume that Y is a Hilbert space, then we must also assume it is equipped with a scalar product such that the norm of the experimental error h is small with respect to the norm of the exact image Lf.(It is for this reason that a space of square-integrable functions is the most convenient one in a number of practical problems.) As a consequence of these properties of the image space Y, the solution of Eq. (5) does not exist for arbitrary g and the problem is ill-posed. In the case of inverse problems with discrete data, however, the solution may exist for arbitrary g (even if it is not unique; see Section 111). It also depends continuously on the data since the space Y is a finite-dimensional Euclidean space and any operator is continuous in a finite-dimensional space. One must never forget, however, that continuous dependence of the solution on the data is a necessary but not a sufficient condition for the numerical stability (robustness) of a solution. Finite-dimensional problems obtained by discretizing ill-posed problems are usually ill-conditioned, even extremely ill-conditioned, so that error propagation from the data to the solution can deprive the solution of any physical meaning. If we return now to discussing linear inverse problems formulated in Hilbert spaces, we find we have a puzzling situation: either we use Eq. (9, but then the solution may not exist, or we use Eq. (9), but then we have only one equation and two unknown functions, f and h. And the search for

LINEAR INVERSE AND ILL-POSED PROBLEMS

9

approximate solutions of Eq. ( 5 ) , i.e., objects f such that Lf is approximately equal to g, will not succeed due to the noncontinuous dependence of the solution on the data. In fact, as clearly illustrated by Hadamard’s example, the set of approximate solutions can contain wildly oscillating and completely physically meaningless functions. More precisely, the set of all functions f, such that the distance between Lf and g is not greater than some prescribed small quantity E in an unbounded set of the object space X. The basic idea in the treatment of ill-posed problem is the use of a priori information about the unknown object to constrict the class of approximate solutions. This means that we need additional information, information that cannot be deduced from Eqs. ( 5 ) or (9), about the properties of the unknown object f.Moreover, this information must be incorporated into the algorithm to produce a physically meanin@ approximate solution. The additional information can consist of upper bounds on the solution and/or its derivatives, regularity properties of the solution (existence of derivatives up to a certain order, analyticity, etc.), localization properties of the solutions (restrictions on its support, behaviour at infinity, etc.), lower bounds on the solution and/or its derivative (positivity of the solution and/or its first derivative), and so on. The idea of using prescribed bounds to produce approximate stable solutions was introduced by Pucci in the case of the Cauchy problem for the Laplace equation (Pucci, 1955),while the constraint of positivity was used by John in the solution of the heat equation for preceding times (John, 1955), another classical example of an ill-posed problem. A general version of similar ideas was formulated independently by Ivanov for Eq. (5) (Ivanov, 1962).This method and the method of Phillips for Fredholm integral equations of the first kind (Phillips, 1962)were the first examples of the regularization theory for illposed problems, formulated and developed by Tikhonov a few years later (Tikhonov, 1963a; 1963b; 1964).Expositions of this theory can now be found in a number of monographs (Tikhonov and Arsenine, 1977; Groetsch, 1984; Bertero, 1982; Morozov, 1984). A summary of this theory will be given in Section V. This will be preceded by a brief section on the theory of generalized solutions, the basis of the general formulation of regularization theory. Regularization theory is essentially deterministic in that it does not make use of statistical properties of the noise or objects. Probabilistic methods for the solution of ill-posed problems have also been developed, but they will not be considered in this paper. Some of them are ad hoc methods and have not yet been formulated in a sound mathematical framework. We need only mention here the method of Wiener $filters,developed as a general method of solving ill-posed problems (Strand and Westwater, 1968; Franklin, 1970; Bertero and Viano, 1978)and which can be considered a probabilistic version of a particular regularization algorithm (Bertero et al., 1980a).This method, of

10

M. BERTERO

course, is very well established and can be used whenever a knowledge of the statistical properties, not only of the noise but also of the object is available. In a sense, the latter type of information plays the same role as prescribed bounds on a solution in the case of regularization theory and therefore in this formulation is the required a priori information (Turchin et al., 1971). The difficulty in the use of Wiener filters is that, in several practical inverse problems, the required correlation functions are not known. When these can be determined, the method can be very useful, as is demonstrated by applications to atmospheric remote sensing (Askne and Westwater, 1986).

11. LINEARINVERSE PROBLEMS

According to the definition introduced by Courant and Hilbert, the problem of solving the functional equation ( 5 ) is well-posed in the sense of Hadamard if the following conditions are satisfied: i) the solution f is unique in X ; ii) the solution f E X exists for any g E Y; iii) the inverse mapping g + f is continuous.

Condition i implies that the operator L: X + Y admits an inverse operator L-': Y + X , while Condition ii means that L-' is defined everywhere on Y. Then, since the continuous operator Lis linear, from a corollary of the Banach open mapping theorem (Yosida, 1966, p. 77) it follows that L-' is also continuous and therefore, in the linear case, Conditions i and ii imply Condition iii. We emphasize that the requirement of continuous dependence of the solution on the data is a necessary, but not a sufficient condition, for the stability (robustness) of the solution against noise. In the case of a well-posed problem, the propagation of relative errors from the data to the solution is controlled by the condition number. If 6 g is a small variation of g and 6f the corresponding variation of f = K ' g , then

I l ~ f l l x / l l f l l x5 cond(~)ll~gllY/llgllY~

(10)

where cond(L) is the condition number given by cond(L) = ~ ~ L ~ 2~ 1.~ ~ L - ' ~ ~

(11)

Here IlLll and llL-'ll denote the norms of the continuous operators Land L-', respectively. When cond(L) is not too large, the problem ( 5 ) is said to be wellconditioned and the solution stable with respect to small variations of the data.

LINEAR INVERSE AND ILL-POSED PROBLEMS

11

On the other hand, when cond(l) is very large the problem is said to be illconditioned and a small variation of the data can produce a completely different solution. It is clear that the distinction between well-conditioned and ill-conditioned problems is not very sharp and that the concept of wellconditioned problem is more vague than the concept of well-posed problem. As we stated in the Introduction, one of the Conditions i-iii may not be satisfied when Eq. (5) is the mathematical model of the given linear inverse problem. In such cases, the problem is said to be ill-posed. According to this definition, since uniqueness never holds inverse problems with discrete data are always ill-posed (see Section 111). In the recent mathematical literature, however, the term ill-posed is used in a more restrictive sense related to the theory of generalized solutions. Since the generalized solution, when it exists, is always unique (see Section IV), the problem of solving Eq. ( 5 ) is called illposed only when the generalized solution does not exist for arbitrary g or, equivalently, does not depend continuously on the data. It follows that only problems formulated in infinite-dimensional spaces can be ill-posed in this sense, while problems with discrete data are always well-posed (but, possibly, ill-conditioned). In succeeding sections we first, investigate some general properties of Eq. (5) in the ill-posed case and subsequently present some of the more significant examples of linear inverse problems. A. General Properties

The ill-posedness of a problem is a property of the triple ( L , X , Y } ; a problem is ill-posed because, for instance, the space Y is too broad. As already discussed in the Introduction, however, the space Y cannot be modified since it must be large enough to contain both exact and measured (noisy) data. Therefore, if we know that in an inverse problem the set of measured data does not coincide with the set of exact data, Condition ii is not satisfied. In many practical circumstances it is quite natural to assume that the object space X is a space of square-integrable functions. It may be convenient however, to adapt the structure of X to the available a priori information about the properties of the solution, such as regularity properties (existence of derivatives of f up to some order, analyticity, etc.) and/or localization properties (approximate size of the support of f,asymptotic behaviour at infinity, etc.). This means that, for functions f defined on some interval [a, b ] , the appropriate Hilbert space can be a weighted Sobolev space whose scalar product is defined by

12

M. BERTERO

Here the star denotes complex conjugation, the pi are given functions which are continuous and positive, and f ( i ) denotes dif/dx'. A more general description of the process of restricting the object space by the use of a priori knowledge is as follows. Let X be the original Hilbert space for which the problem (5) has been formulated (for example, a space of square-integrable functions) and let us assume that the a priori information about the specific solutions we are considering asserts that the solutions of the problem belong to the domain of an operator C from the Hilbert space X into another Hilbert space 2 (the operator C is called the constraint operator and the space 2 the constraint space) with the following properties:

A) the operator C has domain D ( C ) dense in X and is closed (the importance of this property is based on the fact that, as a rule, all differential operators are closed; for a definition see Balakrishnan, 1976); B) the operator C has a continuous (bounded) inverse C - ' . Then it is possible to introduce a new space X , c X , the domain of C equipped with the scalar product ( f 9

4 ) c = (Cf? C4)P

(13)

It is easy to prove that, when Conditions A and B are satisfied, Xc is also a Hilbert space. Moreover, the operator L: X, + Y is also continuous since the topology of X , is stronger than the topology of X . It follows that the inverse problem can be reformulated by taking X c as the new object space. We need only note that the scalar product (12) is a particular case of the scalar product (13). As stated in the Introduction, we will consider Eq. (5) only for a linear and continuous operators L. Therefore, for the sake of completeness, we now summarize a few general properties of these operators which we will frequently use below. We also indicate their physical interpretation. The null space of L , denoted N ( L ) , is the set of all functions f that annihilate L, i.e., N ( L ) = {f E

x ILf

= 0).

(14)

When the linear operator L is continuous, N ( L ) is a closed linear subspace of X . Moreover, N ( L ) is not trivial if and only if the inverse operator does not exist. The null space of the operator L will also be called the subspace of the invisible objects (Rust and Burrus, 1972), since the image of any element of N ( L ) is exactly zero. This also means that the experiment, or the instrument, described by L is unable to detect the objects that belong to N ( L ) .On the other hand, the orthogonal complement of N ( L ) , N(L)', will be called the subspace of visible objects, since the objects that belong to this subspace can be recovered, in principle, from exact data. By the orthogonal projection theorem

LINEAR INVERSE A N D ILL-POSED PROBLEMS

13

(Yosida, 1966; p. 82), an arbitrary object f can be uniquely represented as the sum of a visible plus an invisible component. It is evident that, at most, only the visible component can be recovered from the image in the absence of further a priori information about the object. The range of the operator L, denoted by R(L), is the set into which L maps X R(L)= {

g E

ylg

=

U fE X )

(15)

and therefore R ( L ) is the linear subspace of the exact or noise free images (data). The distinction, discussed in the Introduction, between exact and measured images makes clear that R ( L ) in general does not coincide with Y. Moreover, in the case of inverse problems formulated in infinite-dimensional spaces, R ( L )may not be a closed subspace of Y. In some cases (for example, if L is the Laplace transformation) R ( L ) is dense in the image space Y. The adjoint operator L* is uniquely defined by the following relation ( L f ,g ) v = (f,L*g),,

(16)

which holds for any f E X and g E Y. The operator L* is also linear and continuous and has the same norm as L: IIL*ll = 111.11.The null space and the range of L* will be denoted by N ( L * ) and R(L*),respectively. For example, in the case of the integral operator (7), if we set X = L2(a,b) and Y = Lz(c,d ) , the adjoint operator is given by the equation (L*g)(y)=

jcd

K ( x,y)*g(x)dx,

a I Y I b.

(17)

Finally, we recall two important relations between null spaces and the ranges of the operators L and L*:

R(L) = N(L*)',

R(L*) = N(L)',

(18)

-

where R ( L )denotes the closure of R(L).In other words, by investigating R(L*) we are led to a determination of the subspace of visible objects, while the investigation of N(L*) provides information about the subspace of exact images. These properties will be used in some of the examples of inverse problems discussed in succeeding sections. The relations (1 8) imply the following decompositions of the spaces X and Y X

= N(L)@

R(L*),

Y

= N(L*)@

-

R(L).

We discuss now in brief two important examples of linear operators whose range is not closed. The first is that of a compact operator (Balakrishnan, 1976) whose range is not finite-dimensional (i.e., we exclude the case of finite rank operators). An example of a compact operator is provided by the integral

14

M. BERTERO

operator (7) when the kernel K ( x ,y) is a square-integrable functions of the two variables x, y. More precisely, in this case L is a Hilbert-Schmidt integral operator from L2(a,b) into L2(c,d ) . We consider the general case where the solution and data space do not coincide (for example, in Eq. (7) the intervals [a, b] and [c, d ] do not coincide) and introduce the singular value decomposition of L . We then indicate the relationship with the well-known spectral representation of a compact, selfadjoint operator. The operators L*L and LL* are compact, nonnegative operators in X and Y, respectively. Moreover, N(L*L) = N ( L ) and N(LL*) = N(L*). It follows (Balakrishnan, 1976) that both operators admit a countably infinite set of positive eigenvalues. It is also easy to prove that L*L and LL* have exactly the same positive eigenvalues with the same finite multiplicity (Lanczos, 1961; Kato, 1966).If we denote these eigenvaluesby a$ and count each eigenvalue as many times as required by its multiplicity,the a$ can be ordered in such a way as to form a non-decreasing sequence: a: 2 a: 2 a: 2 . . .. The compactness of L implies that lim 0: = 0. k - +w

Now let uk and v k be the eigenfunctions of L*L and LL*, respectively, associated with the same eigenvalue a: L*Luk = o:uk,

LL*vk = 0 : U k .

(21)

Then the uk form an orthonormal basis in N(L)', i.e., the subspace of visible objects, while the v k form an orthonormal basis in N(L*)*,i.e., the closure of the subspace of the exact images. As is easily verified, it is always possible to choose the pair {uk,q.)in such a way that it is a solution of the following shifted eigenvalue problem Luk = akkvk,

L*uk = akuk.

(22)

The positive numbers are called the singular values of L and the functions U k , the corresponding singular functions. The set of triples {gk; uk, u k } is the singular system of L . Then one can prove the following representation (Kato, 1966) +m

the singular value decomposition of the compact operator L . A similar representation holds true for the operator L* obtained from Eq. (23) by interchanging the role of the singular functions U k and vk. The singular value decomposition (23) implies that the visible components of the object corresponding to small singular values make a small contribution

LINEAR INVERSE AND ILL-POSED PROBLEMS

15

to the corresponding components of the exact image. Therefore, in the case of a noisy image, these components can also be invisible in practice. Moreover, Eq. (23) implies that an image g is an exact image, i.e., g E R(L),if and only if the following conditions are satisfied

These conditions are called Picard's conditions (Nashed, 1976) for the existence of the solution of Eq. (5), since they were first derived by Picard for Fredholm integral equations of the first kind (Picard, 1910). They show that R ( L ) is not closed, since an arbitrary function satisfying the first of the conditions (24) does not necessarily satisfy the second one. It is evident, however, that R ( L ) is dense in N(L*)'. If L is self-adjoint and non-negative (the integral operator (7) is self-adjoint when [a, b] = [c, d ] and also K ( x , y)* = K ( y , x)), we denote by 1, its positive eigenvahes and by Uk the corresponding eigenfunctions. Then we have q,= I , , u k = Uk, and Eq. (23) becomes the spectral representation of the operator L. The second example is that of a convolution operator in Lz(R") &f)(X)

=

J m - Y)f(Y)dY, Rn

x E R",

(25)

where x = {xl,xz,. . .,xn} and d y = dy,dy, . .., d y , . If the impulse response function K ( x ) is integrable, then its Fourier transform k(<)is bounded and continuous and from the Riemann-Lebesgue theorem (Titchmarsh, 1948), it + co. As a consequence, if R is the set of also follows that lk(c)l+ 0 as points such that Ik({)l# 0 (the closure of R is the support of I?({)), we have rn

= inf{li(c)ll

c E R} = o

(26)

When R is a bounded subset of R", the function K ( x ) is said to be bandlimited. In this case, the set of invisible objects is the set of functions whose Fourier transform is zero over R. Moreover, the set of exact images is contained in the subspace of functions whose Fourier transform is zero out of R. More precisely, if g is an exact image, by the well-known convolution theorem of the Fourier transform, =

and therefore g E R ( L )if and only if P

kcr)f^cr)

(27)

16

M. BERTERO

These conditions, which are analogous to the conditions (24),combined with the property (26) of the kernel, imply that R ( L )is not closed. Notice that this is not true when the quantity m, given by Eq. (26) is strictly positive. An example is provided by the band-limiting operator to be discussed in the following section. In this case, K ( x ) is not integrable. B. Inverse Source Problems

An inverse source problem can be defined, in general, as the problem of determining the constitution of a source from measured values of the emitted radiation. A specific example is the determination of the current distribution of an antenna from knowledge of its radiation pattern. Here we illustrate the main features of these problems by discussing the simplest case, a scalar source generating a scalar field. The more general problem of determining a chargecurrent distribution is investigated by Devaney and Wolf (1973), Bleinstein and Cohen (1977), and Hoenders (1978) with particular attention to the question of uniqueness. The basic equation is the inhomogeneous Helmholtz equation: AU + k2u = -4nf,

(29)

where k = 2n/Ais the wave number of the emitted radiation. The function f (r), the source density function, can be assumed to be zero outside some bounded region D. Without loss of generality, we can also assume that D is a sphere whose radius a is known. Then the direct problem is the determination of a field amplitude u(r) in the entire three-dimensional space satisfying Eq. (29) and also the Sommerfeld radiation condition lim r(au/dr - iku) = 0,

r-m

r = lrl,

(30)

i.e., u(r) has to represent a t infinity an outgoing spherical wave. As is well known, iff E L2(D),then there exists a unique continuous solution of this problem given by u(r) =

where

1

C(r - r')j(r')dr',

C(r) = eikr/r. Moreover, the behaviour of this solution at infinity is given as u(r) = g(s)G(r)[l

+ O(r-')].

(32) (33)

17

LINEAR INVERSE AND ILL-POSED PROBLEMS

Here s = r/r = {sin O cos 4, sin 0 sin 4, cos 0) and the function

is the radiation pattern. The inverse problem is now the determination of the source distribution f(r), given the radiation pattern g(s), and this implies the solution of a Fredholm integral equation of the first kind. Equation (34) has the same structure as Eq. ( 5 ) if we introduce the integral operator r

( ~ f ) ( s=) J e-ik(s*r’lf(r’) dr’.

(35)

D

The operator L transforms a function f E L2(D)into a function defined on the unit sphere S 2 c R 3 . Let us assume that the data space Y is the space L 2 ( S 2 )of functions which are square-integrable with respect to Lebesgue measure on S 2 , ds = sin OdOd4. Then it is easy to show that the operator L: L2(D) L 2 ( S 2is ) compact and that its adjoint is given by --f

(L*g)(r) =

js2

eik(r*s)g(s) ds.

We first notice that R ( L )c L2(S2)is a subspace of analytic functions since, as follows from Eq. (34),any exact image g(s)is the restriction to the surface of a sphere of radius k (Ewald sphere) of the Fourier transform of a function with bounded support. Moreover the null space N ( L ) can also be easily characterized. The radiation pattern is identically zero if and only if the Fourier transform off is zero on the surface of the Ewald sphere. Since it is possible to prove (Devaney and Wolf, 1974; Hoenders, 1978) that the emitted field outside D is uniquely determined by the radiation pattern (this problem will be considered in the next section), it follows that any f E N ( L )produces a radiation field which is identically zero outside the source region D.In other words N(L),the subspace of invisible objects, is also the subspace of the nonradiating sources. Given an arbitrary source distribution f,this can be uniquely decomposed into a component onto N ( L ) (the invisible component) and a component orthogonal to N(L)(the visible component). Using the second of the relations (18), the component orthogonal to N(L),i.e., the component which contributes to the radiation pattern, can be characterized by investigating the range of the operator L*, as given by Eq. (36) (Bertero and De Mol, 1981a). If we use the well-known expansion of a plane wave into spherical

18

M. BERTERO

harmonics

a,= 4n(

(39)

from Eq. (36) we obtain +a,

c

(L*g)(r) = l = O

1 in-1

ol( Js2g(sr)xrn(s’)*dsj)ulm(r)

(40)

In conclusion, the subspace of visible sources is the closed subspace spanned by the functions ulm(r)of Eq. (38). Moreover, the set of triples {ol;ulm,ulm} is just the singular system of the compact operator L , as follows from the relations Lutm = olulm,

L*vIm

= OlUlm,

(41)

which are easily obtained from Eqs. ( 3 9 , (36), and (37) using the orthogonality properties of the spherical harmonics. We finally remark that the singular values a,tend to zero exponentially fast when 1 > ka, indicating strong ill-posedness of the problem of determining the visible component of the source distribution. C. Inverse DifSraction Problems

The problem of inverse diffraction can be defined as the problem of determining the field distribution on a boundary surface from the knowledge of the field distribution on a surface situated within the domain where the wave propagates (Shewell and Wolf, 1968). More precisely, consider a scalar field u in a region exterior to a surface Zo which is the boundary of the region where the sources or the scatterers are situated. Then u is a solution, in the exterior region, of the homogeneous Helmholtz equation

A U + k2U = 0

(42)

and it also satisfies the Sommerfeld radiation condition (30) at infinity (in the case of a scattering problem, this condition is satisfied by the scattered wave). The problem of inverse diffraction is now the problem of determining the field distribution on Zo from knowledge of the field distribution on another surface

LINEAR INVERSE AND ILL-POSED PROBLEMS

19

C1 belonging to the exterior region. In particular, when the surface C1 is a

sphere of large radius surrounding C,, the data consists of the radiation pattern or the scattering amplitude, and the problem is called inverse diffraction from far-Jeld data. A general discussion of the uniqueness of the solution of the problem of inverse diffraction is given by Hoenders (1978) both in the scalar and in the vector case. The uniqueness of the problem of inverse diffraction from far-field data was proved by Devaney and Wolf (1974) for electromagnetic fields. The scalar problem formulated above can be easily solved when the surfaces Zo and Clare circular cylinders with the same centre (Cabayan et al., 1973), spheres with the same centre (Hoenders, 1978) or parallel planes (Shewell and Wolf, 1968).Here we sketch the treatment of inverse diffraction from far-field data and the treatment of inverse plane-to-plane diffraction. 1. Inverse DifSraction from Far-Field Data

In the problem of inverse diffraction from far-field data, we can assume, without loss of generality, that Zo is the surface of a sphere of radius a (for example, the sphere containing the source distribution introduced in the discussion of the problem of Section 11,B).Then we denote by f(s) the field distribution on C,. The direct problem is the determination of the solution u(r,s)of Eq. (42)in the region r > a satisfying the Sommerfeld condition (30) at infinity and also the boundary condition u(a, s) = f (s).This is a typical exterior problem for the homogeneous Helmholtz equation. If we represent f (s) in terms of spherical harmonics

where Clm =

I,

f ( s ) x m ( s ) * ds,

(44)

then the solution u(r,s) of the direct problem is given by

where the hil)(kr)are spherical Hankel functions of the first kind. Notice that lhil)(x)[is never zero for real values of x, and therefore the solution of the direct problem (for fixed r) always exists, is unique, and depends continuously on f in the norm of L2(S2). Now using the well-known asymptotic behaviour of the spherical Hankel functions hjl)(kr) = ( - i ) ’ + l(eikr/r)[1 o ( r - ’ ) ] (46)

+

20

M. BERTERO

from Eq. (33), which defines the radiation (diffraction) pattern g(s),we derive the following integral relationship between g(s) and f(s),

where m

K(s,s') = k-'

l

1 1 ( - ~ ) " ' [ ~ ~ " ( ~ u ) ] - ' ~ " ( s ) ~ ~ ( s '(48) )*.

1=0 m = - I

Thus, the solution of the inverse diffraction problem from far-field data has been reduced to the solution of a Fredholm integral equation of the first kind. It is also easy to recognize that Eq. (48) is the spectral representation of the kernel. As a consequence the corresponding integral operator L is compact in L 2 ( S 2 ) its , eigenvalues are given by 1,= k - l ( - i ) * + ' / h i l ) ( k a ) ,and the corresponding eigenfunctions are just the spherical harmonics. Since the latter form a basis in L2(S2)and all the eigenvalues are nonzero, it follows that N ( L ) = {0}, and therefore the solution of Eq. (47) is unique. This is the uniqueness result already mentioned in Section I1,B. However, since If-,0 as 1 + +co, the solution does not depend continuously on the data. As a consequence, even if uniqueness holds true the information content of the far field is poorer than the information content of the near field. This result is obvious on the basis of elementary physical considerations. We notice also that, when the source or the scatterer is interior to a sphere of radius a, it is possible to determine the field to the surface of the source or scatterer by means of suitable Bessel function expansions that provide a sort of analytic continuation of the field. This technique is described by Bertero et al. (1980a) and has been applied to the problem of determining the shape of perfectly conducting bodies (Imbriale and Mittra, 1970). 2. Inverse Plane-to-Plane Diffraction We consider now inverse plane-to-plane diffraction (Shewell and Wolf, 1968). Stability results for this problem are given by Bertero and De Mol (1981b) and Magnanini and Papi (1985). An application of this problem was investigated by Sondhi (1969).The direct problem is the following. Find in the half-space z > 0 a solution u(r) = u(x, y, z) of Eq. (42) satisfying the Sommerfeld radiation condition (30) at infinity and taking given boundary values on the plane z = 0. If we denote by f ( p ) , p = { x , y } , the boundary field distribution, and by g,(p) = u ( x , y , a ) the field distribution on the plane z = a, then the solution of the direct problem is given by Gl(P - P', a ) f ( p ' )dp'9

(49)

LINEAR INVERSE AND ILL-POSED PROBLEMS

21

where

and G(r) is defined in Eq. (32) (Luneburg, 1964). The inverse diffraction problem is now the determination of the boundary distribution f ( p ) from knowledge of the field distribution g,(p) on the plane z = a > 0. According to Eq. (49), this corresponds to the inversion of a convolution operator, such as the operator (25), a problem which has been treated by means of the Fourier transform. Notice that the representation of the field on the plane z = a in terms of its Fourier transform is also called its representation in the form of an angular spectrum of plane waues (Shewell and Wolf, 1968). The 2D-Fourier transform of the kernel (50) is given by

6, (c, 2) = eizm(2)

(51)

where

m(C) = (kz - lC12)1’z, m(C) = i(1Cl2

- k2)’/’,

lCl 5 k lCl > k,

(52)

and therefore in this case Q = Rz.The part of the spectrum I k corre> k corresponds to inhomogesponds to homogeneous waves, while the part neous or evanescent waves. Since 1 C1(C,z)l # 0 for any finite va.lue of 151, we conclude that the solution of Eq. (49) is unique in Lz(Rz). The solution, however, does not exist for arbitrary data and does not depend continuously on the data due to the fast decay of l(?l(C,z)I as + +a, in other words, due to the existence of evanescent waves. We also notice that, when z is very large, IC,(C,z)1 is > k. In this case, one can extract from the data only practically zero for I k. Thus, this inverse diffraction problem values of f({) in the disc provides our first example of the general problem of Fourier transform inversion with limited data (see Section 11,F). D . Linear Inverse Scattering Problems

The inverse scattering problems are, in general nonlinear problems. They have stimulated elegant mathematical studies, some of which were briefly mentioned in the Introduction, though they fall outside the scope of this paper. Under some circumstances, however, it is possible to introduce physical approximations which allow a linearization of the nonlinear problem.

M.BERTERO

22

A well-known case is that of a weak scatterer; here Born approximation may be used. Another kind of approximation that also leads to a linear problem is Rytov approximation (Chernov, 1967), which is valid when the scale at which the properties of the scatterer fluctuate is large compared to the wavelength 1 of the incident radiation. A third example is the investigation of dispersed systems by light scattering experiments. The typical case is that of a suspension of spherical particles all of which have the same physical properties (for instance, refraction index) but are different in size. The aim of the experiment is the determination of the distribution function of the particle sizes. In the case of dilute systems, one can neglect multiple scattering and the problem becomes linear. 1. Semi-Transparent Objects

The use of Born approximation for reconstructing the refraction index of weakly scattering semi-transparent objects has been widely investigated (Wolf, 1969; Devaney, 1978; Hoenders, 1978), and also applied to experimental data processing (Carter, 1970; Carter and Ho, 1974; Fercher et al., 1979). Let n(r) be the refraction index of the object and let f(r) = 1 - n2(r).Since we consider a bounded object, situated, for instance, inside the sphere of radius a, f(r) is zero for r > a. Then the total field u (incident plus scattered) is a solution of the wave equation

Au

+ k2u = k’f(r)u,

(53) where k = 2n/A is the wave number (in free space) of the incident radiation. When the incident radiation is a plane wave, u,(r) = exp[ik(s,,r)], the first Born approximation to u(r) = uo(r) + us(r) (incident + scattered radiation) is given by U(B)(r) = e’k(S0.r)- (k2/4n)

s

G(r - yr)eik(so,r‘lf(r’)dr’.

(54)

Now, let uiB)(r)be the second term on the r.h.s. of Eq. (54) (Born approximation of scattered wave). The 2D Fourier transform of ukB)(p,z ) (p = {x, y}) over any plane of constant z that does not intersect the region containing the object is given by Wolf (1969),

i2iB’({,z) = (ik/s,)exp[iks,z]f[k(s - so)] r.h.s. f denotes the 3D Fourier transform

(55)

where on the of f and ks = {t.,.,t,,,(kz - t: - 5,2)”’). It follows from Eq. ( 5 5 ) that knowledge of ijLB)({, z) is equivalent to knowledge off on the surface of the Ewald sphere with centre ks, and radius k and, therefore, from the mathematical point of view, this problem is analogous to the inverse source problem discussed in Section II,B.

LINEAR INVERSE AND ILL-POSED PROBLEMS

23

Moreover, by varying the direction of incidence so, a (theoretically infinite) number of experiments would allow one to determine fwithin the sphere with centre at the origin and radius 2k (limiting Ewald sp_here). In this case, the uniqueness of the reconstruction off is assured, since f is an analytic function thanks to the bounded support off. We notice that we are again dealing with the problem of reconstructing a function from limited knowledge of its Fourier transform. The same mathematical problem must be solved if one applies Rytov approximation (Devaney, 1981). The basic point is the introduction of the complex phase function $(r) = In u(r) - ik(s,, r).

(56)

Then the Rytov approximation for the complex phase function is given by the equation (Chernov, 1967)

$ (R)(')

=

-ik(.mdu~V(r).

(57)

It is obvious now that if one takes the 2D Fourier transform of exp[ik(~,,r)]$'~)(r)over any plate of constant z that does not intersect the scatterer, one again gets values off on the surface of the Ewald sphere. A short discussion of the limits of validity of the two approximations has been given by Devaney (1 98 1). 2. Perfectly Conducting Bodies

In the case of scattering of an electromagnetic wave by a perfectly conducting body, the Born approximation (also known as Kirchhofi or physical optics approximation) consists in using the values of the incoming field as the magnetic field at the surface of the body (Bojarski, 1966; Lewis, 1969; Hoenders, 1978). This approximation is valid when the wavelength I of the incoming field is small relative to the smallest geometrical details of the surface of the body. Consider an incident plane wave with electric field Ei(r) = exp[ik(s,, r)]Eo and assume that the target is a smooth, convex, and bounded body which occupies the domain D. The back-scattered field Eb(r),i.e., the field observed in the direction s,, = -so, at great distances from the scatterer is given by Eb(r) = p(k)(eikr/(2&r))Eo

+ O(r-'),

(58)

where k = ks,. Then in the studies mentioned above it is proved by Born approximation that T(k) s 2&K2[p(k)

+ p*(-k)]

=

s

x(r)ei(k-r)dr,

(59)

24

M. BERTERO

where X(r) is the characteristic function of the target, i.e., a function which is 1 when r is in D and zero otherwise. Since, in practice T(k) can be measured only for values of k in a restricted domain (in particular, in radar applications, only for points interior to the JkJ I M , where m and M are related to the minimum and annular region m I maximum value of the usable frequency band), we again have a problem consisting in inverting a Fourier transform with limited data. A special feature of this problem is the lack of information at low frequencies. For this reason, it has been suggested that the directional derivative of the characteristic function be reconstructed rather than the characteristic function itself (Mager and Bleinstein, 1978). In this way, one simultaneously attenuates low-frequency data while enhancing the effect of high-frequency data. 3. Dispersed Systems

Consider a dilute suspension or a dilute aerosol consisting of spherical particles having the same physical properties (refraction index etc.) but different sizes. We denote by f(r) the probability density of the particle size distribution. Moreover, we denote by Q ( p , r) the measured scattering pattern in the case where all the particles have the same radius r (monodisperse system).This pattern, in general, is a function of the radius r and the measured scattering variable p . For example, in photon correlation spectroscopy (Cummins and Pike, 1974) Q ( p ,r) is the correlation function of the scattered field, where p is the correlation time, or in Fraunhofer diffraction (Van de Hulst, 198 1) the small-angle differential cross-section where p is the scattering angle, or the absorption coefficient as a function of the wavelength of the incident radiation (Van de Hulst, 1981). Then, by neglecting multiple scattering, for a polydispersed system with particle size probability f ( r )the scattering pattern is, according to Shifrin and Perelman (1965),

In several important cases, Q(p,r) depends only on the product of the two variables, i.e., Q ( p ,r) = K ( p r ) ,and the Fredholm integral equation of the first kind (60) becomes

For example, in polydispersity analysis by photon correlation spectroscopy (Cummins and Pike, 1974), K ( x ) = exp(-x), and the solution of Eq. (61)

LINEAR INVERSE AND ILL-POSED PROBLEMS

25

involves the inversion of the Laplace transform, a problem considered in Section I1,G. (In this case, however, the variable r is not the radius of the particles but their translational diffusion coefficient, which is proportional to the inverse of the radius of the particles.) The integral equation (61) is also a satisfactory approximation for Fraunhofer diffraction and in extinction experiments when using anomalous diffraction approximation (Shifrin and Perelman, 1965). The integral equation (61) has the form (5) with

We assume, for simplicity that the data g( p) are measured for all values of p in (0, +a), and consider L as a linear operator in L2(0, 00). Then the integral

+

equation (61) can be investigated using the spectral representation of L in terms of generalized eigenvalues and eigenfunctions (McWhirter and Pike, 1978). An equivalent approach is based on the use of the Mellin transform which, for a square-integrable function, is defined by (Titchmarsh, 1948).

The transform A is an isometry from L2(0, + co)into L2(- co,+ co),

L+*

j+"

If(r)12 dr = lI(Af)(5)l2 dt, 2n -"

(64)

and the following inversion formula holds true

Now, taking the Mellin transform of both sides of Eq. (61) yields (Ag)(t) = (AUt)Wf)( - 0.

(66)

We find that the operator L is a continuous self-adjoint operator in L2(0,+ 00) if K ( x ) is real and satisfies the condition

j0+"

x-"21R(x)l dx <

+ co.

(67)

Moreover, the operator L is invertible if and only if the support of (AK)(<) coincides with ( - 00, + co). Then from Eqs. (65) and (66), we derive that

26

M. BERTERO

Condition (67), however, implies that l(&K)(C)l -+ 0 as -+ +a (this is again a consequence of the Riemann-Lebesgue theorem), and therefore L-' is + a). not continuous in ~'(0, E . Radon Transform Inversion and Tomography

A paper devoted to inverse problems could hardly neglect the subject of Radon transform inversion, a subject basic in several fundamental areas, such as diagnostic radiology (computerized tomography), radio astronomy, and electron microscopy. On the other hand, it is impossible to give even a short account of the explosive development of this field, a field which has seen the most spectacular applications of the theory of inverse problems. Therefore, we limit ourselves to showing the connection between Radon transform inversion and the general formulation given above. The interested reader is referred to books on mathematical aspects (Herman and Natterer, 1981; Natterer, 1986a) and applications as well as computational methods (Herman, 1979; Herman, 1980; Sabatier, 1987b). The problem of Radon transform inversion, also called object reconstruction from projections, and the related problem of Abel transform inversion, are two examples of linear inverse scattering problems that arise when the variations of dynamical functions over a given wavelength are so small that diffraction can be neglected. A geometrical optics description of the process is then possible and, in several instances it is also possible to assume straight line ray propagation. These approximations are adequate especially at X-ray wavelengths. The n-dimensional Radon transform R maps a function f(x) defined on R" into the set of its integrals over the hyperplanes of R". Therefore, if S"- is the surface of the unit sphere in R" and 8 E S"-', s E R', the integrals off over the hyperplanes perpendicular to 8 with signed distance from the origin s are given by (Rf)(&S)=

j

(x.e) = s

f(x)dx

(69)

where (x,8) is the scalar product in R". The operator R defines the Radon transformation and the inverse problem is the solution of the equation g = Rf for given g. A condensed discussion of related transforms (e.g., X-ray transform, divergent beam transform, and attenuated Radon transform) is given by Louis and Natterer (1983). Another, related problem is diflraction tomography, a field created from the solution of the scattering problem of the wave equation through the use of Rytov approximation (Mueller et al., 1979; Devaney, 1982, 1984; Natterer, 1986a).

LINEAR INVERSE A N D ILL-POSED PROBLEMS

27

We focus now on the 2D case, in which case Radon transform and X-ray transform coincide since integrals over hyperplanes are simply integrals over straight lines. Equation (69)can also be written in the following form, where 8' is the unit vector orthogonal to 8 (Rf)(e,S)=

1-z

f ( s e + te1)dt.

(70)

Another notation which is often used is the following (Rf

(71)

= (Ref)(s),

and the function R e f , with fixed 8, is called the projection of f.This is the origin of the name, object reconstruction from projections, often used as a synonym of Radon transform inversion. One of the basic properties of the Radon transform, which also clarifies the information it provides about the function f , is the projection slice theorem (or projection theorem or Fourier slice theorem; for a proof see, Natterer, 1986a), i.e., the following relation between Fourier transforms ( R e f ) ( 5 ) = f(te)7

(72)

where ( R e f ) is the 1D Fourier transform of Ref while f^ is the 2D Fourier transform of f . The meanhg of this result is clear: knowledge of R e f is equivalent to knowledge off along the straight line parallel to 0 and passing through the origin. Several results concerning the uniqueness of the solution can be immediately deduced from Eq. (72). If Ref is known for all values of 8, then f is known everywhere and therefore f is uniquely determined. On the other hand, if Ref is known only for values of 8 in a subset of the half-circle (problem of limited angle tomography) the solution of the problem is, in general, not unique. But when the function f has a bounded support, its Fourier transform is an analytic function and therefore can be uniquely recovered from its value in a finite sector. We have again the problem of restoring a function f from limited values of its Fourier transform. An explicit inversion formula for the transform (69) was already obtained by Radon (1917), as we recalled in the Introduction. Here we sketch an approach which is the basis of the algorithms currently used in practical applications. If we introduce the formal adjoint R# of the Radon transform, also called the back projection operator, (R#g)(x)=

then the equation g

= Rf

lS,

s(e7

(x7 0)) dB7

can be replaced by R'g

= R'Rf.

(73) By taking the 2D

28

M. BERTERO

Fourier transform of both sides of this equation, we have (Natterer, 1986)

(r)

%r).

(R#R~) =4 4 ~ 1 Then, if we introduce the operator A defined by (Af )YO = lrlm9

(74)

(751

we obtain (AR'Rf)'' = 4nj, and this equation, combined with the original equation Rf = g, yields the inversion formula

f = (41t)-'AR#g.

(76)

This formula is the basis of the filtered back projection method, the algorithm which is most frequently used in practice. In the previous computations we have not specified the object space X or image space Y. It is always convenient, and also reasonable in practice, to assume that the unknown function f has a bounded support. Several papers have also considered the case where both X and Y are Sobolev spaces. Then it is possible to prove that with an appropriate choice of these Sobolev spaces, the operator R , Eq. (69),is continuous and has a continuous inverse (Natterer, 1980; Louis and Natterer, 1983). The continuity of R - ' , however, is only obtained when the functions of the data space are smoother than the functions of the solution space. As we have already discussed in the Introduction, this choice may not be reasonable in practice due to the effect of noise on the data. Therefore, Radon transform inversion is an ill-posed problem when the data space is, for example, a space of square-integrable functions. The ill-posedness is essentially related to the fact that the operator A in Eq. (75) is not continuous in L2.We also mention that for functions f with bounded support, the Radon transform R defines a compact operator if X and Y are suitable weighted L 2 spaces. Then the singular system of R has been explicitly determined for arbitrary values of n (Davison, 1981; Louis, 1984), while for the limited angle problem the singular system has been determined only in the case n = 2 (Louis, 1986). F. Fourier Transform Inversion with Limited Data

We have seen in the previous sections that several inverse problems can be reduced to the same basic problem of determinhg a function f(x) defined on R" from knowledge of its Fourier transform f(T) on a bounded domain !J c R". Therefore, if we denote by g(5) the known (in general, noisy) values of we must solve the problem

7,

LINEAR INVERSE AND ILL-POSED PROBLEMS

29

A quite natural framework for dealing with this problem is to let X = L2(R") and Y = L2(Q).Then it is obvious that the solution of the problem is not unique. The null space of the operator is the set of all functions f whose Fourier transform is zero over Q. Uniqueness, however, holds true when the function f has a given bounded support D c R". In this case, one can take X = L2(D). Then the Fourier transform of any f E Lz(D) is analytic, and therefore can be uniquely determined from its values on a suitable infinite set of points. The problem, however, is ill-posed because the existence and continuous dependence of the solution on the data do not, in general, hold true. In the present section we treat essentially the one-dimensional case using the basic results of Slepian and coworkers (Slepian and Pollak, 1961; Landau and Pollak, 1961; 1962). The extension to several dimensions may be accomplished along the lines indicated by Slepian (1964). We assume that the values of the Fourier transform off E L2(- 00, + 00) are given in the interval [ - c , c ] and note that Eq. (77) then has the form ( 5 ) if the operator L is defined as follows: (Lf)({= ) {-ycity(x)dx,

151 5 C.

(78)

We assume also that the data space Y is L2(-c, c), and define the norm of g as its L*-norm divided by 2n. Then the adjoint operator is given by

's'

(L*g)(x)= 27t

-c

eix5g({)d{,

(79)

In this particular case, the null space of L is the set of all f' E L2(-m, +a) whose Fourier transform is zero on the interval [ - c, c ] , and therefore, as already pointed out, the solution of the problem is not unique. We also notice that the operator LL* is just the identity operator in L2(-c,c), while the operator L*L is the band-limiting operator B, given by (Bcf)(x)= s:yo(n/c) sincCc(x - Y)/7tlf(Y) dY,

(80)

where the following standard notation has been used sinc(x) = sin(nx)/(nx).

(8 1)

In fact, the operator B, transforms a function f E L2(- 00, + 00) into a function whose Fourier transform coincides with the Fourier transform off over the interval [ -c,c] and which is zero elsewhere. It follows that B, is thz projection operator onto the subspace of band-limited functions with bandwidth c. Such a subspace of entire functions, which is the subspace of visible objects,

30

M. BERTERO

is closed with respect to the L2-norm and therefore is itself a Hilbert space, called the Paley-Wiener space, and denoted PW,. We recall that for any function f E PW, the following Whittaker-Shannon expansion theorem (sampling theorem) holds true:

where x, = nn/c, n = 0, k 1,. .. The distance x/c between adjacent sampling points is usually called the Nyquist sampling distance. As we have already remarked, uniqueness holds true when the function f has bounded support, say interior to the interval [ - 1,1]. Then, if we take X = L2(- 1, l), the operator (78) is replaced by the operator

It is easy to recognize that this is a compact operator from L2(- 1,l) into L2(- c, c). The determination of its singular system can be reduced to the solution of the eigenvalue problem of the operator L*L given by 1

(L*Lf)(x) = ~-l(~/c)sincCc(x - y)/xIf(y)dy,

1x1 I 1.

(84)

This is the integral operator investigated by Slepian and coworkers. Its eigenfunctions, denoted $k(c,x), k = 0,1,2,. . ., are called prolate spheroidal wavefunctions (PSWF). The corresponding eigenvalues ordered to form a decreasing sequence, have a typical stepwise behaviour: they are approximately equal to one for values of the index less than 2c/x and then fall off to zero exponentially. Notice that 2c/n is just the number of sampling points in Eq. (82) within the closed interval [ - 1,1]. The main properties of PSWF, usually normalized to one with respect to are the following: the norm of L2(- 00, +a),

ak;

a) the norm of $&, x) in L2(- 1,l) is b) $k(c,x) and $j(c, x), with k # j , are doubly orthogonal, i.e., orthogonal with respect to the scalar product of L2(- 1,l) and with respect to the scalar product of L2(- 00, + co); c) Ic/k(c,x) has exactly k zeros within the closed interval [- 1,1]; d) the set of $&, x) forms an orthonormal basis in PW, while the set of & ” 2 $ k ( ~ , forms ~ ) an orthonormal basis in L2(- 1,l); e) $k(c, x) are also eigenfunctions of the differential operator (Df)(x) = -[(1 - x2)f’(x)]’

+ c’x~~(x),

which is a self-adjoint operator in L2(- 1,l) with boundary conditions defined by the requirement that the eigenfunctions are bounded at the points f 1.

LINEAR INVERSE A N D ILL-POSED PROBLEMS

31

The last property is quite important in practice for the computation of PSWF and is also important from a theoretical point of view, since it provides one of the few examples of a differential operator commuting with an integral operator with analytic kernel (Griinbaum, 1986). A second example will be given in Section II,G. Finally, by investigating the operator LL* it is easy to conclude that the singular system of the operator (83) is given by

*k -- h‘/2 k

uk(x) =

9

A;”2$k(C,X)

(86)

As a concluding remark, we point out that a related problem is the inversion of the integral operator 1

( ~ ( x=)S_l(.i.).inc[c(x

-

y)/n~y)dy,

--co

<x

-= +a,(87)

which arises in the investigation of optical systems (Bertero and Pike, 1982). The difference from the Slepian operator (84) is obvious: while in that case the range of the variable x is restricted to the interval [- 1,1], here it is the entire real line. As a consequence the operator (87) is not self-adjoint, though it is still possible to show that it is a compact operator from L2(- 1,l) into L2(- 00, + a). Moreover, its singular system can be given again in terms of PSWF. More precisely, the singular values a, and the singular functions uk are given again by Eq. (86), while the singular functions uk are now given by uk(x)= $k(c,x). The operator (87) is also basic in the solution of the problem of bandwidth extrapolation (Viano, 1976; Bertero et al., 1980a). G . Laplace Transform Inversion

In several domains of experimental science, the experimenter is concerned with the problem of recovering and resolving exponential relaxation rates. Many examples come to mind: nuclear magnetic resonance in chemistry and, more recently, in medical imaging, photon correlation spectroscopy, fluorescence, sedimentation equilibrium and, in general, relaxation kinetics. In all such cases, the basic problem is the inversion of the Laplace transform g ( p ) = ( L f ) ( P )=

jo+m

e-pmdt.

(88)

It is well known that, in general, g ( p ) is analytic in the half-plane Re(p) > p o and that an inversion formula can be given using contour integration in the complex plane. This formula, however, is useless in practice because the available data will be noisy values of g ( p ) for a j n i t e number of

32

M. BERTERO

values of p . Here we assume that the values of g( p ) are known for all values of p in (0, + a)and defer the case of discrete data to Section II,H. The linear mapping L defined by Eq. (88) is continuous and injective in L2(0,+a).In fact, from Eq. (66) and recalling the fact that the Mellin transform of the exponential is the Gamma function, we have

(Jld(5)= r(3+ it)(JKf)(- t).

(89)

Then, using the bound

and Eq. (64) with f replaced by L f , we obtain

Io+

ml( W ( P ) 1 2 dP

7L.

Io+

mlf(t)12 dt.

(91)

Moreover, from Eqs. (89) and (68) we derive the following inversion formula for the Laplace transform:

We conclude that L is a continuous self-adjoint operator in Lz(O,+a), with ((LII= and that L-' exists but is not continuous. An important case which has been investigated only recently (Bertero et al., 1982) is the inversion of the Jinite Laplace transform, i.e., the inversion of the Laplace transform of a function with bounded support within a given interval, say [a, b],0 < a < b < 00. The dual problem is the inversion of the Laplace transform with limited data, i.e., the case of a Laplace transform known only on the interval [a, b] where no restriction is introduced on the support of the unknown function. Because of the scaling properties of the Laplace transformation, it is not overly restrictive to assume that the support off is in [l,?], so that the finite Laplace transformation is defined as follows

6,

+

( L f ) ( p )=

1'

e-Py(t) dt,

0 5 p 5 co.

(93)

It is easy to recognize that L is a compact operator from Lz(l,y) into L2(0,+ a )and that its inverse operator L-' exists. Analogously, its adjoint operator (L*g)(t) =

Jo+m e-'Pg(p) dp,

1s t

sy

(94)

is also compact and invertible, so that the range of L is dense in L2(0,+ CO). It

LINEAR INVERSE AND ILL-POSED PROBLEMS

33

follows that if we introduce the singular system of L , i.e., {ak;u k ,v k } ,then the singular functions u k form an orthonormal basis in Lz(l,y) while the singular functions u k form an orthonormal basis in Lz(O, co). From the general results on compact operators presented in Section II,A, we know that the uk are eigenfunctions of the operator L*L given by

+

( L * L f ) ( t )=

1 =f(s)ds,

1 I t Iy

(95)

which can be called the jinite Stieltjes transformation. Analogously, the singular functions uk are the eigenfunctions of the operator L*L given by

The singular values ak have been computed numerically by Bertero et al. (1982) for several values of y. If they are ordered to form a decreasing sequence, then it is possible to prove that, for fixed k, is an increasing function of y. As concerns the dependence on k for fixed y, the ak tend to zero very rapidly so that only a few singular values are significantly large. For example, in the case y = 5 only five singular values are greater than a, = 0.8751, a, = 0.1935, a, = 0.03827, a3 = 0.007434, and cr4 = 0.001435. A remarkable property, similar to the basic property of PSWF, is that it is possible to find differential operators commuting with the integral operators L*L and LL* (Bertero and Griinbaum, 1985). This result also provides a tool for the computation of the singular functions (Bertero et al., 1986a). More precisely, the singular functions uk are the eigenfunctions of the second-order differential operator

(D“Y)(t) = - [(t2 - l)(y2 - t2)f’(t)]’

+ 2(tZ - l ) f ( t ) ,

(97)

which is a self-adjoint operator in L z ( l ,y) with boundary conditionsdefined by the requirement that the eigenfunctions be bounded at the two points 1 and y. On the other hand, the singular functions u k are the eigenfunctions of the fourth-order differential operator

+

which is also self-adjoint in L2(0, 00) if one looks for eigenfunctions that are bounded at the origin and square-integrable at infinity. Using these results it is possible to prove (Bertero and Griinbaum, 1985) that:

a) all the singular values have multiplicity 1; b) the singular function uk has exactly k zeros interior to [l, y]; the points 1 and y can never be zeros of uk.

34

M. BERTERO

There is also an interesting relationship between the singular functions uk and the Legendre polynomials in the limit y -, 1 (Bertero et al., 1986a).

H . Generalized Moment Problems A generalized moment problem can be defined as follows. Let 41,&, &, . . . be a sequence of linearly independent functions in the Hilbert space X ; then find a function f E X such that

(f,4n)x = 9.;

n = 1,2>3,...,

(99)

where gl, g 2 ,g 3 , . . . is a sequence of given numbers. This problem has the general form (5) if we introduce a linear operator L that transforms a function f E X into a sequence of numbers according to Eq. (99). Moreover, we will assume that the data space Y is 12, i.e., the space of square-summable sequences with norm defined by

where we have denoted by g the sequence g r ,g 2 , g 3 , . . . If the $, form an orthonormal basis in X,then the problem is trivially wellposed and there exists a unique solution for any g E Y, given by the expansion

The problem is also well-posed when the set of functions 4,, is nearly an orthonormal basis in the following sense. There exists an orthonormal basis and a positive number 6 < 1 such that, for any sequence g E Y,

($,,I

In this case, one can prove (Riesz and Sz. Nagy, 1972) that there exists a dual basis {@}, i.e., a set of functions satisfying the conditions

(4,,,drn)x=

n, m = L2,. . . ., (103) and that the sets {6,,pn> and (@} form a biorthogonal basis in X . This result implies that there exists a unique solution of the problem (99) given by

Moreover the mapping g -+ f is continuous since, as follows from inequalities proved by Riesz and Sz. Nagy (1972),

llfllx

s (1 + ~ ) 1 1 9 1 1 Y .

(105)

LINEAR INVERSE AND ILL-POSED PROBLEMS

35

Applications of this result to the theory of non-harmonic Fourier series have been summarized by Riesz and Sz. Nagy (1972). When the functions q5,, do not satisfy the previous conditions and, in particular, the angle between &, and &,+ tends to zero as n + 00, the problem can be ill-posed. It is easy, however, to give a general condition for the uniqueness of the solution or, in other words, for the existence of the inverse of the operator L: the solution is unique if and only if the span of the q5n is dense in X . Otherwise the null space of L is just the orthogonal complement of the span of the functions +,,, A classical example that satisfies the requirement of uniqueness is the Hausdorff moment problem: find a function f defined on (0,l) from the values of its moments gn =

lo1 xn-

'f(x) dx;

n

=

1,2,3,.. .

(106)

This problem has the form (99) if X = L2(0,l), in which case the span of the functions &(x) = x n - is dense in X . The operator L: L2(0, 1)+ 12, defined by Eq. (106) is continuous and [lLll = The range of L, however, is only dense in l 2 and therefore L-' is not continuous. In fact, the characterization of necessary and sufficient conditions for the numbers g 1 , g 2 , g 3 ,.,. to be the moments of a function f E Lp(O,l),p > 1, has been the subject of several elegant mathematical investigations (Widder, 1946). A condensed discussion of the main results has been given by Talenti (1987). It is important to point out that the Hausdorff moment problem is related to Laplace transform inversion when the Laplace transform is given at the points pn = n - 3,n = 1,2,. .. In fact, if we consider the generalized moment problem

6.

gn =

jo+m

n = 1,2,. . .

e-("- li2)'u(t)dt;

(107)

using the change of variables x = e-' and introducing the function f(x) = -In x), we transform this problem into the problem (106). Moreover, if the function u ( t ) belongs to L2(0,+a), then the function f ( x ) belongs to L2(0,1). Other examples are the Stieltjes moment problem, i.e., the problem of determining a function f ( x ) defined on (0,+a)from knowledge of its moments

x-%(

gn = jo+mx"-'f(x)dx;

n

=

1,2,3, ...

( 108)

or the Hamburger moment problem, which is the problem of determining a function defined on ( - 03, + m) always from knowledge of its moments. It is obvious that the functionals (108) are not continuous in L2(0,+a)and,

36

M. BERTERO

therefore, the problem must be considered in some suitable weighted L2-space or, roughly speaking, one must consider a Hilbert space of functions that tend to zero at infinity more rapidly than any inverse power of x. The uniqueness of the solution, however, is not assured in general (it depends on the choice of the space) since, as was shown by Stiltjes, all the moments of the function f(x)

= exp( - x ' / ~ ~ ) in(x''~)

(109)

are zero (Widder, 1946). Another problem is Poisson transform inversion (Saleh, 1978; Bertero and Pike, 1986), a problem related to inversion of photon counting distributions for the purpose of obtaining distributions of classical light intensity fluctuations 1

r+m

The solution of this problem is unique in L2(0,+ co),as easily follows from the completeness of Laguerre polynomials. Note that the inversion of the Poisson transform is equivalent to the solution of the Stieltjes moment problem in a suitable weighted space with exponential weight. Moreover, if the problem (1 10)is also formulated in a weighted space with exponential weight, i.e., if we define the norm of f(x) as follows: r+w

with fi > 0, then, as shown by Bertero and Pike (1986),the operator L : X -,1' defined by Eq. (110) is compact and one can use its singular system for the investigation of Poisson transform inversion. 111. LINEARINVERSE

PROBLEMS WITH

DISCRETE DATA

In most of the examples of Section 11, it is assumed that the data are known everywhere in some domain of the measured variable. For example, it is assumed that the scattering amplitude or the diffraction pattern is known for all values of the scattering angle in a given interval. Analogously, it is assumed that the Fourier transform of the unknown function is known for all frequencies in a given interval, and so on. Such an assumption, however, does not provide a satisfactory model of real experimental situations. In practice, one has only a finite number of detectors that can only measure the data function at a finite number of points. Therefore, the output of an experiment is a set of (real or complex) numbers g l , g 2 , . . . ,g N .These numbers can be viewed as the components of a vector which we will call the data vector, denoted g.

LINEAR INVERSE AND ILL-POSED PROBLEMS

37

A . General Formulation

In the case of linear problems, we can assume that the g, are the values of prescribed linear functionals of the unknown solution (Bertero et al., 1985a). Consider, for example, the case where the physical quantity g ( x )measured by the detectors is related by an integral operator to the unknown function f ( y ) , so that the inverse problem is the solution of a Fredholm integral equation of the first kind, i.e., Eq. (6). Then, if the responses of the detectors are linear, their outputs g, will be proportional to the values of g at some points x , . By neglecting the constant related to the efficiency of the instrument, we may write Yn

= g(xn) =

s

K(xn,Y)S(Y)d y -

(1 12)

Such an equation does not take into account the fact that any detector integrates over some region in the domain of the physical variable x . When this effect cannot be neglected, Eq. (1 12) must be replaced by the equation

Here P,(x) is an averaging function that describes the integration effect of the n-th detector. Generally, it has a peak centred near the experimental point x = x , . The r.h.s. of Eq. (1 12)or (1 13)is a scalar product in L2 and therefore defines a linear and continuous functional in this space. More generally, we can assume that the data g, depend continuously on the object f and that the object space X is a Hilbert space (regarding the choice of this space, recall the considerations developed in Section 11,A). Since, by the Riesz representation theorem (Balakrishnan, 1976), any linear continuous functional on a Hilbert space X can be represented as a scalar product, we can summarize the remarks above as follows: A ) The design of an experiment for the indirect determination of a physical quantity f consists in specifying a finite set of linear continuous functionals F,, n = 1,2,. . .,N . The output of the experiment is the set of values g, of these functionals F,. B) Given the object space X , if the functionals F, are continuous on X,then with any functional F, we may associate a function q5n such that F,(f) = (f,+A;

n

=

1,. .. ,N .

(1 14)

38

M. BERTERO

C) In the case where experimental errors or noise are neglected, then the linear inverse problem with discrete data corresponding to the experiment specified above consists in determining a function f E X satisfying the equations

(f,4")x;

n = L...,N. (1 15) The previous scheme applies also to problems where the data are intrinsically discrete, in the sense that, even in the ideal case, they do not depend on a continuous variable. A very simple example is provided by the Hausdorff moment problem already discussed in Section I1,H. Another important example is the determination of the physical properties of a vibrating system (for instance, the density of a vibrating string) from knowledge of its eigenfrequencies. Such a problem, discussed briefly in the Introduction, also has interesting applications in the investigation of the structure of the Earth in the large. In this case, a successful approach (Backus and Gilbert, 1968; 1970) consists in postulating an Earth model about which the nonlinear inverse problem is linearized. The resulting linear problem is consistent with the general definition given above. If we remember the definition of a generalized moment problem given in Section II,H, we conclude that a linear inverse problem with discrete data is always a finite section of a generalized moment problem. It is obvious that the information about the physical quantity f that can be extracted from Eq. (1 15) is incomplete. If we denote by X, the linear finitedimensional subspace spanned by the functions 4,,, then the data vector g depends only on the orthogonal projection off onto X,. Any function f which is orthogonal to X , produces a zero data vector and therefore cannot be recovered by means of the experiment specified by the functions 4,,.According to the general definitions given in Section II,A, the component off orthogonal to X , can be called the invisible component off (Rust and Burrus, 1972; Bertero et al., 1985a), since this component cannot be detected by the experiment. In the language of mathematics, this means that the solution of Eq. (115) is never Sn =

unique. When the 4,, are linearly independent, so that the dimension of the subspace X , is exactly N , the component off onto X,, the visible component of f which we denote f +, must be a linear combination of the (6, and therefore can be written in the form

f+ =

N

2 am$,,,.

m=l

Substitution into Eq. (1 15) shows that the coefficients a, must solve the linear system N

LINEAR INVERSE A N D ILL-POSED PROBLEMS

39

where the quantities Gmn

= ( 4 m 7 4n)x = G,*m

(118)

are the elements of the Gram matrix. If we denote by G"" the elements of the inverse of the Gram matrix (recall that it is invertible if and only if the functions 4, are linearly independent) satisfying the relations

and we also introduce in XN the dual basis given by N

so that the sets {4"}and

(4,) form a biorthogonal basis in X,, then we have N

f'

=

2 g,4",

n= 1

which is just the finite-dimensional version of the solution (104) of a wellposed generalized moment problem. This representation clearly shows that f' depends continuously on the data. If 6g is a small variation of g and 6f' the corresponding variation off', then IISf'llx tends to zero as 6g tends to zero. Continuous dependence of the solution on the data, however, does not imply numerical stability which is related to deeper properties of the Gram matrix of the functions 4,,. Even if the functions 4,, are linearly independent, some (or many) of the 4, may be nearly parallel. In this case, the problem of determining the component off onto XNexhibits numerical instability. In fact, this problem is equivalent to inversion of the Gram matrix of the functions +,, and, as is known, the Gram matrix indicates how much the functions (vectors) $, depart from an orthogonal system: it becomes ill-conditioned when the vectors are close to a linearly dependent system. We recall that, in the finitedimensional case, ill-conditioning means essentially that the smallest eigenvalues cluster near zero while the others spread elsewhere. In order to quantify the stability of the problem of determining the visible component off it is necessary to introduce a measure of the errors on the data or, in other words, to introduce a metric in the data space Y . We assume that Y is a Euclidean space with scalar product defined by N

the weights W,, being the matrix elements of a given positive matrix W. The simplest choice of the weights is obviously W,, = 6",. On the other hand, in the case of least-squares problems it is quite natural to relate W to the

40

M. BERTERO

covariance matrix C of the errors on the components of the data vector. In linear regression theory, the relation is

w = c-1

(1 23)

and in this case the choice W,, = S, corresponds to white noise. Another possible choice will be discussed in Section III,E in connection with the problem of moment discretization. We can define now a linear operator L from X into Y that transforms a function of X into a vector of Y according to the rule (Lf)n = ( f 9 4 n ) x ;

n = 1,***,N*

(124)

The mapping is onto when the $n are linearly independent, otherwise the range of L is a subspace of dimension N' < N, where N' is the number of linearly independent c&. Moreover, in terms of L , Eq. (1 15) can be written in the form

(125)

g = Lf,

which has precisely the general form (5). We consider, for the sake of simplicity, the case where the 4,, are linearly independent. Since L is a finite rank operator, we can always introduce its singular system { a k ;uk;v&,},which is the set of solutions of the shifted eigenvalue problem Lu,

= akvk,

L*v,

= akuk

k = 0 , 1 , ..., N - 1 . (126) The adjoint operator defined by Eq. (16) transforms a vector of Y into a function of X . Its explicit expression is as follows:

A few remarks about the computation of the singular system of L are in order. The singular vectors vk are the eigenvectors associated with the eigenvalues a: of the operator LL*. This is an operator in Y and therefore can be characterized by means of a matrix which we will denote 2. Combining Eq. (124) with Eq. (127), it follows that

2 = CTW

(128)

where G Tdenotes the transpose of the Gram matrix (notice that the rank o f t coincides with the rank of G since W is positive-definite). It follows that the computation of the singular values and singular vectors is a standard eigenvalue problem. When this has been solved, the corresponding singular functions uk can be obtained by means of the second of the equations in (126)

LINEAR INVERSE AND ILL-POSED PROBLEMS

41

which, using Eq. (127), can be written explicitly as follows

where ( v k ) m is the m-th component of the vector v k . Since the singular vectors vk form an orthonormal basis in Y while the singular functions t t k form an orthonormal basis in xN,it is easy to obtain the following representation of the visible component off:

As we have already remarked in the Introduction, the determination off ’is a well-posed problem and therefore the propagation of relative errors from the data to the solution is controlled by the condition number. If 6 g is a small variation of g and 6f’ the corresponding variation off’, then Eq. (10) holds true with

cond(l) = u0/uN- 1 .

(131)

This inequality is precise in the sense that equality can hold true. In spite of this fact, however, the condition number may be a rather pessimistic estimate of error propagation. Equality holds in a very special case which, in general, is not satisfied in practice. For this reason, the condition number can be called the “worst magnification” of relative errors (Twomey, 1974). A more realistic estimate is given by the “average magnification” of relative errors which we will denote (cond(L)). Its expression is the following (Twomey, 1974):

(1

1/2 N - 1

(cond(L)) = N-’r‘’o:> k=O

k=O

o;’)~’~,

It is easily verified that this quantity is always smaller than cond(l) but greater than N - ’ cond(l). It is important to point out that the ill-conditioning of an inverse problem with discrete data usually derives from the fact that it is the discrete version of an ill-posed inverse problem. A few relevant examples will be discussed in succeeding sections. From these examples it is clear that when the number of data points increases, the ill-conditioning of the problem also increases. The reason is that by increasing the number of data points, one obtains a sequence of discrete problems that provide better and better approximations of the illposed infinite-dimensional inverse problem, whose condition number is infinite. As a consequence, an increase in the number of data points produces an increase of the instability in the computation of the visible component off without a significant increase of the information content of the data, because

42

M. BERTERO

the new functions 4, are nearly parallel to the subspace spanned by the previous one. On the other hand, when the number of data points is sufficiently small, an inverse problem with discrete data can be well-posed. Since the number of data points cannot be too small (the information content of the data is then too poor), it follows that there exists an optimum number of data points, which corresponds to a compromise between stability and information content. In other words, there exists an optimum experiment for the determination of the desired physical quantity. To our knowledge, however, such a problem has not yet been solved either from the theoretical or from the practical point of view. Some interesting results in this direction have been obtained in the case of the finite Hausdorff moment problem (Talenti, 1987). B. Fourier Transform Inversion with Discrete Data As a first example of an inverse problem with discrete data, we consider the problem of determining a function of bounded support when its Fourier transform is given at a finite number of points. As in Section II,F, let us assume that the support of the function f is interior to the interval [ -c, c ] . If the points where is known are x,, x 2 , .. . ,x,, we have the problem

7

exp(-ix,y)f(y)dy;

gn =

n = 1, ..., N .

(133)

-C

If we assume that f E L2(-cc,c)and that the data space Y is the usual Euclidean space, i.e., the scalar product is defined by Eq. (122) with W,, = h,,, then the operator given by Eq. (128) coincides with the Gram matrix (which is symmetric) and the latter is given by

e

G,, = 2c sinc[c(x, - x,)/n].

( 134)

The singular values 0, of the problem are the square roots of the eigenvalues of G and the singular vectors v k are the corresponding eigenvectors of G . Then the singular functions uk are given by N UL(Y)

=a ;'

n= 1

(vk)nexp(-ixny)*

(135)

Consider now the important case of uniformly spaced sampling points x, = x1 + d(n - 1);

n = 1, ..., N ,

(136)

where d is the sampling distance. When d = n/c, the Fourier transform is sampled at the Nyquist rate. In this case, the functions +,(y) = exp( - ix,y) are orthogonal and the Gram matrix is

43

LINEAR INVERSE AND ILL-POSED PROBLEMS

a multiple of the unit matrix, G,, = 2c d,,,,. The problem is well-conditioned and the solution (121) is the truncated Fourier series

When d < n/c, the Fourier transform is over-sampled and the problem becomes ill-conditioned. A qualitative argument for explaining this fact is the following. If we fix the interval where the data are given, the optimum number of data points is obtained when the data are sampled at the Nyquist rate, as shown by the orthogonality of the functions 4” in this case. If we increase the sampling rate, we add more and more points which are less and less linearly independent from the previous ones and which therefore d o not contain significant new information about the function f . A quantitative analysis of the problem follows from the properties of the matrix W < 1/2,

Sn, = 2Wsinc[2W(n - m)],

(138)

which can be obtained from the Gram matrix (134)(with x, given by Eq. (136)) if we put W= cdf2z and multiply G,, by d/2zt. The matrix S has been studied by Slepian (1978) who denotes its with &(N, w) > Al(N, w) > ... >ANw), and deeigenvalues &(N, fines the discrete prolate spheroidal sequences (DPSS) as the real solutions, for k = 0, 1 , . . . , N - 1, of the system of equations

w),

N

1 Sn,u‘,k’(N,w) = l k ( N , W)u(,k)(N,w ) ;

n

m= 1

= 0,

1, f2,. ... (139)

Therefore, the eigenvectors of S are obtained by index-limiting the DPSS to

(LW. It follows that the singular values of the problem (133) are proportional to the square roots of the eigenvalues of S, 0,

= [h&(N, W ) / d ] ” ’ ;

k = 0, ..., N - 1

( 140)

while the singular vectors are obtained by simply index-limiting the DPSS (vL),, =

u(,~’(N, W);

k

= 0,..., N -

1;

n = 1,..., N ,

(141)

since, according to the definition given by Slepian, the Euclidean norm of these vectors is one. Finally the singular functions uk(y) are related to the discrete prolate spheroidal waue functions (DPSWF) defined by Slepian as follows:

where Ek

=

1 where k is even, and

Ek

= i when k is odd. In fact, by comparing

44

M. BERTERO

Eq. (142) with Eq. (135) (x, defined in Eq. (136)) one finds that uk(J’) = (Ok€k)-’eXp{i[X1 f ( N - l)d/2]}&(N,

w ;dy/2n).

(143) Slepian proved in (1978) that the DPSWF are simultaneously eigenfunctions of an integral and a differential operator (a property analogous to a basic property of PSWF); they are doubly orthogonal in the sense that they are orthogonal both with respect to the scalar product of L z ( - W, W) and with respect to the scalar product of L 2 ( - $ , $ ) ; moreover Uk(N,W ;x) is an even or odd function of x according to the parity of k, has exactly k zeros in the open interval (- W, W ) ,and exactly N - 1 zeros in the interval (-$,$I. In Fig. 1 we give a plot of DPSWF in the case N = 5 and W = 0.2. The corresponding singular values (square roots of Lk(N,W ) ) are a, = 0.993, a, = 0.875, 0, = 0.481, a, = 0.128, and 0, = 0.0165. Slepian, (1978) gives asymptotics of the eigenvalues and eigenfunctions, proving, in particular, that as W -, 0 and N -+ co,so that 7tNW -+ c, we have &, @uk(N, w ;wx) $k(c,x), (144) where &(c, x) is the PSWF of order k and Ak the corresponding eigenvalue. If we recall the behaviour of the eigenvalues of PSWF, this property shows that when the number of sampling points in a fixed interval is large, the number of singular values nearly equal to one is approximately equal to the number of sampling points corresponding to the Nyquist rate. Finally, when d > n/c, the Fourier transform is sampled below the Nyquist rate and the problem becomes well-conditioned, since each sampling point then contains an independent piece of information. For example, if N = 5 and W = 0.8,the square roots of the eigenvalues of the matrix (138) are a, = 1.414, a, = 1.408, a2 = 1.330, a, = 1 . 1 1 1 , and o4 = 1.007, with condition number equal to 1.404. &(N, w,

-+

C . Interpolation and Numerical Derivation

Interpolation is the problem of determining a function f when its values are given at a finite number of points. Analogously, numerical derivation is the problem of determining f ’ from the same data. These are classical problems of numerical analysis and have recently found interesting applications to inverse problems in computational vision (Grimson, 1982; Torre and Poggio, 1986; Bertero et al., 1986b). We show the relationship between these problems and the general formulation of an inverse problem with discrete data in the simple case of functions depending on one variable. Then the problem is one of determining a function f(x) defined on the interval [a, b] that takes prescribed values gn at

LINEAR INVERSE AND ILL-POSED PROBLEMS

45

lL=L==L5

-0.5

FIG. 1. Plot of the discrete prolate spheroidal wave functions (DPSWF) in the case N = 5 and W = 0.2.

the N points x 1 , x 2,..., x N ,a 5 x1 < x2 <

e

.

0

< x N s b:

n = 1 , . .., N . (145) When we have a differentiablesolution f ( x )of this problem, then a solution of the corresponding problem of numerical derivation is just f ’ ( x ) . It must be pointed out, however, that the problem of numerical derivation can be formulated independently of the corresponding problem of interpolation. If gn = f ( x , ) ;

46

M. BERTERO

we put f ’ ( x ) = h(x) and we assume that x 1 = a, then the problem is gn - g 1 =

:J

h(x)dx;

n = 2, ..., N

(146)

which is already in the form (115), at least when X = LZ(a,b). The interpolation problem (145) can be formulated in the form (1 15) if the space X of the solutions is a reproducing kernel Hilbert space (RKHS), i.e., a Hilbert space of continuous functions such that all the evaluation functionals are continuous. In fact, iff E X then, from the Riesz representation theorem (Balakrishnan, 1976),it follows that for given x E [u, b ] , there exists a function Q , E X such that f ( x ) = ( f ,Q J x (147) which shows that the problem (145) has the form (1 15), the functions r#+, being the functions Q , associated with the points x,. The symmetric kernel Q(x, x ‘ ) = (Q,, Q x , ) x = QAx’) = Q J x )

( 148)

is called the reproducing kernel of X (Aronszajn, 1950). Once the RKHS has been chosen, the Gram matrix of the interpolation problem is given by Gnm

= Q(xn9 xm)*

(149)

We now consider two examples.

I . Interpolation of Band-Limited Functions The first example is the interpolation of a band-limited function. Let X be the Paley-Wiener space PW, already discussed in Section II,F. Then, if B, is the band-limiting operator (80),for any f E X we have B,f = f showing that PW, is a RKHS with reproducing kernel Q(x, x ’ ) = (n/c)sinc[c(x - x ’ ) ] .

(150)

This interpolation problem is strictly related to the problem (133) of Fourier transform inversion with discrete data, since one can solve the latter just by determining a band-limited function that interpolates the data values gn and then take the inverse Fourier transform of the result. As a consequence, the two problems have the same Gram matrix, except for a multiplicative factor, as follows from Eqs. (149) and (150). Moreover, the singular values of this interpolation problem are the singular values of the Fourier inversion problem, multiplied by (27r)-’/’, the singular vectors are the same and the singular functions are given by N

u k ( y )= o;llz

C (v,),,(n/c)sinc[c(x - x,,)]. n=l

(15 1)

LINEAR INVERSE AND ILL-POSED PROBLEMS

47

Notice that these singular functions are just the inverse Fourier transforms of the singular functions (135) (multiplied by a suitable normalization factor). In the case of equidistant points, Eq. (136), the analysis runs parallel to that of the Fourier transform inversion. In particular, if d = n/c, the solution (121) is the interpolating function obtained by truncating the Whittaker-Shannon expansion (82) and is exactly the inverse Fourier transform of the solution (137). When d < n/c, the singular system can be expressed again in terms of the eigenvalues and eigenvectors of the Slepian matrix (138). As follows from Eqs. (151) and (141), the singular functions are the interpolating functions of the index-limited DPSS and also the inverse Fourier transforms of the singular functions (143). Notice that the singular functions of the interpolation problem are even or odd according to the parity of k only when the set of the data points is symmetric, i.e., x1 = - d ( N - 1)/2. In Fig. 2 we give a plot of these singular functions in the case N = 5, W = 0.2, where W = cd f 2n. Notice the similarity of these functions and DPSWF of Fig. 1. This is explained by the limiting property (144) if one recalls that the Fourier transform of a PSWF is still a PSWF. Like the problem of Fourier transform inversion, the interpolation problem is ill-conditioned when d < n/c, due to the fact that the sampling process becomes maximally efficient when the data are sampled at the Nyquist rate. It must be observed, however, that efficient reconstructions of oversampled functions can be obtained using generalized sinc-series (Campbell, 1968; Natterer, 1986b). By means of these series, f ( x ) may be evaluated using a small number of sampling points in the neighborhood of x and this provides a nearly local method. It may be that these generalized sinc-series yield a sort of regularization of the ill-conditioned interpolation problem, though they have not yet been investigated from this point of view. Finally the case d > n/c has the same properties of the corresponding case in the problem of Fourier transform inversion. 2. Interpolation by Spline Functions The second example is the interpolation of a function defined on the interval [0,1] which has square-integrable derivatives up to order k. For the sake of simplicity, we consider only the cases k = 1 and k = 2. The case of arbitrary k is discussed by Bertero et al. (1985a). Let X be the space of the continuous functions with square-integrable first derivative-this space is usually denoted H'(0, 1)-and let us introduce in X the scalar product (152)

48

M. BERTERO

A---+-a - 0.5

-’ t

FIG.2. Plot of the singular functions for the interpolation of a band-limited function with bandwidth c = 2nW, W = 0.2 given at the points 0,+ 1, +2.

Then, using the Taylor formula, it is easy to show that the reproducing kernel is given by Q(x,x’) = 1

+ x - (x - x’)+ = 1 + min(x,x’}

(153) where x, is a function which is zero when x < 0 and equal to x when x > 0.

49

LINEAR INVERSE AND ILL-POSED PROBLEMS

Since 4,,(x) = Q(x,x,), it is obvious that, in this case the solution (121) is just the linear interpolation of the data values gn. Let X now be the space of continuous functions with square-integrable second derivative-this space is usually denoted HZ(O,1)-and let us introduce X the scalar product

Then X is RKHS and, again using the Taylor formula for a function f easy to show that the reproducing kernel is Q(x, x') = 1 + XX'

+ xZx'/2 - x3/6 + (X - x ' ) : / 6 .

EX

, it is

(155)

If we recall that the class S,,,(x,, x z , . , . . ,x N )of spline functions of degree m having the knots x l , x z , . . . ,xNis the set of all functions s(x) which have the representation

where p(x) is an arbitrary polynomial of degree S m (Greville, 1969), we see that the functions &(x) = Q(x,xn) are spline functions of degree 3 (cubic splines) and that the subspace XN spanned by the functions &(x) is a linear subspace of S,(x, ,xz, . . . .,xN).Therefore, the interpolation provided by the solution (121) or (130) is just an interpolation in terms of cubic splines. Interpolation in terms of natural cubic splines (Greville, 1969) is obtained by minimizing the L2-norm of the second derivative off. Since this is a seminorm, this kind of problem will be discussed in Section III,B. Finally, if we consider the problem (146) in the space (152), the corresponding generalized solution is obtained by interpolating the data values gn in terms of cubic splines and then by differentiating the result (Bertero et al., 1986a). Interpolation of functions of two variables in RKHS has been studied by Duchon (1976) and Wahba and Wendelberger (1980).

D. Finite Hausdor- Moment Problem We consider a finite section of the Hausdorff moment problem and therefore wish to determine a function f ( x ) defined on the interval [O, 11 from knowledge of its first N moments: ri

gn =

J -xn-'f(x)dx; 0

n = 1, ..., N

(157)

50

M. BERTERO

Let X be L2(0,1) and Y the usual N-dimensional Euclidean space. Then the problem has the form (115) with &(x) = xn-'. It follows that X , is the subspace of polynomials of degree N - 1 and that the solution (121) is also a polynomial of degree N - 1. A well-known procedure that can be used for the computation of this solution consists in representing f '(x) in terms of shifted Legendre polynomials L,(x) (Papoulis, 1956) N

We recall that the shifted Legendre polynomials are uniquely defined (except for sign) by the following properties: a) the degree of L,(x) is exactly n; b) (L,, = h,,(n, m = 0,1,2,. . .), the scalar product being that of L'(0, 1). The relation between the shifted Legendre polynomial L,(x) and the usual Legendre polynomial P,(x) is as follows: L,(x) = (2n + 1)'/2P,(2x - 1). Now by substituting the representation (158) in Eq. (157) and taking into account the properties of L,(x), one finds that n

gn =

m= 1

BnmCm;

n = 1,

*

3

N,

(159)

where f 1

= (2m -

1)1'2(-1)m-1[(n - 1)!]'/[(n

-

+

m)!(n m - I)!].

Since in Eq. (159) the summation index extends from 1 to n, it is possible to obtain recursively the coefficients c, from the moments g, and the coefficient c, depends only on moments of order In.Therefore, it is not necessary to change the algorithm when the number N of the given moments is changed. The previous algorithm is very simple but also very ill-conditioned even when N is moderately large (for instance, N = 10). This can be shown by introducing the singular system of the problem. The Gram matrix is now given by G,, = (n

+ m - l)-';

n, m

=

1 ,..., N,

(161)

and this is a well-known example of an ill-conditioned matrix, called the Hilbert matrix and denoted HN(- 1). The Hilbert matrix is often used for testing numerical algorithms (Gregory and Karney, 1969). A remarkable property of this matrix is that it commutes with a tridiagonal matrix (Griinbaum, 1982), so that its eigenvectors can be easily computed.

LINEAR INVERSE AND ILL-POSED PROBLEMS

51

It follows that the singular values of the finite moment problem are just the square roots of the eigenvalues of HN(- l), the singular vectors are the corresponding eigenvectors of HN( - l), and the singular functions are orthogonal polynomials of degree N - 1 obtained by means of Eq. (129). In order to give a numerical example of the ill-conditioning of the moment problem, we give the singular values in the case N = 6: a,, = 1.2724, a, = 0.4923, a2 = 0.1277, 03 = 0.2481 x lo-', 0, = 0.3545 x lo-', 0, = 0.3291 x lop3with a condition number cond(l) = 3866. The plot of the corresponding singular functions is given in Fig. 3. For large N , the asymptotic estimate of the condition number is (Gregory and Karney, 1969) cond(l) =

I

- 24

-2i

( 162)

I

A 2

t

-2

FIG.3. Plot of the singular functions of the finite Hausdorff moment problem in the case N = 6.

52

M. BERTERO

and therefore it increases very rapidly for increasing N . We have here a very clear example of the relation between ill-conditioning and the number of data points, as discussed at the end of Section II1,A. Another problem related to the moment problem is the inversion of the Laplace transform when this is given at the points p, = n - $(n = 1,2,. . ,N ) . This relation has been already discussed in Section II,H. In Fig. 4 we give the singular functions of this problem, always in the case N = 6. These singular functions can also be obtained from the singular functions of Fig. 3 by means of a change of variables which transforms Eq. (107)into Eq. (106). For a more general discussion of the Laplace and of the finite Laplace transform inversion in the case of equidistant points and geometrically distributed points see Bertero et al. (1985b).

E. Moment-Discretization of Fredholm Integral Equations of the First Kind Consider a Fredholm integral equation of the first kind, such as that given in Eq. (6), with a continuous kernel K ( x , y ) and continuous data function g(x). Assume also that the intervals [a, b] and [c, d ] are bounded. Then we denote by Lo the integral operator associated with the kernel K ( x ,y ) of Eq. (7).As we know, this is a compact operator from L2(a,b) into L2(c,d ) . We denote its singular system by { c T ~ , U~ O; , k , u , , ~ } . The method of moment discretization, which provides a natural way of approximating a solution of Eq. (6), is as follows (Nashed, 1976b; Nashed and Wahba, 1974; Groetsch, 1984): Given a finite set of points a Ix 1 < x 2 < ... < x N Ib, find a function f ( y ) that satisfies the equations dx,) =

K ( x , , y)f ( y )d y ;

n = 1,. . .,N .

(163)

Jab

This is a problem of the form (115) with $,(y) = K ( x , , y ) if X = L'(a,b). In order to investigate the relationship between the singular system of the integral operator Lo and the singular system of the operator L associated with the problem (163) as defined by Eq. (124), we assume that the points x , are the knots of some quadrature formula. We denote by wl,w 2 , . . .,w N the corresponding weights and introduce in the data space Y a scalar product defined as in Eq. (122) with W,, = w, d",. We recall now that the singular values of Lo are the square roots of the eigenvalues of the integral operator Lo*Lo whose kernel is given by ( 164)

53

LINEAR INVERSE AND ILL-POSED PROBLEMS

1-

Ul

2

3

4

2-

10.

-1

-

1

4

-2-

FIG.4. Plot of the singular functions of the inversion of the Laplace transform given at the pointsp, = n - 1/2, withn = 1,2 ,..., 6.

M.BERTERO

54

and that the singular functions u0,, are the corresponding eigenfunctions. Analogously, the singular values of L are the square roots of the eigenvalues of the finite rank integral operator L*L whose kernel is N

T(Y,Y’)=

1 wnK(Xn,y)K(xn,y’),

n=l

(165)

as follows from Eqs. (124) and (127). Again, the singular functions uk are the corresponding eigenfunctions. Clearly, T(y, y’) is just the approximation of To(y,y’) provided by the quadrature formula corresponding to the knots xn and the weights w,. Since K ( x , y ) is continuous, the kernel T ( y , y ’ ) converges to the kernel To(y,y’) as N -+ co and the maximum distance between adjacent knots tends to zero. Using this result and well-known perturbative lemmas-more precisely, the Weyl-Courant lemma (Riesz and Sz. Nagy, 1972)-it is possible to prove (Bertero, 1986) that the singular values a, of the problem (163) converge to the singular values a0,,of the integral operator Lo. Analogously, the singular functions u, converge, in the norm of L2(a,b), to the corresponding singular functions tio,,. Since the singular values oo,, tend to zero as k + 00, the previous result implies that the ill-conditioning of the problem (163) increases when the number of data points increases. Here we have another example of the property discussed at the end of Section II1,A. A special kind of moment discretization can also be used for the approximate solution of integral equations of the first kind associated with operators of the following type: ( L o f ) ( x )=

J-+;

K ( x - Y)P(Y)f(Y)dY,

--co

< x < +a, (166)

where K ( x ) is a band-limited function with bandwidth c. This operator is compact whenever the functions K ( x ) and P ( y ) are square-integrable. An example is provided by the operator (87), in which case K ( x ) = (n/c)sinc(cx/n) and P ( y ) is the characteristic function of the interval 1- 1,1]. Another important example is an operator whose inversion is related to the problem of data processing in confocal scanning microscopy. It may be obtained from Eq. (166) by setting K ( x ) = P ( x ) = sinc(x) (Bertero et al., 1984). Since the functions in the range of Lo are band-limited, it is quite natural to sample the data at the Nyquist rate, so that we can consider the problem

LINEAR INVERSE AND ILL-POSED PROBLEMS

55

and take w, = n/c. Then the kernels To(y,y ‘ ) and T ( y ,y ’ ) are given by TO(Y7 Y ‘ ) =

P(Y)(

T ( Y , Y ’ )= P(Y)

(“=:N

+-J:

c

) )

K ( x - Y ) K ( X - Y ’ )dx P ( Y 7 (nlc)K(x, - Y ) W ,

- Y‘) P(Y’).

(168) (169)

Using the fact that for N = co, the trapezoidal rule is exact for band-limited functions, it is possible to prove that T ( y ,y ’ ) converges to T’(y, y ‘ ) in the L2norm as N -+ co (with fixed sampling distance). It follows again that the singular values of the problem with discrete data converge to the singular values of the integral operator. It is important to point out that in some cases, it is possible to obtain very good approximations of the first singular values using a very small number of data points. As an example, we give a numerical result obtained in the case of the integral operator mentioned above related to confocal scanning microscopy. In this case, it is possible to derive analytic expressions of the singular values and singular functions (Gori and Guattari, 1985). From this result, = 0.821898, it follows that the first five singular values are given by = 0.450158, cro.z = 0.206417, = 0.150053, = 0.109845. On the other hand, Bertero et al. (1987) have shown that with five sampling points, the corresponding five singular values are a. = 0.821807, a, = 0.413815, oz = 0.206327, a3 = 0.136873, and a , = 0.109751. The error in the even singular values is at most one unit on the fourth digit!

IV. GENERALIZED SOLUTIONS As is well known, the study of Eq. ( 5 ) presents rather complex problems even in the case where this equation is an n x m linear system, i.e., where the operator L is a matrix with n rows and m columns. The existence and uniqueness of the solution depend on the number n of rows, the number m of columns, and the rank p of the matrix (Lanczos, 1961).A great simplification is obtained by introducing the Moore-Penrose generalized inverse of a matrix, which is closely related to the problem of looking for a least squares solution of minimal norm of the original linear system. Such a solution, which is also called the generalized solution of the linear system, always exists and is unique, independent of the number of rows, number of columns, and rank of the matrix. A very important fact for the treatment of linear inverse problems is that the concept of generalized (Moore-Penrose) inversion can be extended to

56

M. BERTERO

the case of linear continuous operators in Hilbert spaces (Nashed, 1976a; Groetsch, 1977). In this case, however, the generalized inverse is not always continuous and therefore the problem of determining the generalized solution may be ill-posed. The essential result is that the generalized solution is unique, exists for arbitrary data g and depends continuously on g when the range of the operator L is closed, even if the requirements of existence and uniqueness of the solution are not satisfied by the original equation (5). On the other hand, when the range of L is not closed (examples are provided by the compact operators and by the convolution operators discussed in Section II,A), the generalized solution is unique though it does not exist for arbitrary data nor does it depend continuously on the data. In some mathematical literature, the term, “ill-posed,’’ is reserved for this case. According to this definition, inverse problems with discrete data are always well-posed (but, of course, they can be very ill-conditioned) and, therefore, the concept of ill-posedness is restricted to the case of problems formulated in infinite-dimensional spaces. In this section, we sketch a procedure that leads to the introduction of the Moore-Penrose generalized inverse for a linear continuous operator and also provide a physical interpretation of this procedure in terms of the concepts of visible and invisible components of the object, introduced in Section I1,A. We also introduce an extension of the Moore-Penrose generalized inverse obtained when the criterion for selecting a least squares solution is not the minimization of a norm, but rather the minimization of a seminorm. Finally, we sketch the Backus-Gilbert method for linear inverse problems with discrete data in order to clarify the analogies and differencesbetween this method and the method of generalized solutions. A . Moore-Penrose Generalized Inverse

The solution of Eq. ( 5 ) may not exist for arbitrary data g because R ( L )is a subspace of Y. In the case of the band-limiting operator B, given by Eq. (80), for example, Y is the space of square integrable functions and R(L)the closed subspace of band-limited functions with fixed bandwidth. Similarly, for a problem with discrete data, R ( L )is a subspace of Y if the functions c), are not linearly independent. When R ( L )is a subspace of Y, the measured data may have a component orthogonal to R(L) as an effect of the noise contribution h in Eq. (9). For example, in the case of the band-limiting operator, the noise may have Fourier components outside the band of the ideal low-pass filter described by B,. Under these circumstances, the solution of Eq. (9,with g given by Eq. (9), does not exist. Then a quite natural procedure is to search for a function (or functions) u such that Lu is as close as possible to 9.

57

LINEAR INVERSE A N D ILL-POSED PROBLEMS

The projection operator onto R ( L ) will be denoted by P, so that, recalling the relations (18), we are led to conclude that Q = I - P is the projection operator onto N(L*).Moreover, the following relations are obvious. (170) We give now the following definition: a function u E X is said to be a leastsquare solution (or pseudosolution) of Eq. (5) if it minimizes the distance between Lf and g: PL=L,

QL=O.

llLu - S I l Y = inf{IILf - 9llr If€XI.

(171)

By considering the first variation of the functional llLf - g1Iy, it is quite easy to see that any least-squares solution must satisfy the orthogonality condition (Lu - g,L$)y = 0

(172)

for any $ E X . It follows that u must be a solution of the Euler equation L*Lu

=

(173)

L*g.

On the other hand, if we use the relations (170), we can write

llLf - gll:

= llLf - Pgll:

+ IIQgll:.

( 174)

Therefore, it is clear that a least-squares solution exists if and only if there exists a function that annihilates the first term of the r.h.s. of this equation. We conclude that Eq. (173) has a solution if and only if the equation Lu = Pg

(175)

also has a solution. In other words, the process of replacing the usual solutions with the least-squares solutions is equivalent to the projection of the measured data onto R(L) followed by the solution of Eq. (5) with g replaced by the projected data Pg. The advantage of Eq. (173) over Eq. (175) is that, in practice, it may be difficult to determine the projection operator P while, in general, it is quite easy to determine the adjoint operator L*. From the previous remarks, it follows that a least-squares solution of Eq. (5)exists if and only if P g E R(L). As a corollary we have that when R ( L ) is closed, a least-squares solution exists for any measured data g . This result applies, for example, to the case of inverse problems with discrete data and also to the case of the band-limiting operator (80). The solution of Eq. (175) is unique if and only if N ( L ) = (0). If N ( L ) is not trivial, we denote by S ( g ) the set of all least-squares solutions associated with g. If do)is one of these least-squares solutions, then S ( g ) is the

58

M. BERTERO

closed affine subspace given by S ( g ) = ( u E x I u = d o ) 4,L+ = O}

+ i.e., S ( g ) is a translation of N ( L ) , S ( g ) = do)+ N ( L ) .

(176)

For example, in the case of the band-limiting operator (SO), S(g)is the set of all square-integrable functions whose Fourier transform coincides with the Fourier transform of g over the band [- c, c ] and is arbitrary elsewhere. In a similar way, in the case of an inverse problem with discrete data, S(g)is the set of all functions whose component in X , is given by Eq. (121), while the component orthogonal to X, is arbitrary. Now, for any g such that Pg E R(L),S ( g ) is not empty and is a closed and convex set of X. Then from a general theorem of functional analysis (Balakrishnan, 1976),it follows that there exists a unique solution of minimal norm. This is called the generalized solution (or normal pseudosolution), denoted f + :

Ilf’llx

= inf{llullxlu E S(g)I.

(177)

It is easy to see that f’ is the unique least-squares solution which is orthogonal to N ( L ) (see Fig. 5 ) , and therefore this procedure is equivalent to the restriction of the operator L to the orthogonal complement of N ( L ) . In other words, the generalized solution is the unique least-squares solution whose invisible component, i.e., the component not transmitted by the instrument described by L is exactly zero. We point out that, when R ( L ) is closed, there exists a unique generalized solution for any g E Y. Since it is also possible to prove that the mapping g + f + is continuous (Groetsch, 1977), it follows that when R ( L ) is closed, the problem of determining f is well-posed. Again, this result applies both to the case of the band limiting operator (80)and to the case of inverse problems with discrete data. In the latter case, when the functions 4” are linearly independent, the generalized solution f + just coincides with the visible component off as given by Eq. (121) or (130). When not all the 4,, are linearly independent, then f’ is given again by Eq. (130), where the number of terms is no longer N , but now the number N’ of linearly independent 4,,. We consider now the two examples of linear operators with nonclosed range already discussed in Section II,A. In the case of a compact operator, by the representation (23) we have +

and therefore the generalized solution exists if and only if Pg satisfies the second of the conditions (24)(the first condition just means Qg = 0). Then the

LINEAR INVERSE A N D ILL-POSED PROBLEMS

59

FIG.5. Two-dimensional geometric representation of the generalized solution. The null space N ( L )is a straight line passing through the origin and the set of least-squaressolutions S(g) is a straight line parallel to N ( L ) .Then 'f is the element of S(g) orthogonal to N(L).

generalized solution is given by

In a similar way, when L is a convolution operator, we find by the Parseval equality that

and it follows again that the generalized solution exists if and only if the second of the conditions (28) is satisfied, in which case it is given by

We return now to the general case and notice that both in the case of an

60

M. BERTERO

operator with a closed range and in the case of an operator with nonclosed range, the mapping g -+ f' defines a linear operator L': Y -,X as follows: f+

= Lfg.

(182)

L+ is called the generalized or Moore-Penrose generalized inverse of the linear operator L. The latter name comes from the fact that, as we have already remarked, L+ is the natural extension to the case of a linear continuous operator of the Moore-Penrose inverse on a matrix. It is easy to prove (Groetsch, 1977) that when L is linear and continuous L+ is closed. Now, when R(L) is closed, f' exists and is unique for any g E Y and therefore, if we denote by D(L+)the domain of L+, we have D(L+) = Y. From the closed graph theorem (Balakrishnan, 1976), it follows that L+ is continuous or, as we have already pointed out, the problem of determining f' is well-posed. Its stability is controlled by the condition number cond(L) = 1ILllllL+lI~

(183)

which isjust a generalization of the condition number (1 1). For a problem with discrete data, Eq. (183) coincides with Eq. (131). A generalization of this expression to arbitrary linear continuous operators is the following: if Amin and Amax denote, respectively, the lower and upper bound of the positive part of the and llL'll = so that spectrum of the operator LL*, then llLll = dmaX

amin,

cond(L) = (Amax/Amin)liz.

(184)

The expression for the norm of L+ comes from the relationship L+ = (L*L)+L*= L*(LL*)+,

(185)

which can be easily proved using Eq. (173) and from the spectral representation of LL*. When R(L) is not closed, then L+ is not defined everywhere on Y, but D(L+)= R ( L ) 0 R(L)*. As a consequence, the determination of the generalized solution is an ill-posed problem. In fact, the generalized solution does not exist for arbitrary data nor does it depend continuously on the data. In this case, as well as in the case of well-posed but ill-conditioned problems, one must use the regularization techniques to be discussed in Section V. B. C-Generalized Inverses

In some problems one is looking for a least squares solution minimizing not the norm of X but rather some suitable seminorm defined on a subset of X . Examples come from the theory of interpolation by means of natural splines (Greville, 1969), from numerical methods for the solution of Fredholm integral

LINEAR INVERSE A N D ILL-POSED PROBLEMS

61

equations of the first kind (Phillips, 1962), and from certain problems of computational vision (Hildreth, 1984; Bertero et al., 1986b). We will consider a norm or seminorm of the type

P ( f ) = IICflIz, (186) where C: X -+ 2 is a constraint operator as defined in Section II,A, and search by solving the variational problem for a generalized solution

fz

Ilcf:IIz

= inf{llCullz

I

I4

E

a)).

(187)

When this problem has a unique solution, it will be called the C-generalized solution of Eq. ( 5 ) .This is called by Morozov the solution of the basic problem (Morozov, 1984). We first consider the case where the constraint operator C satisfies the Conditions A and B of Section II,A. In this case, Eq. (186) defines a norm and not a seminorm, and the solution of problem (187) can be reduced to the solution of problem (177) just by redefining the space X , as explained in Section 11. It follows that there exists a unique solution of problem (187) if and only if P g E LD(C) (the image of the domain of C under the action of L). The mapping g + f defines a linear operator L:

fc' =L:g

(188)

which will be called the C-generalized inverse of L. It may be of interest in this simple case to give an explicit representation of L: in terms of a suitable Moore-Penrose generalized inverse. If we put f = C ' q 5 , then, by means of simple computations, we find that (Bertero, 1986) L; = c - ' ( L c - ' ) + .

(1 89)

Notice that L: may not be continuous even when R ( L ) is closed, because LD(C) may not be closed. Examples are given by Hildreth (1984) and Bertero et al. (1986b). In these cases, a well-posed problem is transformed into an illposed one, but this transformation can be forced by the introduction of physical constraints. In other words, the Moore-Penrose generalized solution is not meaningful from the physical point of view and must be replaced by other generalized solutions. We consider now the case where the constraint operator C does not satisfy condition B) of Section I1,A. In other words C is a closed operator with a closed range but may have a nontrivial null space N(C).A very simple example is provided by ( 190)

62

M. BERTERO

where f ( k )denotes the derivative of order k. Then N ( C )is the subspace of the polynomials of degree Ik - 1. We assume now that Condition B of Section II,A is replaced by the following Condition B'. B' the range of C is closed, more precisely R ( C ) = 2, and the unique solution of the set of equations Lf=O,

Cf=O

(191)

is f = 0, i.e., N ( L ) n N ( C ) = { O } . When this condition is satisfied, it is possible to prove (Groetsch, 1986) that there exists a constant m > 0 such that IlLfIlF

+ llCfI13 2 mllf 1:

(192)

for any f E D(C). Such a condition is called by Morozov the completion condition (Morozov, 1984) and is taken as the basic assumption for the solution of problem (187). Inequality (192) was also proved by Bertero (1986) using not only Conditions A and B' (cf. Groetsch, 1986), but also the assumption that N ( C ) is finite-dimensional. By means of inequality (192), it is easy to show that D(C),endowed with the scalar product

( f ,4%

= ( L f ,L+)Y

+ ( C f ,C4).z,

(193)

is a Hilbert space, which we will denote X,. Analogously, the restriction of L to X , will be denoted L,. Assume now that for given 9, there exist least squares solutions u E D(C).It is evident that this is true when Pg E LD(C). Then Eq. (175) implies that for thoseleast-squares solutions, we have Ilullz = IlPgllf IlCulld. Since,for fixed g , IlPgllF is a constant, the solution of problem (187) is equivalent to the solution of the problem

+

Ilfbll,

= inf{llullcl u E %)

"W ) } .

(194)

It follows that the C-generalized inverse of the operator L is just the MoorePenrose generalized inverse of the operator L,. As an example, we consider the case where L: X -i Y is compact. If we notice that, as follows from inequality (192), any bounded set in Xc is also a bounded set in X,we conclude that L , is also compact. We can then introduce its singular system { q - k ; uC,k,v , , ~ ) , the set of solutions of the shifted eigenvalue problem LC'C,k

= %,kvC.k?

LzvC,k

= *C,k'C,k.

(195)

When this problem has been solved, f b is given by Eq. (179), with by ( 0 C . k ; U C , k , V C , k } .

{Ok; uk,vk}

63

LINEAR INVERSE A N D ILL-POSED PROBLEMS

We want to show now that the solution of the shifted eigenvalue problem (195) can be reduced to the determination of the set of solutions { w t ; t)k} of the generalized eigenvalue problem L*Lt)k = o;c*c$k. (196) We notice that this problem is analogous to the problem encountered in the investigation of small oscillations of a mechanical system. In that case, L*L is related to the potential energy of the mechanical system, and C*C is related to the system’s kinetic energy. The first step is the determination of LF in terms of L*. From the relation

(f,LrdC = (Lf, LL?9), + (Cf9 CLc*S)Z = (f,(L*L

which holds true for any f

(197)

+ C*C)LEg), = ( f , L*g)x,

E D(C)which is dense in

X and any g E Y,we obtain

L,* = (L*L+ c*c)-’L*.

( 198)

Therefore, the second of the equations in (195) can be written as follows: L*oC,k = aC,k(L*L + c*c)uC,k.

(199)

Finally, if we apply L* to both sides of the first of the equations in (195) and use Eq. (199) to eliminate +k, we obtain (l

-

“z,k)L*Lk,k

= az,kC*CuCsk,

(200)

and this equation can be equated to Eq. (196) by setting OC,k

=

+ w k2)

-1/2 3

%,k

= t)k.

(201)

Notice that as a byproduct of this procedure, we have found that all the singular values ac,kare less than one. Similar results apply to inverse problems with discrete data. Since in this case L is a finite-rank operator, the C-generalized inverse L: is always continuous, but may be ill-conditioned, the condition number being given again by Eq. (131), with a, and aN- replaced by aC,,and aC,N- respectively. As a concluding remark, we point out that, in the case of the interpolation problem in RKHS, as discussed in Section III,C, if we look for C-generalized solutions associated with the functional (190), then the result is an interpolation in terms of natural splines (Greville, 1969). We merely note that Condition B’ is satisfied whenever k 5 N , where k is the order of the derivative in the functional (190) and N is the number of points. For a discussion of the interpolation and derivation problems in terms of C-generalized solutions, see Bertero et al. (1985a).

64

M. BERTERO

C. The Backus-Gilbert Method for Problems with Discrete Data The Backus-Gilbert method (Backus and Gilbert, 1968; 1970) has been proposed for the solution of the inverse problem consisting in the determination of the structure of the Earth, using data related to properties of the Earth as a whole such as mass, moment of inertia, and frequencies of elasticgravitational normal modes. It has also been applied to Fourier transform inversion (Oldenburg, 1976), the inverse scattering problem (Colton, 1984), and Laplace transform inversion (Haario and Somersalo, 1985). A relationship between the Backus-Gilbert method and the Fejer theory of Fourier series expansions has been discussed by Bertero et al. (1 988a). The Backus-Gilbert method can only be used in the case of an inverse problem with discrete data, when the object space X is a space of squareintegrable functions. In this case, the problem (115) takes the form 9, =

s

f(x)$,,(x)* dx;

II =

1,. . .,N .

(202)

It must also be pointed out that the method does not provide an exact but only an approximate solution of these equations. It is discussed in this section because it shows some analogies with the method of generalized solutions. To introduce the basic idea of the method, let us reconsider for a moment the Moore-Penrose generalized solution, or C-generalized solution, of problem (202). When the 4,, are linearly independent and the data values g, exact, by combining Eq. (202) with Eq. (121) we obtain f'(x) =

1

A+(x,x')f(x') dx',

where

and therefore, at any point x, the generalized solution f '(x) is an average of the true solution f ( x ) . Moreover, this averaged function lies in the subspace spanned by the functions 4". A similar result holds true for any C-generalized solution f g if we bear in mind that f : belongs to the subspace spanned by the functions,,)I satisfying the relations

( f , 4n)X for any f

E

= ( f ,*,,)c;

n = 1,. . ., N

(205)

D(C). When the C-scalar product is defined as in Eq. (13), the

LINEAR INVERSE AND ILL-POSED PROBLEMS

65

functions $, are obtained by solving the equations n = 1,. . .,N ,

C*C$,, = 4,,;

(206) and, when the scalar product is defined as in Eq. (193), the functions $, are obtained by solving the equations (L*L

+ C*C)$,,= q5,,;

n

=

1,. . .,N .

(207)

Then, by introducing the Gram matrix of the functions $, and the dual basis $", one finds for f s an expression similar to Eq. (121) with 4" replaced by $". By combining this equation with Eq. (202), we again find

where

The functions f ' and f,' are solutions of Eq. (202). The Backus-Gilbert method consists in searching for an approximate solution of these equations. Let us denote it by L G ( x ) ,which is also an average of the true solution f ( x ) ,

J

and which depends linearly on the exact values of the functionals (202). This condition implies that the kernel A ( x , x ' ) , called the averaging kernel by Backus and Gilbert, must have the expression N

with functions a,(x) to be determined. By combining Eqs. (210),(21 I), and (202) we have

and therefore ~ B G ( x )is a function in the subspace spanned by the functions a,(x). It is obvious that the generalized solutions discussed in the previous sections have a similar structure. In that case, the functions a,,(x) are determined by requiring that the solution satisfy Eq. (202) and by adding a variational principle for the solution (smallest norm, etc.) in order to ensure uniqueness. Backus and Gilbert follow a different approach and, in particular, do not require that the function (212) satisfy Eq. (202). In fact, they introduce a

66

M. EERTERO

variational principle for the averaging kernal itself, since they require that it must be the sharpest in a sense to be specified. For this purpose, let J(x, x’)be a function that vanishes when x = x’ and increases monotonically with increasing distance between x and x’.An example of such a function is

J(x,x’) = (x - x‘)2.

(213)

Then the unknown functions a,(x) in Eq.(211) are determined by solving the minimization problem

s

S2(x) = J(x,x’)lA(x,~’)1~ dx’ = minimum with the constraint

s

A(x,x’)dx’ = 1.

(214)

(215)

In other words, one looks for a kernel of the form (211) which gives a good approximation of the delta distribution 6(x - x’). If we introduce now the quantities bn =

s

s

+n(X) dx,

snm(x) = J(x,x’)4n(X’)4rn(xr)* dx’,

(216)

we find that the solution of problem (214), (215) implies the minimization of the quadratic functional

S2(x) =

N

1

n.m = 1

Snm(x)am(x)an(x)*

= minimum,

with the linear constraint

This problem can be solved in a standard way using the method of Lagrange multipliers. If S(x) is a non-singular matrix with elements Snm(x)and if we denote by Snm(x)the elements of [S(x)]-’, then the solution of problem (217), (218) is N

a,(x)

= A(x)

1 Snm(x)bm;

m=

1

n = 1,. . .,N ,

(219)

where the Lagrange multiplier A(x) is given by N

A(x)

=

( 1 Snm(x)b,b:) n.m= 1

-1

.

The functions a,(x) depend on the choice of the function J(x, x’).

(220)

LINEAR INVERSE AND ILL-POSED PROBLEMS

67

It should be obvious that for those problems where the generalized for the Backussolution is extremely ill-conditioned, the solution Gilbert method must also be unstable. Numerical results obtained in the case of Fourier series summation (Backus and Gilbert, 1968) indicate however that fBG(x)is more stable than f + ( x ) .This follows from the fact that the kernel (204) is narrower than the kernel (211) and therefore, using f + ( x ) , one requires higher resolution. The connection between resolution and stability will be discussed in Section VI. As a concluding remark, we point out that a convergence result has been recently proved for the Backus-Gilbert method (Schomburg and Berendt, 1987). In this paper, it is assumed that problem (202) is a finite section of a generalized moment problem (Section II,H) satisfying the requirement of is dense uniqueness, i.e., the span of the functions b,,,with n = 1,2,3,. . .,co, in X. Moreover, it is assumed that for a certain set of values gn of the generalized moments, there exists a solution f of the problem which is real-valued and Lipschitz-continuous. Then let &. be the approximate solution provided by the Backus-Gilbert method, using the exact values of the first N generalized moments off, the function J ( x , x ’ ) being given by Eq. (213). The result is that f N converges everywhere to f in the case of functions depending on one or two variables, while f N , does not, in general, converge to f i n the case of functions depending on more than two variables. It is evident that this convergence result can only be proved by assuming that the values gnof the generalized moments are not affected by experimental errors. It is interesting, however, to know that in this case the method can provide an approximation of the exact solution. It should also be important to investigate the convergence (or non-convergence) of the Backus-Gilbert method when applied to problems such as the moment discretization of a Fredholm integral equation of the first kind or the Fourier transform inversion with limited data.

A&)

V. REGULARIZATION THEORYFOR ILL-POSED PROBLEMS

The method of generalized solutions, discussed in the previous Section, provides a satisfactory answer to questions of existence and uniqueness for Eq. (5) only when the generalized inverse is continuous and well-conditioned. As we know, this means that the range of the operator L is closed and the condition number (183) is not much greater than one. The method is not adequate when the generalized inverse is not continuous, or if continuous the condition number (183) is too large. In the first case, the generalized solution

68

M. BERTERO

may not exist because the data are contaminated by experimental errors; in the second case, the generalized solution always exists but it may be deprived of any physical meaning as a consequence of dramatic error propagation from the data to the solution. In both cases, one must introduce methods for obtaining physically meaningful approximations of the generalized solutions. As already discussed in the Introduction, the basic idea is to constrain the solution in some way in order to avoid the wild oscillations generated by noise propagation. Several methods related to various kinds of constraints have been introduced, most of which have been unified in a general theory now known as regularization theory or Tikhonov regularization theory. In this section we sketch the basic ideas of the theory and provide the main references to the mathematical literature, a literature which has grown very fast in recent years. Obviously, our presentation will be strongly biased by our personal experience in this domain. A . The Ivanov-Phillips- Tikhonov Regularization Method

The Tikhonov regularization method was introduced independently by several authors at the beginning of the sixties. The first versions of the method were published in 1962 (Ivanov, 1962; Phillips, 1962) and a more general, unifying formulation-restricted however to the case of Fredholm integral equations of the first kind-was later proposed by Tikhonov (1963a; 1963b). Other important contributions are due to Morozov (1966; 1968) and Miller (1 970). In our presentation, we first sketch the methods of Ivanov, Phillips, and Miller, and successively show their relation to Tikhonov regularization theory. For the purpose of providing a concise outline of these methods, we start by giving a few results concerning the minimization of the functional

IILf - gIIi + aIIfII:, (221) can be any positive number. Let f, denote the function which

@aCfI

=

where LY , is easy minimizes Q a [ f ] .Then, by annihilating the first variation of O a [ f ] it to show that f, must satisfy the orthogonality condition

(Lf- 9, L v ) +~ a ( f a , 4)x = 0

(222)

for any 4 E X.It follows that j i is a solution of the Euler equation

+ crl)fa = L*g.

(2233

+ LYWllx 2 LYllfllxt

(224)

(L*L

Then the inequality

I"*L

LINEAR INVERSE A N D ILL-POSED PROBLEMS

69

which holds true for any a > 0, implies that there exists a unique solution of Eq. (223), which can be written in the form f, = Rag,

(225)

where R,

= (L*L

+ al)-'L*.

(226)

By simple algebraic manipulation, one can also show that R,

= L*(LL*

+ d-'.

(227)

This representation of R, implies that fa belongs to the range of L*, and therefore is orthogonal to the null space of L, as follows from Eq. (18). An important consequence of this property is that f,converges in the limit CI = 0 to the generalized solution 'f associated with g provided that Pg E R(L). Moreover, using the spectral representation of the self-adjoint positive semidefinite operator LL*, it is easy to show that the function E,' =

IILfa - gll:

=

a211(LL* + ~ ~ ) - ' P g I l+? IlQgllP

(228)

is a strictly increasing function of a whose values at a = 0 and a = cc are llQgll; and llgll~,respectively. In a similar way, one can show that the function (229) E,Z = llfalli = IIL*(LL* al)-'gJJ; is a strictly decreasing function of a, whose values at a = 0 and a = cc are llf+l[i (=awhen the generalized solution does not exist) and zero, respectively. For the sake of clarity, we give a more explicit representation of f,and the functions E,' and E,Z in the two particular cases already discussed in Section II,A, namely compact operators and convolution operators. When L is compact, from Eq. (225) and the singular value decomposition of L and L*, one easily derives that fa admits the expansion

+

' (iff' exists, as given by Eq. (179))as a which clearly shows that fa -+ f Moreover, we get

and also

and the properties stated above of the functions ea, Ea are self-evident.

+ 0.

M.BERTERO

70

Analogously,in the case of a convolution operator, the function f , assumes the form

where R is the support of

I?(<).Then from the Parseval equality,

and also

and the properties of the functions E,, E, are self evident again. Similar formulas apply to problems that may be diagonalized by means of the Mellin transform, such as Abel or Laplace transform inversion (cf. Sections II,D.3 and 11,G). 1. Iuanou Method (Constrained Least-Squares Solutions)

The basic idea of the Ivanov method (1962) consists in restricting the approximate solutions to some suitable subset defined by physical constraints. Here we consider only the case where the subset is a sphere of radius E in X , sE

={

f

I llfllX

5

If

(236)

This choice has a precise physical motivation when l2 is an energy density and X a space of square-integrable functions. Then knowledge of an upper bound E 2 on the total energy of the signal implies that f E SE. When the constant E is given, is quite natural to look for the function (or E S, such that L f ( E )has minimal distance from g. This is functions) equivalent to solving the constrained least-squared problem

TCE)

IILP - 911r = inf{llu- - 911r I llfllx 5 E ) .

(237)

Any solution of this problem will be called for obvious reasons, a constrained least-squares solution. We must consider separately two cases.

A) The generalized solution exists and satisfies the constraint I l f + l l x I E . In this case, the solution for problem (237) is not unique (except when

LINEAR INVERSE AND ILL-POSED PROBLEMS

71

Ilf '(Ix = E ) . The set of constrained least-squares solutions is the intersection of the set S ( g ) of the unconstrained least-squares solutions, Eq. (1 76), with the sphere S, (see Fig. 6). Then there exists a unique constrained least-squares solution of minimal norm and this obviously coincides with f'. B) The generalized solution does not exist or, if it exists, it does not satisfy the constraint, i.e., llf'Ilx > E . This case is the most likely when the data are noisy. Then the intersection of S, with the set of unconstrained least-squares solutions is empty (see Fig. 6).Under thesecircumstances, it is obvious that the constrained minimum points of the functional (lLf- g((ycannotbeinterior to

FIG.6. Two-dimensional geometric representation of the constrained least-squares solution. In the case of data g such that Ilf'll < E, S(g) intersects S, and thereforerE)coincides with f' (see Fig. 5). In the case of data g such that Ilf'll > E , S(g) does not intersect S, and therefore f(,)lies on a circle of radius E and is orthogonal to N ( L ) .

72

M. BERTERO

S , but must lie on the surface of this sphere. Since these points satisfy the condition 11 f (Ix = E, one can use the method of Lagrange multipliers for determining the solution of problem (237). This method consists of the following steps: 1) for any a > 0, minimize the functional (221); 2) since for any OL,there exists a unique minimum point fa of this functional, then search for a value of a such that

(238)

Ilfallx = E*

From the properties of the function E,, Eq. (229), stated above and illustrated in Fig. 7, it follows that there exists a unique value of a, say a(,), which solves Eq. (238).The corresponding solution fa is just Tcaand this is the unique solution of problem (237). We conclude with a few results about the convergence properties of the constrained least squares solution f l E )in the ideal case of experimental errors

0

O((E)

FIG.7. Graph of the function En in the case E mination of a(E).

o(

iIlf'llx

< +a,illustrating the deter-

LINEAR INVERSE AND ILL-POSED PROBLEMS

73

tending to zero. For this purpose, we assume that a family {ge}r,oof noisy data functions is given and that as E -+ 0, g L converges to a noise-free data function g, namely a function in the range of L. Let be the constrained least-squares solution associated with gr and let f = L+g be the generalized solution associated with g. Moreover, let us assume that for any E, case B applies. Then the following results hold true (Bertero, 1986):

rLE)

+

f'r")

i) if IIL+y(lx< E , i.e., the prescribed constant is overestimated, then weakly converges to f + as E -+ 0 (for the definition of weak convergence, see Balakrishnan, 1976); ii) if IIL+glJx= E, i.e., the prescribed constant is precise, then 0; strongly converges to f as E iii) if IILfgl(x> E, i.e., the prescribed constant is underestimated, then strongly converges to the constrained least-squares solution associated with the noise-free data g. +

2)

-+

f'E),

For problems with discrete data, case i must be modified since for sufficiently small E , there is necessarily a transition from case B to case A, and therefore the constrained least-squares solution corresponding to noisy data is not unique. In this case, one can identify, by definition, the constrained leastsquares solution with the generalized solution and therefore strong convergence applies also to this case. This result is also self-evident since weak convergence and strong convergence coincide in finite-dimensional spaces.

2. Phillips Method This method was first proposed for the approximate solution of the Fredholm integral equation of the first kind (Phillips, 1962). A more general formulation was given by Ivanov (1966) and Morozov (1966, 1968) while Reinsch (1967) applied independently the same method to the smoothing problem, a problem which replaces strict interpolation when the values of the function are only approximately given. The starting point is the assumption that an upper bound E on the error is known. We denote by JJg) the set of all elements of X which are compatible with the data g within error E

J€(d= If

E

x 1 IlLf - sllu 4.

(239)

This set is always unbounded when the problem is ill-posed. In the case of a problem with discrete data, for example, it is a cylinder whose basis is an ellipsoid in X , (see Section 111,A). This cylinder is not bounded in the directions orthogonal to X, because the solution of the problem is not unique. On the other hand, in the case of an operator whose inverse is not continuous, J,(y) is unbounded since it is always possible to find a sequence { f n } such that

74

M. BERTERO

llLfnllx + 0 and Ilf,lly-+ 00. Notice that the example of Hadamard discussed in the introduction is just a particular case of this general result. It is easy to prove, however, using the continuity and linearity of L , that JJg) is always a closed and convex set. Since J J g ) contains wildly oscillating and completely unphysical approximate solutions, it is quite natural to look for the smoothest element of J,(g), i.e., the element of minimal norm, which will be denoted by f('). This leads to the problem llj(E)IIX =

inf{Ilf llx 1 llLf - gllr 5

4.

(240)

As remarked by Ivanov (1966) this problem is just the dual of problem (237). Since the set JJg) is closed and convex, from the general theorem of functional analysis (Balakrishnan, 1976)already used for proving the existence and uniqueness of the generalized solution, it follows that there exists a unique solution of problem (240).This solution is not the null element of X provided that the data g satisfy the inequality llglly > E . This inequality is quite reasonable since it implies that the norm of the data is greater then the norm of the noise. If it is not satisfied, it means that the data function (vector) consists only of noise and that it does not contain any information about the unknown object f. Since we exclude this case and since we also assume that if a nonzero component of g orthogonal to the range of L exists, then this component can only be an effect of the noise, we are led to conclude that the following inequalities must hold true: IlQgllr < E

< llgllr

(241)

A representation of the solution of problem (240) analogous to the representation of the constrained least-squares solution can be obtained if we notice thatf") must satisfy the condition IILf") - gllr = E. Then we can again use the method of Lagrange multipliers in order to determine T(').We must minimize the functional (221) for any CI > 0 and then search for a minimum point fa such that IILfa

- gllu = E .

(242)

From the properties of the function E,, Eq. (228), and conditions (241) it follows that there exists a unique value of a, sax d'), which solves Eq. (242) (cf. Fig. 8). The corresponding solution fa is justf('), i.e., the unique solution of problem (240). We see that the difference between the solution of problem (237) and the solution of problem (240)consists only in a different choice of the Lagrange multiplier. In the case where the experimental errors tend to zero, the situation as regards the convergence off(') is far simpler than in the case of the con-

75

LINEAR INVERSE A N D ILL-POSED PROBLEMS

E

Ilasll,

0

o (

FIG.8. Graph of the function c, in the case liQgllv < c < I)glly, illustrating the determination of a"'.

strained least-squares solutions. If { g c } c , o is the family of noisy data functions introduced in Section V,A.l,Tf)the solution of problem (240)with g replaced by g c , and f' E J,(g,) for any E, then it is possible to prove (Ivanov, 1966; Groetsch, 1984; Bertero, 1986) that?:) strongly converges to f ' as E + 0. 3. Miller Method

The method of Section V,A.I requires a prescribed bound on the solution, while the method of Section V,A.2requires a prescribed bound on the error. A paper of Miller (1970) consider the case where both bounds are known. Results similar to those of Miller were also obtained by Franklin (1974). Let us assume that two constants E , E are given and that one wants to find functions f such that IILf - SIIu 5 llfllx _< E. (243) The set K of all functions satisfying these conditions is just the intersection of the set S, given by Eq. (236) and the set Jc(g) given by Eq. (239): K = S, n J,(g). Any function f E K may be called an admissible approximate solution. In such an approach, we must consider two problems: first, how to ensure that K is not empty, in which case we say that the pair { E , E } is permissible; €9

76

M. BERTERO

second, how to extract an element of K in order to produce one specific approximate solution. It is easy to characterize the set of permissible pairs. We first notice that K is not empty if and only if both j ( Eand ) f (') belong to K . The '3f"jar-t is trivial. The "only if" part follows from the variational properties of f ( " ) and j('). For example, the condition that K is not empty implies that there exist elements of J,(g) whose norm is less than E . Since f"") is the element of minimal norm of J,(g), it follows that the norm of ?(')must also be less than E. Moresatisfies, by definition, the first of the conditions in (243),and thereover fore j ( ~ € K). Similar arguments apply to f"'"). The remarks used for proving the previous result also imply that when K is not empty,

Ti')

I17(:(ollxI Il7(")llx

(244)

IILP' - 9 l l Y 5 IILf('"- SllY

(245)

and Moreover, if we recall that E, is a decreasing function of a (or that increasing function of a), we also have

E,

is an

< a(').

(246) Finally, since the previous results imply that when K is not empty, then llf(')ll~I E and IILfl'"' - g/IuI 6,it follows that the set of permissible pairs in the plane { € , E ) is just the set of all pairs to the right and above the curve (see Fig. 9). We also conclude that all functions f,with described by { E , , a between a(E)and f) belong to K . Both f l E )and f(') may be used to determine an element of K . Another possibility, however, is to take the function f, with a = ( E / E ) * .This function will be denoted by 7"").In fact it has been proved by Miller (1970) that if K is satisfies inequalities (243) with ( E , E} replaced by not empty, It is easy to find, however, a sufficient condition that ensures that T(')E K . Let us denote by 0 ( O ' [ f ] the functional (221) with cx = ( E / E )and ~ by the subset of X a(E)

{a,,aE}.

Po)

K'O' =

{f E K

Ic D ' O ' [ f ]

IE'}

(247)

Then since ?(')minimizes the functional @ O ' [ f ] , it is obvious that K'O' is not empty if and only if the following condition is satisfied: @(O)[f'O']

€2

(248)

(Notice that this condition can be easily verified in a numerical application of the method.) Finally, using the inclusion K(O)c K,

(249)

LINEAR INVERSE AND ILL-POSED PROBLEMS

77

/ I

I

E

I

/ II f

OC=O

I

I

/

E/E=const

l

k

0

Ilagll,

11g11,

E

FIG.9. Representation of the set of the permissible pairs { E , E } in the case where the generalized solution f’ exists. Whenf+ does not exist, the line E = llQgll,. is an asymptote of the boundary curve { e s , E m } .

which follows from the remark that any element of K‘O) satisfies conditions (243), we conclude that when condition (248) is satisfied, the set K is not empty and flo’E K . Moreover, the following inequalities are a trivial consequence of inequalities (244)-(246) and the fact that f a E K if and only if tx belongs to the interval [a@),a(‘)]

ll.f~c’llx 1lP)Ilx 5 IIP’llx IIL.f(E’- g u y I IIL.P0’- Sllv II I ~ -PSllY C4-Q

_< ( € / E ) 2 I a(E’

(250) (25 1 ) (252)

78

M. BERTERO

We see therefore that the Miller solution f ( ' ) has a degree of smoothness and f?'). intermediate between that of Finally, as concerns the convergence of flo)to the true generalized solution f+,this approximate solution_has properties similar to those of the constrained least-squares solution f ( E ) (Bertero, 1986) when the error of the data tends to zero.

pE)

4 . Extensions and Comments

The method of constrained least-squares solutions can be considered a generalization of pioneering works on ill-posed problems for partial differential equations (Pucci, 1955;John, 1955; Fox and Pucci, 1958; John, 1960). In these papers, which we referred to in the Introduction one looks for approximate solutions satisfying a prescribed bound. In the method of Section V,A.1, this condition is replaced by a search for approximate solutions in the sphere (236). An extension of this condition is provided in the same paper of Ivanov (1962)and is based on a topological lemma due to Tikhonov (Lavrentiev, 1967;Tikhonov and Arsenine, 1977),which in our context can be formulated as follows: Let H be a compact subset of the Hilbert space X and let us assume that the linear, continuous operator L: X + X restricted to H , has an inverse, Then the inverse operator is continuous. The result of Ivanov can now be formulated as follows: If H is a compact and convex set of X and if the restriction of L to H admits an inverte operator, then for any g E X there exists in H a unique least squares solution f of Eq. ( 5 ) . Moreover the mapping g + f is continuous. It is obvious that in this way one is not obliged to restrict solutions to a sphere (in fact, a sphere is not a compact set). An important example of an application of the theorem is the case where the function to be restored is the distribution function of a random variable, and therefore is an increasing function with values in the interval [O,l]. Then according to the Helly theorem (Titchmarsh, 1958, p. 342), a set of increasingand uniformly bounded functions defined on a bounded interval [a, b] is compact in L2(a,b) and therefore the Ivanov theorem can be used whenever the inverse operator L-' exists. As a second comment we point out that the methods outlined in the previous sections provide approximations of the Moore- Penrose generalized solution. Then one can also look for approximations of the C-generalized solutions introduced in Section IV,B. In fact, the method of Phillips (1962) applies to this case, since it provides an approximate solution of Fredholm integral equation of the first kind such that the L2-norm of its second derivative is as small as possible. The smoothing method of Reinsch (1967) also corresponds to this case.

LINEAR INVERSE AND ILL-POSED PROBLEMS

79

Approximations of the C-generalized solutions can be obtained if we replace the functional (221) with the functional

(D,,,[IfI = IILf- YII: + allcfllk

(253) where the constraint operator C satisfies the properties assumed in Section IV,B and, in particular, property B'. Then one can prove (Groetsch, 1984; Morozov, 1984; Bertero, 1986) that for any a > 0, there exists a unique function X which minimizes the functional (253) and which can be obtained by solving the functional equation (L*L

Moreover, the mapping g

4

+ aC*C)f,,,

= L*g.

(254)

fc,,, given by f ~ ,= u

&,ag

(255)

with RC,a= (L*L + aC*C)-'L* is continuous. This operator can be written in the standard form (226) by introducing the adjoint of the operator L , with respect to the scalar product (193)(Groetsch, 1986).Then, using Eq. (198), by simple algebra, from Eq. (256) we find that RC,Q= b(L:Lc

+ Bal)-'L:,

(257)

where ,f? = (1 - a)-i. In particular, when L is a compact operator (or a finite rank operator as in the case of inverse problems with discrete data), it follows that fC,, has a representation in terms of singular values ac,kgiven by Eq. (195) analogous to the representation of f, in (230). We also notice that when L is an integral operator and C a differential operator such that the scalar product (13) is given by Eq. (12), then the solution of Eq. (254) implies the solution of a boundary-value problem for an integrodifferential equation (Tikhonov, 1963a). The latter is equivalent to the solution of the functional equation (223) in the Sobolev space defined by the scalar product (12) (Groetsch, 1984). If one introduces now the functions EC,a

= 1ILfC.u - glly,

E c , ~= IlCfc,allz,

(258)

one can easily prove that eC,,is an increasing function of a while EC,, is a decreasing function of a. As a consequence, all the results proved in the previous sections concerning the operator (226) can be extended to the present case. More precisely, there exist a unique value of a minimizing under the constraint EC,, < E, and conversely there exists a unique value of a minimizing Ec,a under the constraint eCSaI E .

M.BERTERO

80

A final comment about the Backus-Gilbert method, which can also be affected by numerical instability, is in order. No regularization method has been developed for this algorithm. Backus and Gilbert, however, have introduced two methods in order to improve stability (Backus and Gilbert, 1970), though rigorous results have not yet been proved for these methods. If the covariance matrix C of the noise is known, then from Eq. (212) one easily derives that at any given point x , the variance of the error induced by the noise on L G ( x )is

Then the two methods introduced by Backus and Gilbert are the following: 1) Minimize the functional 0 2 ( x )with the constraint (218) and also the constraint 6’(x) I E2((S2(x)is defined in Eq. (217)). The latter constraint prescribes an upper limit on the desired resolution. 2) Minimize the function S’(x) defined in Eq. (217) with the constraint (218) and also the constraint u’(x) Ic2. The latter constraint prescribes an upper limit on the desired error affecting the reconstructed solution.

Both problems can also be solved by means of the method of Lagrange multipliers and we do not give the details here. We merely wish to remark that each of these two problems is the dual of the other and that they are similar, respectively, to the Ivanov and to the Phillips method for the regularization of the Moore-Penrose generalized solution. B. General Formulation of Regularization Methods

The common feature of the methods presented in Section V,A is that they provide different criteria for selecting a specific element from the same family of approximate solutions, namely fa = Rag, defined by Eqs. (225) and (226) or Eq. (227). This family describes a trajectory in the Hilbert space X . To get a clear picture of this trajectory, we need certain further properties of the operators R,. More precisely, we want to show that: i) for any ct > 0, R,: Y -,X is a linear continuous operator whose norm is bounded by

IIRaII 5 11&;

(260)

ii) if g belongs to the range of L and if f’ is the generalized solution associated with g , then lim((R,g - f + l l x = 0. 010

LINEAR INVERSE A N D ILL-POSED PROBLEMS

81

The proof of i follows from Eqs. (227) and (224). In fact, Eq. (227) implies that IJRagl12= (LL*(LL*

+ aZ)-'g, (LL* + a I ) - l g ) r .

(262)

Then, if we notice that the norm of the operator LL*(LL* + aZ)-' is smaller than 1 and that, by inequality (224), the norm of the operator (LL* + aI)-' is smaller than a-', we obtain inequality (260). As for ii, if g E R(L),then g = Lf' and from Eq. (226) we get I I K g - S'llx = IIRaLf'

- f'llx

= all(LL*

+ aI)-'f'lI,.

(263)

Then Property ii follows from the spectral representation of the self-adjoint positive semi-definite operator L*L and from the dominated convergence theorem (Groetsch, 1984; Bertero, 1986). Properties i and ii can be verified in an elementary way for a compact operator or convolution operator by means of Eq. (230) or Eq. (233), respectively. In the case of noisy data, say g,, where is an estimate of the norm of the error, i.e., of the distance between gp and the exact data g E R(L), 119, - glly

(264)

€9

the approximate solution Rag, may have no limit as ct --t 0 or, as in the case of problems with discrete data, the limit is the generalized solution f: and the distance between f,' and f ' = L'g, i.e., llf: - I ' l l x , may be extremely large. There exists, however, a value of a, say a(oP'),such that the distance bef is minimum. If we write tween Rag, and ' Rage - f'

= (Rag -

f') + Rabe - 9)

(265)

then from Eq. (260) and the triangular inequality, we have IIRagc - J + I 5 I ~IIRag - f +ttx

+ €/&*

(266)

Then since the first term in the r.h.s. is an increasing function of a, as follows from Eq. (263),while the second term is a decreasing one, there exists a unique value of a that minimizes the r.h.s. of Eq. (266). It is obvious that this optimum value of a, a(op'),cannot be determined in practice because its determination requires knowledge of the true solution f'. It is important, however, to know that such an optimum value certainly exists. We can now describe the trajectory of the approximate solutions f a , , = Rag, in the case of a non-empty set K defined by inequalities (243) (see Fig. 10).The trajectory starts at the origin (null element) of X when a = co and for large values of a lies inside the sphere S, defined by Eq. (236). Then for M = d'), the trajectory crosses the surface of the ellipsoid J,(g) defined by Eq. (239), and for values of a between a(') and a(E)passes through the set K .

82

M. BERTERO

\

E

/

FIG.10. Two-dimensional representation of the trajectory described by Rag, as a increases from 0 to to. It is assumed that there exists a generalized solution associated with noisy data. Otherwise the trajectory tends to infinity as a --t 0.

In this part of the trajectory is found the point corresponding to the optimum value of c1 discussed above and also the point corresponding to a=(e/E)’ (at least when condition (248) is satisfied). Finally, for all values of c1 smaller than the trajectory always lies inside the ellipsoid J,(g) and when c( = O its endpoint will be the center of J,(g), i.e., f:, when the generalized solution associated with gc exists; otherwise, the trajectory becomes infinite. These comments on the methods of Section V,A justify the general definition of a regularizing algorithm (in the sense of Tikhonov) which will now be given and discussed. defines a We say that a one-parameter family of operators regularizing algorithm for the approximate determination of the generalized

LINEAR INVERSE AND ILL-POSED PROBLEMS

83

inverse L+ of the linear operator L if: i’) for any a > 0, R,: Y + X is a continuous operator; ii’) for any g E R ( L ) limllR,g - f’llx

= 0.

a10

When the operators R, are linear, then we have a linear regularizing algorithm. It is possible, however, to introduce nonlinear regularizing algorithms €or the solution of linear problems. We will give a few examples of such algorithms in Section V,D. The parameter c1 is usually called the regularization parameter and, in general, is a positive real number. In some cases, however, it may be convenient to introduce a discrete variable that take only integer numbers. In this case, we have a sequence {R‘”’} of regularizing operators and the limit a + 0 is replaced by the limit n + co. If we want a unified notation we may set R, = R(“),where [a-’1 denotes the integer part of a-’ and is equal to n. The meaning of Condition ii’ is obvious. It implies that when g E R(L), it is possible to obtain arbitrarily accurate approximations off’ by means of continuous operators. Moreover, in the case of noisy data g, satisfying condition (264), we still have an inequality analogous to (266), that is ((Rage- f’llx 5 IIRug - f’IIx

+ EIIRaII.

(268) The first term in the r.h.s. can be called the “approximation error,” introduced when the noncontinuous (or ill-conditioned) operator Lf , acting on exact data, is replaced by the continuous (or well-conditioned) operator R,. This “approximation error” tends to zero as a tends to zero. The second term in the r.h.s. represents “error propagation” from the data to the solution and becomes exceedingly large as a tends to zero. Therefore, it is clear that the choice of a will be based, in general, on a compromise between the approximation error and noise propagation. As in the case of the specific example provided by Eq. (226), given a noisy data function g, and a regularizing algorithm {R,}a,o, the family of approximate solutions f,,,= Rag, will describe a trajectory in the Hilbert space X and, in general, there will be a point on this trajectory at a minimum distance from the true solution f’. A similar definition can be introduced for the regularization of C-generalized inverses. Condition i is not modified, while Condition ii is modified as follows: ii”)

for any g E L D ( C ) limllR,g - fdllx = 0, a10

where f: is the C-generalized solution associated with g.

84

M. BERTERO

It is easy to show that the family of linear continuous operators {Rc.a}a,o defined by Eq. (256) is a regularization algorithm for L:. In fact, the representation (257) of these operators coincides with Eq. (226) except for the factor b. Then, since + 1 as a + 0, Eq. (269)can be proved in the same way as Eq. (261). In the case of linear inverse problems with discrete data, the definition of a regularization algorithm needs some modifications (Bertero et al., 1988b). For an ill-posed problem, indeed, a regularization algorithm constitutes a family of continuous (bounded) operators that approximate an unbounded operator. But, for a problem with discrete data, the generalized inverse is always continuous since it is a linear operator on a finite-dimensional space. The problem must be regularized when the norm of L+ is much greater than l/llLll (ill-conditioning),and therefore a regularization algorithm must provide an approximation of L+ with norm smaller than the norm of L'. For these reasons, we say that a one-parameter family of operators {R,},,o is a regularization algorithm for an inverse problems with discrete data when: i) for any a > 0, the range of R, is contained in X,, the subspace spanned by the functions q5,,; ii) for any a > 0, the norm of R, is smaller than the norm of L+, i.e., IlRUll

IIL+II = l/g,-1;

(270)

iii) the following limit holds true in the sense of the norm of bounded operators

A similar definition can be given of a regularization algorithm for a Cgeneralized inverse (Bertero et al., 1988b). In Condition i, the subspace X , is replaced by R(L,*),the orthogonal complement of the null space of Lc which coincides with the subspace spanned by the functions IcI. given by Eq. (205), while in Condition ii the norm of L+ is obviously replaced by the norm of L,f. C. Spectral Windows

For fixed a > 0, consider the function of L given by E(A)= (1 + a)-' and defined on (0, + 00). Then the Tikhonov regularizer (226) can also be written in the form R, = F,(L*L)L*, where the operator F,(L*L) is obtained from Fa@) using the spectral representation (Yosida, 1966) of the self-adjoint, nonnegative operator L*L. Moreover, if the operator R, is applied to a noise-free image g = Lf +,one obtains

LINEAR INVERSE AND ILL-POSED PROBLEMS

85

where W,(A) = A(L + u ) - ' . This function is small in the neighborhood of the spectral point 1= 0 (notice that this point belongs to the spectrum of L*L if and only if the problem is ill-posed), and, therefore, the effect of the regularizing algorithm is a windowing (or filtering) of the spectral components off' related to the ill-posedness of the problem. For example, in the case of a compact operator, using Eq. (230) and the relation (9,u k ) y = (Lf+, u k ) y = ( f + , L * u & = c r k ( f + , u k ) X , we obtain

where we have introduced the notation A k = 0 2 .Analogously, in the case of a convolution operator, we obtain from Eq. (233)

In this case, the spectrum of L*L is the set of values of the function

lm*, < E Q.

A(C) =

The previous remark suggests a search for regularization algorithms of the form

R,

= F,(L*L)L*,

(275)

where now { F,(A)),> denotes a suitable family of functions defined on (0, + 00) and again F,(L*L)is given in terms of &(A), using the spectral representation of L*L (Bakushinskii, 1965; Groetsch, 1980; 1984). The problem is now to find sufficient conditions on {&(A)},>o that would guarantee that { R , ) , > o is a regularization algorithm. These conditions can be given on the window functions Wu(L)= LF,(A). For example, it is not difficult to prove (Groetsch, 1980; Bertero, 1986) that if { W,(A)},, is a family of real-valued, piecewise-continuous functions defined on (0,+ co)satisfying the conditions:

i) for any u > 0,O 5 W,(A) I 1; ii) for any A > 0, lim W,(A)= 1; a10

iii) for any a > 0, there exists a constant c, such that W,(A)Ic,A;

(277)

then the family of operators defined by Eq. (275), with &(A) = 1-'W,(A), is a regularization algorithm. Conditions i-iii are satisfied by the Tikhonov window W,(A) = A(1 + a)-'.

86

M. BERTERO

Another important example is the following:

W,(1) = 0,o I 1 < a;

W,(1) = 1,1 2 a,

(278)

which corresponds to a truncation of the spectral representation of L*L. This regularization algorithm is very important for both compact operators and convolution operators. In the case of compact operators, however, it is more convenient to define spectral windows in terms of singular function expansions as follows: m

and therefore one must introduce a family of window sequences rather than a family of window functions. Conditions i and ii above are unchanged (the variable 1 is replaced by the index k), while Condition iii must be replaced by the requirement that for any a > 0, there exists a constant c, such that Wu,kI cask. Then the use of the window function (278) is equivalent to the use of the window sequence

K,k = 1,

k

Wa,k= 0,

[a-'];

k > [Cr-'1.

(280)

The corresponding regularization algorithm is the well-known method of truncated singular function expansions (Twomey, 1965; Miller, 1970; Groetsch, 1984), also known as numerical jltering. Analogously, in the case of a convolution operator it is convenient to define spectral windows in terms of the Fourier transform as follows:

Again, Conditions i and ii above are unchanged (the variable ;1is replaced by the variable c), while Condition iii is replaced by the requirement that for any c1 > 0, there exists a constant c, such that K(<) I cmll?(~)l.Then the use of the window function (278) is equivalent to the use of the window function

K(5)= 1,

K(5)= 0,

151 I a-l;

ltl > u-l

(282)

or, in other words, equivalent to the use of a cut-off in the Fourier integral. We point out that in the case of a regularization algorithm defined as in Eq. (281), the operator W,(L*L)is a convolution operator given by

CK(L*L)f I@) =

IR"

A,(x - x')f(x')dx',

(283)

where Aa(x)is the inverse Fourier transform of Wa(5).For example, in the case

87

LINEAR INVERSE A N D ILL-POSED PROBLEMS

of functions defined on ( - 03, +a), and the window functions (282) we have (284) A,(x) = (na)-'sinc(x/na). It follows that, in the case of noise-free data, the regularized solution is an average of the true solution over a distance of the order of a. We now present other interesting window functions for the inversion of convolution operators in one dimension. a. The Triangular Window Wa((l)= (1 - aI5I), 151 I a - ' ;

Wa(5)= 0,151 >

(285) The triangular window is related to the approximation of Fourier integrals in the sense of ( C , 1)-summability (Titchmarsh, 1948).In this case, the averaging function AJx) is given by (286) A,(x) = (2na)-l sinc2(x/2na). Notice that this averaging function is positive and therefore in the absence of noise, the corresponding regularization algorithm provides positive approximations of positive functions (Bertero et al., 1988a). b. The Hanning Window

K(t)=+[I

+ cos(na(l)],

151 5 a - ' ;

K(5)= 0,

It1 > a-l.

(287) The Hanning window is well known in the theory of signal processing (Kunt, 1986, p. 136). The corresponding averaging function is Aa(x) = (4na)-'(sinc[(x - na)/na] + 2sinc(x/na)

+ sinc[(x + na)/na]> (288)

which is not positive. The negative parts, however, are quite small and the side-lobes are smaller than the side-lobes of the function (286), so that this window can be very convenient for practical use c. The Gaussian Window

Wa(4;)= exp(-at2/2)

(289)

The corresponding averaging kernel is ~ / ~-x2/2a). A,(x) = ( 2 n ~ ) -exp(

(290) Notice that this kernel is also positive and that side-lobes are absent. The disadvantage is that it can be used only if k(<)tends to zero at infinity less rapidly than any Gaussian, whereas the band-limited windows introduced

88

M. BERTERO

above can be used for regularizing the inversion of an arbitrary convolution operator. The method of spectral windows, which is based on the use of Fourier integrals, includes the method of filtered back projection (Natterer, 1986a), presently the most important reconstruction algorithm in computerized tomography. As a final remark, we point out that the methods outlined above can also be used for the inversion of Laplace transform, and more generally for the inversion of integral operators of the type (62), as well as for the solution of the Abel equation (Bertero, 1986; Bertero et al., 1988a). In these cases, the Fourier transform is replaced by the Mellin transform. Then the analysis runs parallel to that performed in the case of convolution operators. D. Iterative Methods

Iterative methods are frequently used for the solution of n x n linear systems. The Jacobi method and the Gauss-Seidel method, for example, has been well known for some time now. A complete account of iterative algorithms can be found in any textbook on numerical analysis (Ralston, 1965; Marchuk, 1975). The most simple iterative process can be obtained by writing the linear system Ax = y in the form x = (I - A)x + y which suggests the iteration x , + ~= (I - A)x, + y. The latter can also be written in the form x , + ~= x, - (Ax, - y), which must, in general, be modified as follows: x , + ~= x, - z(Ax, - y), in order to obtain convergence. The arbitrary parameter z is called a relaxation parameter and the vector rn= Ax, - y the residual of the iterative process (Marchuk, 1975). Moreover, the iterative process is said to be stationary if the parameter z does not depend on a particular iteration, and is non-stationary if z = z, changes from one iteration to the next. The method of steepest descent and method of conjugate gradient are examples of nonstationary iterative algorithms. One interesting feature of some of these methods is that they can be extended to functional equations such as Eq. (5) and also have regularizing properties in the sense specified in Section V,B. In other words, the approximate solution provided by a finite number of iterations is a stable approximate solution, and the number of iterations (or more precisely, the inverse of the number of iterations) plays the role of a regularization parameter. The extension of the simple stationary iteration method mentioned above to the solution of Fredholm integral equations of the first kind is due to Landweber, (1951), who also proved the convergence of the algorithm in the

LINEAR INVERSE AND ILL-POSED PROBLEMS

89

case of noise-free data, using, however, overly strong restrictions on the kernel (or, equivalently, the relaxation parameter). An extension of this method, obtained by replacing the relaxation parameter with a fixed linear operator, has been proposed by Strand (1974). The results of Landweber and Strand apply essentially to Eq. (5) or, more precisely, Eq. (173) for the case where L is a compact operator. The extension to the general case of a linear continuous operator is given by Bialy (1959), who also proved convergence for the correct range of values of the relaxation parameter. A survey of these results with applications to least-squares linear signal restoration is given by Sanz and Huang (1983). Finally, note also that the method of steepest descent and method of conjugate gradient have been extended to Eq. (5) and Eq. (173) (Kammerer and Nashed, 1971; 1972), and that in this case convergence of the algorithm has been proved for certain classes of noise-free data. Iterative reconstruction of distorted signals has also received much attention in the engineering literature (Schafer et al., 1981). Examples are the recovery of the input to a linear shift-invariant system from its output (deconvolution), restoration of a multidimensional signal from its projections and the extrapolation of a signal from a finite segment of that signal. These problems can be classified as linear inverse problems and, in fact, their mathematical representation is given by Eq. (5). In particular, a very attractive algorithm was proposed by Gerchberg (1974) and Papoulis (1975) for the problem of extrapolating a band-limited signal, a problem equivalent to the problem of Fourier transform inversion with limited data. The convergence of this method in the case of noise-free data was proved by De Santis and Gori (1975) using expansions in terms of prolate spheroidal wave functions. The main interest of the method is that it can be easily implemented on a computer and that it achieves rather good super-resolution in the case of noise-free data. It was later recognized (Maitre, 1981; Sanz and Huang, 1983) that this algorithm is just a special case of Landweber-Bialy iteration with z = 1. In this section we give the main results concerning Landweber-Bialy iteration, steepest descent, and conjugate gradient and indicate why they may be considered regularization algorithms for the approximate determination of the generalized solution. Therefore, the basic equation is not Eq. (5), but Eq. (173). The approximation off given by the n-th iteration will be denoted f,,and the corresponding residual r,, will be defined by +

r, = L*Lf, - L*g.

(291)

It is evident that f,, and r, belong to the Hilbert space X. This is not convenient for inverse problems with discrete data, since in this case one must essentially compute N-dimensional vectors. A very simple modification of the

90

M. BERTERO

equations is possible, however (Bertero et al., 1988b), by setting

f, = L*f,,,

r , = L*r,,

(292)

so that (assuming the & are linearly independent), we have

r, = Lf,, - g,

(293)

e

where is the matrix associated with the operator LL* and related to the Gram matrix of the functions 4, by Eq. (128). Then all the algorithms can be formulated in terms of the vectors f,, and r, and the matrix

e.

1. Landweber-Bialy Iteration

The sequence of approximations is given by fo

=o,

f,+l

= f ,-7rfl9

(294)

where T is a fixed value of the relaxation parameter. Then, using Eq. (291) it is easy to show that

and therefore this algorithm has the general structure (275) with

c

n- 1

Wn’(;l) = A P ( A )= z l

(1

- d ) k

k=O

=

1 - (1 - GI),’

(296)

This window function satisfies the Conditions i-iii of Section V7Cfor values of

1 in the spectrum of L*L (which is in the interval [0,11LI12])when the relaxation parameter z satisfies the conditions

0 < z < 211Lll-2.

(297)

These are precisely the conditions that guarantee that if Pg E R(L), then the sequence { fn} converges to the generalized solution f + of Eq. ( 5 )(Bialy, 1959). It follows that the sequence {Rcn)} defines a regularization algorithm. Then the problem of choosing an “optimum value” of the regularization parameter is equivalent to the problem of choosing an “optimum number” of iterations. In fact, in the case of noisy data the first iterations improve the accuracy of the solution but, after a certain critical value, the noise induces instability and the quality of the solution degrades rapidly. Finally, it is not too difficult to prove that the following inequality holds true, llrn+Jx

Ilrflllx

(298)

Ilfnllx.

(299)

and also Ilf,+lllX

2

LINEAR INVERSE AND ILL-POSED PROBLEMS

91

These properties are analogous to properties proved for the regularization algorithm (226) and precisely to the fact that E, as given by Eq. (228) is an increasing function of u, whereas E, as given by Eq. (229) is a decreasing function of u. 2. Steepest Descent

In this case, the iteration scheme is given by

f o = 0,

f,+l

=f

n -

T J n 7

(300)

where Tfl

=

ll~nll~/ll~~nll~.

(301)

It has been proved that when P g E R ( L L * ) (this condition is obviously f (Kammerer stronger than the condition Pg E R(L)),then fn converges to ' and Nashed, 1971). Therefore, if we set f, = R'")g the family of operators { R ( " ) }defines a regularization algorithm. Notice that the operator R'") is continuous but is not linear. Moreover, inequalities (297) and (298)hold true in this case. We have mentioned this algorithm for completeness, though we are not aware of any applications of the algorithm to the solution of inverse problems.

3. Conjugate Gradient

In this case, the iteration scheme is given by f o = 0,

f,+l

=f

n

- TnPnr

(302)

where Po

= ro =

-L*g,

Pn =

rn+

On-IPn-1,

(303)

and also Tn

= (rn,Pn)*/ll~Pnll;9

an-, = - ( ~ ~ n ~ ~ P n - l ) ~ / l l ~ P f l - l (304) ll;.

It is known that for an N-dimensional problem, this method is a finite iterative method, in the sense that a theoretical convergence in N steps is guaranteed. This theoretical result holds true, for example, in the case of inverse problems with discrete data. In practice, roundoff errors prevent the achievement of this theoretical convergence. Moreover, in the case of a functional equation in an infinite-dimensional space, the number of iterations required for convergence is infinite. The convergence off, to f ' given by Eqs. (302)-(304) has been proved in the case Pg E R(LL*L) (Kammerer and Nashed, 1972). This condition is stronger than the condition required for the convergence of the steepest

92

M. BERTERO

descent method. Again, if we set f, = R(")g,the sequence (R'"))defines a regularization algorithm. The operators R(")are continuous and nonlinear. Maitre (1981) has compared the Gerchberg-Papoulis (or LandweberBialy) method and the conjugate gradient method for extrapolation of a signal of finite extent. The result, obtained by numerical simulations, is that the conjugate gradient produces the same accuracy as the Gerchberg- Papoulis algorithm, though requires for fewer iterations. In some cases, the conjugate gradient method made possible a reduction in the number of iterations by a factor of 5000 from the Gerchberg-Papoulis algorithm without any reduction in accuracy. In fact, both Landweber-Bialy iteration and conjugate gradient method compute first those parts of the solution that belong to the large singular values. Conjugate gradient, however, seems to be more efficient in this procedure, as indicated by arguments developed by Natterer (1986~).An impressive example has been found in the case of Laplace inversion in a weighted space (Bertero et al., 1986~);the approximate solution given by the n-th iteration of the conjugate gradient method practically coincides here with the approximate solution obtained using the first n terms in the singular function expansion, at least for small values of n. E . Choice of the Regularization Parameter

A regularization algorithm provides a one-parameter family of approximations of the unknown generalized solution f'. This family describes a trajectory in the Hilbert space X and, as follows from inequality (268), there exists a unique point of this trajectory which has minimum distance from f +. This implies the existence of an optimum value of the regularization parameter for a given noisy image gc. The determination of this optimum value, however, requires knowledge of the unknown generalized solution and, therefore, it cannot be performed in practice. It follows that the solution of a practical problem involves two essential steps, first the choice of the regularization algorithm and, second, the choice of a criterion for selecting the regularization parameter. From this point of view, the methods presented in Section V,A can be considered methods for selecting the regularization parameter in the regularization algorithm (226). The typical feature of these criteria is that some additional information about the solution and/or the error is required. An extension of some of them to more general regularization algorithms can be performed as follows. We introduce in the general case the two functions of the regularization parameter already introduced in (226), that is, the norm of the regularized

LINEAR INVERSE A N D ILL-POSED PROBLEMS

93

solution Em = lIR,gll,

(305)

and the discrepancy function IILRag - S l l Y .

(306) The latter is the distance between the data computed using the approximation fa = Rag and the real data. Then we assume that these functions have the properties proved in the case of the algorithm (226), that is: 6, =

(a) E, is a strictly decreasing function of a whose values at a = 0 and l / j ' + l l(i.e., ~ 00, when the generalized solution does not exist) and zero, respectively). (b) E, is a strictly increasing function of a, whose values at a = 0 and a = 00 are llQgllYand JJgJJy, respectively. a = 00 are

One can easily check that these conditions are also satisfied by the examples of spectral windows given in Section V,C and by the iterative algorithms of Section V,D. We consider now two criteria for selecting the regularization parameter. The first criterion is based upon the assumption that a bound E for the norm of f is known, i.e., f E S, (Eq. (236)).If the prescribed constant E is smaller than the norm of the generalized solution f +,then property (a) implies that there exists a unique value of a, say a(,), which solves the equation E, = E . For a > dE),we have E, < E , and therefore all the corresponding regularized solutions belong to the sphere S,. Moreover, from condition (b) it follows that if a > a(,), the discrepancy E , is greater than the discrepancy corresponding to a = dE). We conclude that c1 = a(,) is the value of the regularization parameter providing a regularized solution that is compatible with the prescribed constraint and that minimizes the discrepancy between the computed and the measured data. Then it is obvious that the method of constrained least-squares solutions of Section V,A.l gives a value of the discrepancy function which is smaller than the value provided by any other regularization algorithm, for a given value of the prescribed constant E. The second criterion is based upon the assumption that a bound E on the error is known. Then if E satisfies inequalities (241), property (b) implies that there exists a unique value of a, say a = a('), which solves the equation E , = E . For a < a('), we have E , < E , and therefore all the corresponding regularized solutions are compatible with the data with accuracy E. On the other hand, property (a) implies that if a < a('), the norm of fa increases. We conclude that c( = a(') is the value of the regularization parameter providing a regularized solution which is compatible with the measured data and which has minimal norm.

94

M. BERTERO

Notice that the method of Section V,A.2 gives a solution whose norm is smaller than the norm of any other regularized solution, for a given value E of the error estimate. This second method is also known as the discrepancy principle (Morozov, 1966; 1968). From the results given in Section V,A.2, it follows that for the regularization algorithm (226), this method always provides a regularized solution that converges strongly to the true generalized solution f ' as the error of the data tends to zero. Then the question arises as to whether the same property is true for other regularization algorithms, in particular, spectral windows and iterative methods. The answer is, in general, negative if the discrepancy principle is formulated as above. But if the discrepancy principle is slightly modified, so that it is necessary to find a value of 01 such that

I I m S - 911u = 11%

(307)

where > 1 is a given (but arbitrary) constant, then it is possible to prove the convergence result for a large class of regularization algorithms, including, for example, the method of truncated singular function expansions and the Landweber-Bialy iterative method (Vainikko, 1982; Defrise and De Mol, 1987). In the case of truncated singular function expansions, it is possible to introduce a method for the selection of the regularization parameter, or equivalently, the number of terms in the expansion, which is analogous to the method of Section V,A.3. If we assume that the solution satisfies the constraints (243)and if we retain in the expansion (179) only terms corresponding to singular values fulfilling the condition 2 EIE,

(308)

then the resulting truncated singular function expansion satisfies the constraints (243) except for a factor of & (Miller, 1970). Notice that the quantity controlling the truncation of the expansion is a kind of signal-to-noise ratio, so that, therefore, we have an extension of the method of numerical filtering (Twomey, 1965). The criterion given by Eq. (308) applies, of course to the case of compact operators, but it can also be extended to the general case of a continuous operator, when the regularization algorithm is defined by the spectral window (278) (Miller, 1970). In the case of an ill-conditioned problem with discrete data, the regularized solution always converges to the true generalized solution as the error of the data tends to zero. For example, it is obvious that, as E -+ 0, both do and dE), as defined in the present section, tend to zero. Moreover, one can always use the method of truncated singular function expansions and the criterion (308) for the choice of the optimum number of terms.

LINEAR INVERSE AND ILL-POSED PROBLEMS

95

Finally, there is an important method that has been proposed for the regularization algorithm (226) but which can be used only in problems with discrete data. This is the method of cross-validation, essentially suggested in the context of smoothing spline functions (Wahba and Wold, 1975a; 1975b), and later extended to more general problems (Wahba, 1977). This method does not require any upper bound on the solution and/or data error, and is based on the idea of letting the data themselves choose the value of the regularization parameter. More precisely, it is required that a good value of the regularization parameter should predict missing data values. If we consider an inverse problem with discrete data, formulated as in Section lll,A, we may than denote by fa,,,the minimizer of the functional @a,k[fl

=

N-'

c

I(Lf)n

n+k

- gnI2

+ ~~~f~~~~

(309)

where ( L f ) , is defined by Eq. (124). This functional is just the functional (221) where the data space is the usual Euclidean space, but with the k-th data missing. The extension to the case of a weighted norm in Y is easy. Then the cross-validation function Vo(u)is defined by

and the cross-validation method consists in determining the unique value of a which minimizes Vo(a).The computation of the minimum is based on the relation (Golub et al., 1979; Craven and Wahba, 1979)

where f a is the minimizer of the functional

which is a special case of the functional (221), and where Akk(m)is the kk-entry of the N x N matrix A(a)= LL*(LL*

+ aZ)-'.

(3 13) Notice that LL* is essentially the Gram matrix of the functions &, since, in this formulation, W = N-'I. It has been shown (Golub et al., 1979; Craven and Wahba, 1979) that from the point of view of minimizing the predictive mean-square error, minimization of Vo(a)must be replaced by minimization of the generalized crossvalidation function, defined by V ( M= ) (N-I Tr[I

-

A(a)])-2(N-'II[I - A(a)]gl)*),

(314)

96

M. BERTERO

where the norm is the usual Euclidean norm. An important property of V ( a )is its invariance with respect to permutations and, more generally, with respect to rotations of data values. The choice of the regularization parameter has been the subject of several papers. It is impossible to give in this review a complete account of all the criteria that have been suggested or their main mathematical and computational properties. The criteria we have outlined are the most general and most significant in our opinion. Our feeling, however, is that it is not possible to find a criterion that could work for any ill-posed problem. Therefore, given an ill-posed problem one must investigate the various algorithms and criteria that have been proposed and perhaps invent a new one in order to take full account of the specific characteristics of the problem.

VI. INVERSE PROBLEMS AND INFORMATION THEORY The results presented in Section V indicate that there are many algorithms for solving ill-posed or ill-conditioned inverse problems. The situation can be confusing but this is unavoidable. After years of theoretical investigations and computational work, a rather generally accepted point of view is that no general method exists and that, even in solving a specific problem, it is convenient to use different algorithms for different classes of solutions. The basic reason for this difficulty is that very often, the data contain rather poor information about the solution. For example, in the case of a compact operator, the singular values tend to zero so that data components corresponding to small eigenvalues are completely contaminated by noise. In more general cases, this situation is a consequence of the fact that the exact image can be for smoother than the corresponding object. The problems discussed in Section I1 provide several striking examples of this situation. It follows that two completely different objects can produce very similar smooth images. Then the noise contribution h to the real image given in Eq. (9) hides the smoothness of the exact image, and it becomes impossible to distinguish between the two different objects. A first consequence is that in designing an algorithm for solving an inverse problem, one must never forget the general principle formulated by Lanczos (1961, p. 132). “... a lack of information cannot be remedied by any mathematical trickery.” In other words, clever algorithms cannot produce miracles. The basic point is to understa.nd the information content of the data and the role of any available a priori information about the solution. A number of concepts introduced in the theory of inverse and ill-posed problems, such as stability estimates, resolution limits, number of degrees of

LINEAR INVERSE AND ILL-POSED PROBLEMS

97

freedom, and so on, follow this general approach. In the present section, we attempt a presentation of these ideas with the aim of showing the various relationships between them. We do not think that the result of this effort is already a completely satisfactory theory. We hope, however, that the main features of a future, complete theory can emerge from these ideas. A . Modulus of Continuity and Uncertainty of the Solution

Given a regularization algorithm and given a criterion for the choice of the regularization parameter, one has a recipe for computing an approximate solution of an inverse problem. Then one can try to determine the following: (a) the stability, or robustness of the algorithm and (b) the convergence of the approximate solution to the true solution as the noise tends to zero. The first problem is a typical problem of numerical analysis which can be solved by looking at the condition number or some other estimation of numerical stability. The second problem would appear to be a purely mathematical question since, in practice, the noise is never zero. However, when the convergence result holds true, one knows that by reducing the noise, one can get a better solution. For this reason, a proof of the convergence of an algorithm is also interesting from the practical point of view. For example the method of Section V,A.2, which corresponds to the choice of the regularization parameter given by the discrepancy principle, provides a stable approximate solution that converges to the true generalized solution as the error of the data tends to zero. We must point out, however, that, even if convergence is guaranteed, convergence can be arbitrarily slow since no rate of convergence can be found for arbitrary solutions. In fact, if we want to have a rate of convergence, we must restrict the class of admissible solutions by means of some kind of a priori information. Then one can introduce a modulus of continuity, which, as we will show, is essentially a measure of the uncertainty of the solution. An upper bound for the modulus of continuity is also called a stability estimate (John, 1960; Miller, 1964; 1970). We will assume, for the sake of simplicity, that the inverse operator L-' exists and that, in general, it is not continuous. Several results, however, can be easily extended to the case of the generalized inverse L' just by restricting L to N(L)', i.e., by taking N(L)* as the new object space. We first define the convergence rate of the regularization algorithm { in the case of exact data associated with functions f of a prescribed set H:

98

M. BERTERO

An estimate of wH(cr),combined with inequality (268), can be used for obtaining a value of the regularization parameter that is optimum in H (Groetsch, 1984). In fact, one can look for the value of c( which minimizes the function wH(u) ~llR,ll. Moreover, for noisy data gc corresponding, with error E , to solutions in H, we define a modulus of convergence of the regularization algorithm { R,),, as follows (Franklin, 1974): (316) &,CO = suP0lRug, - f l l x l f E H , llLf - S€llU 4 .

+

Then, inequality (268) implies that

rJ"k4

%(co + ~IIRaII.

(317)

Finally we introduce the modulus of continuity of the operator L-' restricted to L H : P H ( 4 = suplllfllx If E H , IlJWlv

4.

(318) If H contains a neighborhood of 0, then p H ( € )is a continuous increasing function of E . Moreover, if H is compact, then the topological lemma of Tikhonov referred to in Section V,A.4 implies that pH(€) -+ 0 as E -+ 0. We point out, however, that the compactness of H i s only a sufficient condition for this result: it is easy to find examples of bounded sets H that are not compact such that pH(€) 4 0 as E + 0. Examples are given by Bertero (1982; 1986). The relationship between the modulus of convergence and the modulus of continuity is clarified by the following result (Franklin, 1974): if the set H contains a neighbourhood of 0, then for any linear regularization algorithm { R u } a > 0 and any a

P H ( 4 %(E7 4. (319) The relevance of this result is obvious. Given a set H of solutions, no regularization algorithm and no choice of the regularization parameter can provide a modulus of convergence that tends to zero more rapidly than the modulus of continuity. Therefore when the data are noisy, the last one is the best possible convergence rate for the approximation of elements of H. We point out that this optimum converge rate is obtained in the case of constrained least-squares solutions as defined in Section V,A.4. In fact, it is proved by Ivanov (1962) that if the set H is closed, convex and symmetric with respect to 0 and if pH(€) -+ 0 as E + 0, then, for sujiciently small E where .F and f; are the constrained least-squares solutions associated respectively with data g and g' satisfying the condition 119 - g'llu < E . Very often, the set H can be characterized as a level set of a functional of the

LINEAR INVERSE AND ILL-POSED PROBLEMS

99

form (1 3), i.e.,

H = {f E D ( C )I IlCfllZ 5 El.

(321)

Then if the constraint operator C satisfies the conditions of Section II,A the set H is closed, convex, and symmetric with respect to 0. Moreover, H is also compact when C has a compact inverse C - ’ , and in this case all the assumptions of the Ivanov theorem on the convergence rate of constrained least-squares solutions are satisfied. Another approximate solution which has the optimum convergence rate is provided by Eq. (255) with a = (e/E)’. If we denote this solution by fo, then it has been proved (Miller, 1970) that

1l.L - f l l x 5 f i P H ( E ) > for any arbitrary f

E

K

(322)

K , the set K being defined by =

{ f ~ x lH,f ~ IlLf

- glly 5

€1.

(323)

We stress now another interpretation of the modulus of continuity (318). The set K defined in Eq. (323) is a generalization of the set introduced in Section V,A.3. In that case, H i s the sphere of radius E in X . It is obvious that K is the set of all the admissible approximate solutions compatible with the data function g with error E. Any element of K is an acceptable approximate solution of the problem and, therefore, the diameter of K is a measure of the uncertainty of the solution for given a priori information (the set H ) and given noise level (the value of E). When the set H is convex and symmetric with respect to zero (the conditions satisfied by the set (321)) the diameter of K can be easily estimated in terms of pH(€),and therefore the modulus of continuity is also an estimate of the uncertainty of the solution. In fact, if f, f’E K , then f, = (f - f’)/2 belongs to H. From the inequality IILflIIy 5 $(llLf - glly + IILf’ - glly) IE , it follows llfill < pH(€),and therefore diam(K) = sup{llf - f’1l.X

If,

f ’E K } 5 2pH(E).

(324)

In other words, when pH(€)+ 0 as E + 0, the uncertainty of the solution tends to zero in the case of vanishing noise. We also notice that the modulus of continuity provides an estimate of the uncertainty that is independent of the data function g. Finally, we consider the problem of estimating the modulus of continuity pH(e),i.e., the problem of determining stability estimates. We restrict ourselves to the case of a set H defined as in Eq. (321), since only then is it possible to give rather general methods. A first remark is the following. If we introduce the quantity k ( E ) = sup{llfll,I IlCfllZ 1, I l L f l l Y (325) € 1 9

100

M. BbRTERO

then we have PH(4

= EPC(E/E)>

and therefore we can restrict our attention to the estimate of pC(e). If we introduce now the quantity Pi%)

= sup{llfllx

1 II&fll: + E211cfll; I 4

and the sets K(O) =

{f E X 1 IILfll? + ~ ~ I l C f l5l $e 2 )

K'"

{f E x I lLf112y

=

+ E'llCfll$

I 2E2}

from the inclusions K(O) c

K c K('),

(here K is defined by Eqs. (323) and (321)), we derive the inequalities PLO)(E) s

PC(E)

I fip;O)(E).

We find that ,$)(E) is a good stability estimate of the modulus of continuity. ) be easily computed. In the case of a finite-dimensional Moreover, ~ P ) ( Ecan problem, K(O)is an ellipsoid and $ ) ( E ) the maximum length of the semi-axes of K'O). Therefore, it can be determined by solving an eigenvalue problem. can always be This result holds true in general, i.e., the computation of ,up)(€) reduced to the solution of a spectral problem for the operator L*L+ e2C*C. This operator is positive-definite and therefore its spectrum has a positive lower bound y 2 ( e )which can be determined by solving variational problem

r w = inf{ll&fII? + E211Cfll;lIlfIIX

=

11.

(332)

Then, by comparing this equation with Eq. (327), we have PLO)(E)= E / Y ( E ) .

(333)

For more details see (Bertero ec al., 1980a). The relation (333) can be used for computing stability estimates in several important cases (Bertero et al, 1980; Bertero, 1982; 1986). Finally, we note an important remark due to John (John, 1960). We say that we have Holder continuity in the dependence of the solution on the data when there exist constants A , q with 0 < q I 1 such that &')(E)

5 AE",

(334)

while we say that we have logarithmic continuity when &?)(E)

with q > 0 arbitrary.

I AllnEl-"

(335)

LINEAR INVERSE A N D ILL-POSED PROBLEMS

101

When Holder continuity holds true, John calls the ill-posed problem wellbehaved. In this case, only a fixed percentage of the significant digits is lost in determining f from g, and therefore the uncertainty of the solution is not very severe. In contrast, in the case of logarithmic continuity, even an improvement of several orders of magnitude in the noise level does not induce a significant reduction of the uncertainty of the solution. In other words, the information content of the data is practically noise-independent. It is important to realize that the type of continuity does not depend only on the problem, i.e., the operator L , but also on H. For one and the same problem, we can have Holder continuity for certain sets H and logarithmic continuity for others. For certain ill-posed problems, one can have Holder continuity when one prescribes bounds on a finite number of derivatives of the unknown function f . In this case, the problem is said to be mildly ill-posed. Examples are tomography, Abel transform inversion, and numerical differentiation (Bertero et al., 1980; Louis and Natterer, 1983). On the other hand, when prescribed bounds on a finite number of derivatives imply only logarithmic continuity, the problem is said to be severely ill-posed. Examples are Laplace transform inversion (Bertero et al., 1982),the problem of bandwidth extrapolation and, in general, the solution of a Fredholm integral equation of the first kind with analytic kernel (Bertero et a/., 1980a).As already pointed out, however, one can have Holder continuity even in the case of a severely ill-posed problem simply by choosing an appropriate set H of admissible solutions. For example, in Laplace transform inversion, we have Holder continuity if His a bounded set of functions having suitable analyticity properties (Bertero et al., 1982). B. Evaluation of Linear Functionals and Resolution Limits

In several applications one is not directly interested in estimating the solution of a problem, but rather the value of some suitable functionals of the solution. These functionals can be, for example, a moment of given order (the average radius or average occupied volume in problems of particle sizing described in Section II,C.3) or, in general, a linear continuous functional, i.e., a generalized moment. Several examples related to the applications of Abel equation are described by Anderssen (1986). Stated in a rigorous form, the problem is the following: Given an element E X , estimate the value of the functional where f is a solution or generalized solution of Eq. (5). The important feature of these problems is that some of them are far more

102

M. BERTERO

stable than the problem of determining the solution f itself. In fact, for a certain class of these functionals the evaluation problem is well-posed. Since this class can be characterized for any given linear inverse problem, we have here a precise answer to the question succinctly expressed by Sabatier (1984): the need to identify and ask within the framework of indirect measurements well-posed questions about a phenomenon of interest. In general, however, the problem of estimating the functional (336) is not well-posed, and therefore it is necessary to use regularization theory. A modulus of continuity can also be defined for this problem, and by considering special classes of functionals in this approach one can find a rather precise definition of the resolution limits achievable in a given inverse problem. 1 . Evaluation of Well-Posed Functionals

Consider a functional of the form (336) with 4 E R(L*).Then there exists a function $ E Y such that 4 = L*$. By substituting for 4 in Eq. (336), we find Fg(f) = u - 3 L*$h = (Lf, $)Y

*

(337)

In this case, F+,is a continuous linear functional of Lf.Therefore, it is obvious that given the data function g, the estimation of the value of the functional is ag = (s9 $)Y

(338)

*

In other words, these functionals can be estimated directly from the data without any need to solve for the unknown function f.It is also obvious that the dependence of the value of Fg on g is continuous. If 6g is a variation of g and bag the corresponding variation of a+, then, from Eq. (338), using the Schwarz inequality we obtain 16agl 5

Il$llY

' lI&lllY

5

4l$llY.

(339)

It is obvious, however, that the problem can be ill-conditioned because the error ha, can be exceedingly large when Il$lly is too large. It is also important to notice that Eq. (337) characterizes all the functionals that depend continuously on Lf and therefore also characterizes all the functionals that can be directly estimated in terms of g. Assuming again, for the sake of simplicity, that the inverse operator L-' exists, this result can be proved as follows. If the functional F4 has the property

IE$(f)l5 C I I U l l r ,

(340)

where c is a constant independent of f,then, given f~ X , it is always possible to find g E R ( L ) such that f = L-'g. By inequality (340), it follows that IF+,(f)l= I(f> 41x1 = I(L-'g,

41x1

(34 1) Therefore (L-'g, 4)x is a linear and continuous functional on R ( L ) and can CllSllY.

103

LINEAR INVERSE AND ILL-POSED PROBLEMS

-

be extended, by continuity, to a linear and continuous functional on R(L).By the Riesz representation theorem, this implies that there exists an element $ E R(L) such that ( L - l99 4)x = (g*$ ) Y . (342) If we now set g = L f , we have

(f,4)x = (Lf, $)Y

=(

f 3

L*$)x,

(343)

and therefore 4 = L*$. As an example, consider the case of the integral operator (87) whose adjoint is just the band-limiting operator (80) when the values of x are restricted to the interval [- 1, I]. It follows that in the inversion of the integral operator (87), a linear and continuous functional can be estimated directly from the data whenever the function 4 is the restriction of a band-limited function to the interval [ - 1,1]. 2. Evaluation of Ill-Posed Functionals When 4 4 R(L*),the functional (336)cannot be estimated directly in terms of the data g. Therefore, the use of regularization theory is required in this case. It is obvious if ?is some regularized solution of the problem, then the corresponding regularized value of the functional is

-

a+ = (Xcph.

(344) Also for this problem one can define a modulus of continuity as follows: C I H k

4) = S U P { l ( f 3 cpfx 1 .fe ff,llLf IIY 5 €1

1345)

and the analysis runs parallel to that outlined in Section V,A. In particular, in the case where the set H is of the type (321), one can introduce the stability estimate PLO)(EP#J)= SUP{l(fAW

I IILfll: + E211Cfll;

5

.’>

(346)

and inequalities analogous to inequalities (331) hold true in this case. Moreover, it is possible to compute &O)(E;@). The result is (Miller, 1970; Bertero et al., 1980a) p p ( E ; 4) = E([L*L+ &*C]

-14,4);’2.

(347)

From this equation, it is not difficult to prove that $ ) ( E ; 4) + 0 as E + 0 with 4 arbitrary whenever the constraint operator C has a bounded inverse (Bertero, 1982). In conclusion, regularized solutions also provide stable estimations of linear and continuos functionals and the corresponding stability estimates can be easily computed.

104

M. BERTERO

3. Resolution Limits We apply now the analysis of the previous sections to the investigation of resolution limits. We restrict ourselves to the case where X is a space of squareintegrable functions. Then the functional (336) takes the form

(f,4)x =

s

(348)

f(x)4(x)dx

(For the sake of simplicity, we consider only the case of real functions of one variable.) Morever, let us assume that the function 4 is positive and peaked upon the point xo (and, for example, symmetric with respect to x,,), that its integral is 1, and its second central moment is a2,i.e.,

s

$(x)dx = 1,

s

(x

-

xo)24(x)dx = a2.

(349)

Then the value of the functional (348)can be considered a blurred value off at the point x = xo. In other words, we want to estimate a local average off over a resolving length a. It is quite natural to predict that the estimation error (for a fixed noise level E ) will grow for decreasing values of a, i.e., for increasing resolution, and therefore the achievable resolution will be obtained by fixing the acceptable value of the estimation error. Consider the case C = I. Then, as follows from Eq. (347) the absolute error in the estimation of the functional (348) is bounded by

4) = E( [L*L + €'I]

-14, (

by.

(350) It is more interesting, however, to introduce relative errors. This approach is quite natural in the case of stochastic regularization (Wiener filters). Here we can proceed along the lines indicated by Bertero et al. (1980a). The absolute error (350) is the maximum value of I(f,4)xIunder the constraint IILfll: ~ ' I I f l l : I E'. This a posteriori constraint is compatible with the a priori constraint llfllx I 1. Then it is easy to see that the maximum value of [(f, 4)xI under this a priori constraint is just I1411x.Therefore, we can define the ratio as an estimate between the a posteriori and a priori maximum value of [(f, of the relative error, i.e., E,&

+

It is an immediate result that E,&; 4) I 1. We can consider now a family { 4b}b,of functions having different values of the variance a 2 (for instance, Gaussians of variance a'). Then the relative error (351) is a function of E and a, let us say E , , ~ ( E , O ) .Since 4bis an approximation of the Dirac delta function, the L2-norm of c # ~tends ~ to infinity

105

LINEAR INVERSE AND ILL-POSED PROBLEMS

as c -+ 0. Without loss of generality, we may assume that this norm is a decreasing function of c. Then from Eq. (351) one can derive the following properties of the relative error: (a) for fixed c,E,&, a) is an increasing function of E ; (b) for fixed E , &,&,a) is a decreasing function of a and tends to one (100% error) as a + 0. The typical behavior of E,,](E, a) as a function of c is indicated in Fig. 11. Numerical computations of these curves for various inverse problems are given by Bertero et al. (1980a; 1980b) and by Abbiss et al. (1983). Let us make some remarks regarding these results. The plot of E , , ~ ( E , a) as a function of (T represents a trade-off between resolution and error. If we fix the

0.5

1

0

FIG.1 I. Illustration of the trade-off between relative error and resolution. It is assumed that the unit of resolution is some typical length related to the problem (for instance, the Rayleigh resolution distance in the case of an imaging system).

106

M. BERTERO

acceptable error on the averaged solution, for instance lo%, then from the curve we can deduce the corresponding resolution and vice-versa. As indicated in Fig. 11, if we reduce the error on the data, say E' < E, then we have an improvement in resolution. However, if the inverse problem we are considering is affected by logarithmic continuity, the change in the plot of E,,,(E, a) is imperceptible even when there is a change in E of several orders of magnitude. This effect is clearly shown by the computations presented by Bertero et al. (1980b) on the inversion of the Slepian operator (84). In all the problems that exhibit this behaviour, there is a resolution limit that is practically noise independent, representing a fundamental limitation on the possibility of recovering details of the unknown object. In the problem considered by Bertero et al. (1980b), for example, this limit is simply the classical Rayleigh resolution distance. A possibility of going beyond this limit is indicated by Bertero and Pike (1982) for the case when: 1) the full image is measured, so that the data contain more information about the solution; 2) the value of c is small, essentially a limitation on the size of the unknown object (which is important a priori information).

In general, the resolution limit as defined by the previous approach depends on the point xo (see Eq. (349)), and therefore there is no uniform resolution over the domain of the variable x. In the inversion of a convolution operator, however, the resolution does not depend on x due to the translational invariance of the problem. The simplest example of a problem with nonuniform resolution is provided by the inversion of an integral operator of the form (62). Since such an operator can be transformed into a convolution operator by taking as new variables the logarithm of the old ones, it is obvious that in this case we have a uniform resolution in log-variables. This implies that the resolution distance increases for increasing values of the variable r in Eq. (62). In fact it is more appropriate to introduce a resolution ratio 6, rather than a resolution distance (McWhirter and Pike, 1978).The meaning of this resolution ratio is the following: given two delta pulses at positions r l and r 2 , it is impossible to resolve these pulses unless r2 2 d0r1. A general method for the estimation of b0 is discussed by McWhirter and Pike (1978) C. Number of Degress of Freedom

The concept of number of degress of freedom was first introduced in optics in terms of the sampling expansion and successively clarified in terms of the basic properties of the PSWF (Toraldo di Francia, 1969a). The typical step

LINEAR INVERSE A N D ILL-POSED PROBLEMS

107

wise behaviour of the eigenvalues of prolate spheroidal functions indicates that while the object can have an arbitrary number of degress of freedom, i.e., an arbitrary number of large components with respect to PSWF, the image always has a finite number of components. This number, called by Toraldo di Francia the Shannon number, is given by S = C / I C and is proportional to the space-bandwidth product. In fact the Shannon number is approximately equal to the number of sampling points interior to the geometric image and is essentially a characteristic parameter of the optical instrument, giving a measure of the information transmitted by the instrument itself. In subsequent work (Toraldo di Francia, 1969b), it was recognized that this number is,in fact, noise-dependent. The dependence is so weak, however, that the original conclusion remained valid for all practical purposes. An interpretation of the result was later given in terms of the theory of ill-posed problems by showing that the problem of inverting the Slepian operator is affected by logarithmic continuity (Bertero et a/., 1980a). The concept of number of degress of freedom or, more precisely, the concept of a noise-dependent number of degress of freedom, can be generalized to all inverse problems that can be treated in terms of singular systems, i.e., problems corresponding to the inversion of a compact operator and inverse problems with discrete data (Twomey, 1974; Bertero and De Mol, 1981a; Pike et al., 1984). As discussed in Section V, an acceptable regularized solution of these problems can be provided by a truncated singular function expansion. When the noise level E is known and a priori information on the solution is represented by a prescribed bound E on the norm of the solution, the truncation rule is given by Eq. (308). This condition has also a nice statistical interpretation. Assume that the data is represented by Eq. (9) and that both f and h are representatives of zero-mean, uncorrelated random processes. Moreover, assume that the signal f is from a white noise process with power spectrum E 2 and that h is also from a white noise process with power spectrum e2.Then the variance of any given component of f with respect to the basis ( U k f is: ( I ( f , u , J X I 2 ) = E 2 . Analogously, the variance of any given component of h with respect to the basis ( u k } is: ( ~ ( hUk)yl2) , = E’. From the relation ( g , uk)y = (Lf, 0k)Y + ( k uk)Y = ak(f, uk)X + (h,uk)Y we get 9

This equation shows that the variance of a given component of the data consists of two term, the first being the contribution of the object and the second the contribution of the noise. The first term tends to zero as k + 00, or becomes very small for large k in the case of ill-conditioned inverse problems with discrete data; the second term is constant. Moreover, the first term is a

108

M. BERTERO

decreasing function of k. Therefore, for k greater than some critical value, the variance of the noise is greater than the variance of the object contribution and the corresponding data components do not contain any information about the object. Equation (308) simply states the requirement that the variance of the object contribution must be greater than the variance of the noise. In the case of the Slepian operator, the quantity

N ( E / E )= max{k

+ Ilok 2 E / E } ,

(353)

is approximately equal to the Shannon number. Therefore, it is quite natural to call it the number of degrees offreedom (NDF) in the case of the more general problem. The N D F is a function of the signal-to-noise ratio E / E and gives a measure of the information content of the data. It is a very useful parameter for estimating, in general, how many distinct elements one can resolve with the available data. From the examples discussed in Section 111, it follows that for some important inverse problems with discrete data, such as the moment problem or Laplace transform inversion, the N D F can be quite small (on the order of 3 or 5, and in any case less than 10).For other problems, such as tomography, the N D F can be very large. Here is another way of introducing a distinction between mildly and severely ill-posed problems. In general, in the case of a mildly ill-posed problem where it is possible to restore Holder continuity by means of prescribed bounds on a small number of derivatives of the solution, the N D F depends rather strongly on the signalto-noise ratio. It is possible to obtain a significant increase in the N D F by increasing E / E .In contrast, in the case of a severely ill-posed problem affected by logarithmic continuity, the N D F is nearly independent of the signal-tonoise ratio, at least in the case of reasonable values of this quantity. This usually happens if the singular values 4 tend to zero exponentially fast as k -+ co (this is the case, for example, of the Slepian operator). Then if No is the value of N ( E / E )corresponding to some preassigned value of the signal, can easily deduce that to-noise ratio, say E O / e Oone N ( € / E ) = ( N o - 1)

+ clog,,(E€,/€E,)

(354)

where c is a constant (Bertero and De Mol, 1981a). Therefore, if No is very large, even when improving the signal-to-noise ratio by many orders of magnitude there is no significant improvement in the NDF. This is what happens in the problem discussed by Toraldo di Francia mentioned at the beginning of this section, where the N D F can be considered as practically noise-independent and equal to No. It is important to point out that the N D F can always be introduced (and computed) in the case of an inverse problem with discrete data, since in this

LINEAR INVERSE AND ILL-POSED PROBLEMS

109

case one can always use singular function expansions, as indicated in Section II1,A. Then the N D F can depend not only on the signal-to-noise ratio E / E ,but also on the number N of data points. This is true, for example, in the case of the finite Hausdorff moment problem (Section 111,D). If we define the N D F as the number of singular values greater than (recall that the singular values are just the square roots of the eigenvalues of the Hilbert matrix), then we have N D F = 4 for N = 4,N D F = 6 for N = 10, N D F = 8 for N = 50, and N D F = 9 for N = 100. Clearly, the N D F depends on the number N of given moments, even if the dependence is rather weak. This dependence, however, is related to the fact that the (infinite-dimensional) Hausdorff moment problem does not correspond to the inversion of a compact operator, and therefore the singular values and singular functions of the finite Hausdorff moment problem do not have a limit as N -i 00. . The situation is different when a problem with discrete data is a finite version of an infinite-dimensional problem correspopding to the inversion of a compact operator. In this case, the N D F can be defined for the infinitedimensional problem and the N D F of any finite version of it cannot be greater than the limiting NDF. But when the number of points of the finite version is sufficiently large, its first singular values are good approximations of the corresponding singular values of the compact operator. It follows that for N greater than a suitable number No of data points, the N D F is independent of N and equal to the N D F of the compact operator. An illustration of this behavior for Poisson transform inversion is given by Bertero and Pike (1986). We have also some indications that in the inversion of a compact operator, satisfactory results can be obtained using a finite version of the problem with a number of (suitably placed) data points, which just coincides with the N D F of the infinite-dimensional problem. We give an example taken from the inversion of the Laplace transformation in a weighted space, using as a weight the gamma distribution (Bertero et al., 1985~).In this case the Laplace transformation is compact and the NDF, defined as the number of singular values greater than lop2,is 4.In Fig. 12 we give the reconstruction of two delta pulses having the same mass, i.e., 0.5. Clearly, using four geometrically spaced data points, it is possible to obtain a reconstruction that practically coincides with the reconstruction obtained using 32 uniformly spaced data points. Of course, the position of the four points must be optimized (in this case, the criterion is minimization of the condition number). As a conclusion, we find that this example provides a strong indication of the fact that the N D F can coincide with the optimum number of data points. In other words, one needs as many data points as pieces of information that can be extracted from the complete (infinite-dimensional) noisy data (i.e., the maximum number of these pieces of information).

110

M. BERTERO

-0.5 FIG. 12. Reconstruction of two delta pulses in the case of Laplace transform inversion in a weighted space using 32 equidistant points (dotted line) and 4 geometrically spaced points (full line).

D . Impulse Response Function: Another Approach to Resolution Limits

Given a regularizing operator R , with a fixed and an exact (noise-free) data g = L f , the corresponding regularized solution is f, = R , L f . Since f, converges to f when the operator L has an inverse L-' and to f ' , when L has a generalized inverse L', the operator

T,= R,L

(355)

is an approximation of the identity operator in the first case and of the projection operator over N(L)' in the second case. This is the mathematical interpretation of the operator T,,emphasizing the fact that, in the noise-free case, the regularized solution f, is just a (stable) approximation of the true solution f (or generalized solution f '). There exists, however, an interesting physical interpretation of T,. Since L describes the transmission of the signal by the instrument in the absence of noise, while R, describes the processing of the data by the computer in the absence of round-off errors, T, describes the total effect of both the transmitting instrument and the processing computer (in the absence of any kind of error).

LINEAR INVERSE AND ILL-POSED PROBLEMS

111

If T, is an integral operator,

fh) = ( T , f ) ( x )=

s

A,(& x ’ ) f ( x ’ ) dx,

(356)

the averaging kernel A,(x, x ’ ) is the impulse response function of the system consisting of instrument plus computer. Moreover, if for fixed x , A,(x, x ‘ ) as a function of x‘ has the form of a central lobe flanked by decreasing side-lobes, the width of the central lobe may be used as a measure of the resolution achievable at the point x by means of the algorithm R,. Notice that the form (356) for the approximate solution is just the starting point of the Backus-Gilbert method. Moreover, in the case of inverse problems with discrete data, if we assume that X is a space of squareintegrable functions and that the regularized solutions are defined by means of windowed singular function expansions, the kernel A,(x, x ’ ) is given by N- 1

In this approach, it is obvious that the resolution is determined by the choice of the regularization parameter. This choice, however, can be an important point in the case of mildly ill-posed problems, while in the case of severely illposed problems the choice of the regularization parameter depends very weakly on the noise and we again find a resolution limit that is practically noise-independent. It is obvious that the form and the width of the central lobe depend, in general, on the regularization algorithm. We think, however, that the width does not depend strongly on the algorithm, since it is a measure of the achievable resolution and, therefore, a measure of the information that can be extracted from the data. We must never forget the principle formulated by Lanczos and emphasized at the beginning of Section V. Though this question must still be investigated carefully, it is useful to examine an example in support of this claim. Let us consider inversion of Fraunhofer diffraction data. This corresponds . to inversion of an integral operator of the form (62) with K ( x ) = J 1 ( x ) 2 / xThe corresponding Fredholm integral equation of the first kind can be approximated conveniently by means of the exponential sampling method (Ostrowsky et al., 1981). When the problem has been discretized in this way, one can compute singular values and singular functions. For example, using 32 geometrically spaced data points and assuming that the support of the unknown function is in the interval [7,130] (the wavelength of the incident radiation is taken as a unit of the radius of the particles), one finds 18 singular values greater than

112

M. BERTERO

In Fig. 13 we give various reconstructions of a delta pulse of unit mass obtained using various regularization algorithms. The radius of the particles is represented on a logarithmic scale in units of the wavelength of the incident radiation. In all the reconstructions the maximum number of singular functions is 18. In the figure we have superimposed 10 different reconstructions corresponding to 10 different contaminations of the exact data by means of random errors of the order of 1%. In this way a clear picture of the robustness of the solution is obtained. In Fig. 13(a) we plot the reconstruction obtained by means of truncated singular function expansion. This provides the maximum resolution

a)

FIG.13. Reconstruction of a delta pulse in the case of the inversion of Fraunhofer digraction data. (a) Truncated singular function (TSF) expansion. (b) Tikhonov regularization with c( = (c) TSF expansion with triangular window. (d) TSF expansion with Hanning window.

113

LINEAR INVERSE AND ILL-POSED PROBLEMS

1

10

100

1000

FIG. 13. (Continued)

achievable by means of 18 singular functions, but this resolution is obtained at the cost of very large side-lobes. In the other parts of the figure, we plot the reconstructions obtained by means of various filterings of the previous truncated singular function expansion. In particular, Fig. 13(b) corresponds t o the Tikhonov window, Fig. 13(c) corresponds to the triangular window, and Fig. 13(d) corresponds to the Hanning window. In all these cases, we have a loss in resolution of approximately a factor of 2 with respect to the reconstruction of Fig. 13(a) but the side-lobes are always much smaller than in Fig. 13(a) and the reconstructions are nearly positive. The previous example seems to indicate that we can obtain positivity at the cost of a loss in resolution, a result which is in conflict with a rather commonly held opinion about the beneficial effect of the constraint of positivity in the solution of inverse problems. This question, however, is beyond the scope of

114

M. BERTERO

this paper, which is essentially devoted to linear methods for linear problems and is mentioned only as an example of the many questions that are still open.

REFERENCES Abbiss, J. B., Defrise, M., De Mol, C. and Dhadwal, H. S. (1983). “Regularized iterative and noniterative procedures for object restoration in the presence of noise: an error analysis,” J . Opt. SOC.Am. 73,1470. Anderssen, R. S. (1986).“The linear functional strategy for improperly posed problems.” In Inverse Problems (J. R. Cannon and U. Hornung, eds.). Birkhauser, Basel. Aronszajn, N. (1950).“Theory of reproducing kernels,” Trans. Amer. Math. SOC.68,337. Askne, J. I. H.and Westwater, E. R. (1986). “A review of ground-based remote sensing of temperature and moisture by passive microwave radiometers,” IEEE Trans. Geosci. Remote Sensing GE-24,340. Backus, G. and Gilbert, F. (1968). “The resolving power of gross Earth data,” Geophys. J. R. Astron. SOC.16, 169. Backus, G. and Gilbert, F. (1970). “Uniqueness in the inversion of inaccurate gross Earth data,” Phil. Trans. R. SOC.266, 123. Bakushinskii, A. B. (1965).“A numerical method for solving Fredholm integral equations of the first kind,” USSR Cornp. Math. Math. Phys. 5, (No. 4), 226. Balakrishnan, A. V. (1976). Applied Functional Analysis. Springer, Berlin. Baltes, H. P., ed. (1978). Inverse Source Problems in Optics. Topics in Current Physics, Vol. 9. Springer, Berlin. Baltes, H. P., ed. (1980).Inverse Scattering Problems in Optics. Topics in Current Physics, Vol. 20. Springer, Berlin. Barcilon, V. (1986). “Inverse eigenvalue problems.” In Inverse Problems (G. Talenti, ed.). Lect. Notes in Math., Vol 1225, Springer, Berlin. Bertero, M. (1982). Problemi Lineari non ben posti e Metodi di Regolarizzazione. (Non-well-posed Problems and Methods of Regularization), Pubbl. Istituto di Analisi Globale e Applicazioni, No. 4, Firenze. Bertero, M. (1986). “Regularization methods for linear inverse problems.” In Inverse Problems (G. Talenti, ed.). Lect. Notes in Math., Vol. 1225. Springer, Berlin. Bertero, M. and De Mol, C. (1981a). “Ill-posedness, regularization and number of degrees of freedom,” Atti Fond. G. Ronchi 36,619. Bertero, M. and De Mol, C. (1981b). “Stability problems in inverse diffraction,” I E E E Trans. Antennas Propagat. AP-29,368. Bertero, M. and Griinbaum, A. F. (1985).“Commuting differential operators for the finite Laplace transform,” Inverse Problems I, 181. Bertero, M. and Pike, E. R. (1982). “Resolution in diffraction-limited imaging, a singular value analysis-I. The case of coherent illumination, “Optica Acta 29,727. Bertero, M. and Pike, E. R. (1986). “Intensity fluctuation distributions from photon counting distributions: a singular-system analysis of Poisson transform inversion,” Inverse Problems 2, 259. Bertero, M and Viano, G. A. (1978). “On probabilistic methods for the solution of improperly posed problems,” Bollettino U.M.I. 15-B,483. Bertero, M., De Mol, C. and Viano, G. A. (1980a). “The stability of inverse problems,” In Inverse Scattering Problems in Optics (H. P. Baltes, ed.). Topics in Current Physics, Vol. 20, pg. 161. Springer, Berlin.

LINEAR INVERSE AND ILL-POSED PROBLEMS

115

Bertero, M., Viano, G. A. and De Mol, C. (1980b). “Resolution beyond the diffraction limit for regularized object restoration,” Optica Acta 27, 307. Bertero, M., Boccacci, P. and Pike, E. R. (1982).“On the recovery and resolution of exponential relaxation rates from experimental data: a singular value analysis of the Laplace transform inversion in the precence of noise,” Proc. R. Soc. Lond. A383, 15. Bertero, M., De Mol, C., Pike, E. R. and Walker, J. G . (1984). “Resolution in diffraction limited imaging: IV-The case of uncertain localization or non-uniform illumination of the object,” Opt. Acta 31,923. Bertero, M., De Mol, C. and Pike, E. R. (1985a). “Linear inverse problems with discrete data. I: General formulation and singular system analysis,” Inverse Problems 1, 301. Bertero, M., Brianzi, P. and Pike, E. R. (1985b).“On the recovery and resolution of exponential relaxation rates from experimental data. 111. The effect of sampling and truncation of data o n the Laplace transform inversion,” Proc. R. SOC.Lond. A398,23. Bertero, M., Brianzi, P. and Pike, E. R. (1985~).“On the recovery and resolution of exponential relaxation rates from experimental data. 111. The effect of sampling and truncation of data on the Laplace transform inversion,” Proc. R. Soc. Lond. A398,23. Bertero, M., Griinbaum, A. F. and Rebolia, L. (1986a). ’“Spectral properties of a differential operator related to the inversion of the finite Laplace transform,” Inverse Problems 2, 131. Bertero, M., Poggio, T. and Torre, V. (1986b).“Ill-posed problems in early vision,” M.I.T. Memo 924. Bertero, M., Brianzi, P., Defrise, M. and De Mol, C. (1986~). “Iterative inversion of experimental data in weighted spaces,” In Proc. U.R.S.I. Int. Symp. Electr. Theory, Budapest, August 25-29, Part A, 315. Akademiai Kiado, Budapest. Bertero, M., Brianzi, P. and Pike, E. R. (1987). “Super-resolution in confocal scanning microscopy,” Inoerse Problems 3, 195. Bertero, M., Brianzi, P., Pike, E. R. and Rebolia, L. (1988a). “Linear regularizing algorithms for positive solutions of linear inverse problems,” Proc. R . Soc. Lond. A415,257. Bertero, M., De Mol, C. and Pike, E. R. (1988b). “Linear inverse problems with discrete data. 11: Stability and regularization” Inverse Problems 4, 573. Bialy, H. (1959). “Iterative behandlung hearer funktionalgleichungen,” Arch. Rat. Mech. Anal. 4, 166. Bleinstein, N. and Cohen, J. (1977). “Nonuniqueness in the inverse source problem in acoustics and electromagnetics,” J . Math Phys. 18, 194. Boerner, W. M., Brand, H., Cram, L. A,, Gjessing, D. T., Jordan, A. K., Keydel, W., Schwierz, G. and Vogel, M., eds. (1983) Inverse Methods in Electromagnetic Imaging. Part I and Part 11. Reidel, Dordrecht. Bojarski, N. N. (1966).A Survey of Electromagnetic Inverse Scattering. Syracuse Univ. Res. Corp., Special Projects Lab. Rpt., DDC‘ AD-813-851. Cabayan, H. S., Murphy, R. C. and Pavllsek, T. J. F. (1973). “Numerical stability and near-field reconstruction,” IEEE Trans. Antennas Propagat. AP-21,346. Campbell, L. L. (1968). “Sampling theorem for the Fourier transform of a distribution with bounded support,” SIAM J . Appl. Math. 16,626. Cannon, J. R. and Hornung, U., eds. (1986). Inverse Problems. ISNM 77. BirkhHuser, Basel. Carasso, A. and Stone, A. P., eds. (1975). Improperly-Posed Boundary Value Problems. Pitman, London. Carter, W. H. (1970). “Computational reconstruction of scattering objects from holograms,” J . Opt. Soc. Am. 60,306. Carter, W. H. and Ho, P. C. (1974). “Reconstruction of inhomogeneous scattering objects from holograms,” Appl. Opt. 13, 162. Chadan, K. and Sabatier, P. C. (1977).Inverse Problems in Quantum Scattering Theory. Springer, Berlin.

116

M. BERTERO

Chernov, L. A. (1967). Wave Propagation in a Random Medium. Dover, New York. Colin, L., ed. (1972). Mathematics of Profile Inversion. NASA Technical Memorandum, NASA TMX-62-150. Colli Franzone, P., Taccardi, B. and Viganotti, C. (1977). “An approach to inverse calculation of epicardial potentials from body surface maps,” Ado. Cardiol. 21, 167. Colton, D. (1984).“The inverse scattering problem for time-harmonic acoustic waves,” SIAM Rev. 26 (3), 323. Courant R. and Hilbert, D. (1962). Methods of Mathematical Physics, Vol. 11. Interscience. New York. Craig, 1. J. D. and Brown, J. C. (1986). Inverse Problems in Astronomy. Adam Hilger, Bristol. Craven, P. and Wahba, G. (1979). “Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation,” Numer. Math. 31, 377. Cummins, H. 2. and Pike, E. R., eds. (1974).Photon Correlation and Light Beating Spectroscopy. Plenum Press, New York. Davison, M. E. (1981).“A singular value decomposition for the Radon transform in n-dimensional Euclidean space,” Numer. Funct. Anal. Optimiz. 3, 321. Defrise, M. and De Mol, C. (1987).“A note on stopping rules for iterative regularization methods and filtered SVD,” In Inverse Problems: An Interdisciplinary Study (P. C. Sabatier, ed.). Advances in Electronic and Electron. Physics, Supplement 19, pg. 261. Academic Press, New York. De Santis, P. and Gori, F. (1975). “On an iterative method for superresolution,” Opticu Acta 22, 691. Devaney, A. J. (1978).“Nonuniqueness in the inverse scattering problem,” 1.Math. Phys. 19,1526. Devaney, A. J. (1981). “Inverse-scattering theory within the Rytov approximation,” Opt. Lett. 6, 374. Devaney, A. J. (1982). “A filtered back propagation algorithm for diffraction tomography,” Ultrasonic Imaging 4, 336. Devaney, A. J. (1984). “Geophysical diffraction tomography,” IEEE Trans. Geoscience Remote Sensing GE22, 3. Devaney, A. J. and Wolf, E. (1973). “Radiating and nonradiating classical current distributions and the fields they generate,” Phys. Rev. D8, 1044. Devaney, A. J. and Wolf, E. (1974).“Multipole expansions and plane wave representations of the electromagnetic field,” J . Math. Phys. 15, 234. Di Chiro, G. and Brooks, R. A. (1979). “The 1979 Nobel prize in Physiology or Medicine,” Science 206, 1060. Duchon, J. (1 976). “Interpolation des fonctions de deux variables suivant le principe de la flexion des plaques minces,” Analyse Numerique 10, 5. Fercher, A. F., Bartelt, H., Becker, H. and Wiltschko, E. (1979).“Image formation by inversion of scattered field data: experiments and computational simulation,” Appl. Opt. 18,2427. Fox, D. and Pucci, C. (1958).“The Dirichlet problem for the wave equation,” Ann. Mat. Pura Appl. 46,155. Franklin, J. N. (1970). “Well-posed stochastic extensions of ill-posed linear problems,” J. Math. Anal. Appl. 31,682. Franklin, J. N. (1974). “On Tikhonov’s method for ill-posed problems,” Math. Comp. 28, 889. Gerchberg, R. W. (1974). “Super-resolution through error energy reduction,” Optica Acta 21, 709. Golub, G . H., Heath, M. and Wahba, G. (1979). “Generalized cross-validation as a method for choosing a good ridge parameter,” Technometrics 21,215. Gori, F. and Guattari, G. (1985). “Signal restoration for linear systems with weighted impulse. Singular value analysis for two cases of low-pass filtering,” Inverse Problems 1,67.

LINEAR INVERSE AND ILL-POSED PROBLEMS

117

Gregory, R. T. and Karney, D. L. (1969) A Collection of Matricesfor Testing Computational Algorithms. Wiley-Interscience, New York. Greville, T. N. E. ed. (1969). Theory and Applications of Spline Functions. Academic Press, New York. Crimson, W. E. L. (1982).“A computational theory of visual surface interpolation,” Phil. Trans. R. Soc. Lond. B298,395. Groetsch, C. W. (1977). Generalized Inuerses of Linear Operators. Dekker, New York. Groetsch, C. W. (1980). “On a class of regularization methods,” Boll. Un. Mat. Ital. 17-B, 1411. Groetsch, C. W. (1984). The theory of Tikhonou Regularizationfor Fredholm Equations of the First Kind. Research Notes in Math., Vol. 105. Pitman, Boston. Groetsch, C. W. (1986). “Regularization with linear equality constraints,” In Inverse Problems, (G. Talenti, ed.) Lect. Notes in Math., Vol. 1225. Springer, Berlin. Griinbaum, F. A. (1982). “A remark on Hilbert’s matrix,” Linear Algebra Appl. 43, 119. Griinbaum, F. A. (1986). “Some mathematical problems motivated by medical imaging,” In lnuerse Problems, (G. Talenti, ed.). Lect. Notes in Math., Vol. 1225. Springer, Berlin. Haario, H. and Somersalo, E. (1985). “On the numerical implementation of the Backus-Gilbert method.” Cahiers Math. de Montpellier 32, 107. Hadamard, J. (1902). “Sur les problemes aux derivees partielles et leur signification physique,” Bull. Univ. Princeton 13,49. Hadamard, J. (1923).Lectures on Cauchy’s Problem in Linear Partial Differential Equations. Yale Univ. Press, New Haven. Hadamard, J. (1964). L a Theorie des Equations aux Dbriudes Partielles. (Theory of Partial Differential Equations). Editions Scientifiques, Peking. Herman, G. T., ed. (1979). Image Reconstruction from Projections-Implementation and Applications, Topics in Applied Phys., Vol. 32. Springer, Berlin. Herman, G. T. (1980). Image Reconstruction from Projections- The Fundamentals of Computerized Tomography. Academic Press, New York. Herman, G . T. and Natterer, F., eds. (1981). Mathematical Aspects of Computerized Tomography. Lect. Notes in Med. Inf., Vol. 8. Springer, Berlin. Hildreth, E. C. (1984). “Computation of the velocity field,” Proc. R. Soc. Lond. B221,189. Hoenders, B. J. (1978). “The uniqueness of inverse problems,” In Inverse Source Problems in Optics (H. P. Baltes, ed.). Topics in Current Physics, Vol. 9. Springer, Berlin. Imbriale, W. A. and Mittra, R. (1970). “The two-dimensional inverse scattering problem,” I E E E Trans. Antennas Propagat. AP-l8,633. Ivanov, V. K. (1962). “On linear problems which are not well-posed,’’ Souiet Math. Dokl. 3, 981. Ivanov, V. K. (1966). “The approximate solution of operator equations of the first kind,” U.S.S.R. Comp. Math. Math. Phys. 6 (No. 6), 197. John, F. (1955).“Numerical solution of the equation of heat conduction for preceding times,” Ann. Mat. Pura Appl. 40, 129. John, F. (1960). “Continuous dependence on data for solutions of partial differential equations with a prescribed bound,” Comm. Pure Appl. Math. 13, 551. Kac, M. (1966). “Can one hear the shape of a drum?,” Am. Math. Monthly 73, (4). Part 11, 1 . Kammerer, W. J. and Nashed, M. 2. (1971). “Steepest descent for singular linear operators with nonclosed range,” Appl. Anal. I, 143. Kammerer, W. J. and Nashed, M. Z. (1972). “On the convergence of the conjugate gradient method for singular linear operator equations,” SIAM J . Numer. Anal. 9, 165. Kato, T. (1966). Perturbation Theoryfor Linear Operators. Springer, Berlin. Keller, J. B. (1976). “Inverse problems,” Am. Math. Monthly 83, 107. Kunt, M. (1986). Digital Signal Processing. Artech House, Norwood. Lanczos, C. (1961).Linear Differential Operators. Van Nostrand, London.

118

M. BERTERO

Landau, H. J. and Pollak, H. 0.(1961).“Prolate spheroidal wave functions, Fourier analysis and uncertainty. 11,” Bell System Tech. J . 40,65. Landau, H. J. and Pollak, H. 0.(1962). “Prolate spheroidal wave functions, Fourier analysis and uncertainty. 111: The dimension of the space of essentially time- and band-limited signals,” Bell System Tech. J . 41, 1295. Landweber, L. (1951). “An iteration formula for Fredholm integral equations of the first kind,” Amer. J . Math. 73,615. Lavrentiev, M. M. (1967). Some Improperly Posed Problems of Mathematical Physics. Springer, Berlin. Lewis, R. M. (1969).“Physical optics inverse diffraction,” IEEE Trans. Antennas Propagat. AP-17, 308. Lord Rayleigh, J. W. S. (1877). The Theory of Sound. Dover, New York 1945. Louis, A. K. (1984). “Orthogonal function series expansions and the null space of the Radon transform,” S I A M J . Math. Anal. 15,621. Louis, A. K. (1986). “Incomplete data problems in X-ray computerized tomography. I. Singular value decomposition of the limited angle transform,” Numer. Math. 48,251. Louis, A. K. and Natterer, F. (1983). “Mathematical problems of computerized tomography,” Proc. IEEE. 71,379. Luneburg, R. K. (1964). Mathematical Theory of Optics. Univ. of California Press, Berkeley. Mager, R. D. and Bleinstein, N. (1978). “An examination of the limited aperture problem of physical optics inverse scattering,” IEEE Trans. Antennas Propagat. AP-26,695. Magnanini, R. and Papi, G. (1985). “An inverse problem for the Helmholtz equation,” Inverse Problems 1, 357. Maitre, H. (1981). “Iterative superresolution. Some new fast methods,” Optica Acta 28,973. Mandelbrot, S. and Schwartz, L. (1965).“Jacques Hadamard (1865-1963),”Bull. Amer. Math. SOC. 71, 107. Marchuk, G . 1. (1975). Methods of Numerical Mathematics. Springer, Berlin. McWhirter, J. G .and Pike, E. R. (1978). “On the numerical inversion of the Laplace transform and similar Fredholm integral equations of the first kind,” J. Phys. A l l , 1729. Miller, K. (1964). “Three circle theorems in partial differential equations and applications to improperly posed problems,” Arch. Rat. Mech. Anal. 16, 126. Miller, K. (1970).“Least squares methods for ill-posed problems with a prescribed bound,”SIAM J . Math. Anal. 1, 52. Morozov, V. A. (1966).“On the solution of functional equations by the method of regularization,” Souiet Math. Dokl. 7,414. Morozov, V. A. (1968). “The error principle in the solution of operational equations by the regularization method,” U S S R Comp. Math. Math. Phys. 8,63. Morozov, V. A. (1984). Methods for Solving Incorrectly Posed Problems. Springer, Berlin. Mueller, R. K., Kaveh, M. and Wade, G. (1979).“Reconstructive tomography and applications to ultrasonic,’’ Proc. IEEE 67,567. Nashed, M. 2. ed. (1976a). Generalized Inverses and Applications. Academic Press, New York. Nashed, M. Z . (1976b).“On moment-discretization and least-squares solutions of linear integral equations of the first kind,” J. Math. Anal. Appl. 53,359. Nashed, M. Z . and Wahba, G. (1974). “Convergence rates of approximate least squares solutions of linear integral and operator equations of the first kind,” Math. Comp. 28,69. Natterer, F. (1980).“A Sobolev space analysis of picture reconstruction,” SIAM J . Appl. Math. 39, 402. Natterer, F. (1986a). The Mathematics of Computerized Tomography. Teubner, Stuttgart. Natterer, F. (1986b). “Efficient evaluation of oversampled functions,” J. Comp. Appl. Math. 14, 303. Natterer, F. (1986~).“Numerical treatment of ill-posed problems,” In Inverse Problems (G. Talenti, ed.). Lect. Notes in Math., Vol. 1225. Springer, Berlin.

LINEAR INVERSE AND ILL-POSED PROBLEMS

119

Oldenburg, D. W. (1976).“Calculation of Fourier transforms by the Backus-Gilbert method,” Geophys. J . R . Astron. SOC.44,413. Ostrowsky, N., Sornette, D., Parker, P. and Pike, E. R. (1981).“Exponential sampling method for light scattering polydispersity analysis,” Optica Acta 28, 1059. Papoulis, A. (1956).“A new method of inversion of the Laplace transform,” Q. Appl. Math. 14,405. Papoulis, A. (1975).“A new algorithm in spectral analysis and band-limited extrapolation,” IEEE Trans. Circuits Syst. CAS-22,735. Payne, L. E. (1975). Improperly Posed Problems in Partial Diferential Equations. SIAM Regional Conf. Series in Appl. Math. SIAM, Philadelphia. Phillips, D. L. (1962).“A technique for the numerical solution of certain integral equations of the first kind,” J . Assoc. Comput. Mach. 9, 84. Picard, E. (1910).“Sur un theoreme general relatif aux equations integrales de premiere espece et sur quelques problemes de physique mathematique,” R . C. Mat. Palermo. 29,615. Pike, E. R., McWhirter, J. G., Bertero, M. and De Mol, C. (1984).“Generalized information theory for inverse problems in signal processing,” I E E Proc. 131,660. Pucci, C . (1955). “Sui problemi di Cauchy non ben posti,” (“On a non-well-posed problem of Cauchy”), Atti Acc. Naz. Lincei 18,473. Radon, J. (1917). “Uber die bestimmung von funktionen durch ihre integralwerte langs gervisser mannigfaltigkeiten,” Berichte Sachsische Akademie der Wissenschajien, Leipzig, Math.-Phys. Kl. 69, 262. Ralston, A. (1965). A First Course in Numerical Analysis. Mc-Craw Hill, New York. Reinsch, C. H. (1967). “Smoothing by spline functions,” Numer. Math. 10, 177. Riesz, F. and Sz. Nagy, B. (1972). Lecons d‘dnalyse Fonctionnelle. (“Lectures on Functional Analysis”).Gauthier-Villars, Paris. Rust, B. W. and Burrus, W. R. (1972). Mathematical Programming and the Numerical Solution of Linear Equations. American Elsevier, New York. Sabatier, P. C., ed. (1978). Applied Inverse Problems. Lecture Notes in Phys., Vol. 85. Springer, Berlin. Sabatier, P. C. (1984).“Well-posed questions and exploration of the space of parameters in linear and non-linear inversion,” In Proc. Con[. Inverse Problems Acoustical and Elastic Waves. (F. Santora, Y.H. Pao and W. W. Symes, eds.). SIAM, Philadelphia. Sabatier, P. C. ed. (1987a). Inverse Problems: An Interdisciplinary Study. Advances in Electronics and Electron Physics, Supplement 19. Academic Press, New York. Sabatier, P. C. ed. (l987b). Tomography and Inverse Problems. Adam Hilger, Bristol. Saleh, B. (1978).Photoelectron Statistics. Optical Sciences. Vol. 6. Springer, Berlin. Sanz, J. L. C. and Huang, T. S. (1983).“Unified Hilbert space approach to iterative least-squares linear signal restoration,” J. Opt. SOC.Am. 73, 1455. Schafer, R. W., Mersereau, R. M. and Richards, M. A. (1981). “Constrained iterative restoration algorithms,” Proc. I E E E 69, 432. Schomburg, B. and Berendt, G. (1987). “On the convergence of the Backus-Gilbert algorithm,” Inverse Problems 3, 341. Shewell,J. R. and Wolf, E. (1968).“Inverse diffraction and a new reciprocity theorem,”J. Opt. Soc. Am. 58, 1596. Shifrin, K. S. and Perelman, A. Y . (1965).“Inversion of light scattering data for the determination of spherical particle spectrum,” In Proc. Second Interdisciplinary Conf. Electromagnetic Scattering. Massachusetts, June 1965. (R. L. Rowell and R. S. Stein, eds.). Gordon and Breach, New York. Slepian, D. (1964). “Prolate spheroidal wave functions, Fourier analysis and uncertainty-IV: Extensions to many dimensions; generalized prolate spheroidal functions.” Bell Syst. Tech. J . 43, 3009. Slepian, D. (1978).“Prolate spheroidal wave functions, Fourier analysis and uncertainty-V: the discrete case,” Bell Syst. Tech. J . 57, 1371.

120

M. BERTERO

Slepian, D. and Pollak, H. 0.(1961). “Prolate spheroidal wave functions, Fourier analysis and uncertainty-I,” Bell System Tech. J . 40,43. Sondhi, M. M. (1969). “Reconstruction of objects from their sound-diffraction patterns,” J . Accoust. SOC.Am. 46, 1158. Strand, 0. N. (1974). “Theory and methods related to the singular-function expansion and Landweber’s iteration for integral equations of the first kind,” SIAM J. Numer. Anal. 11,798. Strand, 0.N. and Westwater, E. R.(1968).“Minimum-RMS estimation of the numerical solution of a Fredholm integral equation of the first kind,” SIAM J . Numer. Anal. 5,287. Talenti, G.,ed. (1986).Inverse Problems. Lect. Notes in Math., Vol. 1225. Springer, Berlin. Talenti, G., (1987).“Recovering a function from a finite number of moments,” Inverse Problems 3, 501.

Tikhonov, A. N. (1963a). “Solution of incorrectly formulated problems and the regularization method,” Sooiet Math. Doki. 4, 1035. Tikhonov, A. N. (1963b). “Regularization of incorrectly posed problems,” Sooiet Math. Dokl. 4, 1624. Tikhonov, A. N. (1964).“Solution of nonlinear integral equations of the first kind,” Soviet Math. Dokl. 5, 835. Tikhonov, A. N. and Arsenin, V. Y. (1977). Solutions of ill-Posed Problems. Winston/Wiley, Washington. Titchmarsh, E. C. (1948).lntroduction to the Theory of Fourier Integrals. Clarendon Press, Oxford. Titchmarsh, E. C. (1958). Eigenfunction Expansions Associated with Second-Order Differential Equations. Vol. 11. Clarendon, Oxford. Toraldo di Francia, G. (1969a). “Degrees of freedom of an image,” J. Opt. SOC.Am. 59,799. Toraldo di Francia, G. (1969b). “Some recent progress in classical optics,” Rio. Nuovo Cimento 1, 460. Torre, V. and Poggio, T. (1986). “On edge detection,” IEEE Trans. Pattern Anal. Mach. Intelligence PAMI-8, 147. Turchin, V. F., Kozlov, V. P. and Malkevich, M. S. (1971). “The use of mathematical statistics methods in the solution of incorrectly posed problems,” Sooiet Phys. Usp. 13,681. Twomey, S. (1965). “The application of numerical filtering to the solution of integral equations encountered in indirect sensing measurements,” J . Franklin lnst. 279,95. Twomey, S . (1974). “Information content in remote sensing,” Appl. Opt. 13,942. Vainikko, G . M. (1982).“The discrepancy principle for a class of regularization methods,” USSR Comp. Math. Math. Phys. 22 (No. 3), 1. Van de Hulst, H. C. (1981). Light Scattering by Small Particles. Dover, New York. Viano, G. A. (1976). “On the extrapolation of optical image data,” J . Math. Phys. 17, 1160. Wahba, G. (1977). “Practical approximate solutions to linear operator equations when the data are noisy,” SIAM J. Numer. Anal. 14, 651. Wahba, G. and Wendelberger, J. (1980). “Some new mathematical methods for variational objective analysis using splines and cross-validation,” Monthly Weather Review 108, 1122. Wahba, G. and Wold, S. (1975a). “A completely automatic French curve: Fitting spline functions by cross-validation,’’ Comm. Stat. 4, 1. Wahba, G. and Wold, S. (1975b). “Periodic splines for spectral density estimztion: The use of cross-validation for determining the degree of smoothing,” Comm. Stat. 4, 125. Widder, D. V. (1946). The Laplace Transform. Princeton Univ. Press, Princeton. Wolf, E. (1969). “Three-dimensional structure determination of semi-transparent objects from holographic data,” Opt. Commun. 1, 153. Yosida, K. (1966). Functional Analysis. Springer, Berlin. Zuev, V. E. and Naats, I. E. (1983). lnoerse Problems of Lidar Sensing of the Atmosphere. Optical Sciences, Vol. 129. Springer, Berlin.

.

ADVANCES IN ELECTRONICS A N D ELECTRON PHYSICS VOL. 75

Recent Developments in Energy-Loss Spectroscopy* JORG FINK Kernforschungszeutrum Karlsruhe Institut f u r Nukleare Festkorperphysik Karlsruhe Federal Republic of Germany

I. Introduction . . . . . . . . I1. Principal Features . . . . . . A . Fundamentals of Electron Scattering . . . . . . . . B. Models of Dielectric Functions . . . . . . . . . . C. Interface Effects. . . . . . . . . . . . . . . . . D. Core-Level Excitations . . . . . . . . . . . . . . E. Data Evaluation . . . . . . . . . . . . . . . . I11. Instrumentation . . . . . . . . . . . . . . . . . A . Principle of Operation . . . . . . . . . . . . . . B. Electron Source, Monochromator. Analyzer. and Detector . C. Zoom Lenses, Accelerator. and Decelerator . . . . . . D . Scattering, Characterization and Preparation Chamber . . E. Vacuum System and Electronics . . . . . . . . . . F . Spectrometer Performance . . . . . . . . . . . . IV. Sample Preparation . . . . . . . . . . . . . . . . A . Thin Film Deposition . . . . . . . . . . . . . . B. Preparation from Macroscopic Solids . . . . . . . . C . Ion Implantation and Doping . . . . . . . . . . . V . Nearly-Free-Electron Metals . . . . . . . . . . . . VI . Rare Gas Bubbles in Metals . . . . . . . . . . . . A. Pressure and Density in Bubbles . . . . . . . . . . B. Surface Plasmons on Bubbles . . . . . . . . . . . VII . Amorphous Carbon . . . . . . . . . . . . . . . . VIII . Conducting Polymers . . . . . . . . . . . . . . . . IX. Superconductors . . . . . . . . . . . . . . . . . A . Transition Metal Carbides and Nitrides . . . . . . . B. A-IS Compounds . . . . . . . . . . . . . . . . C . Ceramic Superconductors . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

122 124 124 129 142 144 147 148 148 . 150 . 152 . 153 . 153

. 155 157 157 . 158 . 159 . 160 . 167 . 167 . 177 181 187

215 . . . . 216 . . . 220 . . . . 221 . . . 226 . . . 226

* This article has been accepted as Habilitation Thesis by the Fakultat fur Physik. Universitat Karlsruhe 121 Copyright 0 1989 by Academlc Press Inc All nghts of reproduction in any form reserved

ISBN 0-1 2-014675-4

122

JORG FINK

I. INTRODUCTION Scattering experiments with fast particles have become an important tool in various fields of physics. Compared to absorption experiments, they make it possible to investigate not only the frequency but also the wave length of possible excitations of matter, by means of angular resolved measurements. In solid state physics, most of our experimental knowledge about the structure of solids has been obtained from elastic scattering experiments with X-rays, electrons, and neutrons. Inelastic scattering experiments with neutrons have provided considerable information on the dynamic properties of solids in the low-energy range ( E < 100 mev), i.e., on phonons and magnons. Inelastic scattering of light (Raman scattering) has been a powerful means of probing the vibrational motions of atoms and molecules in bulk material. Recently, methods of scattering thermal neutral atoms (mainly He) on the surface of solids have been developed to gather information on surface phonons (Toennies, 1984). The advent of powerful synchrotron radiation sources has led to the first pioneering inelastic X-ray experiments for the study of phonons (Burke1et al., 1987) and collective excitations of electrons (Schulke et al., 1984, 1986). Inelastic scattering of electrons has provided important information on the vibrational modes of adsorbates (Ibach and Mills, 1982) and on surface phonons of metals (Rocca et al., 1986) in the low energy range ( E < 100 mev). At higher energies, inelastic electron scattering or electron energy-loss spectroscopy (EELS) in transmission has been the classical method for investigating the collective excitations of electrons, i.e., plasmons. The method has been further improved during the years to provide information on interband transitions between valence bands and conduction bands and on core-level excitations in order to learn about the unoccupied density-of-states (near-edge fine structure) and on the lattice structure (extended-edge fine structure). EELS at medium energy ( E < 20 eV) is in competition with optical spectroscopy and at higher energies (20 < E < 2500 eV) with X-ray absorption spectroscopy using synchrotron radiation. In general, the resolution offered by optical spectroscopy is superior to that encountered in EELS. However, this does not hold for soft X-rays in the energy range of 250 to 800 eV where it is still not an easy matter to build high-resolution monochromators. Moreover, as mentioned above, the scattering methods have the great advantage that an additional degree of freedom-momentum transfer-can be varied, which is not possible in optical spectroscopy. Furthermore, the ease with which one can cover the excitation spectrum from infrared to soft X-ray (2500 eV) is still a tremendous advantage even with powerful synchrotron radiation sources. Inelastic X-ray scattering is developing to a tool that

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

123

extends EELS to higher momentum transfer. However, at lower momentum transfer the cross-section for inelastic X-ray scattering becomes very small; as yet, EELS is without competition in this region. In comparing EELS with other methods, we should not forget that, experimentally, EELS suffers from a serious problem that is not found in neutron and X-ray scattering; that is, the interation of the electrons with the solid is far stronger than one would like so that only samples with a thickness of about 1000 A can be probed. This is, in some cases, an advantage, but in many cases it entails a considerable effort in sample preparation. This hurdle and the complicated instrumentation may be why EELS has not become as widespread a method as, for example, photoelectron spectroscopy. The Franck-Hertz phenomenon in gases, which has been the source of information on electron energy levels in gaseous atoms and molecules, was first applied to solids by Rudberg (1929). In this doctoral thesis, he reported measurements of the kinetic energy of electrons that had been reflected from the surface of metals (Cu and Ag). The first electron energy-loss measurements in transmission with primary energies from 2 to 8 keV were performed by Ruthemann (1941) and showed discrete excitations in Be and A1 that were later explained by free-electron plasmons of the metals (Pines and Bohm, 1952). The same author reported (1942) for the first time the excitation of core-level electrons in collodium by EELS. A first review on these early-day EELS experiments was given by Marton et al. (1955). The first systematic applications of EELS to solid-state physics (mainly collective excitations and interband transitions) were made by Raether’s group at Hamburg (Raether, 1965; Daniels et al., 1970; Raether, 1977; Raether, 1980). This work was continued and extended to core-level spectroscopy by Schnatterly’s group at Princeton (Schnatterly, 1979) and other groups. Alongside these investigations, electron microscopists became interested in EELS as an extremely powerful tool for microanalysis using core-level excitations. Excellent reviews of the field were given by Colliex (1984), Colliex an Mory (1984), and Egerton (1986). With time the energy resolution of these spectrometers using a transmission electron microscope (TEM) together with an energy analyser below the microscope column was improved and interesting near-edge structure investigations were performed (Colliex et aZ., 1985; Leapman et al., 1982;Grunes et al., 1982; Lindner et al., 1986). However, most of these modified TEM had excellent spatial resolution but very poor momentum transfer resolution. Therefore, TEM were used in only a few investigations on valence band excitations and, in particular, momentum transfer depending measurements. This work reviews EELS studies by my colleagues and myself during the last five years using a dedicated electron energy-loss spectrometer. Following a theoretical introduction; and a description of the Karlsruhe spectrometer, it describes work on nearly-free-electron metals, rare gas bubbles in metals,

124

JORG FINK

amorphous carbon, conducting polymers, and superconductors. A short review on these activities has been given previously (Fink, 1985a). This contribution should not be regarded as a systematic discussion of the method of EELS. Rather, it is an illustration of the present state of the technique applied to solid-state physics and to material science. 11. PRINCIPAL FEATURES

A . Fundamentals of Electron Scattering

The geometry of the electron scattering experiment is shown in Fig. 1. An electron specified by a momentum hk, is scattered into a state with momentum hk, . The energy loss and momentum transfer (momentum transfer a function of the energy loss and scattering angle 6 ) are given by hw

=

E , - El = hz(kz - kT)/2rn

and

hq = hko

- hk,.

The basic quantity that is measured is the partial differential cross-section d20/di2dE.It is the fraction of electrons of incident energy Eo scattered at an angle 8 into an element of solid angle dQ with energy between El and El dE. The cross-section has the dimension (area/energy). In the transmission electron energy loss experiments, the energy and momentum of the incident electrons are very large compared to the energy and momentum of the electrons in the solids. In the Karlsruhe spectrometer, the primary energy is at present

+

SCATTERED ELECTRON

FIG. 1. Definition of energy loss and momentum transfer in a transmission inelastic electronscattering experiment.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

125

170 KeV, which corresponds to a wave number k, of 228.4 k', or a wavelength 1 of 0.0275 A. In solid-state physics, interesting wave vectors are around 1 A-' and, therefore, the scattering angles are extremely small. For the values given above, the scattering angle 0 is about 4 mrad or 0.25".The wave vector q can be decomposed into two vectors, one parallel to and one perpendicular to, the incoming electron beam. For 141 << Ikol, one has qL z k, sin 0 N k,d

(14

411 = ko(hm/2Eo)

(Ib)

q 2 = 4:

+ q;.

(W

For valence band excitations and finite scattering angles, hm/2E, is small compared to 8 and q , l can then be neglected; therefore 141 N q1 N k,O. Quantum mechanically, the scattering process can be described by a transition from an initial state a, (incident electron plus electrons in the solid in the ground state) to a final state Q1 (outgoing electron plus excited electrons in the solid). The transition is caused by Coulomb interaction between the scattered electron and charges in the solid which, in the nonrelativistic case, is given by

Due to their high kinetic energy, the incident electrons are distinguishable from electrons in the solid and exchange effects can be neglected. In this case, we can separate the wave functions into a plane wave for the incoming and outgoing electron eiko-lr and the eigenfunctions of the unperturbed solid $O,J(rl

' ' ' IN),

where V is a normalization volume. Then the differential cross-section can be written in the Born approximation (Van Hove, 1954)

where E,, are the energies corresponding to the eigenfunctions I ),,,of the unperturbed system. The factor l/N has been included because the crosssection is defined per electron. Following Platzmann and Wolf (1973), r integration over the Coulomb potential and the plane waves can be performed leading to the Fourier transform 4 n e 2 / q 2of the Coulomb potential. Then the operator in the matrix element is the Fourier transform of the electron density

126

JORG FINK

operator nq = Cexp(iqr,). n

To eliminate the 6 function, a time-dependent operator nq(t) = exp(iHt)n,(O) exp( - i H t )

is introduced, where H is the Hamiltonian of the system. Assuming lkol N lkll leads to a new equation for the differential cross-section:

where (do/dR),,,, = 4/(a:q4) is the elastic Rutherford scattering cross-section with a, being the Bohr radius. The dynamic structure factor S ( q , o ) is defined by

This is the Fourier transform of the density-density correlation function; brackets indicate the quantum mechanical and statistical average. This important result was first derived by Van Hove (1954). It shows that the differential cross-section can be factored into an amplitude term and a dynamic structure term. The former describes the interaction of the particle with the scatterer, which in this case is just the elastic Rutherford scattering of two charged particles. The latter term depends only on the dynamic structure of the solid, and, therefore, very general information on the many-body system can be obtained by a scattering experiment. In particular, S(q, o) gives information on the density fluctuations of the electron system. It should be mentioned that elastic electron diffraction that leaves the quantum state of the solid unchanged gives information on the static density of all charges in the solid. In this case, S(q) is ( Z - F(q))2,where F ( q ) is the form factor of the electron charge, i.e., the form factor for elastic X-ray scattering, and Z is the form factor of the nuclear charge. Equations (2) and (3) describe the real transitions in the many-particle system produced by an external probe. Such transitions correspond to irreversible dissipative processes. These processes are expressed in Eqs. (2) and (3) by fluctuations and this relation is therefore an example of the fluctuationdissipation theorem. In addition to real processes, a probe will produce virtual transitions that represent reversible deformation of the system corresponding to a polarization. The total response of the system to a weak external perturbation is described in the linear response theory. Let us describe the Hamiltonian of the system by H = H , + HI, where H , is the Hamiltonian

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

127

of the isolated system and H , is the coupling to the perturbation. According to Kubo (1957), the induced value (to first order in H , ) of a dynamic variable 0 is given by

( o ( w ) ) = (l/i)J:

dtei("+ib)'(CW,H,(w)l>,

(4)

where O ( t ) = exp(iH,t)O exp( - iH,t), H , (w)is the Fourier transform of H,(t), and 6 is a very small positive real number. In connection with electronscattering experiments, it is interesting to look at a particular response function, namely the response of the charge density p(r,t) = en(r,t) to an external, scalar, electrostatic potential @)ex,, i.e., a longitudinal electrostatic perturbation. The perturbing Hamilton operator can be given by

and the density-density response function x is then defined by (Pind(r,w))

s

( 1 / 2 ~ ) dleio'(pind(r, t)> = jd3rt

x(r, rf,w)Qext(r',0).

Using Eq. (4) the response function x is then given by

For a homogeneous system, such as the jellium model for free electrons where the electron charge is compensated by a homogeneous positive background due to the ions, the response function depends only on r - r' and the Fourier transforms of Eqs. (6) and (7) are given by

There is a close similarity between x(q, w ) and the dynamic structure factor S(q, w ) given in Eq. (4). A detailed comparison shows that the structure factor can be expressed in terms of the imaginary part of the density response function:

128

JORG FINK

where exp( - hw/k, T ) - 1 = - I for electron excitations with hw >> kBT.Thus the calculation of the differential cross-section reduces to a calculation of the imaginary part of the density-density response function. It is interesting to connect this result to another response function the dielectric function. The dielectric function may serve as a simple unifying concept for the theories of the electron gas. In its most general form it is defined by E(r, t ) =

ss

d3r' dt'E^-'(r, r', t - t')D(r', t).

(1 1)

For a homogeneous system, the dielectric tensor depends only on r - r'. The same holds for the macroscopic viewpoint in which an average is taken over the system, therefore, it is a homogeneous system which is studied. The macroscopic dielectric function eM is measured by an external probe that transfers momentum corresponding to a wavelength that is large compared to the dimensions of the Brillouin zone. Then the Fourier transform of Eq. (1 1)is given by E(q, 4 = EM1(%w)D(q, w). Now, with D =

and E

=

(12)

-Val,,,we may write

'/EM(q, 0) = E(q, w)/D(q, w)

= @tot(q,w)/@ext(qy0).

(13)

Using @lol(q,0)= OeXl(q,0)+ @in&, w), Eq. (8), and the Fourier transform of the Poisson equation @)ind(q,m)= uqpind(q,co)(uq = 4 n / q 2 is the Fourier transform of the Coulomb potential), we obtain

By combining Eqs. (2), (6),and (ll), we obtain an important result for the relation between the differential cross-section and the macroscopic dielectric function :

where Im[ - l/a,(q, w)] is the macroscopic loss function. In many cases, the loss function is almost independent of q, in which case the differential crosssection for inelastic electron scattering decreases approximately as l/q2. According to Eq. (lc), q 2 = q i + 4:. For a given energy loss, q I 1can be calculated from Eq. (lb); q L (or angular) distribution of the inelastically scattered electrons is then a Lorentzian with full width of 2 q I I .For incident electrons of 170 keV kinetic energy, q , , = 0.0067 k' for a 10-eV loss and q I 1=

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

129

0.67 A-' for a 1000-eV energy loss. Therefore, valence band excitations show a rather narrow q,, (angular) distribution, while high-energy core excitations tend to have larger ones. First, from In the following, we summarize some useful relations for E(q, o). the fact that the response of the system is causal, one obtains the KramersKronig relations

For the calibration of the loss functions and also for a consistency check of the data, sum rules are useful, the most important of which are as follows: f m

Jo

do o Im( - 1/&(q,o))= (n/Z)w;

:j

d o w Im E(q, o)= (n/2)w;,

where the plasmon frequency is given by w,, = ( 4 ~ n e ~ / r n ) " ~

and n is the density of the valence electrons. Sum rules are frequently used to define an effective number of electrons, n e f f ,contributing to excitations up to a finite frequency range oc:

jT

d o o Im E(q, o)= ( n / 2 ) o ; ( n e f f / n ) .

B. Models of Dielectric Functions 1 . Drude-Lorentz Model

The Drude-Lorentz model is based on a phenomenological view. Nevertheless, it is still extremely helpful for a rough understanding of the dielectric functions and, in particular, the loss function for zero momentum transfer. It is assumed that excitations of the electrons can be expressed by a sum of oscillators satisfying the equation of motion

+

rni(d2r/dt2 yidr/dt

+ o ? r ) = qiE(t),

where m i , yi, mi, and qi are the masses, damping constants, eigenfrequencies,

130

JORG FINK

and charges, respectively, of the oscillators. By solving the differential equation the polarization qir and thus the dielectric functions can be derived to be 1

&(W) =

+ 471ci mi(w’

niq? - o2- iyiw)’

where n, are the oscillator strength of the oscillators. In order to illustrate typical dielectric properties of solids, we show in Fig. 2 the loss function, = Re&,and c2 = ImE as calculated by Eq. (22) for various combinations of oscillators, the parameters of which are listed in Table I. For simple sp metals, the charges qi = e, are almost free, i.e., the energy of the one oscillator that is needed is zero and the damping y is given by the scattering of the charges by phonons or impurities. This “Drude-model” leads to the dielectric functions shown in Fig. 2a. Close to the zero-crossing of cl, where e2 is small, the loss function Im( - I/&)= E ~ / ( E ; + E : ) has a strong maximum due to collective excitations of the electrons, i.e., the plasmon. The plasmon energy is given by Eq. (20) and is proportional to the square root of the density of the electrons. In the noble metals, an additional oscillator appears due to transitions from the filled d bands to states just above the Fermi energy. The influence of these interband transitions on the dielectric functions is shown in Fig. 2b. If the oscillator is strong enough, it causes a second zerocrossing of c1. This leads to a second maximum in the loss function below the free-electron plasmon, called an interband plasmon in contrast to the freeelectron intraband plasmon. Negative polarization due to interband transitions near the free-electron plasma energy leads to a shift of the zero-crossing of and therefore shifts the intraband plasmon to higher energies. A typical example for this situation is Ag or TiN. In transitionmetals, various additional oscillators appear that strongly distort the free-electron loss function (see, e.g., Wehenkel and Gauthe (1974)). Excitations from inner shells are simulated in TABLE I PARAMETERS ( I N eV) FOR

THE

DRUDE-LORENTZ DIELECTRIC FUNCTIONS SHOWN IN FIG.2

Drude

Lorentz

nl

”0

hwo

yo

(a)

1.0

0

4

-

(b)

0.83 0.95 -

0 0

4

0.17 0.05

(c)

(d) (el (f)

-

4 -

-

_

-

_

1 .o 0.83 0.80

hol

4

35 8 8 8

yI

_ 0.5 0.5 4 4 4

n2

w2

_

_

-

ha3

n3

y2

_

-

y3

-

-

-

-

-

-

-

-

-

-

-

-

-

3 3

0.5

-

0.17

0.16

0.5

-

-

-

-

0.4

1.5

0.5

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

13 1

21 20

FIG.2. Calculations of the loss functions and the dielectric functions E , and c2 of solids using the Drude-Lorentz model. The parameters are given in Table 1. (a) Free-electron metal. (b) Free-electron metal with a strong interband transition at low energy leading to an interband plasmon. (c) Free-electron metal with a core-level excitation. (d) Insulator. (e) Semiconductor with a strong interband transition across the fundamental gap. (f) Semiconductor with transitions related to defect states in the gap.

132

JORG FINK

Fig. 2c by an additional oscillator far above the plasmon energy. In this case, e2 of the oscillator is rather small (c2 << t ) and el is close to one. Therefore, far above the plasmon energy, EELS and X-ray absorption spectroscopy measure the same function, since Im( - 1 / ~ )N e2. The positive polarization due to core excitation at lower energy shifts the intraband plasmon to lower energies. The experimental loss function of the nearly-free-electron (NFE)metal A1 is shown in Fig. 3a. The dominant feature is the volume plasmon at 15 eV and multiple losses of this plasmon at, 30,45, and 60 eV. In addition, we see core excitations from the 2p states (L shell) and 1s states (K shell). For comparison, we show in Fig. 3b the band structure of A1 in the X - W direction and transitions between occupied states and unoccupied states. The interband transition between parallel bands at the zone boundary along the X - W direction appears in the spectrum at 1.5 eV. For small q, a surface plasmon (see below) of the A120,-A1 metal interface is observed near 7 eV. In Fig. 2d, the dielectric function of a typical insulator such as polyethylene is simulated by an oscillator having an energy equal to the gap energy E, = 8 eV. Then the plasmon energy is E , = (E,

+ 4~ne~/rn)'/~.

(23) It is not surprising that collective excitations of electrons occur in insulators, since the energy of the plasmon is far higher than the gap energy, and therefore the electrons are quasi-free at this energy. In Fig. 2e, we have simulated the dielectric functions of a semiconductor such as conjugated polymers by two oscillators, one corresponding to excitations of weakly bonded n electrons at 4 eV and one corresponding to excitations of more strongly bonded 0 electrons at 8 eV. The interband plasmon due to the n electrons appears at higher energy than that of the n oscillator, since the zero-crossing of el is above the n oscillator energy. It also depends on the background dielectric function due to the 0 electrons. Nevertheless, the energy shift of the n oscillator due to a change in momentum transfer or due to doping can be seen directly in the loss function. In Fig. 4, we show various n valence bands and conduction bands and interband transitions between them corresponding to the 71 oscillators. When there are two flat bands, the energy of the transition is not changed upon changing the momentum transfer. Therefore, the n plasmon also shows zero dispersion in momentum transfer. The same holds when either the occupied or the unoccupied band is flat. If both are curved, a non-zero (positive or negative) dispersion must appear, indicating that information on the dispersion of bands can be derived from momentum-dependent measurements of interband plasmons. Low doping of conjugated polymers leads to additional transitions into unoccupied levels created in the gap. Its influence on the dielectric functions is shown in Fig. 2f. When the oscillator strength of these transitions is small, Im( - 1/e) N E ~ since ,

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

w

133

I

x

w

MOMENTUM TRANSFER

-

FIG.3. (a) Typical loss spectrum of a 600 A thick A1 foil, showing an intraband transition (IB), surface plasmon (SP),volume plasmon (P), multiple volume plasmons (2P, 3P, and 4P), and core-level excitations with near-edge structure and extended edge structure (EXAFS) from the Al 2p and Is states. (b) Schematic band structure of Al along the XW direction explaining interband and core-level transitions shown in (a).

134

JORG FINK

3

I

zero

zero

positive

negative

FIG.4. Momentum dependence of interband transitions between valence bands and conduction bands. Depending on the curvarture of the bands, the transitions show zero, positive, or negative dispersion in momentum transfer.

at low energies N const N n2, where n is the optical refraction index at zero energy. Therefore, for small momentum transfer the loss function for these transitions can be directly compared with optical data. 2. Microscopic Models The simplest microscopic model for an electron gas in a metal is the model of non-interacting free electrons, i.e., the Sommerfeld model. In this model, the states are plane-wave functions and the excitations can be described by taking an electron from some state of momentum hk lying within the Fermi sphere to a state of momentum h(k + q) lying outside of it (see Fig. 5a). Using nq = CkC:+qCk, Ek = h2k2/2m,and Eq. (9), the density response function of the noninteracting electron gas is given by

The cross-section is then proportional to ImXO. The range of possible excitations is restricted as shown in Fig. 5c by the two parabolas starting at zero and at 2kf. The latter can be derived from excitations of electrons at the Fermi level by an energy hw and momentum transfers qMinand qMax. Because we have neglected interactions between electrons, no collective excitations are described by Im xo. The latter excitations can be described by introducing the interaction of the electrons within the framework of the self-consistent field method or random phase approximation (RPA). In this approximation, we consider the response of the electron gas to the total field, i.e., the sum of the external field and the field induced by the external field. The response function to the total

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

3

2

135

i INTRA-

BAND

L

w \ w

1

FIG.5 . (a) Intraband transitions for a free-electron metal. (b) Intraband and interband transitions for a nearly-free-electron metal. (c) Plasmon and range of intraband transitions for a free-electron metal. (d) Plasmon and range of intraband transitions and interband transitions for a quasi-one-dimensional free-electron metal with a band structure shown in the insert. For comparison, the range of intraband transitions in a free-electron metal is shown by broken lines. Zone-boundary collective states in the gap between intraband and interband transitions are indicated by the chain line.

136

JORG FINK

field is defined by (Pind(q9

w)> = @(%

Using the Poisson equation @)ind(q, (25), the dielectric function is given by

o)@tot(%

0) =

=

(25)

uq(pind(q,w ) ) and Eqs. ( 1 3) and

1 - u,cr(q,o).

(26)

To perform approximation, we now replace the response function a(q, o) of the many-body system to the total field by the response function of the non-interacting system xo. We then obtain the Lindhard dielectric function (Lindhard, 1954) &RPA(Q,

4 = EL(q, 4 = 1 - u,xO(q, 4.

(27)

In the RPA, the electrons respond to the total field, equal to the sum of the external field and the average potential of the electron gas; fluctuations about this average are ignored. The RPA is valid in the weak coupling regime, i.e., when the kinetic energy of the electrons is much larger than the average potential energy. In other words, R P A is valid for high densities r, = ro/ao I 1, where 2r, is the average distance between electrons and a, is the Bohr radius. In real metals, the parameter rs is not small compared to unity and is in the range 2 < r, < 6 . Therefore, one cannot expect that the RPA jellium theory for dense, weakly coupled plasma will predict quantitatively the behaviour of real metals, although various properties are astonishingly welldescribed. According to Eq. (27), the loss function in RPA is given by Im( -

= v,Im(Xo)/(EL(Z.

Thus one obtains the same single-particle excitations as in the noninteracting case, though the intensity is reduced by the square of the absolute Lindhard dielectric function. In addition, the loss function now describes the collective excitations, i.e., the plasmons. The energy and momentum relation is given by EL(qr o)= 0. This leads to an equation for the momentum dependence of the plasmon energy, where E,(O) is the plasmon energy for zero momentum transfer. The low-q dispersion constant a is given by 3 CL = - E,/E,(O). (29) 5

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

137

In the (w,q) diagram in Fig. 5c, the plasmon line with quadratic dispersion in q is shown for q < qc = mw,/hk,, the point where the plasmon-line is merging into the particle-hole continuum. Below qc,the plasmon is completely undamped in the RPA; above qc,it decays into single-particle excitations. As it is well known that plasmons have a finite width, Mermin (1970) introduced a phenomenological plasmon width AE,,, = h/r, which leads to the dielectric function

The loss function, calculated with the help of Eq. (30) and taking parameters close to those of sodium (h/r = 0.2 eV, w,, = 5.9 eV), is shown as a function of the momentum transfer in Figs. 6a and 6b. In Fig. 6a, the abscissa is scaled in such a way as to show the dispersion of the plasmon and its sudden decay near qc. In Fig. 6b, the loss function is multiplied by l / q 2 ,and, therefore, the dispersion of the intraband excitations can be seen. It is interesting to note that close to qc, the maximum of the plasmon and that of the intraband transitions appear almost at the same energy. Above qc, the plasmon has decayed and only the intraband transitions still exist. RPA is a mean-field theory and does not take into account short-range exchange and correlations between electrons. In RPA, the induced charge is given by (Pin&, cu)) = xo(q, w)atOt(q,a).However, near an electron the local density is strongly reduced due to correlations and exchange. Therefore, @)ind(q, w ) is no longer given by u,(pin,(q, 0)) but can be approximated by @tnd(q>w) =

c1 - G(q, w)luq(Pind(q,

O)),

where G(q, o)is the local j e l d correction function. Then we obtain @,fat(% w, = @ext(%

+ @:nd(q,

w,

= @Fa(% w, - G(q,O)uq(Pind(q, w)).

Taking Eq. (25) and using the approximation (pin&, w ) ) = xo(q,w)@Fot, we obtain a(q, w ) = xo/(l + G(q, o)u,xo). Then the dielectric function is given by

According to Mahan (198l), the local field correction function G(q, w ) is a time-dependent function in the sense that it improves over the years (Hubbard, 1955; Nozieres and Pines, 1958; Singwi et al., 1968; Vashishta and Singwi, 1972; Kugler, 1975; Brosens et al. 1976, 1977; Ichimaru, 1982; Dabrowski, 1986; Holas and Rahman, 1987).Using a static local field correction function

138

JORG FINK

-

0

3

-

0

0

5

10

15

ENERGY ( e V )

1

0

5

10 15 ENERGY ( e V )

20

FIG.6. Calculated momentum dependence of the Mermin loss function with intraband transitions and plasmons. Parameters were chosen close to those of Na having a critical wave ) dominantly plasmons. (b) (1/q2)lm(- l/&),showing vector q, = 0.75 A-'. (a) Im( - 1 / ~ showing dominantly intraband transitions.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

139

G(q),the dispersion coefficient is reduced at higher r, values compared to the RPA value by 4%P*

(32)

= 1 - (5/12)5(rs)(Ep/EF)2,

where ((r,) depends on the special form of G(q). For G(q) = q2/2(q2 + q:), which was proposed by Hubbard (1955) (it takes into account exchange effects only), 5 = 1/2. More refined theories, including correlation effects for antiparallel spin, yield (33)

G(q) = 4 1 - exPC-B(&)21),

where A and B depend on r,. These local field correlations give slightly rsdependent 5 values between 1/4 and 1/3 (Vashishta and Singwi, 1972; Pathak and Vashishta, 1973). U p to now we have only considered a homogenous electron gas. Now we turn to the real crystals, i.e., the interaction of electrons with the crystal lattice. A microscopic theory is necessary to consider variations of the field on lattice distances. For a crystalline solid, the dielectric tensor is periodic in lattice, i.e., E(r, r’, t ) = E(r + R, r‘ + R, t). This leads to a more complicated Fourier transformation of Eq. (11): E(k + G, w ) = CEI(k G‘

+ G , k + G’,W)D(k + G’,w),

(34)

where k are wave vectors within the first Brillouin zone and G and G’ are reciprocal lattice vectors. Equation (34) shows that in an inhomogeneous periodic system, longitudinal plane-wave perturbation DO< + G’) leads to a response with the same frequency but not with the same wavelength. Besides k + G’, Bragg diffracted components with wave vectors k + G appear as well. This also implies that not only a longitudinal response, but a transversal response as well appears on a longitudinal perturbation. The contributions with G # G‘ have been termed local jield effects. These should not be confused with the local field contributions due to exchange and correlation. The former disappear when the matrix E(k

+ G , k + G’, w ) = EGG.(k,k, w )

is diagonal. For crystalline systems Eq. (12) is still valid, however, l/E,(q, W) is no longer given by the inverse microscopic dielectric function l/eGG,(k,k, w). In this case, the result for the macroscopic dielectric function in terms of the microscopic one is given by Im[ - l/E,(q, w)] = - l/qz Im[k

+ G , E&k,

k, o), k

= - 1/qz W q , EC&(k,k, 4,q1

+ G] (35)

140

JORG FINK

with q = k + G . This means that we have to calculate the matrix elements eGG.(k, k, a),invert the matrix and take the element q = k + G. Neglecting offdiagonal elements leads to 1/cM= l/eGG(k,k',co).Then the inverse macroscopic dielectric function is equal to the microscopic one. The off-diagonal elements correspond just to the local field corrections. The maxima of the loss function, i.e., the plasmons, are given by the relation Det(e) = 0. The calculation of the inverse microscopic dielectric function is a major computational task and the number of such calculations in the literature is still rather small. In this situation, approximate methods are very valuable, as they provide insight into what is expected from more extensive calculations. In many cases, e.g., weakly inhomogeneous NFE metals, coupling to transversal modes maybe neglected (Sturm, 1982). Then according to Adler (1962) and Wiser (1963), the dielectric function is given in RPA by

(,,le-i(q

+ G)rlcrf)(ail

&(q + G')r

.)I

(36)

Here lo) = (1, k + q) and lo') = Il', k) are Bloch states, 1, I' are band indices, and k is a wave vector inside the first Brillouin zone. For most NFE metals, the critical plasmon wave vector satisfies qe < GMi,/2, i.e., G = 0, G' = 0, and q = k. Then Eq. (36) can be transformed into a form first given by Ehrenreich and Cohen (1959):

For plane wave states with E = h2k2/2m, Eq. (37) transforms again into Eq. (27) for free electrons. In the above approximations (no transverse excitations and G = 0), the inverse dielectric matrix l/e&(q,q;w) can be expressed to second order by the effective dielectric function Eeff(q,W ) = (~001)-* = Eoo(q,q; 0)-

1

4,

EoG(q,q;W)Mc~'(Q~W)EG,O((I,Q;

GG'

(38)

where M&, is the submatrix inverse to MGGrcontaining all elements cGG. with G # 0 and G' # 0. The second term describes the local field corrections. The loss function is then given by ImC- l/EM(q?w)l= rm[-l/Eeff(qrml. Using the relation l/(x + id) = P(l/x) - id(x),the imaginary part of the

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

141

dielectric function given in Eq. (37) can be transformed into

x I { 0 1 e - iqr I 0’) I ’.

(39) As long as the matrix elements does not strongly vary with energy and band index, Im E is determined by the momentum-dependent joint density of states, which is given by the first two terms of the sum of Eq. (39). For q = 0, only interband transitions are allowed. For q # 0, intraband and interband transitions are allowed. In the case of a high joint density-of-states between occupied and unoccupied bands, i.e., in the case of parallel bands and large matrix elements, the interband transitions form a strong oscillator. This is illustrated in Fig. 2 and for the real sp metal A1 in Fig. 3. In Fig. 5b we have drawn the schematic band structure of Na in the [Ill01 direction; intraband and interband transitions are also shown here. In Fig. 5d, we illustrate the ranges of possible intra- and interband transitions in a (o,q) diagram for a one-dimensional band structure as shown in the insert. For qll G , there is a gap between the two ranges due to band splitting at the zone boundary. For comparison we have included by dashed curves the range of intraband transitions in the free-electron case. According to Foo and Hopfield (1968), there are strong transitions from the occupied band to states near the zone boundary. These transitions may cause maxima in the loss function that appear in the gap between intra- and interband transitions as indicated by the chain curve. These transitions have been termed zone-boundary collective states. In the jellium model, there is rigorously no zero-order damping term for q = 0 as shown by Hasegawa and Watabe (1969). This can be seen directly in Fig. 5c, which shows that for zero momentum transfer, there are no intraband transitions, and therefore the plasmon cannot decay into particle-hole excitations. At higher q (but below qc),the damping mechanism is still too weak to account for the measured plasmon line width. In real crystals, however, the plasmon can decay into interband transitions. As shown in Fig. 5d, for q = 0 the plasmon is in the range of interband transitions that leads to a coupling to them and therefore to a finite plasmon width. The same mechanism also describes plasmon damping at higher momentum transfer for nearly-free-electron metals almost quantitatively (Sturm, 1982). Normally, a quadratic dispersion for the line width

+ Bq2

(40) is observed. However, depending on the Fermi energy and the momentum transfer of the lowest Bragg reflections, interband decay channels may close or AE,,2(4 = A&,2(0)

142

JORG FINK

open leading to strong deviations from Eq. (40). A typical example is Li (Kloos, 1973; Gibbons et al., 1976) where a decreasing line width is observed at low q, According to Sturm (1982), the finite line width of the plasmon in N F E metals can be calculated by = 2A Im EIB(q,w,(q)) x

la

(I

41

0

.

(41)

= 0p(q)

The first term in Eq. (41) determines the decay mechanism, while the second gives the excitation strength of the plasmon. According to the Kramers-Kronig relations, a broadening is always related to an energy shift, given by

a

I-'

AEp = - A ReEIB(q,wp(q))x - R e C ~ ~ ( q , 4 1 law

0

= opw

.

(42)

E , is ~

the interband part of EOO(q, q, w), causes additional polarization due to the finite lattice potential. C. Interface Effects

It is well known that besides volume plasmons in the bulk, there exist also collective surface modes, i.e., surface plasmons. For electrons transmitted perpendicularly through a planar dielectrica-metal interface, the cross-section for surface plasmon excitations is given by Ritchie (1957) as d2a

dEdR

k, a -1m q4

( E - E,)'

+

E E ~ ( ~E )

(43)

where k , = k,B is the wave vector parallel to the surface, E and E~ are the dielectric functions of the metal and the dielectrica, respectively. Equation (43) is valid for k , > w / c (c is the velocity of light) and the cross-section has a maximum when E = c 0 . This condition determines the frequency of the surface plasmon. For c0 = 1 (vacuum), by the free-electron dielectric function, the frequency of the surface plasmon is o,= w , / f i . For an oxide-covered A1 foil, E~ il: 3.7 and the surface plasmon appears not at 10.6 eV ( = 15 eV/fi), but near 7 eV, as shown in Fig. 3a. It is of interesting that the cross-section decreases with the third power of q while that of volume losses decreases with the second power. Therefore, it is possible to separate surface losses and volume losses by means of measurements at different momentum transfers. In order to obtain dominantly volume losses, it is necessary to take measurements at finite momentum transfer, where surface losses are negligible. Besides the surface plasmons on planar interfaces, there exist also surface plasmons on voids and gas-filled bubbles. Typical excitations on such bubbles

143

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

are shown in Fig. 7. The frequencies of the various surface-plasmon modes as calculated by Natta (1969) are given by w: = w 2

1+1

I(&

(44)

+ 1) + 1 ’

where E is the dielectric constant inside the bubble. The 1 = 0 or breathing mode is independent of E and the frequency is the same as that of the volume plasmon. The frequency of higher modes, e.g., the dipole and the quadrupole mode, are given in Fig. 7. The 1 = co mode naturally has the same frequency as the surface plasmon of a planar surface. The excitation probability for the various modes as a function of momentum transfer was calculated by Ashley and Ferrell (1976) as follows:

where j , is the spherical Bessel function, r the radius of the bubble, and f the filling factor, i.e., the ratio of the volume of the bubbles to the total volume of the sample. From Eq. (45), we can see that for small bubbles and for small momentum transfer, dominantly dipole modes are excited. With increasing momentum transfer, higher modes come into play. With a spectrometer having a large beam diameter, we cannot measure EELS spectra on isolated voids or bubbles. Because of intensity problems, we need a rather high concentration of bubbles, i.e., the interaction between the surface plasmons on the individual bubbles has to be taken into account. For a simple cubic lattice of bubbles, according to Garnett (1904) an effective dielectric function is given by - EM(W)

+

&,ff(w)

1=0

2EM(W)

EB(w)

= f&Fj(W)

1=2

I=1

\j2/3 up

- EM(w)

M

(46)

+ 2&M(W)’ [>>I

WP

FIG.7. Surface plasmon modes on the interface of bubbles

w

2 up

144

JORG FINK

where j is the filling factor and E ~ ( wand ) E ~ ( ware ) the dielectric functions of the matrix and of the material in the bubble, respectively. Equation (46) is valid for zero momentum transfer. To our knowledge, there is no w- and q-dependent effective medium theory yet available. From the effective dielectric function, the effective loss function Im( - l/eeff(w))can be calculated. It has poles given by the frequency condition

+ 2EM(w) + 2 f C E M ( w )

- EB(w)l

= O.

(47)

It describes oscillators of the dielectrica in the bubble, plasmons of the matrix and surface plasmons on the bubbles in the long wavelength limit. D. Core-Level Excitations The simplest description of core electron spectra near threshold (XANES = X-ray absorption near-edge structure) is given by the EhrenreichCohen dielectric function in Eq. (39), which we rewrite for small c 2 , ‘v 1, and T = 0 as follows: Im(- 1 / ~ ‘v ) E~

= o,(e2n/I/)CI(aIe-iqrJc)12S(tiw a

- E,, - Ec),

(48)

where Ic) and E, are the core level wave function and energy, respectively. For small q, exp( - iqr) can be expanded into 1 - iqr - (qr)2/2. Since) . 1 and ic) are orthogonal, the first term of the expansion is zero. For q << l/(Rc) (R, is the mean radius of the core wave function), the second term, which is just a dipole operator, is dominant. Therefore, for small q, we obtain the same results as in X-ray absorption spectroscopy (XAS). EELS offers the interesting possibility to increase q and, therefore, the selection rules may be changed. For q > l/(R,), the third term (qr)’ is dominant, which leads to monopole and quadrupole transitions. This possibility has been used to study the effect of changes of edge singularities on variation of selection rules (Schnatterly, 1979). Due to the flatness of the core “bands,” the 6 function in Eq. (48) just describes the density of unoccupied states. Furthermore, due to the extreme localization of the core wave function, the matrix element in Eq. (48) is nonzero only very close to the nucleus of the excited atom. Therefore, the local density of the unoccupied states is sampled. The above model of core excitations is a very oversimplified one, since it takes into account only singleparticle excitations. In reality, core excitations are a strong perturbation for the solid and therefore many-body effects appear. In particular, the interaction with the core-hole should be taken into account. Before discussing the effects due to the core-hole, we come to the problem of changes of the binding energy Es of the core level due to different chemical bonding of the excited atom. From XPS investigations (Siegbahn et al., 1969),

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

145

it is well known that the so-called chemical shift of the binding energy depends on the charge of the valence electrons, q i , and the charge of the neighboring atoms, q j , at distance R i j , AEB = K q i

+ e2~i ' q j I R , .

(49)

The first term will lead to a positive shift when valence electrons are transferred to neighboring atoms, since the nuclear charge is less well screened and, therefore, the binding energy is increased. On the other hand, the second term, or Madelung term, leads in this case to a negative shift, which may partially compensate or even overcompensate the positive shift. When the structure of the investigated system is well known and the Madelung term can be calculated, important information on the charges on atoms can be derived from measurements of E B using Eq. (49). In metals the binding energy relative to the Fermi energy can be measured not only by XPS, but also by EELS. In semiconductors and insulators, the situation for EELS is more complicated, since there the gap should be known and also excitonic effects should be taken into account. In the following, we return to the possible effect of the core-hole potential on absorption spectra. In nearly-free-electron metals, the effects most often observed are the enhancement or the reduction of intensity at the threshold due to many-body effects (Citrin et al., 1979). At higher energies above threshold, it is generally recognized for simple metals that the core-hole potential has relatively little effect on the observed spectra. In many cases, the partial density of states picture has been strikingly successful for this reason. At the other extreme, in narrow-band systems such as dilute alloys or rare earths where strongly localized wave functions contribute to the final states, it is quite clear that atomic-like spectra are observed for 2p-3d transitions (Thole et al., 1985)and for 4d-4f transitions (Fuggle et al., 1983).In the latter case, the 4d-core-hole 4f interaction is very strong and lowers the energy of these transitions by about 10-20 eV with respect to the 5d conduction-band edge. Strong excitonic effects have been observed also in the large-gap, ionic crystals where the core-hole is poorly screened. In these cases, the Z + 1 approximation is valid, i.e., the central atom with the core-hole should be replaced by the atom next in the periodic table. The intermediate cases between nearly-free-electron metals and atomic-like excitations are extremely difficult to be described by theory. Atomic Coulomb and exchange interactions, as well as solid-state band-structure effects, have to be taken into account. This was demonstrated for the 2p edges of the 3d metals by Fink et al. (1985a) and Zaanen et al. (1985). As a general rule, it was shown by Kotani and Toyozawa (1974) that with increasing strength of the core-hole potential, more and more spectral weight is shifted from higher energies to the threshold. This was also shown in a

146

JORG FINK

calculation on the 1s edge of graphite by Mele and Ritsko (1979). Further increase of the core-hole interaction then leads to a separation of a coreexciton line below the continuum of unoccupied states. With these preliminary and very brief remarks on near-edge structures in mind, we shall now discuss the structures in the core excitation spectra about 50 eV above threshold (EXAFS = extended X-ray absorption fine structure). This modulation of the intensity is attributed to a change of the absorption cross-section due to excited electrons backscattered from neighboring atoms. The backscattered wave will interfere with the outgoing wave, and the interference may be constructive or destructive, depending on the wave vector and distance. Therefore, information on the local structure around an excited atom can be obtained from this spectroscopy (Lee et al., 1981).The oscillatory part of the spectrum can be described for an s-state core level by

where the wave vector k of the outgoing electron is defined by h 2 k z / 2 m = ( E - EB),Ni is the number of atoms in the neighboring shell i separated from the central (absorbing) atom by a distance R,, 4i is the sum of the phase shifts caused by the absorbing and the backscattering atom, and h(k)is the backscattering amplitude of a neighbor. The exponential terms contain the Debye-Waller-like vibrational attenuation and damping due to the finite coherent path length of the outgoing electrons. When phase shifts and scattering amplitude are known, a Fourier transformation of the oscillations yields information on Ni and R i . While for high-2 elements, EXAFS is a domain of synchrotron radiation, EXAFS on low-2 elements can be readily investigated by EELS. A systematic study on EXAFS spectra on low-2 elements such as C in graphite and diamond and A1 metal has been performed recently by Hott (1986). This is, however, not included in this review. In the EXAFS range, the kinetic energy of the excited electrons is high enough so that the plane-wave approximation is good. The electrons are weakly backscattered by single neighboring atoms. In the XANES range just above threshold, the excited electrons strongly interact with many atoms. On the basis of an EXAFS-like theory including, however, multiple scattering proceskes, XANES spectra can be analyzed to give complementary information on EXAFS, for example, bonding angles and higher order atomic correlation effects (Dehmer, 1975; Natoli et al., 1980; Durham et al., 1981). This kind of spectroscopy is particularly interesting for molecules for it is in this area that the concept of shape resonances above the ionization potential was developed. These resonances are most easily pictured as arising from a scattering process where the excited electron is resonantly scattered to-and-fro along the inter-nuclear axis between the absorbing atom and its neighbor.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

147

This simple scattering picture suggests that the energy position of the shape resonances should be sensitive to intramolecular distances. Empirical relationships between the energy of the resonances with respect to the ionization threshold and the bond length have been worked out for a variety of molecules (Hitchcock et al., 1984; Sette et al., 1984). These rules were also used.for the analysis of bond length of molecules adsorbed on surfaces (Stohr et al., 1984), and there are indications that they may also be used in structure analysis of simple polymers (see Section VIII). It should be borne in mind, however, that the simple shape-resonance energy/bond length relationship has been seriously called into question (Piancastelli et al., 1987) and that more experimental and theoretical work is needed in this field. E. Data Evaluation

The intensity in a scattering experiment is given by the folding of the crosssection with the instrumental resolution R and integration over the scattering volume V:

Normally, the resolution function can be separated into a resolution function of the energy f(o)and of the momentum resolution g(ql). Using a high-energy resolution, we can neglect the folding with f(o).Remembering the slow variation of the volume loss function with momentum transfer, we obtain for the R integration the result

which, using a measured g(ql), can be calculated numerically. Then the scattered intensity is given by

where A and D are the area and the thickness of the sample, respectively. Equations (51) and (52) are valid for very thin samples when contributions from multiple scattering processes can be neglected. In most cases, the samples are not thin enough and corrections for multiple scattering contributions have to be used. The procedure for these corrections is described by Daniels et al. (1970).The essential point of this correction of double losses is that a function that is the spectrum convoluted with itself is subtracted from the original data. In most experiments, it is extremely difficult to obtain a reliable value for the sample thickness. Therefore, the loss measurements only provide a

148

JORG FINK

function that is proportional to Im( - l/e). For metals and low-momentum transfer, Re[ - l/e(O, O)] is zero. By means of the Kramers-Kronig relation given in Eq. (16), it is then possible to determine the absolute value of the loss function. A similar procedure can be used for semiconductors and insulators, since Re[l/e(O,O)] = 1 / ~ ~ ( 0 ,=0 )l / n 2 , where n is the optical refractive index determined at low energies. A Kramers-Kronig analysis of spectra taken at higher momentum transfer is more problematic since approximations or models (Resta, 1977) of the q dependence of the refractive index have to be used. In principle, when the conditions of the spectrometer are extremely stable, the thickness of the sample can be determined by a 4 = 0 measurement and this value can then be used for the evaluation of data taken at higher q. Once the absolute value of the loss function is known, Re[ - l/(q, o)]can be derived by the Kramers-Kronig transformation. Then el (q, o),e2(q, a), reflectivity, optical conductivity, and absorption can be calculated. For 4 small compared to the dimensions of the Brillouin zone, the latter functions can be compared directly with the optical data.

111. INSTRUMENTATION A . Principle of Operation As outlined in Section 11, a versatile spectrometer should fulfill several requirements: 1. A variable energy resolution, which entails a variable electron current. For valence band excitations, a high-energy resolution A E I0.2 eV is desirable and only low currents are needed, while for EXAFS a high current is needed and poor energy resolution A E 2 1 eV can be tolerated. 2. A variable momentum transfer resolution is highly desirable. In most cases of valence band excitations, a high-momentum transfer resolution Aq,,, 5 0 . 1 k' is needed, while for core-level excitations the angle within which most of the scattered intensity is located increases with incieasing energy loss [see Eqs. (15) and (lc)]. Therefore, for the latter excitations a poor momentum transfer resolution q 2 0.1 A-' is needed because of intensity reasons. 3. The sample should be on ground potential so that sample changing can be performed very quickly without shutting down the high voltage. Moreover, sample cooling and heating, in situ sample preparation, and transfer from other vacuum chambers can be performed easily. 4. The spectrometer should be ultra-high vacuum compatible in order to allow measurements on reactive samples.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

149

5. The primary energy of the incoming electrons should be high to allow measurements on thick samples without having serious problems with multiple scattering In the Karlsruhe spectrometer, we tried to fulfill these requirements as far as possible. The outline is close to the Princeton spectrometer (Gibbons et al., 1975).On the other hand, we included certain new elements and elements used in previously constructed spectrometers (Boersch and Miessner, 1962; Kunz, 1966; Lohff, 1963). A sketch of the spectrometer is shown in Fig. 8. Electrons emitted from an electron source are monochromatized in an electrostatic deflection monochromator. The output of the monochromator is matched to the accelerator by a zoom lens system. At the sample, the electrons have a primary energy of 170 KeV in the present configuration. Originally, the system was designed to operate up to 300 KeV. However, serious accidents due to arcing between the potential rings of the accelerator occurred and the voltage had to be reduced. After the scattering process, the electrons are decelerated by the same voltage, pass a second zoom lens system, and are analyzed in a spherical deflection analyzer and then in a detector system. The ripple on the high-voltage power supply has no influence on the energy resolution as long as the time of flight of the electrons between monochromator and analyzer (about 20 nsec) is small compared to the time in which changes of the order of the energy resolution occur on the high-voltage supply. Inelastically scattered electrons are analyzed by raising the monochromator potential above the zero energy loss potential, thus compensating the energy loss in the sample. Momentum transfer is selected by two horizontal and two vertical pairs of

DEFL. PLATES I

DECEL

'OM

ullLLLL

I ANA

PAR

I

'\

FIG. 8. Schematic drawing of the Karlsruhe electron energy-loss spectrometer.

150

JORG FINK

deflection plates after the sample. Thus, scattering angles 8 and 4 may be compensated by bending the electron beam back to the optical axis through the electrostatic field of the deflection plates. We used a 180" spherical electrostatic deflection monochromator as the monochromator and analyzer, since there are considerable calculations and experimental information available for this system. A comparison with other monochromators can be found in the monographs by Ballu (1980) and Ibach and Mills (1982). The energy resolution of the hemispherical monochromator (and analyzer) is given by A E / E o = W/2R + u2, where W is the width of the entrance and exit slits, R the radius of the electron path, E, the energy of the electrons, which travel in a circle of radius R (pass energy), and u is the angular spread of the incident electrons. Following Kuyatt and Simpson (1967), we chose '10 = W/4R in order to reduce the tailing of the energy distribution. Once we had fixed the mechanical values W,R, and u, the only parameter that could then be changed to vary the energy resolution is the pass energy E , . Therefore, to fulfill criterion 1 for variable energy resolution, we provided the spectrometer with a variable pass energy for monochromator and analyzer. To focus the ful\ electron beam emitted from the monochromator onto the sample, it is necessary to apply Liouville's invariance theorem for the number of electrons in phase space built up by position and momentum transfer coordinates. For paraxial rays, no energy dispersing elements, and small = angles, this theorem transforms into the Abbk-Helmholtz law ulr,& u2r2& for electron rays originating at a point in plane 1 (at rl and with ul) and imaged to a point in plane 2 (at r2 and with a2). Since u,, r,, and EM at the monochromator and E, at the sample position are fixed, the only variable that can be changed to vary the momentum-transfer resolution a, at the sample is r,. Since the optical properties of the accelerator (and the decelerator) are also fixed, the magnification M = rs/ry can be changed with the aid of the zoom lens system between monochromator (analyzer) and accelerator (decelerator). A rather sophisticated design of the zoom lens system is needed to achieve both, the variable magnification yielding a variable momentum transfer resolution at the sample, and a variable pass energy E , yielding the variable energy resolution. Still it has not been possible to realize all the calculated spectrometer settings experimentally. B. Electron Source, Monochromator, Analyzer, and Detector

The monochromator including the electron gun and the zoom lenses is shown in Fig. 9. The design of the electron gun and the monochromator is close to that of Kuyatt and Simpson (1967). The gun is a planar diode with a small hole in the anode, and is always operated at full space charge-limited

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY ALO4 ES

Z1

22

23

24

05 06

15 1 EL

M OL 03 EA ML 01 S C FIG.9. Monochromator-zoom lens system. C: cathode; S: electron source; Dl-D6: deflection plates; ML: matching lenses; EA: entrance aperture; DL: deceleration lens; M: monochromator; AL: acceleration lens; ES: exit slit; 21-24: zoom lenses; EL: einzel lens.

conditions. The cathode is an indirectly heated tungsten dispenser cathode 3 mm in diameter. A three-element electrostatic tube lens focuses the gun beam to the entrance apertures of the monochromator, setting the beam diameter and the beam angle to 0.3 mm and 0.026 rad, respectively. The 10: 1 deceleration lens forms a virtual 0.5 mm wide image of the aperture at the entrance plane of a monochromator with radius of 50mm. After the monochromator, the beam is transferred by a 1 : l O acceleration lens to a 0.3 mm-wide exit slit. The use of virtual entrance and exit slits in the monochromator serves two main purposes. First, any secondary electrons produced at the entrance apertures are prevented from reaching the electrostatic deflector; and second, electrons of very low energy do not have to pass through the actual slits. According to Kuyatt and Simpson (1967),because of the latter space charge problems in the monochromator are avoided, since at higher energy more electrons can be transferred through the entrance slits of the monochromator than at low energy. Taking into account the space charge limitation at the entrance slit and the anomalous energy spread due to the interaction of electrons at high current densities (Boersch, 1954), the (in pA) that can be achieved in this configuration is maximum current lmax ZmaX = 31.2R(AE)5'2/(AE,J''z

(53)

where AEkis the effective energy width of the cathode (in eV) and R the mean radius of the monochromator (in cm). This relation shows that the current is strongly correlated with the energy resolution. The gun-monochromator

152

JORG FINK

assembly should deliver a beam into the zoom lens with a current given by I””” and beam diameter and angle of 0.3 mm and 0.026 rad, respectively. The design of the analyzer is almost identical to that of the monochromator. Instead of the exit slit of the monochromator there are entrance apertures defining the radial and the angular acceptance of the analyzer to 0.3 mm and 0.026 rad, respectively. The matching lens and the electron gun is replaced by one electrostatic lens that focusses the beam to the entrance anode of an open multiplier. One of these lens elements is also used as a Faraday cup when high currents have to be registrated. All parts of the monochromator, analyzer, and zoom lenses are fabricated from A1 metal coated with a thin layer of carbon. Insulating pieces are machined in most cases from A1,03. Both monochromator and analyzer are surrounded by a shield of p metal to reduce the earth’s magnetic field and the stray field of the ion pumps below 0.2 pT. C . Zoom Lenses, Accelerator, and Decelerator

As outlined above, the purpose of the zoom lenses is to produce an image of the monochromator output with variable magnification at the image point of the accelerator in order to vary the momentum transfer resolution at the sample. As shown in Fig. 9, the zoom lenses are composed of four electrostatic tube lenses. The potential of the monochromator exit tube is given by the energy resolution required. The potential of the tube at the accelerator input is fixed to 3500 V in order to have constant focal properties of the accelerator. The potential of the second, third, and fourth tube is varied in such a way as to produce an image in front of the accelerator with the desired magnification. This was accomplished, first, by a program that adjusts the voltages by Newton’s method with the aid of matrix arithmetic of focal properties of twoelement lenses. These values were then taken into a second program to calculate and optimize electron trajectories in this lens system. The second, more complicated, program has a higher accuracy and details such as filling factors of the lenses could be studied. At the image point in front of the accelerator, an einzel lens was built up in order to set the pupil of that image to the focal point of the accelerator. Thus, the pupil at the sample is transferred to infinity. The einzel lens has only a small effect on the image because the elements of the lens are close to the image position. Details of the calculations and the programs used are described by Fink and Kisker (1980). Both accelerator and decelerator are composed of three 90-keV commercial acceleration tubes from National Electrostatics Corp., Middletown, Wisc. The elements of the resistor chain near the zoom lenses were slightly diminished in order to reduce strong focusing at the low energy ends. The

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

153

calculation of the focal properties of the accelerator was done with the same programs already used for the zoom lenses. With potentials of 3.5 KV at the entrance and 170 KV at the exit of the accelerator, a magnification of 1.6 was derived. The earth's magnetic field in the acceleration tubes was reduced to below 2 pT by replacing the standard Al tube rings by rings fabricated from p metal.

D . Scattering, Characterization, and Preparation Chamber The sample holder is mounted on a commercial manipulator. The temperature of the sample can be varied between 77 K and 700 K. Currently, a sample holder for liquid He cooling is in preparation. Between sample holder and decelerator, two horizontal and two vertical pairs of deflection plates are installed for momentum transfer selection. They are separated by Herzog plates to reduce the fringing fields between the pairs. Voltages on the deflection plates up to 2 KV are necessary to compensate momentum transfers up to 10 k'. By means of computer-driven electronics, it is possible to measure scattering angles in all four quadrants. Before and after the scattering chamber, four deflection coils are installed for tuning the beam. Magnetic shielding is again achieved by a 1 mm p metal housing inside the chamber. Attached to the scattering chamber is a chamber for sample characterization (see Fig. 8). By means of a Leybold-Heraeus spectrometer, X-ray induced photo-electron spectroscopy (XPS) without monochromator, Augerelectron spectroscopy (AES), and ion backscattering spectroscopy (IBS) can be performed in this commercial chamber. The ion gun for the latter measurements can also be used to thin down samples by ion beam milling. A furnace is also installed in the chamber to anneal samples up to 800°C. The spectrometer is equipped with a preparation chamber in which samples can be produced by evaporation from three different ports supplied with electron guns or resistively heated boats. The evaporation rate can be monitored by two quartz oscillators. Therefore, it is also possible to prepare alloys. All the chambers are connected by magnetic coupled transfer mechanisms. A fast lock-in is installed between the preparation chamber and the characterization chamber. Thus, samples can be introduced into the various chambers of the spectrometer within half an hour whilst the ultra-high vacuum is maintained. E . Vacuum System and Electronics

The monochromator-zoom-lens chamber, analyzer-zoom-lens chamber, accelerator, decelerator, scattering chamber, and characterization chamber are all pumped by their own magnetic ion pump. In addition, the scattering

154

JORG FINK

chamber and the characterization chamber possess supporting sublimation Ti pumps. The preparation chamber and the lock-in chamber are pumped by turbomolecular pumps. Each chamber is separated from the neighboring chambers by valves. In all the chambers, pressures below lo-' Torr are routinely achieved. In the monochromator chamber, the scattering chamber, and the characterization chamber, pressures up to 8 lo-" Torr can be reached using liquid nitrogen cooled Ti sublimation pumps. No differential pumping techniques are required to achieve these low pressures. The choice to have monochromator and analyzer on high potential and sample on ground potential, which is highly advantageous for routine measurements, entails rather complicated remote control of the spectrometer, as almost all the power supplies for the electron gun, monochromator, analyzer, lenses and deflection plates have to be on high potential. A block diagram of the electronics is shown in Fig. 10. The heart of the spectrometer control is a N O V A 4 computer connected to a C A M A C crate. Serial in-

n

Aj I

l

l

I i I

MULTl DAC

I

I CONTROLCONTRO CONTROLCONTROL OUT IN OUT IN CALER CAMAC CRATE

NOVA 4 COMPUTER ~TERMINALI~

I

PRINTER

PINAL! I 1 I DISPLAY

PLOTTER

I

FIG.10. Schematic drawing of the spectrometer electronis. Dot-and-dash lines: fibre optic cables. Dashed lines indicates monochromator and analyser terminal.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

155

formation to and from the high potential of monochromator and analyzer is transmitted by fiber optic cables. It is then transferred via a serial-to-parallel converter to the various power supplies. In all, there are 21 power supplies for cathode heating, monochromator, lenses, and deflection plates on the monochromator terminal. Almost no resistor dividers could be used for the lens voltages, as nearly all the potentials were to be variable in order to achieve variable energy and momentum resolution. In addition, the programmable AE supply (1 KV, Systron Donner, Model M107) is controlled by an output register connected to the serial-to-parallel converter. O n the analyzer terminal there are 16 power supplies. A second “AE +” supply (10 KV, Fluke, Model 410B)is set to the analyzer potential in order to add a fixed voltage (normally 1 KV) onto the variable 1 KVAE supply in case losses above 1 KeV are to be measured. The current transmitted through the analyzer is measured by programmable ampere meters (Keithley, Model 18000-20) followed by voltage-to-frequency converters, placed either at the entrance lens of the multiplier or amplified at the anode of the multiplier. At low currents, the electrons are counted directly by an amplifier/discriminator. The frequency corresponding to the current or counting rate is transmitted to the scaler in the CAMAC crate via an optic fiber cable. Both A E supplies are connected to a 300-KV high-voltage power supply consisting of a cascade transformer supply (Haefely, Typ 1.5-150M) and a resistor chain voltage divider (Haefely, Typ M 150-D) to provide feedback for regulating the output to 0.01% with a peak-topeak ripple of 30 V. A 220-V ac power supply on high potential necessary for the electronics and the vacuum pumps is provided by a 300-KV insulation transformer below the analyzer terminal. From there the 220-V ac is transferred to the monochromator terminal via a small 10 KV insulation transformer. On ground potential, a multi-DAC in the CAMAC crate drives the high-voltage power supplies (2 KV, Wenzel Electronics) for the momentum transfer supply, and four 200-mA current sources for the deflection coils at the scattering chamber.

E Spectrometer Performance Five different spectrometer settings have now developed. The energy and momentum resolution and the current transmitted through the spectrometer are listed in Table 11. For valence band excitations and elastic electron diffraction patterns, mostly file 2 and sometimes file 1 are used. Files 3-5 were installed for core-level excitations. The spectrometer settings are loaded via the computer to the power supplies. Normally, after a new setting of the spectrometer a current is measured that is slightly lower than the highest values possible. After automatically tuning the various deflection plates and

156

JORG FINK TABLE I1 CURRENTLY USEDSPECTROMETER SETTINGS AND COMPARISON WITH CALCULATED VALUES.1, AND IF ARE THE CURRENTS AFER MONOCHROMATOR AND AFTER TRANSMITTING THE ANALYZER, RESPECTIVELY Calculated values

File Eo (V) No. 5 1 11 2 11 3 32 4 41 5

rs (mm) 1.3 2.0 0.2 0.18

0.13

Aq,,,

(meV) 40 80 80 150 300

Actual values n Ir

(k')(nA) 0.01 0.01 0.1 0.1 0.2

17 170 170

300 5500

rs

(mm) 0.25 0.5

-

0.15

A% (meV) 80 170

200 400 700

&I12

IM

(A-1)

(nA)

0.04 0.04 0.1

15 120 120 500 2000

0.2 0.3

coils as well as the pass energy of the analyzer in such a way as to optimize the beam current, the optimum beam current as listed in Table I1 can be reached within several minutes. Then the sample is moved into the beam and the sample position is optimized in terms of maximum current in plasmon loss setting. A further computer tuning on the deflection plates guarantees that the scattering intensity is centered to q = 0. A change to another spectrometer setting, i.e., another energy and momentum resolution can be achieved in several minutes. Energy loss or momentum transfer (scattering angle 6 or 4) is scanned up and down in order to avoid transient errors due to flyback at the end of the scan. This also serves to cancel out slope effects in the data, which may result from long-term drift in the beam current. Normally, the technique of multiscanning is used in order to reduce beam fluctuation noise (about 1%). For lengthy measurements, tuning of the spectrometer with respect to the plasmon intensity can be performed automatically between scans. The data are stored in the computer and can be displayed on a TV display or plotted on a digital plotter. To design similar electron energy-loss spectrometers, it is instructive to compare the calculated current, energy resolution, and momentum transfer resolution data (also given in Table 11) with those actually achieved in the spectrometer. The currents after the monochromator are slightly lower but close to the values calculated in Eq. (53). The full current of the monochromator can be transferred to the entrance apertures of the analyzer. After the analyzer, the current is reduced by a factor of 10 to 30. Most of this loss occurs in the two entrance apertures of the analyzer, probably due to incorrect matching of the monochromator output to the analyzer input via zoom lenses and einzel lenses. Similar strong current losses were reported for the Princeton spectrometer (Gibbons et al., 1975). The energy resolution is almost half the

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

157

calculated values. The reason for this loss in energy resolution and, in particular, why it is not possible to improve the resolution below AE = 80 meV, is not clear. The momentum transfer resolutions achieved are close to the calculated ones. For low A q I l 2 , a considerable loss in resolution is observed and AqIl2 50.03k' could not be reached. The discrepancies between the calculated and experimental values may be due to lens aberrations, ripple on the power supplies, incomplete magnetic shielding and fringing fields due to charging of the insulators. There are various parameters, e.g., the 35 power supplies, that influence the performance of the spectrometer. Currently, we are using the present configuration of the spectrometer without serious attempts to improve its performance. IV. SAMPLE PREPARATION One of the serious problems in the application of EELS to solid-state physics and material science is the preparation of the samples. While in some cases preparation can be done easily, in other cases preparation is extremely difficult or not feasible at all. Samples with a thickness of 500 to 4000 A and diameter of at least several tenths of a millimeter are needed. At high momentum resolution, the diameter should be even larger. Thus, the situation is completely different from EELS using an electron microscope where sample diameters of several micrometers are sufficient which can be prepared by chemical etching. Various preparation methods that depend on the actual solid have been developed in the past several years. They have been described in several reviews, but in view of the importance of this field in EELS we present here a brief outline of the methods we have used during the past five years. A . Thin Film Deposition

Large thin films with controlled thickness can be prepared by vacuum evaporation of metals and oxides. Samples that are reactive, e.g., alkali metals, are evaporated onto thin substrates of amorphous carbon or formvar in a preparation chamber attached to the spectrometer. In these cases, the substrate cannot be removed from the sample. Using single-crystalline MgO as substrate, single-crystalline samples of Na and K can also be grown. For core-level spectroscopy where the substrate does not disturb the spectra, samples of all 3d elements and dilute AgMn and CuMn are prepared in the preparation chamber. For non-reactive samples a separate vacuum chamber is used. Large-grain single-crystalline A1 films are epitaxially grown on heated

158

JORG FINK

NaCl crystals. A15 compounds, such as Nb,Ge, are evaporated from two different controlled sources onto heated Mo ribbons to be removed later on by etching in a FeCl, solution. Almost perfect single-crystalline MgO films may be prepared by evaporation onto LiF-covered NaCl heated to 200°C. Various intermetallic compounds, such as refractory materials and A15 compounds, are prepared by sputtering. In all these cases, heated Mo ribbons are used as substrate. Refractory compounds, e.g., 3d and 4d metal carbides and nitrides, are produced by reactive sputtering of the metals in argon/ methane or argon/nitrogen atmosphere. The stoichiometry may be varied over a large range by changing the methane or nitrogen partial pressure. A15 compounds, such as Nb,Ge, Nb,Sn, and Nb,Al are prepared by cosputtering the metals from two different cathodes placed side by side containing the elemental metals. The stoichiometry of the films may be varied by changing the current on the individual magnetrons. Thin hydrocarbon plasma-generated amorphous carbon films (a-C: H) are deposited on NaCl substrates in a rf plasma sustained by benzene vapour (Fink et al., 1984a; Bubenzer et al., 1983). By varying the voltage across the discharge space and/or the benzene pressure, the optical properties of these films can be varied. Thin films of conducting polymers, such as polypyrrole (Fink et al., 1986), polythiophene (Fink et al., 1987a), or polyaniline, can be produced by electropolymerization in an electrochemical cell. In particular, Sn-In-oxide electrodes are suitable as a substrate, since samples can be easily floated off in water off their smooth surface. The thickness of the films can be monitored by measuring the charge flow in the electrochemical cell. Partially oriented polyparaphenylene films with a thickness I 1000 A are prepared by the method of Kovacic (1966) using a stirring system for shearflow polymerization in a thin slit zone (Tieke, 1982). Thin films from soluble polymers such as polyparaphenylenesulfide or polyparaphenyleneoxide can be produced by dispersing the dilute solutions onto either NaCl or glass substrates. After evaporation of the solvent, the film can be floated off in distilled water or alcohol. B. Preparation from Macroscopic Solids

Thin films of AuMn alloys can be prepared by rolling the alloys into foils having a thickness of several micrometers and subsequent thinning by ion beam milling below lo00 A. For core-level spectroscopy where the loss energy is far from the zero-loss peak, non-homogeneous samples with numerous pin holes or even fine powder fixed on a high-mesh electron microscope grid may be used. In this way, samples of AuMn alloys and various high-T, super-

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

159

conductors (La, -,Sr,CuO, and YBa,Cu,O,) may be produced. Samples with thicknesses as little as 500 A can be cut by an ultramicrotome fitted with a glass or diamond knife. To prevent folding, the slices are usually floated from the cutting edge onto water and subsequently onto the grids. Samples of various polymers and charge-transfer salts have been produced by this method. This method has recently been applied for sample preparation of MgO and high-T, ceramics using a diamond knife. Highly oriented films of polyacetylene and polyparaphenylenevinylene with a thickness of 20004000 A may be prepared by casting a solution of a precursor polymer. During the conversion of the prepolymer, highly oriented polymers with chain orientation of approximately several degrees may be obtained by stress alignment (Leising, 1984; Fink et al., 1987b). Layer compounds, such as graphite or transition metal chalcogenides (Manzke et al., 1981) can be mechanically cleaved to a thickness below 1000 A. Normally, adhesive tape is attached to the opposite surfaces and the material is pulled apart by hand. If the process is repeated a sufficient number of times, thin samples can be obtained on tapes that can then be floated off in organic solvents.

C. Ion Implantation and Doping Ion implantation is a powerful method of preparation of thin films that are not in thermal equilibrium. To study rare-gas bubbles in metals, all the rare gases from He to Xe are implanted into various metals and alloys (Al, Ni, Mo, Rh, NIP and NIB). Normally, different implantation energies are chosen in order to achieve a homogeneous implantation profile. The concentration and concentration profile are studied by proton-enhanced scattering by a 12C(p,p) nuclear reaction method (for He) or by Rutherford backscattering (for Z 2 14) (Mayer and Rimini, 1971). Amorphous samples of NiB and NiP alloys for EXAFS studies and studies of He bubbles in amorphous alloys are prepared by ion implantation of B or Pinto Ni foils (Thome et al., 1983). To investigate the tribological properties of implanted surfaces, Ni foils may be implanted with Kr and Rb. These examples show how important are the modern techniques of sample preparation as tools for EELS. As will be shown in more detail in Section VIII, various conjugated polymers can be doped by AsF,, I,. . . (p-type doping) or Li, Na, ... (n-type doping) to achieve high conductivities. p-type doping from the gas phase is performed outside the spectrometer. The reactive samples are then transferred from the high-purity Ar glove-box via a lock-in transfer to the spectrometer. p-type doping with ClO,, SO;-, or BF, is also achieved in an electrochemical cell inside the glove-box. Thin, highly-reactive, n-type doped films cannot be handled in the glove-box. Therefore, doping is performed by evaporating

160

JORG FINK

definite quantities onto the conjugated polymers in the preparation chamber attached to the spectrometer. Subsequent annealing causes the alkali metals to diffuse into the polymers, and homogeneous doping is achieved. The doping concentration may be varied by changing the amount of alkali metals evaporated onto the polymers. In some cases, the doping concentration may also be varied by heating the fully n-type doped sample to several hundred degrees centigrade. The alkali metals are then evaporated from the sample and doping concentration reduced at this time by performing this procedure in the beam during the measurements, it is possible to make in situ measurements as a function of doping concentration. V. NEARLY-FREE-ELECTRON METALS

The understanding of the electronic structure of nearly-free-electron (NFE) metals (sp metals) has been a challenge since the beginning of solidstate physics and is the basis for the understanding of more complicated metals such as transition metals or rare earth metals. The Sommerfeld freeelectron model is a simple model for the sp metals in which the electrons are assumed to move independently from each other and are neutralized by a homogeneous positive background. The next improvement is to switch on long-range interactions between electrons, i.e., electrons are now assumed to move in the average field of the other electrons. This so-called random phase approximation (RPA) for the jellium model leads to the Lindhard dielectric functions given by Eqs. (24) and ( 2 7 )which describe single-particle excitations (intra-band transitions) and collective excitations (plasmons). The latter excitations show a dispersion in momentum transfer that is given in RPA for the low-q limit by Eqs. (28) and (29). As outlined in Section II,B,2 the RPA is valid in the weak coupling regime, i.e., when kinetic energy of the electrons is much larger than the average potential energy of interaction between the electrons. For the jellium model, this means that RPA is valid for high densities rs = ro/ao << 1, where 2r0 is the average distance between electrons and a, is the Bohr radius. In the real metals, rs is not small compared to unity and is in the range 2 < r, < 6 . Therefore, one cannot expect the RPA jellium theory for dense, weakly coupled plasma to predict quantitatively the behaviour of real metals although various properties are astonishingly well reproduced. An improved theory should take into account short-range exchange and correlation effects between electrons. Experimentally, the deviation of the plasmon may be viewed, therefore, as a measure of dispersion from the RPA value aRPA exchange and correlation. To take these effects into account, RPA was extended by the introduction of the local field correction function G(q,o)[see

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

161

Eq. (31)J The dispersion coefficient a is then given by Eq. (32). Finally we note that not only the plasmon dispersion, but also the plasmon width, is a measure of deviations from the RPA jellium model as outlined in Section II,B,2. In order to investigate exchange and correlation effects in N F E metals, it is necessary to study the highest rs values, that are reached in nature only by the heavy alkaline metals. Therefore, excitations in Na, K, Rb, and Cs have been recently studied by vom Felde et al. (1987a, 1987b). In Fig. 11 the dispersion of the plasmon is shown for Na, K, Rb, and Cs. The comparison of plasmon dispersion with theoretical calculations is not straight-forward, In studies on A1 and In (Kloos, 1973; Petri and Otto, 1975; Krane, 1978; Moller and Otto, 1981a, 1981b), it was postulated that there are two different dispersion constants, one for low q and one for high q. There is a considerable debate in the literature whether the high-q value or the low-q value should be compared with the RPA jellium theory. Moller and Otto (1981a, 1981b)argue that the low-q value is disturbed by low-lying interband transitions and that the high q-value should be used. On the other hand, Sturm (1978, 1981, 1982) has noted that the quadratic dispersion is the first term of a small-q expansion, whereas a q4 term is to be considered for q 2 > 0.lq;. Moreover, it was argued that band structure effects are important only at higher q values. At present, this point is still controversial. For comparison with experimental data, we also show theoretical calculations of the maximum of the loss function in Fig. 11 as a function of q. The chain lines are obtained using the Lindhard-Mermin function given by Eq. (30). Adding a static local field correction according to Vashishta and Singwi given by Eq. (33) to the Lindhard function in Eq. (30) leads to the dashed lines in Fig. 11. Finally, further addition of q-independent core polarization results in the thick solid lines. Values of the core polarization are taken from Tessman et al. (1953), though they are probably too high as they were derived from ionic alkali atoms where, some valence electrons are unoccupied and, therefore, the oscillator strength of core excitations is higher than in the metallic case. Core polarization values of alkali metals are not available. The thin solid lines in Fig. 11 are least-squares fits to the data below qEaccording to the function. E

=

E,

+ aq2 + bq?

(54)

The experimental dispersion coefficients a (divided by tlRpA)shown in Fig. 12 were derived from the coefficient a in Eq. (54). The calculated dispersion coefficients,including static local field corrections in the above framework, are shown in Fig. 12 by a solid line. A calculation using a dynamic local field correction G(q,o) according to Dabrowski (1986) with G(q)taken from Vashishta and Singwi is shown by a dashed line. Finally, the dispersion coefficients calculated numerically for the case when core polarizations are

162

JORG FINK

FIG. 1 I . Volume plasmon energies of Na, K, Rb, and Cs as a function of the squared momentum transfer. Dot-and-dash line: calculation according to the Lindhard-Mermin dielectric function; dashed line: local field corrections according to Vashishta and Singwi (1972) included; thick solid line: core polarization included; thin solid lines: least-squares fit to the experimental data up to qc by the function E = E , aq2 bq4.

+

+

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY Na

K

163

Rb Cs

rr

FIG.12. Plasmon dispersion coefficient u relative to aRPA for alkaline metals. The solid line represents a prediction using a static local field contribution after Vashishta and Singwi (1972). The dashed line represents a prediction of a model using a dynamical local field correction after Dabrowski (1986).Closed circles: experimental values. Open circles: Calculationsincluding local field effects and core polarization.

included are indicated by open circles. It is remarkable that in this case, the u values are again strongly shifted back into the direction of the uncorrected aRpAvalues. The experimental data for Na below qc may also be fitted by a straight line for 0 < q < 0.3 k' defining a low-q a and by a second straight line for 0.3 k'< q < qc defining a high-q CI.For K, the situation is very similar. However, all the three theoretical curves for q c qc show at higher q a higher slope due to a q4 term and due to the inclusion of the local field effects. Moreover, within error bars, our data for q < qc are well fitted by Eq.'(54). Therefore, it is believed that an evaluation in terms of a low-q and a high-q u is not justified. All the experimental values of a, as derived from the least squares fits using the function given by Eq. (54), are well below the RPA values (see Fig. 12). For Na, ueXpis close to the calculated values if local field effects and core polarization are included. For K and Rb (lower electron densities, higher r,), ueXp is well below the corrected RPA values, particularly when core polarization is included. For Rb, ueXpis almost zero. For Cs, the metal with the highest r, value, a negative dispersion was observed for the first time; this cannot be explained by the present day theories. These theories also predict a change in sign in the plasmon dispersion, though the r, values at which this zero-crossing should occur (r, 8) are well above those for the lowest electron densities

-

164

JORG FINK

realized in real NFE metals (r, = 5.8 for Cs). From this comparison, it was concluded that the present day many-body theories underestimate exchange and correlations at r, values larger than about 5. Finally, note that there exist theoretical calculations of the plasmon dispersion for higher momentum transfer, including static local field contributions (Singwi et al. 1968). These theories predict a plasmon dispersion for higher r, values (r, 12) similar to that observed for Cs (r, = 5.8). This again indicates that present day theories underestimate exchange and correlations at these densities. The negative plasmon dispersion in Cs may indicate incipient Wigner crystallization of the electron gas. In view of other possible causes for the anomalous plasmon dispersion of Rb and Cs, we are led to conclude that band-structure effects change the plasmon dispersion in N F E metals only slightly (Sturm, 1982). In addition, low-lying d states in the heavier alkaline metals may have an influence on the plasmon dispersion, but the plasmon energy for zero momentum transfer is so close to the free-electron value that it is believed that interband transitions into the d states just above the Fermi level do not greatly influence the plasmon dispersion. The unusual plasmon dispersion in Rb and Cs is thus most probably caused by exchange and correlation effects. The reduction of the dispersion of the maximum of the loss function above qc in the particle-hole continuum for Na and K is well observed in Fig. 11. Similar results have been obtained for A1 in various experiments (Zacharias, 1975; Hohberger et al., 1975; Gibbons et al., 1976; Batson and Silcox, 1983). However, due to the problems of the 22-eV plasmon of the A1,0, coating, there is considerable scattering of the data. Though in the pure RPA, the plasmon should disappear when it merges into the particle-hole continuum, more refined theories that include short-range exchange and correlations (Vashishta and Singwi, 1972; Devreese et al., 1979) predict a continuing maximum of S(w, q) in the particle-hole continuum (see also Fig. 6). According to these theories, the dispersion of this maximum should be much lower than the plasmon dispersion extrapolated from q < q,. This is in agreement with recent experimental results on Na and K. A quantitative comparison with theories has not yet been performed. The plasmon line widths for q = 0, which are predicted to be zero in the R P A jellium model, were measured to be 0.28,0.24,0.39, and 0.77 eV for Na, K, Rb, and Cs, respectively. On the basis of a damping due to interband transitions, Paasch (1970) calculated the corresponding values to be 0.1,0.15, 0.64, and 0.96 eV, which are close to the experimental values. This indicates that the dominant damping mechanism for q = 0 is indeed the decay into interband transitions. However, at larger momentum transfer, according to the calculations of Sturm and Oliveira (1981),a negative quadratic dispersion of the plasmon width should be observed. The same authors have calculated

-

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

165

the influence of particle-hole excitations that give a small positive contribution to the dispersion of the plasmon width. Adding the two contributions, a small positive dispersion results for Na and K that is much lower than the experimental findings. For Rb and Cs, the two contributions should almost cancel each other out, while experimentally for Rb a positive dispersion proportional to q'.' and for Cs a linear dispersion of the plasmon width has been detected. To our knowledge, only one mechanism, (decay of the plasmon into another plasmon by simultaneous absorption or emission of a phonon) has been proposed to give a linear dispersion of the plasmon width (Hasegawa, 1971).According to these calculations, this mechanism leads to a much smaller dispersion than that observed in Cs. Other mechanism that may have an influence on the plasmon width of Rb and Cs are interband transitions into unoccupied d-states close to the Fermi level. Like for the plasmon-energy dispersion in these metals, exchange and correlations may also have some unusual influence on plasmon width. At present there is no theory to explain the anomalous dispersion of the plasmon width in Rb and Cs. The observed strong increase in the dispersion of the plasmon width in Na and K above qE has also been observed in other NFE metals and indicates the opening of a new decay channel in the particle-hole continuum. Recently, intraband transitions and interband transitions in singlecrystalline Na here also been studied (vom Felde et al., 1987b) In Fig. 13, intraband transitions in Na for q 11 [ 1001are shown. These data are close to the curves shown in Fig. 6b calculated by the Lindhard- Mermin dielectric function. For small q, the data shown in Fig. 13 are affected adversely by background due to the unscattered direct beam since the cross-section for

& Na

2 \i/ 0.2

0.35 .45

5o3Tl,iA-11 I1001

0.15

,

0

1

2

I

,

,

3 4 5 ENERGY ( e V 1

FIG.13. Momentum dependence of the intraband transitions of single-crystalline Na for 9 II 11001.

166

JORG FINK

these intraband excitations is extremely small for low q. The solid line in Fig. 13 is the limiting line for intraband transitions calculated on the basis of a free-electron model (see Fig. 5c). It is in rather good agreement with experiment. A reduction of the Fermi energy from 3.2 eV to 2.5 eV due to many-body interactions as postulated by Jensen and Plummer (1985) is not compatible with present experimental data. For q 11 [l lo], besides intraband transitions, intraband and interband transitions into states close to zone boundary are also possible, which lead to zone-boundary collective states in the loss function (see Section II,B,2). These excitations could be observed on top of the free-electron intraband excitations for q 11 [llO] (see Fig. 14). According to calculations of Sturm (1987), they should appear for q > 0.21 A-' with increasing oscillator strength. In the experiment, they were For q > 0.7 A-', it is difficult to detect these observed for q > 0.25 kl. excitations, since there is considerable broadening. The maxima of these excitations are shown in the insert of Fig. 14 and are close to the upper border of the gap between intraband and interband transitions in agreement with the calculations of Foo and Hopfield (1968). This indicates that these zoneboundary collective states in Na for small q are dominantly caused by interband transitions. Similar zone-boundary collective states have been observed recently in Li and Be with much higher intensity as in these metals, the pseudo-potential is much stronger than in Na metal. The latter experi-

0.5

qrA

0.45

I

I

1

2

I

I

I

3 4 5 ENERGY ( e V )

I

6

FIG. 14. Momentum dependence of single-particle excitations in single-crystalline Na for q 11 [I lo] (solid line) and 911[lo01 (dashed line).The difference between the two directions is due to zone-boundary collective excitations. In the insert, the dispersion of the maximum of these excitations is shown.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

167

ments were performed by inelastic X-ray scattering using synchrotron radiation (Schulke et al., 1984, 1986, 1987). In summary, the recent experimental studies on heavy alkaline metals that form a more strongly-coupled plasma than, e.g., A1 or the light alkaline metals, have raised a number of new questions. First of all, is the extension of the RPA dielectric function by a dynamic local field correction sufficient or are more refined theories necessary? What is the influence of the low-lying unoccupied d states close to the Fermi energy on the dielectric functions of the heavy alkaline metals? Is it possible to prepare dilute alkaline metals to study even more strongly coupled plasmas? There are many new experimental and theoretical problems.

VI. RAREGASBUBBLESIN METALS A . Pressure and Density in Bubbles

Rare gases (RG) are insoluble in metals. For example, the heat of interstitial solution for He in A1 is negative and as large as -2.7 eV per atom, thus making the equilibrium He concentration negligibly small at room temperature. The microscopic origin for this fact is the net pseudo-potential from the RG atoms that the conduction electrons of the metal experience is repulsive as it entails orthogonality of the wave functions to the RG closed shell system. When RG atoms are somehow introduced, a formation of clusters is observed, and at still higher concentrations RG atoms precipitate into bubbles. The growth mechanism of the bubbles, which is strongly correlated with the pressure inside the bubbles, is still a matter of debate. EELS has provided a method of measuring the density and pressure in RG bubbles in metals and during the last five years various studies have been performed in this field. While RG bubbles are interesting per se, they are also of enormous technical importance. In fission reactors, such fission products as Kr and Xe are produced in the fuel elements, and He by (n,a) reactions. Still higher doses of He will be produced by ( n , a ) reactions in the first wall of future fusion reactors. The RG bubbles lead to strong embrittlement of materials due to bubbles in grain boundaries. At the surface, RG implantation produces blistering and exfoliation. In addition, RG bubbles are important for ion beam milling of layer structures as well as in microelectronics in which metallic films are produced by sputter deposition. Ion-beam mixing is used to glue protective layers on metal surfaces to improve tribological properties of metals. EELS has led to significant results in many of these fields. In this section we describe some typical investigations related to the fundamental properties of RG bubbles in metals.

168

JORG FINK

1. He in Metals

In an early theoretical work, Ohtaka and Lucas (1978) had already suggested studying the thermodynamic state of He in bubbles by measuretransition. The earliest experimental studies were ments of the He 1 'S0-21P1 realized by measuring this transition by means of ultraviolet absorption using synchrotron radiation (Rife et al., 1981). In this work the first EELS measurements were reported. In this section we review similar systematic studies of RG bubbles in metals. Figure 15 shows a representative loss spectrum of a 1000-A thick single-crystalline A1 film in which 13 at.% He was homogeneously implanted. Besides the single- and double-volume plasmon at 15 and 30 eV, respectively, and a surface plasmon of the Al,O,-coated, flat surface of the A1 metal film at 6.7 eV, two additional peaks are observed that are related to the He bubbles. At 24 eV, there is a strongly broadened He 1S2P transition that shows a tremendous pressure shift compared to the freeatom value 21.23 eV. The peak at 10.6eV can be assigned to a surface plasmon on the He bubble interface. These surface plasmons will be discussed in more detail in Section VI,B. The origin of the strong blue-shift of the He 1s-2P transition is the strong interaction of the excited-state 2P wave function, which has a mean diameter roughly four times greater than the 1s ground state, with the wave functions of the neighbouring He atoms in the ground state. The Pauli exclusion principle excluding the occupation of the 1s' closed shell by a third electron leads to a strong blue shift with increasing He density. Other mechanisms that may lead to a shift of the He line have been discussed (Ohtaka and Lucas, 1978; Lucas et al., 1983). The effect of a resonant transfer of the optical excitation to neighboring atoms via dipole-dipole interaction can be calculated by the

-

He 1'5, -2'P,

Jzp iq& Ln

VP

2 W

2xVP

5 vl

+ II) t W z

z

0

5

10

15 20 25 ENERGY l e v 1

30

35

40

FIG.15. Electron energy-loss spectrum of He-implanted A1 film (cHe= 13 at.%).SP: surface plasmon; BSP: bubble surface plasmon; VP: volume plasmon. The He I1S,-2'P, transition is strongly shifted compared to the free-atom value of 21.23 eV.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

169

Clausius -Mosotti formula for the dielectric function and leads to a small red-shift. The calculation of the effective dielectric functions within the framework of the Maxwell-Garnett theory leads to a small blue shift. The latter two effects are important at low densities such as that of liquid He, whose number density n = 0.022 k3. At higher densities ( n 0.1 k 3 these ) , effects are negligible compared to the effect of Pauli repulsion. In order to obtain the pressure and density inside He bubbles, it is necessary to correlate the energy shift with He density. While there is agreement that the energy shift is proportional to density in a given energy range, there is still considerable disagreement as to the proportionality constant. One approach assumes pairwise additivity of the interactions, which are taken from ab initio quantum mechanical calculations of the 2p excimer potential energy curves (Lucas et al., 1983); many-body perturbations are neglected. The radial pair distribution function required for these calculations is computed from theoretical models of dense fluids. The blue shift of the resonance line is found to depend nearly linearly on the density: A E = Cn, with C = 31 A3eV. Another approach uses a multiconfigurational selfconsistent field method for calculations of the ground and excited states of a He atom surrounded by a nearest-neighbor shell of 12 He atoms (Taylor, 1985). The He,, cluster geometry was taken to be fcc. These calculations predict a linear relationship between energy shift and density for number densities 0.01 I n I0.05 k 3with a proportionality factor C = 30 A3eV. However, above n = 0.06 k 3a red-shift is predicted. The third approach (Manzke et al., 1982; Jager et al., 1983) uses ab initio self-consistent electron band-structure calculations for solid (fcc) He for densities n in the range 0.10-0.25 A-3. While the use of density functional theory for ground-state properties is well established, the calculation for excited energies poses serious problems. Therefore, the energy shift was obtained by a calculation of the difference of two total energies, the ground state and the excited state described by He atoms with 1 electron in the 2p shell located on Au sites in a Cu,Au structure where each excited He is surrounded by 12 non-excited ones on the Cu sites. The calculations employ the augmented spherical-wave method. The procedure, called ASCF (self-consistent field), was applied with great success to determine the excitation energies in LiF (Zunger and Feeman, 1977). In the density range mentioned above, the calculations yield good proportionality between energy shift and density, with a proportionality factor C = 22 A3eV. The three approaches result in relationships between line shift and density that differ considerably and, therefore, due to the non-linear relationship between density and pressure, lead to highly different pressure values. The first approach of Lucas et ul. (1983) starts from the low-density limit of an excited He, molecule. At higher densities, pairwise addition of

-

170

JORG FINK

the effects of all surrounding He atoms is assumed. This assumption, which does not take into account many-body perturbations, may be the reason for the high C value. The second approach predicts a maximum line shift of 1.3 eV, in clear contrast to experimentally observed line shifts of up to 4 eV. Therefore, the result for lower densities also seems to be questionable. These problems are probably due to the fact that the cluster chosen was too small. The third approach describes the high-density, many-electron system quite well, but assumes the excited atoms do not greatly influence the next nearest neighbor sites. However, estimates of this effect indicate that even smaller values for C would be expected if the concentration of excited He atoms is reduced. Therefore, according to the third approach, the value C = 22 A3eV is a maximum value. Since this value of C is based on reasonable assumptions and is supported by various experimental findings as outlined below, it serves as the basis for the analysis of EELS data. It should, however, not be forgotten that more sophisticated calculations are necessary to improve the accuracy of the proportionality factor in order to obtain reliable values for the pressure in He bubbles. Once the density is calculated by the E(n) relationship, the pressure inside the bubbles can be derived by an equation of state (EOS). A discussion of EOS of He was given recently in a review on density and pressure of He bubbles by Donnelly (1985).In this field, the data are less controversial with respect to a pressure evaluation from densities. The analysis of EELS data was based on an EOS for He at extremely high densities, including solid fcc He worked out by Trinkaus (1983). Using these tools based on theoretical calculations, it was possible to derive the density and pressure in bubbles as a function of implanted He concentration or as a function of annealing temperature. These results gave important information on the growth mechanism of He bubbles. The pressure shift decreases with increasing He concentration as shown in Fig. 16 for A1 implanted with 3, 13, and 26 at.% He (see also the values of nAE and pAEin Table 111). A transmission electron microscopy (TEM) study of the mean bubble radius R as a function of implanted He concentration showed that the mean radius increased with increasing cHe.Figure 17 shows He pressure inside the bubble plotted against the inverse mean radius. Despite the fact that the pressure (due to uncertainties in C ) and the radius may be connected with errors up to 50% and 30%, respectively, the data show a linear relationship between the pressure and the inverse radius with a proportionality factor of about 900 KbarA. For temperatures above one-half the melting temperature ( T , ) ,the bubbles are expected to grow by absorption of thermal vacancies and the pressure relaxes to the thermal equilibrium pressure p = 2y/R, where y is the surface free energy (y = 1 N/m for Al). These pressures are indicated by a broken line in Fig. 17. The experimental findings of a much higher pressure clearly demonstrate that the bubbles are overpressurized and not in thermal equilibrium. Even a 50% error in the pressure due to uncertainties in C would

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

171

ia t 'A 1i°C '

a L

l

Al

26

-

Al

13

-

Al

13

43C

Al

13

49C

I

ENERGY i e V )

FIG.16. He 1'S0-2'P, linesafter subtractionof the backgrounddue to thematrix for Al,Ni, and amorphous NiP samples implanted with various He concentrations. The labels h and I indicate high- and low-dose implantation. TAindicates the annealing temperature over a time period of 2 h.

TABLE 111 DENSITY AND PRESSURES IN RG BUBBLES WITH MEANRADIUS R FROM ENERGY SHIFT( A E ) AND BRAGGREFLECTION SHIFT (Ak).pE A N D pL ARE THE CALCULATED EQUILIBRIUM PREssURE AND THE THRESHOLD h E s s U R E FOR LOOP PUNCHING, RFSPECTIVELY RG Matrix He He He He He He He Ne Ar

Kr Xe

Al A1

Al Al Ni Ni

NIP Al A1 Al AI

CRG TA (at%) ("C)

3.3 13 26 7.7 low high 2

-

475 -

2

-

3 3

-

(A) 6.5 10 20 25 -

15

13 15 17.5 13

AE (eV) 3.32 2.74 1.94 1.0 -4.7 3.7 2.75 1.4 1.0

-

-

~ A E

(A-3) 0.151 0.125 0.088 0.0455

0.214 0.168 0.125 0.073 0.04 -

PIE (Kbar)

nAk

Pd*

-

-

32 20

-

10

(k')(Kbar)

PE PL (Kbar) (Kbar)

150 85 31 5

-

-

7.5

500

-

-

-

250 80 40 60

-

-

-

-

31 15

-

-

15

0.035 0.026 0.023

30 20 30

13 11 15

-

-

123 80 40 32 280 -

62 53 46 61

172

JORG FINK

RIA)

0.1

0.05

i/R

0.15

[R-7

FIG. 17. Pressure in He bubbles in an A1 matrix as a function of the inverse mean bubble radius 1/R.Closed circles: Not annealed samples. Open circle: Annealed sample.

show these discrepancies. Some 20 years ago, Greenwood et al. (1959) predicted that for T < Tm/2, growth of bubbles, and thus pressure, is controlled by the emission of dislocation loops. For the bubble radii investigated in this study, the threshold pressure for loop punching is given [according to Trinkaus (1983)l by p = 2y/R pb/R, where p is the shear modulus (260 kbar for Al) and bis the length of the Burgers vector of the loops (for Al, b = 2.3 A I( [1 1 11). This gives a proportionality factor between pressure and inverse mean radius of 800 KbarA (chain line in Fig. 17), which is very close to the experimental value of 900 K-barA.Therefore, the results on the pressure derived by EELS with the combination of TEM studies clearly support the model of an athermal growth mechanism via loop punching at low temperatures. Annealing the samples increases the diameter of the bubbles and the pressure shift decreases. This is shown for a 13 at.% AlHe sample in Fig. 16. As mentioned above, for T > Tm/2 the pressure is released to the thermal equilibrium pressure p = 2y/R by the absorption of thermal vacancies. For a sample containing 7 at.% He and annealed at 475"C, a pressure shift A E = 1 eV and bubble radius of 25 A were observed. From the energy shift, the pressure is 5 Kbar, which is not far from the equilibrium pressure of 7.5 Kbar (see open circle in Fig. 17). This again supports the present understanding of bubble growth at high temperature. It also gives some confidence to the

+

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

173

theoretically derived proportionality factor between energy shift and density. When annealing the sample at even higher temperatures, most of the bubbles move to the surface of the A1 film and He emerges from the sample. There probably remains some adsorbed He on the surface or some bubbles with almost zero pressure. This small amount of He gives a spectrum very close to that of atomic He with 1'S,-2"P1 transitions with n = 1,2, and 3. This is shown in Fig. 16 for a 13 at.% AlHe sample annealed at TA= 490°C. Even higher pressures for He bubbles were observed in Ni, which has a higher shear modulus. As shown in Fig. 16, a high-dose sample showed a line shift AE = 3.7 eV, while for a low-dose sample AE 4.7 eV could be extracted from the spectrum. For the low-dose sample, the data evaluation is somewhat uncertain because of features due to Ni interband transitions in the vicinity of the He line. Extremely high densities and pressures (250 Kbar and 500 Kbar; see Table 111) could be derived which are again close to the non-equilibrium pressure pL,which for small bubble radii is close to the threshold pressure for Ni self-interstitial atom emission. We also note that the He density in these bubbles is about ten times as large as that of liquid He at normal conditions and exceeds by far the density of the fluid-solid phase transition at room )is. interesting to note that recently evidence for a temperature(n = 0.14 k 3 It solid-fluid transition near 250 K for 3He bubbles in palladium titrite has also obtained (Abell and Attalla, 1987).Recently, He implanted in amorphous NiP samples has been investigated by Fink and Jager (1987b).In these amorphous materials, large pressure shifts were also observed (see Fig. 16) and a pressure of 80 Kbar is derived for He bubbles having a mean radius of about 15 A. As shown in Table 111, these values indicate overpressurized bubbles also for this amorphous material. Thus, a new mechanism for bubble growth in amorphous systems must be developed based on other than the loop punching mechanism adequate only for crystalline materials.

-

-

2. Heavier Rare Gas Bubbles Because of controversial results on the relationship between line shift and density for the He 1s-2P transition, the existence of high pressures is still called into question (Donnelly, 1985). Therefore, we extended our studies to the heavier RG bubbles in metals (vom Felde et al., 1984).The great advantage of these systems is that the proportionality factor can be directly extracted from experiments, since EELS and optical measurements on solid RG are available. Then the line shift between atomic and solid RG can be easily extrapolated to the densities in the bubbles, which are only slightly larger than the densities of solid RG at normal pressure. The disadvantage with respect to pressure determinations by line shift measurements is the fact that these line shifts become smaller with increasing Z due to the smaller mean radius of the wave functions of the excited state. Moreover, for heavier RG such as Kr

174

JORG FINK

and Xe, the bubble surface plasmons and the 1s-2P transitions appear at the same energy and lead to a coupling between the two excitations. Therefore, the line shift cannot be extracted from the spectra. This is illustrated in Fig. 18 where typical loss-spectra of homogeneously RG-implanted A1 films are shown. For Ne and Ar, the 1s-2P transitions can be clearly realized at 18.3 and 12.8 eV, respectively. For Kr and Xe, the spectra are more complicated and will be discussed in the next section. For Ne and Ar, the pressure shift and the density (nAE)and pressure (paE) as derived by the method discussed above are given in Table 111. The pressures are again considerably higher than the equilibrium pressures and close to the threshold pressure for loop punching. Moreover, by comparing these results with those evaluated for He bubbles, we clearly see that for similar bubble radii almost the same pressure values were detected for the AlHe, AlNe, and AlAr systems. These findings support the evaluation method used for the He bubbles and the observed high pressures for the very small He bubbles. Pressure evaluation from the energy shift of excitations is not restricted to the valence electrons. For example, the L2,3 edge of Ar in A1 was recently measured by vom Felde (1986). This spectrum is compared in Fig. 19 with those of atomic and solid bulk Ar under normal pressure measured by synchrotron radiation (Haensel et al., 1971). With increasing density, a considerable broadening of all lines is observed. In addition, the lowest 2p-4s

I

SP

BSP

10

VP

ENERGY lev)

20

-

FIG. 18. Loss spectra (q = 0) of Al films implanted with 3 at.%Ne, Ar, Kr, and Xe. SP: Al surface plasmon; BSP: bubble surface plasmon; VP: volume plasmon of Al; arrows indicate valence electron transitions (Ne, Ar) or coupled oscillations of valence excitations excitations and bubble surface plasmons (Kr, Xe).

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

175

ENERGY (eV1

FIG.19. Ar 2p edges. Gas and solid Ar were measured by X-ray absorption spectroscopy using synchrotron radiation (Haensel et al., 1971). Upper curve: Ar bubbles in A1 measured by EELS (vom Felde et al., 1986).

transition, which is probably excitonic in nature, shows a blue-shift with increasing density. Correlating the energy shift between atomic Ar and solid Ar under normal pressure with the density of solid Ar and assuming a linear relationship between energy shift and density, it is possible to derive the density of Ar inside the bubbles and the pressure can be calculated by an EOS. This yields a pressure of -40 Kbar which is close to the value derived from the 1S-2P transition. Recently, similar measurements on core-level shifts of the Ne K shell by soft X-ray emission techniques were performed by Lefebvre and Deconninck (1986) presenting a further tool for the study of pressure in RG bubbles in metals. Another aim of the EELS core-level measurements was to get information on the density via evaluation of the EXAFS oscillations. While these oscillations can be clearly recognized in solid bulk Ar, they are almost completely washed out in the AlAr spectrum. This is probably caused by strong contributions of those electrons backscattered from the A1 host. The mean free path of the electrons (-20 A) is comparable to the diameter of the bubbles. According to the above EELS studies of RG bubbles in metals, the measured pressures indicate that Ne should be liquid and Ar solid at room temperature. Calculating the pressure with the values of the mean radii derived by TEM, the Kr and Xe bubbles should also be solid. Therefore, during this study it was realized for the first time that solid RG bubbles at room temperature should be detectable by electron diffraction (vom Felde et al., 1984). Independently, Templier et al. (1984) obtained similar results by

176

JORG FINK

TEM investigations. Typical elastic electron diffraction patterns on RGimplanted A1 films are shown in Fig. 20. These spectra were taken with the EELS spectrometer, setting the energy loss to zero. Besides the (1 11)and (200) reflections of Al, the (111)(and for Kr, the (200))reflections are observed for Ar, Kr, and Xe, while for Ne only a diffuse intensity appears at about 2.4 k', as is typical of a liquid. The diffraction peaks of the solid RG systems are slightly broadened due to the finite bubble volume. In addition, a correlation of the intensity of the (111) A1 reflection with that of the (1 1 1) RG reflection upon rotating the scattering plane around the electron beam is observed. This finding is explained by an epitaxial growth of the RG bubbles in the matrix. From the diffraction peaks, the lattice constants, and thus the density, and, using an EOS, the pressures as listed in Table 111 as well, could be derived. These pressures are again above the equilibrium pressures but fall below the threshold pressures for loop punching by a factor of two. However, in view of the large errors expected for these values, the results of the two methods (energy shift and diffraction shift) are still compatible. Finally, we mention that electron diffraction on solid RG bubbles has now become a domain of electron microscopy, since sample preparation of singlecrystalline films is much easier thanks to the small diameter of the electron beam in TEM. However, for He, the latter method cannot be applied because

I

I ,

RG A1 RG (1111 (111)(200) (220)

,

,

, , 3 4 MOMENTUM TRANSFER(k'1

1

,

,

,

2

-

FIG.20. Electron diffraction pattern of A1 films implanted with 3 at.% Ne, Ar, Kr, and Xe. Besides the A1 (11 1 ) and (200) reflections, the Ar, Kr, and Xe (1 11) and for Kr the (220)reflections can be recognized.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

177

of the small scattering amplitude of the He atoms. Therefore, there is as yet no observation of crystalline He in bubbles by diffraction methods. Thus, EELS (and for some metals, VUV spectroscopy) is a powerful method of obtaining information on the thermodynamic properties of He in bubbles. In summary, EELS investigations on RG bubbles yield a consistent picture on overpressurized bubbles for T < TM/2and equilibrium pressures for T > TM/2.In this review, we have not mentioned similar studies as a function of annealing temperature (Manzke et al., 1983a),which give information on bubble growth via loop punching, vacancy absorption, and coalescence at higher temperatures. Open problems are at present the exact determination of the proportionality factor between energy shift and density for He, the energy shift due to the interaction with the bubble wall, and the excitation spectra of RG dissolved in metals (if there exist any at higher implantation concentrations).

3. Surface Plasmons on Bubbles There are numerous studies of collective electronic excitations in the case of planar surfaces like those of films and layers (Raether, 1977,1980), whereas surface plasmons on inner surfaces in metal cavities, i.e., in voids or RG-filled bubbles, have rarely been investigated. The pioneering experimental work was done by Henoc and Henry (1970), who observed the images of He bubbles in an A l matrix in an energy-selective electron microscope. There are also theoretical studies on bubble-surface plasmons (BSP) (Natta, 1969; Ashley and Ferrell, 1976; Lucas, 1973).A short introduction to this theoretical work was given in Section 11,B. In the following we review recent experimental studies on surface plasmons in RG-filled voids (Manzke et al., 1983b; vom Felde and Fink, 1985). In Fig. 15, the BSP at 10.7 eV for a 13 at.% AlHe sample can be recognized between the surface plasmon of the planar Al/AI,O, interface at 7.0 eV and the volume plasmon of bulk Al at 15 eV. The energy of the bubble surface plasmon and the width of the volume plasmon as a function of the concentration of implanted He is shown in Fig. 21. According to Eq. (44), the monopole, dipole, quadrupole, and the 1 = 00 mode should appear at o, = 15 eV, J2/3 up= 12.25 eV, J3/5 o, = 11.62 eV and J1/2 o, = 10.61 eV for E = 1 (unfilled void). The monopole mode, which is actually a breathing mode (see Fig. 7), superimposes on the volume plasmon mode. The increasing width of the line at 15 eV (see Fig. 21) with increasing cHemay be caused by an increasing coupling of the monopole modes of the individual bubbles with increasing filling factor. This phenomenon, which leads in a regular array of bubbles to a band structure of breathing plasmons, was described by Lucas (1973).For 1 > 0, according to the results of Ashley and Ferrell(1976), only the

178

JORC FINK 1.0

-I

0.9

- 0.8 >

+BSP I=O

E 0.7 N

~'0.6 - 0.54

* 2-0

'

C H (at.%l ~

3'0

FIG.21. Bubble surface plasmons measured in He-implanted Al films. Upper part: Line width BE,,, of the plasmon at 15 eV, which is composed of the volume plasmon and the I = 0 BSP mode. Lower part: Energy of the I = 1 BSP dipole mode. Calculation for a single bubble: dashed line. Effective medium theory using a constant He polarizability: Chain line. Effective medium theory using density-dependent He polarizability : Solid line.

dipole mode should have considerable intensity for small momentum transfer [see Eq. (45)]. As the energies shown in Fig. 21 were measured at zero momentum transfer, the results should be compared with theoretical models for the dipole mode. For these calculations, the dielectric constant of He is evaluated in terms of the Clausius-Mosotti formula for atoms in cubic coordination E(O) =

1

[

: I-'

+ 4zncr(o) 1 - -nncc(o)

,

where a(w)is the He polarizability and n the number density. a ( o ) is simulated by an oscillator wo at the 1s-2P transition using Eq. (22). Neglecting yo, the oscillator strength should be no = (rn/e2)a(0)og,where a(0) = 0.205 A3 is the quasistatic polarizability of He as measured in the visible for the high-density gas. Using the He densities as derived in the last section, which decrease with increasing cHe,we obtain an increasing broken line in Fig. 21, which is in contradiction to the decreasing experimental results due to the weaker coupling of the BSP to the dielectric medium. Thus Natta's equation for a single bubble is not adequate for the description of the experimental results. An improved theoretical description has to take into account long-range dipole interactions between the bubbles. This is realized in the theory of for small Garnett (1904) for a long-range effective dielectric function E,'~(o)

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

179

particles in cubic coordination given by Eq. (46). The filling factor f is the = A1 atomic volume fraction of the spherical particles AV/V= ~~~/(nG~,)(i2~, volume). cM= cAl is well described by the Drude model. For cB = cHe, the above Clausius- Mosotti formula was taken. From the effective dielectric function cerf(w),the energy-loss function Im( - l/cerr(w))was calculated. It has poles given by Eq. 47 that describe the excitation energies of the A1 volume plasmon, the He 1s-2P transition, and the BSP dipole mode. Using the density-independent atomic polarizability of He, a(0) = 0.205 A3, the calculated BSP energy is now decreasing with increasing cHe(see chain line in Fig. 21). But still the slope deviates from that of the experimental points. The following effects may explain the remaining discrepancy: The excitation energy w, may depend on the bubble size; this was discussed by Aers et al. (1979) who found that for small bubble radii, the BSP dipole mode is in fact shifted to higher energies. Recently, density-functional calculations on surface plasmon energies in voids by Wu and Beck (1987)have also predicted a strong blue shift for small radii due to the fact that electrons diffuse into the void and the density within the void increases as the void radius decreases. Surface plasmon energies up to about 14 eV, i.e., far above the experimental values, were calculated for radii close to the experimental situation. However, the comparison should be regarded as very ambitious, since there are many important differences between the theoretical model and the experimental system. First of all, in the experiment the voids were filled with high-density He which, due to the high pseudopotential, probably strongly pushes the electrons back into the metal. Besides these microscopic effects, there is some evidence from optical measurements on superdense He (Vidal and Lallemand, 1976; Loubeyre et al., 1982) that the atomic polarizability is decreasing with increasing density. Using a linear dependence a = 0.286 - 1.348n(n in k3), the solid line in Fig. 21 was calculated and is now close to the experimental points. However, it should be mentioned that a linear relationship between a and n is speculative and other mechanisms may be responsible for the discrepancies between the chain curve in Fig. 21 and the experimental points. As shown in Fig. 18, for Ne and Ar bubbles in Al, BSP are observed at 11.7 and 10.2 eV, respectively. The shift, going from Ne to Ar, is explained by the higher atomic polarizability of Ar compared to that of Ne. For Kr and Xe, the spectra shown in Fig. 18 are much more complicated in the energy range where BSP should appear. The peaks cannot be explained by the interband transitions of bulk RG that were measured by Keil (1966). Obviously, for Kr and Xe, the 1s-2P transitions appear at the same energy as the BSP, and therefore there is a coupling of the two interactions. As in the case of He bubbles, the effective dielearic constant cerf and the loss function were calculated. The filling factors (f 3 - 8%) were derived Im( - 1/ceff(~)) from the RG densities discussed in the last section or from measurements of N

180

JORG FINK

the RG content by Rutherford backscattering experiments and from TEMdetermined bubble radii. The RG dielectric functions were reproduced by means of several Lorentzian oscillators. For AlNe, the BSP appeared at 11.4 eV, which is close to the experimental value of 11.7 eV. A similar calculation for AlAr yielded 10.0 eV, whereas 10.2 eV was measured. The calculations also reproduced the RG excitations and the A1 volume plasmon quite well. Of course, the effective medium theory could not reproduce the large pressure shift of the 1S-2P transitions, since the dielectric function for RG was adjusted to the atomic values. The observed half-width of the BSP considerably exceeded the calculated ones, which are nearly insensitive to any reasonable variation of the parameters in the Lorentz model. A random (not cubic) distribution of RG bubbles was given as an explanation for the discrepancy. According to the calculations of Persson and Liebsch (1982), considerable broadening of the BSP should be expected in this case. For AIXe, the agreement between theory and experiment is less satisfactory, as is shown in Fig. 22. The calculated peaks are close to the experimental ones, but the intensities are not well reproduced. The differences were explained by an imperfect simulation of the Xe oscillators. Figure 22 also shows the momentum dependence of the AlXe excitations in this energy range. For low momentum transfer, the measurements provide information on the longwavelength properties of the solid and, therefore, the effective medium theory is adequate. At higher momentum transfer (shorter wavelength) the spatial variation of the electronic structure is detected and for wavelengths small compared to the dimensions of the bubbles, the individual excitations of the RG in the bubble and that of the A1 host should appear. This decoupling of the excitations at higher momentum transfer is indeed observed in the experiment as shown in Fig. 22. There is a strong variation of the spectra with increasing q and at q = 0.2 k', the spectrum shows maxima at energy positions where maxima were observed for bulk solid Xe (Keil, 1966). The decoupled BSP is not observed at higher q since it is strongly damped at higher momentum transfer. No energy shift was observed for Xe excitations, as these spectra were taken on annealed samples containing only large bubbles. Moreover, as mentioned above, the pressure shift decreases with increasing atomic number. The spectra in the intermediate momentum transfer range are not understood at present. There is no effective medium theory available to account for frequency and momentum-dependent dielectric functions. Together with the experimental results, such theories could give valuable results on the microstructure of precipitations in metals. It is remarkable that all the above experiments on surface plasmons could be rather well explained by classical theories. Even at a bubble radius R = 6 A, which is close to the Thomas-Fermi shielding length of electrons (-2 A), there are at least no strong deviations from classical macroscopic theories for

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

18 1

0.2 0.15 0.1

0

I

7

I

8

I

9

/

I

10 11

ENERGY lev)

FIG.22. Loss spectra of a Xe-implanted A1 sample taken at q = 0, 0.1, 0.15, and 0.2 k’. Arrows on the uppermost curve mark Xe valence electron transitions. The background is subtracted. The dashed curves show the loss function calculatedon the basis of Maxwell-Garnett theory.

the dielectric constant. To see quantum effects, even smaller voids or empty voids should probably be studied. Another interesting field are the “inverse bubbles,” i.e., metallic spheres in dielectrica. Currently, such studies on alkaline spheres in MgO are being performed by vom Felde et al. (1987). These studies indicate deviations from classical models due to finite size effects. VII. AMORPHOUS CARBON There exist various modifications of semiconducting amorphous carbon. Evaporated amorphous carbon (a-C) has a gap of about 0.4-0.7eV. In amorphous carbon produced by the ion beam technique (i-C) the gap varies from 0.3 to 3 eV. Recently, hydrogenated amorphous carbon (a-C:H) produced by plasma deposition from hydrocarbons has attracted considerable interest. These films have an attractive combination of properties: extreme hardness (between sapphire and diamond), high electrical resistivity, optical transparency in the infrared, variable high refractive index, and resistance to chemical attack. They are, therefore, of particular interest for coatings for optical components, for example. Although amorphous carbon modifications have been investigated for many years, the microstructure is a matter of continuing debate. The problem is rather complicated since carbon has three different bonding configurations while other four-valent amorphous

182

JORG FINK

materials, such as a-Si, have only the tetrahedral bonding configuration. The microscopic origin for this difference between C and Si is that C has no 2p core levels and, therefore, the 2p level in C is lowered to the 2s level, thus allowing three different hybridizations of the four valence electrons. These are shown in Fig. 23a. The sp3 configuration, which appears, for example, in diamond, has four tetrahedrally directed cr bonds. In the sp2 configuration typical of graphite, there are three trigonally directed cr bonds. The fourth electron is a a orbital normal to the cr bonding plane. Finally, in the sp' hybridization there are two cr bonds and two a bonds. The a electrons are weakly bound and therefore lie closer to the Fermi level than the cr states. The density of electronic states for the amorphous carbon modifications is schematically shown in Fig. 23b. The size of the gap is determined by the separation of the filled a band (valence band) and empty a* band (conduction band), both originating from the sp' carbons. One of the most interesting questions related to the short-range order of amorphous carbon modifications is the relative concentration of sp3 carbons (tetrahedral bonding, without a electrons), sp2 carbons (trigonal bonding, with one K electron), and sp' carbons (linear bonding with two a electrons). Another important question is whether the A electrons are localized in small islands of sp2 carbon or delocalized into broad A bands. In this section,we review EELS studies (Fink et al., 1983a; Fink et al., 1984a; Fink et al., 1987c)on these questions.

SP

(a)

(b)

FIG.23. (a) Three different hybridizations of the four valence. electrons of a carbon atom. (b) Typical electronic density-of-states of amorphous carbon materials.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

183

The near-edge structures of the carbon 1s level (see Fig. 24) give direct evidence on the existence or non-existence of n electrons.Neglecting core-hole effects,these structures reflect the density of unoccupied states, i.e., the n* and (r* states. Thus, graphite shows a transition at 285 eV into the n* states and above 291 eV into the (r* states. In diamond, there are no n electrons and therefore the edge starts at 290 eV by transitions into (r* states. The sharp structure at 289 eV has been demonstrated to be a core exciton (Morar et al., 1985). The spectra for a-C and a-C:H show is-n* transitions at 285 eV, indicating the presence of 5~ electrons. Therefore, these spectra clearly demonstrate that the name "diamond-like" carbon for a-C: H is misleading. There must be a considerableamount of sp2carbon atoms in the a-C: H films. A quantitative analysis of the concentration of sp2carbons, however, was not possible as the cross-sections of the ls-n* and the 1s-a* transitions are dependent on the detailed band structure of these materials. On heating the a-C:H samples, the features in the spectra resemble more and more those in the graphite spectrum, indicating the transition into a graphite-like structure due to hydrogen release. The measurements of the extended structure of the edges (not shown) revealed strongly damped oscillations for both a-C:H and a-C. From this, an amorphous structure with variations of the radius of the first neighbour shell of -0.1 -0.2 A could be derived using Eq. (50). Quantitative information on the concentration of sp2 carbons could be derived from an analysis of the loss spectra in the energy range 0-40 eV. The

1

'

1

'

'

I

"

"

I

"

GR a-C H T= 1000°C

a-C H T= 650T a-C H T=20"C a-C

DIA

280

300 ENERGY lev)

FIG.24. Near-edge structures of the carbon 1s level excitations for graphite, a-C:H (annealed and as-grown),evaporated carbon (a-C), and diamond.

184

JORG FINK

loss functions and the real and imaginary part of the dielectric functions as calculated by a Kramers-Kronig analysis are shown in Fig. 25. The dominant peak in the loss function is the collectiveexcitation of all valence electrons, i.e., the n + B plasmon, which appears at 27,23,21,22,25,and 34 eV for graphite, as-grown a-C:H, for two annealed a-C:H samples (T = 600 and loOO°C), for a-C, and for diamond, respectively. As the energy of the plasmon is proportional to the square root of the density of the total number of valence electrons, if we neglect the H atoms, it is, a measure of the density since each carbon atom contributes four valence electrons. Thus, the density increases when going from a-C:H via a-C and graphite to diamond. Upon annealing a-C:H up to 6o0°C, the number of valence electrons decreases due to partial hydrogen release, and, therefore, the plasmon energy decreases by about 2 eV. At higher temperatures, the density increases due to a transition from an amorphous to a dense graphite-like structure, and, therefore, the plasmon energy increases again. This demonstrates nicely the usefulness of measurements of the n + B plasmon for density determinations, a technique proposed by Leder and Suddeth as long ago as 1960. In the loss functions shown in Fig. 25, a second feature is seen near 6 eV (except for diamond) originating from collectiveexcitation of n electrons, i.e., a n plasmon, which we found to be more pronounced in the spectra of graphite and annealed a-C:H than in those of as-grown a-C:H and a-C. This n plasmon is related to a n-z* transition showing up in c2 near 4 eV. The n-z* oscillator causes a zero-crossing of c1 near 6eV, where t 2 is small and, therefore, the loss function Im( - 1/c) = c2/(c: + E : ) shows a maximum, i.e., the n plasmon (see also Section 11,BJ). The second maximum in tt near 12 eV is related to B - B * transitions. There is a clear separation of n-n* transitions and B-B* transitions in graphite and in annealed a-C:H. The minimum at -8 eV between the two transitions is less pronounced in the other two systems. The fact that strong n-n* transitions are found also in as-grown a-C:H implies the presence of sp2 hybridized carbon atoms. The concentration of sp2 carbons was derived using the sum rule on me2 [see Eq. (21)]. Integration up to o,= 8 eV gives the number of n electrons, and integrating to infinity yields the number of all the valence electrons, i.e., the n + B electrons of carbon and the electrons of the H atoms forming the B bond to the C atoms. The ratio between those two numbers combined with the known H concentration allows the sp2 carbon concentration to be calculated. For asgrown a-C:H, this analysis showed that about one-third of the C atoms is in the trigonal sp2 configuration (graphite-like carbon) while two-thirds are in the tetragonal sp3configuration (“diamond-like”carbon). This result is almost independent on rather large variations of preparation conditions such as pressure or bias voltage on the plasma. The same result was recently obtained for a-C:H, F, where the H atoms were partially or totally replaced by F

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

185

Eti€RGV(sV)

FIG.25. Loss function and dielectric functions E , and E~ for graphite, a-C:H (as-grown and annealed), evaporated carbon (a-C), and diamond.

186

JORG FINK

atoms. Annealing the a-C :H samples caused a transformation of sp3carbons into sp2 carbons due to a release of H atoms. At an annealing temperature of 600"C, the concentration of sp2 carbon atoms reached 50%. For the amorphous evaporated carbon a-C, a sp2 carbon concentration of almost 100% was derived. Information on the localization of the n electrons could be obtained by measuring the dispersion in momentum transfer of the K plasmon. For localized n electrons, the n and the n* bands should be narrow and, therefore, the n plasmon should show no dispersion. In contrast, for delocalized 7c electrons, dispersion of the K plasmon should be observed. Analyzing the n plasmon dispersion data in terms of the quadratic momentum dependence E = E , + a(h2/m)qz,no dispersion, i.e., a = 0, was found for as-grown a-C:H and a-C. Upon annealing a-C:H, a non-zero value of a was observed above 500°C. For TA= 600°C and T A = 1000"C, the dispersion coefficients a = 0.12 and CI = 0.25, respectively, were derived. This indicates that in as-grown a-C:H and in a-C, the n electrons are localized, i.e., what is known as the n band is composed of narrow non-dispersive levels as, e.g., the band gap tails in amorphous Si or molecular levels in organic molecules. One possible model for a-C:H in which this band structure might occur would assume the existence of larger nonpercolating n-bonded clusters in a matrix of sp3 carbons. The assumption of larger sp2 molecules instead of statistically distributed sp2 carbons is necessary to obtain the low gap of about 1.8 eV. The important fact that the gap of a-C:H is dependent on the size of the sp2 clusters was first pointed out by Bredas and Street (1985) and by Robertson (1986). The reason for localization of n electrons in a-C may be that clusters with n bonds align in a plane and are separated from each other by non-overlapping n orbitals at a dihedral angle along the bond of 90". For a-C:H, the model of sp2 clusters in a sp3matrix is also supported by recent measurements of the carbon 1s near-edge spectra of a-C:H, F, as shown in Fig. 26. For fully fluorinated a-C:F, the F/C concentration was 0.6, while for the Is-n* transitions the ratio of the intensity of the line for C bonded to one F atom at 287 eV to that of the line of C bonded to no F atom at 285 eV is only -0.3. From this fact, a non-equilibrium distribution of F atoms was derived with -70% of F atoms bound to sp3 C atoms and only 30% of the F atoms bound to sp2 C atoms. This result indicates again the presence of sp2 clusters, since it is only the border of the sp2 clusters that can be decorated with F(or H) atoms. The delocalization of the n electrons in a-C:H upon annealing can be explained by percolation of the growing sp2 carbon islands (sp3 carbons are transformed to sp2 carbons by H release). This causes a closing of the gap as observed by EELS or optical spectroscopy due to broadening of the K bands. At the annealing temperature of 600°C where the dispersion coefficient a starts to deviate from zero, conductivity also strongly increases.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

I 280

’,

1

1

187

I

290 300 ENERGY lev)

FIG.26. Near-edge structuresof the carbon 1s level excitations for a-C: H, F produced from fluorinated benzenes.

Investigations of a-C: H, F by EELS have provided detailed information on the microstructure of these systems. In particular, the concentration of carbon atoms in sp2 (graphite-like) and those in sp3 (diamond-like) configuration could be determined. Moreover, measurements of the dispersion of the n-n* transitions gave strong evidence for sp2 carbon clusters in a matrix of sp3 carbon atoms. This microstructure is probably the reason why up to now these systems could not be transformed to highly conducting materials by means of doping with acceptors or donors as, e.g., the conjugated polymers. Finally, changes of the microstructure as a function of preparation condition and annealing temperature have been studied.

VIII. CONDUCTING POLYMERS Since the pioneering work of Chiang et al. (1977), conducting polymers have emerged as a new class of materials. It was realized that certain polymers can be “doped” with electron acceptors (oxidation by AsF;, J;, ClO,, ... counterions) or electron donors (reduction with Li’, Na’, NH;, .. . counterions) to conductivity levels approaching those of metals. The highest conductivity reported up to now (Naarmann, 1987)is D = 1.5 lo5 S/cm, which is not far from the room temperature conductivity D = 6 l o 5 S/cm of Cu. The conducting polymers have interesting technological application, such as replacement of conventional metals in electronic shielding and antistatic equipment, rechargable batteries, photovoltaic cells, and perhaps in the next century electronic circuits on the basis of conducting polymers and molecular crystals. From the more fundamental point of view, conducting polymers are

188

JORG FINK

of great interest as these systems are quasi-one-dimensional semiconductors and, after doping, quasi-one-dimensional metals. The polymers that can be doped up to metallic conductivities are the conjugated polymers, i.e., the carbon atoms are in the sp2 configuration (see Section VII). The most important conjugated polymers in this field are shown in Fig. 27. Besides the strong bonding by (T electrons to the next carbon atoms and to the H atom, there is a weaker bonding due to n electrons indicated by a double bond. The density of the electronic states is similar to that of the hydrogenated amorphous carbon systems illustrated in Fig. 23b. The prototype of the conducting polymers is polyacetylene (PA), the trans modification of which is shown in Fig. 27. A schematic one-dimensional band structure of trans-PA is shown in Fig. 28. The detailed curvature of the bands depends on the geometry and on interchain interactions (AndrC and Leroy, 1971; Grant and Batra, 1979; Mintmire and White, 1983; Ashkenazi et al., 1985; Springborg, 1986). Due to the strong (T bonds, the splitting into bonding 0 and antibonding (T*bands is about 7 eV. As there are 6 C B electrons and 2 H (T electrons per unit cell (2 H and 2 C atoms per unit cell), the 4 (T bands are completely filled. As to the n electrons, there are 2 n electrons per unit cell and, therefore, the x band is completely filled, too. In a single-particle band-structure calculation for trans-PA with equal C - C distances, there is no gap between the n and n* bands and, therefore, PA should be a metal. There are two effects in onedimensional systems (e.g., platinum chain salts, organic charge-transfer salts, or conducting polymers), that lead to a breakdown of single-particle band-

-

trans-POLYACETYLENE (PA) POLY PARAPHENYLENE (PPP) POLYPHENYLENEVINYLENE (PPV)

*

POLYPYRROLE (PPY)

POLY THIOPHENE (PT)

&+

POLY ANILINE (PAN11

FIG.27. Chemical structures of conjugated polymers that play an important role in the field of conducting polymers. Carbon and hydrogen atoms are not shown explicitly.

wo lH

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

189

___----,-----==

> -

1

w

z w

”,/---____-__--

--__ -- . -. ...___*--

--....

0 k n /a FIG. 28. Schematic band structure of trans-polyacetylene.

structure calculations. First, the electron-electron Coulomb interaction U can be sufficiently strong compared to the total band width W,resulting in a Mott-Hubbard insulator with finite gap. Second, the electron-phonon coupling may lead to a Peierls-Frohlich transition with a distortion of the lattice structure with wave vector 2kF. We then obtain a gap at k = 2kF as shown in Fig. 28 with partial localization of the n: electron density on every second bond as indicated by a double bond in Fig. 27. This non-equal 7c electron distribution is termed bond-order wave. It leads to a difference in length between single and double bonds of about 0.06 A. The theory for the electron-phonon model for PA has been worked out by Su, Schrieffer, and Heeger (SSH) (1980), Rice (1979), and Brazovskii (1978, 1980). The model, which is essentially a tight-binding model with an electron-phonon term, has been extremely useful in describing many of the experimental observations. On the other hand, Horsch (1981) has shown that correlation effects cannot be ignored in these systems. Recently, Baeriswyl and Maki (1985) and Baeriswyl (1987) have postulated that electron-electron correlations may be the dominant factor for dimerization in PA and that the size of the electronphonon coupling constant should be much smaller than previously assumed. At present, there is a lively debate on the size of the two effects and experimental results are extremely helpful to clarify the situation. Upon doping a t low levels, electrons are not removed from the n: band (acceptors) or added to the 7c* band (donors) as would be expected in a rigidband model. Rather, the dopants produce defects on the polymer chain that lead to defect levels in the gap. The most important defects that appear on doping with donors (n-type doping) are illustrated in Fig. 29 for PA and polyparaphenylene (PPP). In PA, the most important defect is a conjugational

190

JORG FINK

SOLITON

a=-e

S=O

rn

FIG.29. Defects induced by n-type doping in trans-polyacetyleneand in polyparaphenylene. On the right side, the correspondingdefect levels in the gap and their occupationfor the case of ntype doping is illustrated.

defect in which the double bonds in the zig-zag structure switch from the right-hand slopes to the left-hand slopes. This defect has in the positive, neutral, and negative form zero, one, or two dangling bonds, respectively. In Fig. 29 we show the negative defect with two dangling TC bonds caused by the fact that one electron has been transferred from the donor to the chain. The defect may be also viewed as a domain wall or as a phase slip of 180" in the order parameter of the bond-order wave. In the SSH model, the defect can be described by a solitary wave, and the defects are therefore often termed solitons. The formation of a soliton implies local suppression of the Peierls gap with a defect level in the gap, the so-called mid-gap state. This state is empty, half filled, or doubly filled for positive, neutral, or negative solitons, respectively. The existence of the mid-gap state was first proved by Suzuki et al. (1980) using optical spectroscopy. Charged solitons that have no spin may lead to the spinless conductivity observed at low doping concentrations. There exist many review theoretical and experimental articles concerning solitons in PA (Schrieffer, 1985; Baeriswyl, 1985; Etemad et al., 1982; Streitwolf, 1985; Roth and Bleier, 1987; Heeger et al., 1987). While PA is the prototype of the conducting polymers, on the other hand it has the rather exceptional property that the energy of the right-hand slope doublebond structure is the same as that of the left-hand slope double-bond structure. Thus, PA has a degenerate ground state. The majority of the other

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

191

conducting polymers have the common feature of a non-degenerate ground state; that is, when the positions of the double and single bonds are reversed, an inequivalent structure with different energy results. For PPP, the groundstate structure is the aromatic form shown in Fig. 27. The other structure in PPP is the “quinoid structure,” where n electron charge is transferred from the rings into the bonds between the rings. In non-degenerate ground-state polymers, single defects such as solitons cannot be stable. Upon oxidation or reduction, the quinoid structure is locally more stable. At low doping concentration, singly charged defects, i.e., a charge with lattice distortion, appears which in solid-state physics are called polarons (see Fig. 29). At higher concentrations, doubly charged bipolarons are more favourable. Polarons and bipolarons are related to two defect levels in the gap. The occupation of these levels is illustrated for n-type doping in Fig. 29. Theoretically, polarons and bipolarons were predicted by Brazovskii and Kirova (198 l), Bredas et al. (1982), and Fesser et al. (1983). Experimentally, these defects were first observed in PPP by Crecelius et al. (1983a) using EELS and in polypyrrole (PPY) by Scott et al. (1983) using ESR. Polarons and bipolarons may, in principle, also appear in PA as a combination of two solitons, a soliton and an antisoliton. However, this is energetically less favourable than the formation of a single soliton. With increasing doping concentration, there is an overlap of the bipolarons, which in reality extend over about four rings. This leads to a broadening of the bipolaron levels in the gap. As shown in Fig. 30 for n-type doping, this may lead to an overlap with the conduction band and the valence band at the highest doping concentrations. Then a partially filled conduction band, and thus the metallic state, is formed. Similarily, for PA, with increasing doping concentration the mid-gap state is broadened by the formation of a soliton glass and at sufficiently high concentrations the metallic state is reached due to a disorder-induced quenching of the Peierls distortion (Mele and Rice, 1981). Recently, a first-order transition from a soliton lattice to a

FIG.30. Evolution of the band structure of n-type doped non-degenerate ground-state polymers (e.g.,polyparaphenylene)as a function of dopant concentration.

192

IORG FINK

polaron lattice has been discussed (Kivelson and Heeger, 1985). The detailed electronic structure of the metallic state and the transition from the semiconducting to the metallic state is still under debate. In the last five years, we have studied the electronic structure of undoped and doped conjugated polymers by EELS. Previous reviews on these studies have been given by Fink (1985b), Crecelius (1986), Fink (1987), and by Fink et al. (1987d). Microscopic investigations may lead to a better understanding of the interesting macroscopic transport properties such as the high conductivities. We begin with a review of some EELS investigations on undoped systems, in particular highly oriented PA (Fink and Leising, 1986; Fink et al. 1987e). In Fig. 31, electron energy-loss spectra are shown as a function of the angle 8 between the chain axis c and the momentum transfer q. The spectra which is small compared to the extension of were taken at q = 0.1 kl, the Brillouin zone. For uniaxial crystals, the loss function is given by Im[- 1 / ( q cosz 8 sin28)] [see Eq. (35)], where ell and are the principal components of the dielectric tensor. Therefore, for highly oriented samples, the two components can be determined separately. For q 11 c (8 = 0)

+

QY-K-TO

' 3b ENERGY (eV)

Lo

'

FIG.3 I . Electron energy-loss spectra of highly oriented trans-polyacetylene as a function of the angle between the chain axis and the momentum transfer. The spectra were taken at q = 0.1 A-' which is small in comparison with the dimensions of the BriIlouin zone (after Fink and Leising, 1986).

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

+

193

there is a strong n o plasmon at 22.5 eV due to all valence electrons and a strong n plasmon at 4.9 eV due to a n-n* interband transition. In addition, there are various shoulders above 8 eV that indicate less intense oscillators due to P O * transitions. Turning the momentum transfer perpendicular to the chain axis causes the n plasmon to become strongly reduced and also changes the shoulders in energy and intensity. In Fig. 32, the loss function, E ~ and , c2 are shown for q = 0.1 A-' parallel and perpendicular to the chain axis. These data were derived by a Kramers-Kronig analysis from the loss spectra as described in Section I1,E. For the calibration of the loss function, the optical refractive indices nll = 3.25 and n, = 1.33 were used. For ql(c, the loss function and the dielectric functions are very close to those shown in Fig. 2e, which were formed from two oscillators at 3 and 8 eV. In the onedimensional system PA, the lower oscillator, i.e., the n-n* transition across the fundamental gap at the zone boundary, is very pronounced. The joint density-of-states of the n and n* band shows in this case a square-root singularity at the gap energy 1.8 eV. The maximum of &* is, however, broadened due to interchain interaction and local field effects. The n-n* oscillator leads to a zero-crossing in cl at 4.9 eV and, therefore, the loss function shows a strong maximum at this energy. As this maximum is closely related to the n-n* transition, the n plasmon is not a free-electron plasmon but an interband plasmon. For q l c , the n oscillator is strongly reduced in c2 by a factor of 170, indicating a very small polarizability of the n electrons perpendicular to the chain axis. The observed anisotropy is in qualitative agreement with calculations in the framework of the SSH model by Baeriswyl et al. (1983). At higher energies, there are several oscillators due to ~-r-r* transitions at 12, 14.6, and 19 eV (for q 11 c) and at 16.5 and -23 eV (for q l c ) . The transitions can be partially assigned to transitions in the zone centre and at the zone boundary in the band structure shown in Fig. 28. For q l c , the strong shoulder near 10 eV should be assigned to a n-O* transition. According to the calculations of Mintmire and White (1983), a large joint density-of-states should appear for this normally forbidden transition, because the n and the lowest O* band are effectively parallel in a large portion of the Brillouin zone (see Fig. 28). In the following, we discuss the momentum dependence of the dielectric functions of trans-PA. The n plasmon shows a linear dispersion in momentum transfer in the range q = 0.07 to 1 k'with a slope of 3.3 eVA as can be seen in Fig. 33. This was already recognized by Ritsko (1982) on non-oriented samples. A linear dispersion of the n plasmon was also observed by Ritsko et al. (1983a) for polydiacetylenes. The n plasmon in PA decays close to the The linear dispersion of the n plasmon zone boundary at q = n/a = 1.28 k'. is, of course, related to a linear dispersion of the n-n* oscillator and should not be confused with the dispersion relation of a free-electron plasmon. The

-

194

JORG FINK

16

PPY

I

(5

PT

I

FIG.32. Loss functionsand dielectricfunctions of undoped conjugatedpolymers.PA:highly oriented trans-plyacetylene; PPP: partially oriented polyparaphenylene;PPV: highly oriented polyphenylenevinylene;PPY: non-oriented polypyrrole; PT:non-oriented polythiophene. For oriented samples, the solid (dashed)line gives data for q parallel (perpendicular) to the chain axis.

195

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY 10

PA THEORY

05

-

0 0 0 0

--1 0 tO

-

€ 0 0 0 0 0

0 0

6

L

2

8

x3

12

2

4

ENERGY l e v )

6 8 1012 ENERGY IeVI

0

2

L

6

8

1 PT

PPV

12

10

E K R G Y lev1

q

01 02 03 04

05

06

T

,

I

2

,

I

4

,

.

,

l

l

l

6 8 x) ENERGY lev1

,

12

2

4

6 8 1 0 1 2 ENERGY (eV1

O M T T - K T T ENERGY l e v 1

FIG.33. Momentum-dependent loss functions in the range of I electron excitations. PA: highly oriented polyacetylene; PPP: partially oriented polyparaphenylene; PPV: highly oriented polyphenylenevinylene; PPY: non-oriented polypyrrole; PT: non-oriented polythiophene. For the oriented samples, 9 is parallel to the chain axis. For comparison, calculations of the I plasmon in polyacetylene in the framework of the SSH model, including local field corrections, are shown.

196

JORG FINK

i i b r -

(a1

30

30

20

20

N

W

W"

10

10

0

0

1

0

0

5 10 15 ENERGY lev1

:; 30

r

J 15

7

5 10 15 ENERGY I eV)

10

-

I

(el 30

~

20-

10

00.5

\

qbi-')

\

1.1 0

ENERGY (eV1

5 10 15 ENERGY 1 e V )

FIG.34. Momentum dependence of the H-Z* interband transition in polyacetylene. (a) E 2 ( q , w ) as determined by a Kramers-Kronig analysis from the experimental loss function. (b) Calculation of the joint density-of-states of the x and I[* band. (c) Calculation of E2(q,w) without local field corrections. (d) Calculation of E 2 ( q , a ) with local field corrections for pII << 1 and pl >> 1. (e) Calculation of E2(q,a)with local field corrections for pII = 0.3 A and pL = 1 A.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

197

q dependence of the n-n* oscillator is shown in Fig. 34a in a momentumdependent plot of c2 as derived by a Kramers-Kronig analysis of the loss spectra. For comparison, we show in Fig. 34b the q dependent joint densityof-states of the n and n* band as calculated in a tight-binding model using a simple sampling method. We used E, = 1.8 eV for the gap and W = 11 eV for the total width of the n electron system. In Fig. 34c, the imaginary part of the dielectric function calculated in the SSH model using Eq. (36) is shown with off-diagonal elements omitted (Drechsler and Bobeth, 1985). Finally, in Figs. 34d and 34e we show c2, calculated again from Eq. (36) but now with the inclusion of local field corrections (Neumann, 1986; Neumann and von Baltz, 1987). In Fig. 34d the extension of the Wannier function was chosen pII << 1 A and pL >> 1 A, while in Fig. 34e the calculation is based on more realistic values pI, = 0.3 A and pL = 1 A. Both in the joint density-of-states and in c2 without local field corrections, Van Hove singularities that are square-root singularities in the one-dimensional system are seen. In the (co,q) diagram given in Fig. 35, these singularities are given by full lines. Due to the small matrix element of the n-n* transitions in the zone center, the high energy singularity is strongly reduced in the calculations for c2. When local field corrections are taken into account, the singularities are rounded and the maximum of c2 is shifted to higher energies. This effect increases

AI 5

O O

4a n

FIG.35. Momentum-dependent experimental data on the z plasmon (D) and the maximum of the n-n* interband transition in E* (U) in trans-polyacetylene. Full curves: calculated Van Hove singularities of the n-z* transitions that limit the range of possible interband transitions. Dashed and dot-and-dash curve: calculations of the z plasmon and the maximum of e2 by Neumann and von Baltz (1987).

198

JORG FINK

particularly when more extended Wannier functions are used for the calculation. At higher momentum transfer, the splitting of the lower singularity is strongly washed out. Figure 35 gives a comparison of the experimental data for the plasmon energy and the maximum of c2 with theoretical data by means of the model also used for Fig. 34e. The best agreement could be achieved using the above parameters for the more extended Wannier functions and EB = 1.8 eV and W = 11 eV (Neumann and von Baltz, 1987). A comparison with the simple calculation of the joint density-of-states yielded a slightly higher value W = 12.5 eV (Fink and Leising, 1986). The derived width W = 11 eV is in excellent agreement with estimates of W in the range 10- 12 eV from band-structure calculations, while the gap energy is close to that determined by optical spectroscopy. At this point, it should be noted that an approximation without local field corrections, but with the introduction of a next-nearest-neighbor hopping integral also gave good agreement between theory and experiment (Drechsler et d., 1986). As can be seen in Fig. 35, rather good agreement between theory and experiment can be achieved for the energy position of the collective excitations and the interband transitions in the 71 electron system. However, the shape of the e2 curves in Figs. 34a and 34e is rather different at higher momentum transfer. In particular, the onset in c2 remains in the experiment at the gap energy for zero momentum transfer, while the theoretical calculations predict an increasing gap at higher q. It is interesting to note that this q-independent onset also appears in the loss function shown in Fig. 33 and is therefore not an artifact of the Kramers-Kronig analysis. For comparison, in Fig. 33 the calculated loss function is plotted and clearly shows a momentum-dependent onset. The tail below the single-particle gap has been explained either in terms of excitonic transitions due to high electron-electron correlation energy or by a direct generation of charged soliton-antisoliton pairs assisted by quantum fluctuations in the ground state (Fink, 1987). The latter excitations were also observed in measurements of the photoconductivity (Etemad et al., 1981)and in measurements of the photoinduced absorption (Blanchet et al., 1983). Above q 0.7 kl, there is considerable intensity in the single-particle gap due to multiple scattering processes. On the other hand, the shape of the calculated c2 is rather narrow, while in the experiment there is a splitting that is significant in the calculations only when local field effects are neglected. The origin of this difference has not been clear until now. The calculations also predict a strong narrowing of the plasmon at higher q and, contrary to the postulation of Neumann and von Baltz (1987), the plasmon should be completely undamped when leaving the region of interband transitions at q > 1.1 A-'. In the experiment, strong damping of the n plasmon is observed near that momentum transfer. This probably happens because c2 is strongly increased above -8 eV due to 6-0* transitions not taken into account in

-

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

199

the calculations, (see Figs. 32 and 34a), which leads to a damping of the n plasmon. In discussing the momentum-dependent dielectric functions of PA, we should also mention that the n c~ plasmon shows a linear dispersion for 0.1 I q I 1 k'with a slop of 3.5 eVA. For q > 1 A-' there is a positive deviation from the linear dispersion. An explanation for this linear dispersion may be the fact that the energy of the c ~ - c ~ *transition near 12 eV (probably at the r point) increases linearly with increasing momentum transfer similar to the case of the n-n* transition. This may then cause a linear dispersion of the c~ + n plasmon when using Eq. (23). Let us now return to the n electron system. There are deviations between theory and experiment, given the exact shape of the dielectric functions at higher q, yet the overall agreement shown in Fig. 35 may lead to the conclusion that for PA the Peierls- Frohlich model is adequate and that electron-electron correlations seem to be less important. On the other hand, momentum-dependent calculations of the dielectric functions for non-zero correlation energies would be highly desirable. There is an indication from the calculation of neff for the n electron system for q )Ic up to 6 eV using Eq. (21) that correlations are important. From the experimental data, we obtain for the optical effective mass m * / m = 1.7, while a calculation on the basis of the SSH model, including local field corrections, yields m * / m = 1.44 (Neumann and von Baltz, 1987). The difference between these values may be explained by electron-electron correlations not taken into account in the SSH model. According to calculations of the sum rule in the framework of the onedimensional Hubbard model (Baeriswyl et al., 1986), the experimental value can be explained by high on-site correlation energy U 7 eV or UIW- 0.6. This high values would indicate that PA is an intermediate case in which neither electron-phonon coupling nor electron-electron correlations can be neglected, a situation that is very difficult to treat theoretically. It has been claimed that the small difference in the optical mass of only about 20% may be also caused by other effects such as next-nearest-neighbor hopping, etc. However, it should be noted that very similar values for U have been derived from photoinduced absorption measurements (Vardeny et al., 1985) and from ENDOR experiments (Grupp et al., 1987). Moreover, theoretical estimates of U by Baeriswyl and Maki (1985) are in excellent agreement with the above experimental values. The discussion of the results for PA have illustrated the possibility of obtaining information on the electronic structure of conjugated polymers. Although the dispersion of the n plasmon is not exactly parallel to that of the n-n* transition, some semi-quantitative information on the width of n bands can be obtained directly from plasmon dispersion without performing the Kramers-Kronig analysis. Various undoped conjugated polymers such as

+

N

200

JORG FINK

PPP (Crecelius et al., 1983b; Fink et al., 1983b; polypyrrole (PPY) (Ritsko et a!., 1983b; Fink et al., 1986; Fink et al., 1987f), polythiophene (PT) (Fink et al., 1987a),and polyphenylenevinylene (PPV) (Fink et al., 1987b)have been studied. The loss functions of these polymers are shown in Fig. 32. For highly oriented samples, the data are given for q parallel to and perpendicular to the chain axis. The PPP sample was partially oriented, and, therefore, data taken for q perpendicular to the chain axis are obscured by the projected, much stronger contributions of transitions for q 11 c. Therefore, only data for q 11 c are given. For PPY and PT, up to now no oriented samples could be obtained. On all the conjugated polymers, a n r~ plasmon is realized near -22.5 eV, indicating the same concentration of valence electrons and the same energy of the G - Q * oscillator [see Eq. (23)]. In P P P and PPV there are now two n plasmons related to two n-n* transitions. Strong CT--(T* transitions are always observed between 10 to 20 eV. In Fig. 33, the momentum dependence of the n plasmons is shown. While some of these show a strong dispersion, others are almost without any dispersion. The lowest n plasmon in P P P at 4 eV shows a strong linear dispersion for 0.07 I q I0.3 A-’ with a slope of 2.9 eVA, which is slightly less than that of PA (3.5 eVA). This result suggests a band structure for the n and n* band forming the fundamental gap of PPP very close to that of PA. On the other hand, then plasmon at 7 eV shows no dispersion typical of a transition in which at least the n or the n* band or both are flat (see Fig. 4). Since from symmetry arguments, the n* bands look like the n bands with the Fermi level acting as a mirror plane, both the n and the n* band related to the 7-eV A plasmon must be flat, as shown in Fig. 4a. These flat bands, with a gap of about 6 eV, are related to the n electrons that are localized in the benzene rings, the wave functions of which have nodes at the carbon atoms connecting the rings. The wide R bands with a gap of about 3.2 eV related to the 4-eV plasmon are then n electrons that are delocalized over the polymer chain. Thus, from the loss data a n band structure combining Figs. 4a and 4c with gap energies of 6 and 3.2eV, respectively, can be derived. This is in excellent agreement with band-structure calculations for the n electron system in PPP by Bredas et al. (1984). The situation in PPV is very similar but there is an indication of a third transition between the two dominant n plasmons that may actually be a “mixed” transition between the flat bands and the wide bands, apparently not allowed in PPP. In both PPY and PT, the lowest n plasmons show strong dispersion, indicating that the transition across the fundamental gap is related to a wide n band and a wide n* band. The maxima in the loss functions at higher energies show almost no dispersion, meaning that at least the R band or the n* band or both related to the transition must be flat. The investigation of the n band structure may give important information on the formation of conjugated polymers. This is illustrated here for the

+

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

201

example of PPV (Fink et al., 1987b). This polymer is formed by thermal transformation from the precursor polymer poly(p-xylene-x-dimethylsulphoniumchloride) in which the benzene rings are connected by a saturated CH [S'(CH,),] - CH2 bridge. By thermal elimination of dimethylsulphide, the saturated bridge is transformed into the non-saturated -CH=CHbridge. The ratio of the non-conjugated to the conjugated segments, and, therefore, the average conjugation length, depends on the annealing temperature and time. In Fig. 36a we show the loss function of the precursor polymer at various stages of thermal conversion to PPV, and in Fig. 36b the dispersion of the lowest maximum of the loss function near 4eV. For the as-cast precursor, the 4 eV transition is small and shows almost no dispersion typical of a molecular transition in short conjugated systems. With increasing temperature, the length of the unsaturated segments increases, which leads to an increasing n-z* transition at 4eV with increasing dispersion in q. Therefore, these measurements provide information on the transformation mechanism when passing from the precursor polymer to PPV. In the following, we turn to the doped conjugated polymers and review investigations of the evolution of the n electron structure as a function of doping concentration. For highly oriented PA, the experiments at higher momentum transfer are rather difficult to perform because the effective thickness of the samples strongly increases upon doping with heavy counterions and samples below 2000 A could not be prepared. Nevertheless, loss

20ooc 4.6 13OOC

RT.

-

f

lb)

-

>, - 4.4 &

LL w

24h 30OOC 2h 200OC 2h 130°C

R.T.

4.2 -

4.0 -

3.8 0

2

4

6 8 1 0 ENERGY ( e V I

0

0.1

0.2

0.3

0.4

MOMENTUM TRANSFER

FIG.36. Formation of polyphenylenevinylene from the prepolymer poly(p-xy1ene-xdimethylsulfoniumchloride) upon heat treatment. (a) Loss spectra. (b) Dispersion of the first maximum in the loss spectra.

202

JORG FINK

measurements on doped samples in a limited q range are also possible (Fink et al., 19878).In Fig. 37a the change of the n plasmon of AsF; doped PA as a function of the conductivity of the sample is shown for q 11 c. The highest conductivity corresponds probably to an AsF; concentration of 15% per C atom. The transition into a metallic state with finite Pauli susceptibility is reached near 1000 S/cm. At low doping concentration, intensity appears in the gap of undoped PA due to the fqrmation of a soliton or midgap level. With increasing dopant concentration, the n plasmon is broadened and shifted to lower energy. At the highest dopant concentration, the onset of the loss function is close to zero and the shift of the plasmon is 2.1 eV. For q h , the dielectric function due to the n electrons is probably small compared to the background dielectric function cB because of the cr electrons and therefore the loss function is given in this case by Im( - 1 / ~ )= ct/(.$ E : ) = &&/:.; However, contributions from cannot be excluded, in particular at the highest dopant concentrations where the samples become amorphous, while there still remains some electronic anisotropy. The data in Fig. 37a and 37b clearly show a transformation from a semiconductor via a system with a defect level in the gap to a system where the gap is closed or at least less than some tenth of an eV. In Fig. 38, the evolution of the plasmon dispersion upon

+

I

I

Ib)

ql

=o.taa,,

(S/cm

3300 2700 1000

300 3

0

I 2 4 6 8 1 0 ENERGY ( e V )

ENERGY lev)

FIG.37. Evolution of the loss function in the z electron range of highly oriented PA upon doping with AsF; for various conductivities parallel to the chain axis. (a) q parallel to the chain axis. (b) q perpendicular to the chain axis.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

203

2L

10

02

0.4

06

08

10

MOMENTUM TRANSFER

(k')

FIG.38. Dispersion of the R plasmon in AsFi-doped, highly oriented polyacetylene for various conductivitiesparallel to the chain axis.

doping with AsF, is shown. Up to oI1= 300 S/cm, the dispersion of the plasmon is not changed compared to that of undoped PA. That means that the overall n band structure is not changed and that the conductivity must be connected with defect levels in the gap. Between oII= 300 and 1800 S/cm there is a rapid transition to a new dispersion curve with a steeper slope indicating a stronger dispersion of the n and n* band near the Fermi level, which is to be expected for a closed gap. In addition, for oII 2 1800 S/cm a step of -0.3 eV is realized in the dispersion curves which moves to q = 0.25 A-' for the highest dopant concentration. The step is explained in the following way. At low q there are only transitions from the valence band into a broad soliton band that has filled the gap almost completely. At higher momentum transfer, there are more and more transitions from the valence band to the conduction band. The step in the dispersion would then indicate a gap between the soliton band and the conduction band of about 0.3 eV. The explanation is supported by the fact that the step appears at lower q for a lower dopant concentration at which the unit cell, and thus the length of the soliton band in q space, is smaller due to the larger distance of the counterions in real space. A gap of the same size between valence band and soliton band is to be expected but could not be detected because of the limited energy

204

JORG FINK

resolution. The results support a model of a semiconductor-metal transition via a soliton band. No indication of a narrow half-filled polaron band is observed. Therefore, the above model of a polaron metal is not supported. Finally, we show in Fig. 39 a first attempt to perform a Kramers-Kronig analysis of the loss function of AsF; doped PA. For the calibration of the loss function, E ! = 40 and E: = 8 at the energy of -0.25 eV (Leising et al., 1987) was used. The dielectric functions reveal the shift of the n-n* transition to very low energy (E, I0.2 eV) and the remaining strong anisotropy of the n-n * oscillator. This indicates a highly anisotropic one-dimensional metal for the p-doped PA which was also derived from conductivity measurements and optical studies (Leising et al., 1987). Similar studies on n-doped, highly oriented PA have been performed recently (Fink et al., 1987e; Fink et al., 19878; Fink and Leising, 1987a). In Fig. 40,the loss function and the dielectric functions of K-doped highly oriented P A is shown. A smaller shift to lower energy is observed in all the n-type doped systems. For K, the shift is about 1 eV. For the more electropositive metals Rb and Cs, the shifts are slightly larger. Like the p-type doped PA, there are changes in the o-o* and n-o* transitions. In addition, a strong K 3p core excitation near 20 eV is observed that is very similar to that observed in K-intercalated graphite (Grunes and Ritsko, 1983). Since the effective thickness of the samples was

1.5

PA ( AsF;

1

FIG.39. Loss function and dielectric functions of fully AsFi-doped, highly oriented polyacetylene. Solid lines: q parallel to the chain axis; dashed lines: q perpendicular to the chain axis.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY 15

205

PA (K')

&-

__.--. --- '-.__ x5

---___-.

FIG.40. Loss function and dielectric functions of fully K+-doped, highly oriented polyacetylene. Solid lines: q parallel to the chain axis; dashed lines: q perpendicular to the chain axis.

smaller than that of the p-type doped ones, the Kramers-Kronig analysis could be performed also for higher momentum transfer. In Fig. 41, E 2 ( q , w ) for K-doped PA for q c is shown. Compared to the undoped case, the x-n* oscillator is shifted to lower energies. It shows within error bars a linear dispersion. The extrapolation to zero q gives almost zero energy, indicating the closing of the gap at the highest dopant concentrations. This indicates a similar excitation spectrum as predicted by Williams and Bloch (1974) for a one-dimensional metal. However, looking closer to the data, there are deviations. For example, there are again steplike deviations from a linear plasmon dispersion that at present are not understood in detail. The situation is slightly different for the non-degenerate ground state polymers. In Fig. 42, we show a typical loss function of K-doped PPP at Iow dopant concentration. There are two maxima near 1 and 2 eV in the gap that can be assigned to transitions from the filled bipolaron levels to the conduction band (B*-n* and €3-n* transition). According to the calculation of Fesser et al. (1983), the lower maximum should be much stronger in intensity than the higher maximum, with a ratio of about 20, in rough agreement with the data shown in Fig. 42. It is interesting to note that the observed ratios are not always 20 but are in some cases considerably higher, depending on the method of doping, the counterion, etc. No systematic variation, however, is

206

JORG FINK ___

7--

0

5 10 15 ENERGY ( e V )

FIG.41. Momentum-dependentimaginary part of the dielectricfunctionfor q parallel to the chain axis for fully K -doped, highly oriented polyacetylene. +

observed. On the other hand, various symmetry-breaking mechanisms have been predicted, such as high correlation energy or interchain interaction, that change the oscillator strength of the two transitions (Sum et al., 1987). The shape of the two transitions are also in good agreement with theoretical predictions. It indicates a transition from a narrow defect-level into a

0

1

2 ENERGY (eV1

3

FIG.42. Transitions from occupied defect levels in the gap into the ~band l * for K+-doped polyparaphenyleneat low dopant concentration.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

207

continuum of n* states. Attempts to measure the form factor of these excitations, ie., the momentum dependence of the intensity, have so far not been successful due to the low cross-section and increasing background at higher momentum transfer. Such measurements could, in principle, provide information on the size of the defects. The evolution of the loss function upon doping is illustrated in Fig. 43 for the case of Cs-doped PPP. At low concentrations, again two bipolaron transitions are realized. With increasing dopant concentration, the B*-n* transition increases at the expense of the lowest n-n* transition. At about 25 at.% per monomer, the n-n* transition at 4 eV has disappeared and a new shoulder is realized on the high energy side of the B*-n* transition that disappears at the highest dopant concentrations ( - 50 at.% per monomer). The n-n* plasmon at 7 eV remains almost unchanged, indicating that the benzene rings are not destroyed upon doping. This finding is supported by the fact that the doping is completely reversible, i.e., upon heating the sample Cs can be evaporated and the undoped loss spectrum is obtained again. In the

-

0 2

4 6 8 1012 ENERGY ( e V )

FIG.43. Evolution of the loss function of polyparaphenylene (PPP) upon doping with Cs+. Uppermost curve: fully Cs+-doped PPP ,.( 50 at.% per monomer).

208

JORG FINK

fully doped sample, the strong plasmon at 2.5 eV may be explained by a n plasmon related to a n band that is closed upon doping, i.e., a free-carrier plasmon. However, by comparison with the momentum dependence of this excitation (see Fig. 44) here the situation is completely different from that of PA. The maximum shows no dispersion, indicating a relation t o a still existing, narrow bipolaron band. At higher momentum transfers, a second maximum, showing a strong dispersion, appears which may be assigned to a n-n* transition. At the highest dopant concentrations, the lowest n* band is probably partially filled already, a fact that allows n-n* transitions only at higher momentum transfers or at lower dopant concentrations in agreement with the experiment. The data further suggest that there is a narrowing of the n-n* gap near -25 at.% by about 1 eV, since the energy of the z-n* transition is reduced by this amount. In summary, the present data support an evolution of the n electron system as shown in Fig. 30. However, complete overlap with the valence and conduction band could not be observed. Similar data on PPP have been reported for AsF; doping (Crecelius et al., 1983a), Li doping (Fink et al., 1984b, 1985b), and Na doping (Fark et al., 1987). The existence of a narrow bipolaron band at even the highest dopant concentrations has been also observed in EELS studies on other non-degenerate ground state polymers, such as PPY (Fink et al., 1986; 1987f), PT (Fink et al., 1987a) and PPV (Fink et al., 1987b). It is probably a general law that these non-

I

0

1

2

3 L 5 ENERGY ( e V )

6

FIG.44. Momentum dependence of the loss function of fully Cs+-dopedpolyparaphenylene.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

209

degenerate ground state polymers cannot be doped to the full overlap of the bipolaron bands with the conduction bands and that narrow bipolaron bands remain in the gap up to the highest dopant concentrations. At the end of this section on conducting polymers, we briefly report on recent results on core-level excitations in undoped and doped conjugated polymers. Only some of these results have been published in the references given above. In Fig. 45, the C 1s absorption edges of conjugated carbonhydrogen polymers are shown. The spectrum of PPP is almost identical to that of gaseous benzene (Horsley et al., 1987). For benzene, the peaks at 285.2 and 288.9 eV were ascribed to the 1s-n*(ezu) and 1s-n*(b,,) transitions, since benzene has two unoccupied n* states. The peaks at 293.5 and 300.2 eV, the latter being less pronounced in PPP, were assigned to two 1s-o* shape resonances. The shoulder at 287.2 eV was assigned to a 3p Rydberg state, although, since it remains in the solid, the assignment is probably not correct. It is remarkable that the spectrum of the molecule and that of the polymer is almost the same. This indicates that in these cases the core-level excitations only sample a very local part of the density of the unoccupied states. The

280

290

300 ENERGY l e v )

30

FIG.45. Carbon 1s edges of hydrogen-carbon conjugated polymers such as polyacetylene (PA), polyparaphenylene (PPP), polyphenylenevinylene (PPV), and polynaphtalinevinylene (PNV).

210

JORG FINK

delocalized n electrons have no influence on the spectrum. In PPV, the spectrum is almost the same; only the first shoulder above the first 1s-n* resonance is shifted slightly to higher energies. For PA, the first two peaks, which are probably again 1s-n* transitions, also appear in butene molecules. For q1 = 0, the momentum transfer q is given by q = qll = 0.19 k'for an energy loss of 280 eV [see Eq. (lb)]. When the polymer chain is perpendicular to the beam, there is strong overlap of the n* lobes with the momentum transfer, and therefore the matrix element in Eq. (48) is large, yielding strong 1s-n* transitions. Upon increasing ql, the total momentum transfer turns the matrix element more and more parallel to the chain axis. At q1 = 0.7 kl, in Eq. (48) for the ls-a* transition is strongly reduced, leading to a much weaker peak in the C 1s edge as shown in Fig. 45. For q l c and less pronounced for q IIc, there are two Is-o* shape resonances. As described in Section II,D, the energy positions of these resonances can be used to determine bond lengths. Using the empirical relation A E = 54 Ar, an energy difference of about 4 eV leads to Ar = 0.075 A for the difference between single and double bond. This is in excellent agreement with X-ray (Fincher et al., 1982) and electron (Chien et ai., 1982) scattering data, yielding Ar = 0.06 A, and with NMR data by Yannoni and Clark (1983), yielding 0.08 A. However, it should be kept in mind, as outlined in Section II,D, that the model we have used is based on empirical data and that various exceptions to this model exist. In principle, such splitting of the ls-o* resonances should also appear in PPP, PPV, and polynaphtalinevinylene (PNV), but in this case the ratio of the number of double bonds to that of single bonds is much less than one. Finally, we note that the origin of the splitting of the 1s-n* transition in PNV is not clear at present. In Fig. 46, we show C 1s edges for other conjugated undoped polymers. Polymethineimine (PMI), which is similar to trans-PA but with every second C-H group replaced by a N atom, has a strong 1s-n* resonance at 286.6 eV, which is about 2 eV above that of PA. This is probably caused by chemical shift due to the more electronegative N atoms. From XPS measurements, it is well known that one N atom bonded to a C atom causes a chemical shift of the C 1s level of about 1 eV. In polyacrylonitrile (PAN) at low temperatures, which is a polyethylene chain where every second H atom is replaced by a C G N group, this group causes the ls-x* resonance at 286.8 eV. Upon heating PAN at 3OO0C,PAN transforms into polypyridinopyridine, which is composed of trans-PA bonded on each second C atom by a r~ bond to a PMI chain. Therefore, there appear two 1s-n* transitions at -285 eV and 287 eV corresponding to those of trans-PA and PMI (Ritsko et al., 1983c; Fink et al., 1983~).In PPY, there are two n resonances separated by 1 eV because the so-called /%carbons (two per monomer) are bonded to C and H atoms only, while the a-carbons (two per monomer) are bonded to one N neighbor each. In PT, the 1s-n* transition at about 285 eV also shows a

-

-

-

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

21 1

PAN

200Y PAN

300°C PPY

PT

280

290

300

0

ENERGY ( e V )

FIG.46. Carbon Is edges of N and S containing conjugated polymers such as polymethineimine (PMI), polyacrylonitrile (PAN), polypyrrole (PPY), and polythiophene (PT).

splitting, which is, however, rather small because S and C have almost the same electronegativity. In context of an EELS investigation of the monomer, the peak at 287.2 eV was ascribed to a C-S 1s-a* resonance (Hitchcock et al., 1986). In summary, the 1st-n* transitions in the conjugated undoped polymer are very close to those of short molecules. Therefore, the spectra are not related to delocalized n electrons but to n electrons localized in small molecules. The first 1s-n* transition, which appears close to 285 eV for all nbonded carbon compounds, including graphite, indicates that this transition reaches even more localized, probably atomic-like states. The N 1s spectra of these materials have the same character (see Fig. 47). There is a 1s-n* transition near 401 eV for N bonded to an H atom. In PMI, radiation-damaged PPY, and in a polyaniline sample (PANI), 1s-n* transitions appear at lower energy due to N atoms not bonded to a H atom, which therefore have more charge. In n-type doped metallic polymers, the core holes are stronger shielded and the core-level edges are characterized by an extremely sharp rise at threshold as in metals. For Na- and K-doped PA, the molecular C 1s-n* resonance is strongly reduced, as shown in Fig. 48. A similar C Is edge is found by Ritsko (1981) for AsFS-doped PA. The very near-edge structure of PA(K+)is close to

212

JORG FINK

PPY - 6 s - R I

390

400 410 420 ENERGY ( e V )

FIG.47. Nitrogen 1s edges of polymethineimine (PMI), (210;-doped polypyrrole (PPY-

Cloy), radiation-damaged butanesulfonate-doped polypyrrole (PPY-BS-RD), and Cl0;doped polyaniline (PANI).

that of K-doped graphite (Grunes and Ritsko, 1983). The two maxima may be explained by two C sites, one of them close to a counterion, the other not. However, the intensity of the second peak is much too small to support this explanation. A splitting may also occur to metal carbon hybridization. Furthermore, the second peak may be caused by metal s states above the Fermi level. It is very interesting to note that in the fully doped PA, there is now only one 1s-o* resonance, indicating an almost equal C-C bond-length as predicted for a Peierls system with a closed gap. The 1s-n* transition of Na+, K', Rb+, and Cs+-doped PPP are strongly reduced compared to those in undoped PPP. There is a sharp edge with a tail at higher energies. It is as yet unclear why Li+-doped PPP is an exception. Perhaps there is a partial ionic binding of Li to PPP. It should be noted that there was no evidence of such a bonding in the valence excitations. In Fig. 49, C 1s edges of p-type doped conjugated polymers are shown. In all cases, a shoulder or peak below the threshold appears that may be explained by an excitation into empty defect levels in the gap or by a negative chemical shift. From the charge extraction out of the carbon atoms, a shift to higher energy should be expected [see Eq. (49)and Section II,D]. However, since the extracted charge per counterion is distributed over several carbon atoms in the polymer chain while the charge on the counterions is concentrated on one site, the Madelung term may

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

213

I Cls "a')

IK')

PPP ILI+ 1

PPP INa') PPP IK')

PPPIRb* I

280

290 300 ENERGY ( e V )

310

FIG.48. Carbon Is edges of n-type doped conjugated polymers such as N a + -and Kf-doped polyacetylene (PA)and Lif-, Na+-,K+-, Rb'-, and (2s'-doped polyparaphenylene (PPP).

overcompensate the first term in Eq. (49). The carbon atoms close to the counterions may then cause spectral weight at lower energy. Finally, we note that for PANI, the peak at 287 eV can be assigned to a ls-.n* transition of those carbon atoms that are bonded to a N atom. It is interesting to investigate also the core-level excitations on the counterions in the conducting polymers. As an example, in Fig. 50 we show core edges of Na' in PPP. For comparison, the same edges are shown for Na metal. Since charge is transfered from the Na atom to the polymer, the core levels are shifted to lower energies and, therefore, the threshold is shifted to higher energies. This is observed in both the Na 1s and Na 2p excitations. In NaOH, the Na 2p peak appears at 33.3 eV (Kunz, 1966), which is 0.5 eV above that of PPP (Na') shown in Fig. 50. This indicates a slightly weaker oxidation of Na in PPP than in NaOH. Since in both spectra of Na' in PPP there are no pronounced thresholds, it is believed that most of the 3s and 3p states are well above the Fermi level, which indicates a rather complete charge transfer to the polymer. In Fig. 48 the 2p edges of K + in PA and in P P P are recognized. The difference in spectral shape is not clear at present. K in PA shows a shape very similar to that of K metal while K in PPP shows a shape that is closer to that of oxidized K.

214

JORG FINK

PPY ( C I O ~ )

PT (CIO,)

PAN1 (ClO,

l " " 1 " ~ ' 1 " " 1 ' ' " I ' " ' I " '

290

280

300 ENERGY (eV 1

0

FIG.49. Carbon 1s edges of p-type doped polymers such as polypyrrole (PPY), polythiophen (PT) and polyaniline (PANI).

1 Nals

Na-METAI

Na-METAL

30

32 34 36 ENERGY l e v )

FIG.50. Sodium 1s and 2p edges of Na metal and as counterion in polyparaphenylene

(PPP).

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

2 15

In summary, important information on conducting polymers can be obtained from core-level spectroscopy using the spectra as finger prints similar to XPS as used in chemical analysis. It should be emphasized that the energy resolution in present day EELS spectra is better by a factor of about 3- 10 than conventional XPS spectroscopy and, therefore, smaller shifts can be detected. On the other hand, it should not be forgotten that in EELS, changes of both the core level and the conduction bands have to be considered. More theoretical calculations on core-level excitations in these systems are highly desirable, since only then will detailed information on the electronic structure be obtained.

IX. SUPERCONDUCTORS The recent discovery of high- T, superconductivity in certain metallic ceramics (Bednorz and Muller, 1986) has strongly revived interest in the field of superconductivity. Generally, any microscopic understanding of superconductivity requires detailed knowledge of the normal-conducting properties, of which electronic properties are central. Within the conventional Eliashberg theory, which is believed to describe essentially all known “conventional” superconductors (e.g., Pb, Nb, TIN, Nb,Sn,. . ..), the electronic properties occur in the “Eliashberg function” a2F(o),which is the main ingredient in this theory. It is defined by

Here E k is the energy of an electron of momentum k, w k - k .a phonon frequency belonging to momentum k- k’, G k k . the electron-phonon coupling constant, and N ( E F ) the electronic DOS at E F (Scalapino, 1969). For simplicity, band and branch indices have been omitted. Equation ( 5 5 ) represents a Fermi-surface average over electron-phonon coupling. A crucial point for the validity of Eliashberg’s theory is the applicability of Migdal’s theorem, which implies a slowly varying electronic DOS around E F on the scale of typical phonon energies hwDebye.So spectroscopic investigation of electronic states around EF may not only serve as an experimental check of the electronic data in a2F(w),but also tell us whether N ( E ) is indeed slowly varying around EF, and thus whether Eliashberg’s theory is applicable at all. For the conventional refractory compound superconductors and the A 15 superconductors, this will be illustrated below. Similar studies on the organic superconductor /?(BEDT-TTF),I, have been performed recently by Nucker et al. (1986). Within the context of unconventional superconductors (heavy fermions, new high-T, ceramics, etc.), spectroscopic data may still provide valuable help

216

JORG FINK

for discriminating between different coupling mechanisms. So, if instead of phonons, low-lying excitations of electronic origin (excitons, plasmons) play the role of virtually exchanged bosons, those excitations should also be seen in the real spectral DOS in high resolution energy-loss spectroscopy. Moreover, these superconductors are expected to be highly correlated systems, which implies that the local density-functional theory may not provide a good description of the electronic structure. This breakdown can be recognized by electron spectroscopy studies and furthermore, important parameters can be determined at least roughly to guide theory in the choice of a model. A. Transition Metal Carbides and Nitrides

First, we review investigations of the electronic structure of 3d and 4d transition metal carbides and nitrides (Pfliiger et al., 1982; Pfliiger et al., 1984; Pfliiger et al., 1985a; Pfluger et al., 1985b). Some of these refractory materials have a rather high T, on a before-1986 scale (e.g., for NbN, T, = 16 K). The band structure in the vicinity of the Fermi level is composed of bonding nonmetal p orbitals and of antibonding metal d orbitals. In the carbides, there is strong hybridization of the p states with the d states, which decreases as we approach the nitrides and oxides. The position of EF is determined by the number of available valence electrons. In the case of Tic, EFis situated in the very minimum between the p and the d states, while for TiN, EF is shifted upward into the d bands and N(EF) increases. In Fig. 51, we show the loss

Frc. 51. Loss functions and dielectric functions of Tic and TiN.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

2 17

function and the dielectric functions of T i c and TIN, as derived from the loss spectra. In Fig. 52 we show a comparison of the reflectivity derived from EELS measurements for TIC with optical data obtained using synchrotron radiation (Lynch et al., 1980). In the lower part, the loss function calculated from the optical data is compared with the measured loss function. The overall agreement between these results obtained by the two techniques is rather good. In Fig. 53, the optical joint density-of-states for T i c and TiN as derived from the loss spectra is compared with calculations of the joint density-ofstates using the self-consistent Gaussian-LCAO method of Applebaum and Hamann (1978). Matrix elements and energy-dependent life-time effects were not taken into account. In both compounds, there is, besides a Drude part, a strong interband transition at about 5-7 eV from the bondingp bands into the

1

0.2

0

2.0 Z

E Z

i?

1.0

0

ENERGY IeV 1 FIG.52. Comparison of EELS data (solid line) and optical data (dashed line) for T i c (after Pfliiger et a/., 1984). Upper part: measured reflectivity (after Lynch et al., 1980) and reflectivity derived from EELS data. Lower part: measured loss function and calculated loss function from reflectivity data.

218

JORG FINK

FIG.53. Optical joint density-of-states (solid line) of Tic and TiN as derived by a KramersKronig analysis of the loss functions. For comparison, the calculated joint density-of-states is shown (dashed line). Note that in the calculations, matrix elements and energy-dependent life-time effects are not included.

antibonding d bands. The oscillator causes a zero-crossing of el below the oscillator energy (see also Section II,B and Fig. 2b). In Tic, EF is in the very minimum between the p and d bands and there are strong interband transitions also at lower energy which strongly damp the plasmon. For TiN, where EF is within the d bands, intraband transitions between pure d states localized at the metal atoms are dipole-forbidden and e2 and the optical joint density-of-states have a minimum close to the energy at which el has a zerocrossing. Therefore, the interband plasmon is highly pronounced. A similar highly pronounced plasmon was also observed in ZrN, while in VC, VN, NbC, and NbN it is more or less broadened due to intraband transitions. Structures in the loss function at higher energies can be attributed to transitions from non-metal 2s states into the unoccupied d states. For the carbides and nitrides they appear at 12 eV and 19 eV, respectively. The volume plasmons of all valence electrons are close to the expected energies of the freeelectron plasmons. In Fig. 54, the non-metal 1s absorption edges of Tic, TIN, VC, VN are shown, the binding energies of the 1s level being 281.5, 397.0, 282.6, and

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

219

P

A

vc

i

FIG.54. Nonmetal 1s core-level edges of Tic, TiN, VC, and VN (solid lines). For comparison, the total unoccupied DOS (dashed line) and the calculated transition probability (dotted line) is shown. In the calculations an energy-dependentlife-time broadening is included.

396.8 eV, respectively. For comparison, we also show in Fig. 54 the broadened total DOS calculated by a self-consistent LCAO program. For VN, a singleparticle calculation of the broadened transition probability using a cluster program based on the multiscattering formalism is also shown (Winter, 1984). The measured peak positions are well reproduced by the DOS calculation. This indicates that the transitions from the non-metal 1s states into the rather delocalized 2p states can be well described by single-particlecalculations, and that in this case interaction with the core-hole is rather weak. The situation is completely different in the case of metal 2p excitations into the 3d states, where due to the large overlap between the 3d states and the 2p states there is strong interaction with the core-hole. Now the edges cannot be explained any longer by a simple DOS calculation (Pfliiger, 1983).In the non-metal 1s edges, the first two peakscan be assigned to metal 3d-t,,and 3d-e, states hybridized with non-metal 2p states. Thus, the crystal splitting of the 3d states can be

220

JORG FINK

clearly recognized in all four compounds. The strong peak about 12 eV above the Fermi level was ascribed essentially to a transition into a non-metal 2p state. These core-level measurements indicate that the non-superconducting materials T i c and VC have a lower DOS at EF compared to that of the superconducting materials VN and TiN. With an energy resolution of 0.2 eV, no strongly varying DOS close to EF could be detected. In general, besides investigations with other spectroscopic techniques, the above EELS investigations have served as a check of our present understanding of the electronic structure of these materials. B. A-15 Compounds

Up to 1986 the A15 compounds were the class of materials with the highest superconducting transition temperatures. Moreover, these alloys are also characterized by other unusual physical properties, such as phonon anomalies, temperature-dependent Knight shifts, and others. This has been attributed to a high and rapidly varying DOS with peaks having a width of about 50 meV close to EF. Band structure calculations indeed predict such features in the DOS. A direct spectroscopic verification of the existence of these peaks has been achieved by core-level spectroscopy using EELS (MullerHeinzerling et al., 1985a). In Fig. 55, we show the N b 3d5,, edges of Nb,Sn, Nb,Ge, and Nb,Al, which appear at about 202.5eV. Spectra on nearstochiometric, high-T,, and on off-stochiometric, low-T, samples are shown. For comparison, the partial p-symmetry DOS and the unoccupied part broadened by the energy resolution (0.2 eV) obtained from a tight-binding fit to self-consistent augmented-plane wave calculations (Mattheiss and Weber, 1982) are also shown. It is believed that the measured edge is close to the unoccupied DOS because the s and p final states in this case have only a small overlap with the core-hole. Indeed, the measured high-T, edges are close to the calculated ones, indicating the existence of a peaked DOS close to E F . Moreover, the high- T, samples exhibit much more pronounced peaks than the corresponding low-T, samples. This is in agreement with specific-heat data. Finally, there is a clear trend of increasing peak heights from Nb,Sn to Nb,Ge and, in particular, to Nb,Al, which originates from the shift of EF from the top to the bottom part of the high-DOS region, leaving more and more of the high-DOS region unoccupied and therefore contributing to the nearedge peak. These core-level measurements have revealed directly the existence of the high and strongly peaked DOS at EF in the A15 compounds which is responsible for many anomalies, in particular for the high superconducting transition temperatures. Further information on the electron structure has been obtained from valence band excitations, producing dielectric functions

22 1

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

-05

00 05

10

15

Energy above EF (eV) FIG.55. Nb,,, core-level edges of Nb,Sn, Nb,Ge, and Nb,AI on off-stoichiometric, low-T, and stoichiometric, high-T, samples. The thin line through the data is a guide to the eye. All curves are normalized to the same height at 1 eV above E,. For comparison, the calculated broadened unoccupied density-of-states and the unbroadened density-of-states for the three compounds are shown (after Muller-Heinzerling et al., 1985a)

over a broad energy range that can be compared with band-structure calculations (Miiller-Heinzerling et al., 1985b; Miiller-Heinzerling, 1985). C . Ceramic Superconductors

The last chapter of this section on superconductors is devoted to new ceramic systems that possess superior superconducting properties such as high in the 30-95 K range, high critical fields, and high critical currents. Since this is presently an extremely rapidly developing field, the presentation of EELS and other electron spectroscopy results can reflect only a current viewpoint whose correctness remains to be proven. At present, it is still far from clear what type of interaction leads to the high superconducting transition temperatures. Many theoretical models have been proposed so far based on a pairing by phonons, excitons, plasmons, or on electron-electron Coulomb interaction (Rice, 1987; Fulde, 1988). In particular, the problem of providing the relevant parameters for the latter interaction poses a challenge to theorists and experimentalists. At present (October 1987), there are basically two systems under consideration, La,-,M,CuO, with M = Sr, Ca,Ba . .. and MBazCu,O,-,

rs

222

JORG FINK

with M = Y and most of the rare earths. It is generally believed that in these compounds, two-dimensional C u - 0 planes and, in YBa2Cu307-, , besides those also one-dimensional C u - 0 chains, form the electronic states close to the Fermi level. These low-dimensional structures are therefore important for the transport properties of these systems. For La,CuO,, band-structure calculations predict a splitting of the Cu 3d egstates due to a non-cubic crystal field, and therefore the antibonding Cu 3dX+* states hybridize with the 0 2p states forming a half-filled wide band (Mattheiss, 1987; Yu et al., 1987).The Fermi surface of this half-filled two-dimensionalband has pronounced nesting properties, and therefore a charge-density wave or a spin-density wave may occur depending on the size of electron-phonon or the electron-electron Coulomb interaction. Electron spectroscopy studies may help to clarify whether the single-particle calculations are still valid, or whether the above mechanisms lead to a breakdown of these calculations. Preliminary information on electronic structure can be obtained from the loss function of these materials (Nucker et al., 1987a) shown in Fig. 56. The free-electron plasmon of all valence electrons should appear in both compounds close to 26 eV, in excellent agreement with the experimental result of 25.5 eV for YBa2Cu,O7. In La,CuO,, the plasmon is shifted to 29.5 eV due to a very strong oscillator at 13.7 eV (see Section 11,BJ and Fig. 2). According to the XPS and BIS spectra for these materials (Nucker et al.,

I

0

SrO.ZSCuo~

10

20 30 40 ENERGY ( e V 1

50

FIG.56. Loss spectra of La,,,,Sr,.,,CuO, and YBa,Cu,O,.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

223

1987b),this oscillator should be assigned to a transition of the occupied La 5d states to the narrow unoccupied La 4f band. There is an additional shoulder near 21.5 eV that can probably be attributed to a Sr 4p excitation. In YBa,Cu,O,, shoulders at about 17,19 and 29 eV can be assigned to Ba 5p,,, , 5p1,2,and Y 4p core-level excitations. The origin of the shoulders at 13.6 and 35 eV is not clear at present. Unfortunately, the samples available at present, which were prepared by cutting with an ultra-microtome (see Section IV), are small and have numerous pin-holes. Therefore, strong contributions from the direct beam produce a large background at low energies. Thus, the data below about 2 eV are still connected with some uncertainty and Kramers-Kronig transformation could not be performed. However, momentum-dependent data in this energy range would be extremely interesting and could decide whether there exist low-energy excitations that may lead to a pairing of electrons. Up to now, no pronounced plasmon could be detected that would be expected from optical data on these materials. In Fig. 57, the 0 1s edges of pure La2Cu0, and La,8,Sro,,,Cu0, are shown (Nucker et al., 1987b, 1987c, 1988). The binding energy of the 0 Is level as determined by XPS is indicated by the dashed line. If we interpret the edges as the local DOS of the unoccupied states at the oxygen atoms, this line will denote the Fermi level. In the Sr-doped system, there is a peak close to E F , the intensity of which increases roughly proportional to x. Beyond 5 eV above EF, strong spectral density is observed due to 0 2p states hybridized with La 5d and 4f states. This is in agreement with band-structure calculations within the local densityfunctional approximation using the LMTO-ASA method (Temmerman et al.,

x = 0.15

0

ENERGY (eV1

FIG.57. Oxygen 1s edges of La,-,Sr,CuO, for x

= 0 and 0.15.

224

JORG FINK

1987). It is very interesting to note that in pure La,CuO,, there is within error bars no DOS at EF,although the band-structure calculation and calculations that include the matrix elements predict almost the same local 0 DOS or the same spectral weight near E F , respectively, for the pure and for the doped material. Moreover, a peak or a shoulder at about 2 eV above EF is not predicted by the calculations. This indicates clearly a breakdown of the local density-functional band-structure calculations for these materials. Taking into account the energy resolution of 0.4 eV chosen for the registration of these spectra, a gap of about 1 eV can be derived for La,CuO,. An interpretation of the gap in terms of a Peierls-Frohlich model due to electron-phonon coupling can be excluded because no charge density wave, i.e., no distortion of the C u - 0 distances in the planes, has been observed experimentally. On the other hand, the finding of an antiferromagnetic transition in La,CuO, with a Nee1 temperature of TN= 230 K strongly suggests that the observed gap is due to strong correlations of d electrons on the Cu sites, i.e., a MottHubbard gap. The next important question is: How strong are the correlawould lead to a 3d count close to 9. This would then tions. A rather large U..,, lead to a closed 0 2p band between the lower and the upper Hubbard band and thus an insulating state with a gap would be formed (Emery, 1987). The peak at the Fermi level in La,,8,Sr,.,,Cu0, could then be explained as holes on the oxygen sites. This peak has never been found in photoelectron spectra where the Cu 3d states have a much larger cross-section and the 0 2p states have a smaller one. Thus we conclude that the holes created upon doping have predominantly 0 2p character. The hole-like character of the charge carriers was also derived from Hall-effect studies (Hundley et al., 1987; Ong et al., 1987) and by optical spectroscopy (Geserich et al., 1987). The indication of a large correlation among d electrons is supported by XPS measurements on the density-of-states of the valence electrons (Fujimori et d., 1987; Fuggle et al., 1988), where a peaking of the spectra dominated by Cu 3d states should occur at 2 eV below $;,while in the experiment it is observed at -4 eV. Furthermore, strong correlations would prevent a Cu 3d8 (Cu3+) configuration in agreement with the experimental results of XPS studies on Cu 2p core-level excitations (Niicker et al., 1987b). Finally, from Auger electron spectroscopy it is clear that U,,, for the Cu 3d electrons is not less then 4- 5 eV (Fuggle et al., 1988). The situation in the YBa,Cu,O,-, system is very similar as shown in Fig. 58. There may be at present some ambiguities on the threshold in YBaZCu307-,, since it is not clear whether the 0 1s binding energies for the four 0 sites have the same value. The density-functional band-structure calculations again predict a similar density-of-statesat EF for y = 0 (superconducting, = 92 K) and for y = 1 (insulating), while in the experiment, for y = 1, at which the one-dimensional C u - 0 chains are destroyed, the DOS is

-

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

I

01s

~

y

225

- 0.2

- 0.3 - 0.5 - 0.8

1 . 4 .%a%-.

5 k E~ 5;O

5% '

510 '

ENERGY ( e V )

FIG.58. Oxygen 1s edges of YBa,Cu30,-, for y about 0.2,0.3,0.5,and 0.8.

-

close to zero. This indicates again a gap due to the strong correlations among d electrons. For lower y, the DOS at EF increases and the peak at 3 eV, which appears in the band-structure calculations at 2.5 eV above E,, disappears. It is interesting to note again that the states at E, are not seen by those spectroscopies in which the Cu 3d cross-section is high (Fujimori et al., 1987; Bianconi et al., 1987). As with La,-,Sr,CuO,, we conclude that the states near EF and also the holes have a predominantly 0 2p character. The recent studies by electron spectroscopy have revealed many details on the very complicated electronic structure of cuperoxide-based superconductors. While from UPS, XPS, and Auger electron spectroscopy important information on the parameters U and A (the charge transfer energy for Culigand hopping) could be derived, the 0 1s core-level excitation by EELS directly shows the breakdown of single-particle band-structure calculations in the insulating materials and possibly also in the superconducting systems. It should also be noted that this spectroscopy is at present the only method capable of detecting pronounced changes of the electronic states close to EF upon variation of x and y. The DOS of the holes on 0 sites, which are created upon doping or upon changing the stoichiometry, could be measured directly by this local probe. Finally, it should be emphasized that, contrary to almost all the other electron spectroscopies, EELS is not a surface-sensitive method and thus avoids all the ambiguities entailed in surface preparation, surface contamination, and oxygen diffusion.

-

226

JORG FINK

ACKNOWLEDGEMENTS The author gratefully acknowledges the initiation of the construction of the electron energyloss spectrometer by W. Schmatz and M. Campagna. My colleagues G. Crecelius, W. Czerwinski, H. Fark, A. vom Felde, R. Hott, P. Johnen, A. Litzelmann, R. Manzke, Th. Miiller-Heinzerling, N. Niicker, J. Pfluger, B. Scheerer, and J. Sprosser have contributed significantly to the work presented in this review. The author is grateful to D. Baeriswyl, R. von Baltz, H.-J. Freund, J. Fuggle, J. Heinze, D. Kaletta, P. Koidl, H. Kuzmany, W. Jager, G. Leising, H. Lindenberger, N. Neugebauer, H. Rietschel, J. J. Ritsko, S. Roth, G. A. Sawatzky, M. Stamm, D. Schweitzer, K. Sturm, H. Trinkaus, W. Weber, W. Wernet, G. Wegner, J. Zaanen, and R. Zeller for fruitful collaboration and stimulating discussions.

REFERENCES Abell, G. C. and Attalla, A. (1987).Phys. Reu. Lett. 59,995. Adler, S. L. (1962).Phys. Rev. 126,413. Aers, A. C.,Paranjape, B. V., and Boardman, A. D.(1979).J . Phys. Chem.40,319. Andre, J.-M. and Leroy, G. (1971).Int. J . Quantum Chem. 5,557. Appelbaum, J. A. and Hamann, D. R. (1978).I n “Proceedings of the International Conference on the Physics of Transition Metals, Toronto, 1977”(M. J. G. Lee, J. M. Perz, and E. Fawcett, eds.), p. 1 1 1. IOP, London. Ashkenazi, J., Ehrenfreund, E., Vardeny, Z., and Brafman, 0.(1985).Mol. Cryst. Liq. Cryst. 117,

198. Ashley, J. C. and Ferrell, T. L. (1976).Phys. Rev. B 14,3277. Baeriswyl, D.(1985).In “Theoretical Aspects of Band Structures and Electronic Properties of Pseudo-One-Dimensional Solids’’ (H. Kamimura, ed.), p. 1. Reidel, Dordrecht. Baeriswyl, D. (1987).I n “Electronic Properties of Polymers and Related Compounds, Kirchberg 11” (H. Kuzmany, M. Mehring, and S. Roth, eds.), Springer Series in Solid State Science, Vol. 76,p. 198.Springer-Verlag, Heidelberg. Baeriswyl, D. and Maki, K. (1985).Phys. Rev. B 31,6633. Baeriswyl, D.,Harbeke, G., Kiess, H., Maier, E., and Meyer, W. (1983).Physica (Utrechr) 117B,

617. Baeriswyl, D., Camelo, J., and Luther, A. (1986).Phys. Rev. B 33,7247. Ballu, Y.(1980)Adu. Elecrr. Electron Phys. Suppl. 138,257. Batson, P. E. and Silcox, J. (1983).Phys. Rev. B 27,5224. Bednorz, J. G.and Muller, K. A. (1986).Z.Phys. B 64, 189. Bianconi, A,, Congiu Castellano, A., De Santis, M., Rudolf, P., Lagarde, P., and Flank, A. M. (1987).Solid State Commun. 63,1009. Blanchet, G. B., Fincher, C. R., and Heeger, A. J. (1983).Phys. Rev. Lett. 51,2132. Boersch, H.(1954).Z. Phys. 139, 115. Boersch, H. and Miessner, H. (1962).Z. Phys. 168,298. Brazovskii, S.A. (1978).JETP Lett. 28,606. Brazovskii, S. A. (1980).Sou. Phys. JETP 51,342. Brazovskii, S.A. and Kirova, N. N. (1981)JETP Lett. 33,4. Bredas, J. L. and Street, G. B. (1985).J. Phys. C 18,L651. Bredas, J. L., Chance, R. R., and Silbey R. (1982).Phys. Rev. B 26,5843.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

227

Bredas, J. L., Themans, B., Fripiat, J. G., Andre, J. M., and Chance, R. R. (1984). Phys. Rev. B 29, 6761. Brosens, F., Lemmens, L. F., Devreese, J. T. (1976). Phys. Status Solidi B 74,45. Brosens, F., Lemmens, L. F., Devreese, J. T. (1977). Phys. Status Solidi B 80,99. Bubenzer, A., Dischler, B., Brandt, G., and Koidl, P. (1983). J . Appl. Phys. 54,4590. Burkel, E., Peisl, J., and E. Dorner (1987). Europhys. Lett. 3,957. Chiang,C. K., Fincher, C. R., Park, Y.W., Heeger, A. J.,Shirakawa, J. H., Louis, E. J.,Gam, S. C., and MacDiarmid, A. G. (1977). Phys. Reo. Lett. 39, 1098. Chien, J. C. W., Karasz, F. E., and Shimamura, K. (1982). Macromol. Chem., Rapid Comm. 3,655. Chin, P. H., Wertheim, G. K., and Schliiter, M. (1979). Phys. Reo. B 20,3067. Colliex, C. (1984). In “Advances in Optical and Electron Microscopy” (V. E. Cosslett and R. Barr, eds.), Vol. 9, p. 65. Academic Press, London. Colliex, C. and Mory, C. (1984).In “Quantitative Electron Microscopy,” J. N. Chapman and A. J. Craven (eds.), p. 149. SUSSP Publications, Edinburgh. Colliex, C., Manoubi, T., Gasguier, M., and Brown, L. M. (1985). In “Scanning Electron Microscopy” (Om Johari, ed.) Part 2, p. 489. SEM Inc., A. M. F. O’Hare, Illinois. Crece1ius.G. (1986).1n“Handbook of Conducting Polymers”(T. A. Skotheim,ed.), Vol. 2, p. 1233. Marcel Dekker, New York. Crecelius, G., Stamm M., Fink, J., and Ritsko, J. J. (1983a). Phys. Reo. Lett. 50, 1498. Crecelius, G.,Fink, J., Ritsko, J. J., Stamm, M., Freund, H.-J., and Gonska, H. (1983b). Phys. Rev. B 28,1802. Dabrowski, B. (1986). Phys. Rev. B 34,4989. Daniels, J., von Festenberg, C., Raether, H., and Zeppenfeld, K. (1970). In ‘Springer Tracts in Modern Physics,” Vol. 54, p. 77. Springer-Verlag, Berlin. Dehmer, J. L. (1975). Phys. Rev. Lett. 35, 213. Devreese, J. T., Bosens, F., and Lemmens, L. F. (1979). Phys. Status Solidi B 91,349. Donnelly, S. E. (1985) Radial. Efl. 90, 1. Drechsler, S. L. and Bobeth M. (1985). Phys. Status Solidi B 131, 267. Drechsler, S. L., Heiner, E., and Osipov, V. A. (1986). Solid State Commun. 60,415. Durham, P. J., Pendry, J. B., and Hodges, C. H. (1981). Solid State Commun. 38, 159. Egerton, R. F. (1986). “Electron Energy-Loss Spectroscopy in the Electron Microscope,” Plenum Press, New York. Ehrenreich, H. and Cohen, M. H. (1959). Phys. Rev. 115, 786. Emery, V. J. (1987). Phys. Rev. Lett. 58,2794. Etemad, S., Mitani, T., Ozaki, M., Chung, T. C., Heeger, A. J., and MacDiarmid, A. G. (1981). Solid State Commun. 40,75. Etemad, S., Heeger, A. J., and MacDiarmid, A. G., (1982). Ann. Reo. Phys. Chem. 33.443. Fark, H., Fink, J., Scheerer, B., Stamm, M., and Tieke, B. (1987). Synth. Metals 17, 583. Fesser, K., Bishop, A. R., and Campbell, D. K. (1983). Phys. Rev. B 27,4804. Fincher, C. R., Chen, C.-E., Heeger, A. J., MacDiarmid, A. G., and Hastings, J. B. (1982). Phys. Rev. Lett. 48,100. Fink, J. (1985a). Z. Phys. 61,468. Fink, J. (1985b). In “Festkorperprobleme (Advances in Solid State Physics)” (P. Grosse, ed.), Vol. XXV, p. 157. Vieweg, Braunschweig. Fink, J. (1987). Synth. Metals 21,87. Fink, J. and Kisker, E. (1980). Rev. Sci. Instr. 51,918. Fink, J. and Leising, G. (1986). Phys. Rev. B 34, 5320. Fink, J. and Leising, G. (1987a) (unpublished results.). Fink, J. and Jager, W. (1987b) (unpublished results.). Fink, J, Mutter-Heinzerling, Th., Pfliiger, J., Bubenzer, A,, Koidl, P., and Crecelius, G. (1983a). Solid State Commun. 47,687.

228

d R G FINK

Fink, J., Crecelius, G., Ritsko, J. J., Stamm, M., Freund, H.-J., and Gonska, H. (1983b). J. Phys. (Paris)44,C3-741. Fink, J., Ritsko, J. J. and Crecelius, G. (1983~). J. Phys. (Paris)44,C3-683. Fink, J., Miiller-Heinzerling, Th., Pfliiger, J., Scheerer, B., Dischler, B., Koidl, P., Bubenzer, A., and Sah, R. E. (1984a). Phys. Rev. B 30,4713. Fink, J., Scheerer, B., Stamm, M., Tieke, B., Kanellakopulos, B., and Dornberger, E. (1984b).Phys. Rev. B 30,4867. Fink, J., Miiller-Heinzerling,Th., Scheerer, B., Speier, W., Hillebrecht, F. U., Fuggle, J. C.,Zaanen, J., and Sawatzky, G. A. (1985a). Phys. Rev. B 32,4899. Fink, J., Scheerer, B., Stamm, M., and Tieke, B. (1985b). Mol. Cryst. Liq. Cryst. 118,287. Fink, J., Scheerer, B., Wernet, W., Monkenbusch, M., Wegner, G., Freund, H.-J. and Gonska, H. (1986). Phys. Rev. B 34, 1101. Fink, J., Niicker, N., Scheerer, B., and Neugebauer, H. (1987a). Synth. Metals 18, 163. Fink, J., Niicker, N., Scheerer, B., vom Felde, A,, Lindenberger, H., and Roth, S. (1987b). I n “Electronic Properties of Polymers and Related Compounds, Kirchberg 11” (H. Kuzmany, M. Mehring, and S. Roth, eds.). Springer Series in Solid State Science. Vol. 76, p. 79. SpringerVerlag, Heidelberg. Fink, J., Niicker, N., Sah, R. E., Koidl, P., Baumann, H., and Bethge, K., (1987~).Proc E-MRS Meeting, Strassbourg 1977. Vol XVII p. 475, Les Editions de Physique, Paris. Fink, J., Niicker, N., Scheerer, B., Czerwinski, W., Litzelmann, A., and vom Felde, A. (1987d). In “Electronic Properties of Polymers and Related Compounds, Kirchberg 11” (H. Kuzmany, M. Mehring, and S. Roth, eds.).Springer Series in Solid State Science, Vol. 76, p. 70 SpringerVerlag, Heidelberg. Fink, J., Fark, H.,Niicker, N., Scheerer, B., Leising, G., and Weizenhofer, R. (1987e).Synth. Metals 17,377. Fink, J., Scheerer, B., Wernet, W., Monkenbusch, M., Wegner, G., Freund, H.-J., and Gonska, H. (1987f). Synth. Metals 18,71. Fink, J., Niicker, N., Scheerer, B., vom Felde, A., and Leising, G. (1987g). In “Eelectronic Properties of Polymers and Related Compounds, Kirchberg 11” (H. Kuzmany, M. Mehring, and S. Roth, eds.). Springer Series in Solid State Science, Vol. 76, p. 84. Springer-Verlag, Heidel berg. Foo, E-Ni and Hopfield, J. J. (1968). Phys. Rev. 173,635. Fuggle, J. C., Hillebrecht, F. U., Esteva, J.-M., Kartantak, R. C., Gunnarsson, O., and Schonhammer, K. (1983). Phys. Rev. B 27,4637. Fuggle, J. C., Weijs, P. J. W., Schoorl, R., Sawatzky, G. A., Fink, J.,Niicker, N., Durham, P. J., and Temmerman, W. M. (1988). Phys. Rev. B37, 123. Fujimori, A., Takayama-Muromachi, E., Uchida, Y., and Okai, B. (1987). Phys. Rev. B 35, 8814. Fulde, P. (1988).Physica Scripta T23, 101. Garnett, J. C. M. (1904). Philos. Trans. R. Soc. London 203,385. Geserich, H. P., Schreiber, G., and Renker, B. (1987). Solid State Commun. 63,657. Gibbons, P. C., Ritsko, J. J., and Schnatterly, S. E. (1975). Reo. Sci. Instrum. 46, 1546. Gibbons, P. C., Schnatterly, S. E., Ritsko, J. J., and Fields, J. R.(1976). Phys. Rev. B 13, 2451. Grant, P. M. and Batra, I. P. (1979). Solid State Commun. 29,225. Greenwood, G. W., Foreman, A. J. E., and Rimmer, D. E. (1959).J. Nucl. Matter. 4, 305. Grunes, L. A. and Ritsko, J. J. (1983). Phys. Rev. B 28,3439. Grunes, L. A., Leapman, R. D., Wilker, C. N., Hoffman, R., and Kunz, A. B. (1982). Phys. Rev. B 25,7157. Grupp, A,, Hofer, P., Kass, H., Mehring, M., Weizenhofer, R., and Wegner, G. (1987). In “Electronic Properties of Polymers and Related Compounds, Kirchberg I1 (H. Kuzmany, M. Mehring, and S. Roth, eds.), Springer Series in Solid State Science. Vol. 76, p. 156. Springer-Verlag, Heidelberg.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

229

Haensel, R., Keitel, G., Kosuch, N., Nielsen, U., and Schreiber, P. (1971). J . Phys. (Paris) 32, C4-236. Hasegawa, M. (1971). J . Phys. Soc. Jpn. 31,649 Hasegawa, M. and Watabe, M. (1969). J . Phys. Soc. Jpn. 27, 1393. Heeger, A. J., Kivelson, S., Schrieffer, J. R., and Su, W. P. (1987). Preprint. Henoc, P. and Henry, L. (1970) J . Phys. (Paris)4, C1. Hitchcock, A. P., Beaulieu, S., Steel, T., Stohr, J., and Sette, F. (1984).J . Chem. Phys. 80, 3927. Hitchcock, A. P., Horsley, J. A., and Stohr, J. (1987).J . Chem. Phys. (in press). Hohberger, H. J., Otto, A., and Petri, E. (1975). Solidstate Commun.16, 175. Holas, A. and Rahman, S. (1987). Phys. Rev. B 35,2720. Horsch, P. (1981). Phys. Reo. B24,7351. Horsley, J. A., Stohr, J., Hitchcock, A. P., Newbury, D. C., Johnson, A. L., and Sette, F. (1987). J . Chem. Phys. (in press). Hott, R. (1986)Diplomarbeit Universitit Karlsruhe. Hubbard, J. (1955). Proc. Phys. Soc. London 68,967. Hundley, M. F., Zettl, A., Stacy, A,. and Cohen, M. L. (1987). Phys. Rev. B35, 8800. Ibach, H. and Mills, D. L. (1982). “Electron Energy-Loss Spectroscopy and Surface Vibrations.” Academic Press, New York. Ichimaru, S. (1982).Rev. Mod. Phys. 54, 1017. Jager, W., Manzke, R., Trinkaus, H., Zeller, R., Fink, J., and Crecelius, G. (1983). Radiat. &fl.78, 315. Jensen, E. and Plummer, E. W. (1985). Phys. Rev. Lett. 55, 1912. Keil, P. (1966). Z . Naturforsch. Teil A 21, 503. Kivelson, S. and Heeger, A. J. (1985). Phys. Rev. Lett. 55, 308. Kloos, T. (1973) Z. Phys. 265,225. Kotani, A. and Toyozawa, Y. (1974). J . Phys. Soc. Jpn. 37,912. Kovacic, P. and Oziomeck, J. (1966). Macromol. Synth. 2, 23. Krane, K. J. (1978)J. Phys. F: Metal Phys. 8,2133. Kubo, R. (1957).J . Phys. Soc. Jpn. 12,570. Kugler, A. A. (1975)J. Stat. Phys. 12, 35. Kunz, C. (1966). Z. Phys. 196,311. Kuyatt, C. E. and Simpson, J. A. (1967). Rev. Sci. fnstr. 38, 103. Leapman, R. D., Grunes, L. A,, and Fejes, P. L. (1982). Phys. Rev. B 26,614. Leder, L. B. and Suddeth, J. A. (1960).J . Appl. Phys. 31, 1422. Lee, P. A,, Citrin, P. H., Eisenberger, P., and Kincaid, B. M. (1981).Reo. Mod. Phys. 53,769. Lefebvre, A. and Deconninck, G. (1986). Nucl. fnstr. Meth B 15,616. Leising, G. (1984). Polymer Bulletin 11,401. Leising, G., Filzmoser, M., and Kahlert, H . (1987) Synfh. Metals 21,267. Lindhard, J. (1954). Kgl. Dans. Vidensk. Selsk. Mat. Fys. Medd. 28, 3. Lindner, Th., Sauer, H., Engel, W., and Kambe, K. (1986).Phys. Reo. B 33,22. Lohff,J. (1963).Z. Phys. 171,442. Loubeyre, P., Berson, J. M., Pinceaux, J. P., and Hansen, J. P. (1982). Phys. Rev. Lett. 49, 1172. Lucas, A. A. (1973). Phys. Rev. B 7,3527. Lucas, A. A., Vigneron, J. P., Donnelly, S. E., and Rife, J. C. (1983).P hys. Reo. B 28,2485. Lynch, D. W., Olson, C . G., Peterman, D. J., and Weaver, J. H. (1980). Phys. Rev. B 22,3991. Mahan, G. D. (1981).“Many Particle Physics.” Plenum Press, New York. Manzke, R., Crecelius, G., Fink, J., and Schollhorn, R. (1981). Solid State Commun. 40, 103. Manzke, R., Jager, W., Trinkaus, H., Crecelius, G., Zeller, R., and Fink, J. (1982). Solid State Commun. 44,48 1. Manzke, R., Crecelius, G., Jager, W., Trinkaus, H., Zeller, R., and Fink, J. (1983a) Radiat. EJJ. 78, 327.

230

JORG FINK

Manzke, R., Crecelius, G., and Fink, J. (1983b). Phys. Rev. Lett. 51,1095. Marton, L., Leder, L. B., and Mendlowitz, H. (1955). Ado. Electr. Electron Phys. 7, 183. Mattheiss, L. F. (1987). Phys. Rev. Lett. 58, 1028. Mattheiss, L. F. and Weber, W. (1982). Phys. Rev. B 25,2248. Mayer, J. W. and Rimini, E. (1971).“Ion Beam Handbook for Material Analysis.” Academic Press, New York. Mele, E. J. and Ritsko, J. J. (1979). Phys. Rev. Lett. 43,68. Mele, E. J. and Rice, M. J. (1981). Phys. Rev. B 23,5397. Mennin, N. D. (1970). Phys. Rev. B 1,2362. Mintmire, J. W. and White, C. T. (1983). Phys. Reo. B 28, 3283. Moller, H. and Otto, A. (1981a). Phys. Rev. Lett. 45,2140. Moller, H. and Otto, A. (1981b). Phys. Rev. Lett. 46, 1706. Morar, J. F., Himpsel, F. J., Hollinger, G., Hughes G., and Jordan J. J. (1985). Phys. Rev. Lett. 54, 1960. Miiller-Heinzerling, Th. (1985) K fK-Report 3970. Miiller-Heinzerling, Th., Fink, J., and Weber, W. (1985a). Phys. Reo. B 32, 1850. Miiller-Heinzerling, Th., Fink, J., and Weber, W. (1985b). Physica 135B cl, 347. Naarmann, H. (1987). In “Electronic Properties of Conjugated Polymers (Kirchberg 11)’’ (H. Kuzmany, M. Mehring, S. Roth, eds.), Springer Series in Solid States Science, Vol. 76, p. 12. Springer-Verlag, Heidelberg. Natoli, C. R., Misemer, D. K., Doniach, S., and Kutzler, F. W. (1980). Phys. Rev. A 22, 1104. Natta, M. (1969). Solid State Commun. 7,823. Neumann, C.-S. (1986). Diplomarbeit Universitat Karlsruhe. Neumann, C.3. and von Baltz, R. (1987). Phys. Reo. B 35,9708. Nozieres, P. and Pines, D. (1958). Phys. Rev. 111,442. Niicker, N., Fink, J., Schweitzer, D., and Keller, H. J. (1986). Physica 143B cl, 482. Niicker, N., Fink, J., and Scheerer, B. (1987a) (submitted). Niicker, N., Fink, J., Renker, B., Ewert, D., Politis, C., Weijs, P. J. W., and Fuggle, J. C. (1987b). 2. Phys. B 67,9. Niicker, N., Fink. J., Renker, B., Ewert, D., Weijs, P. J. W., and Fuggle, J. C. (1987~).Jap. J. Appl. Phys. 26, BJ20. Niicker, N., Fink, J., Fuggle, J. C., Durham, P. J., and Temmerman, W. M. (1988). Phys. Rev. B 37,5158. Ohtaka, K. and Lucas, A. A. (1978). Phys. Reu. B 18,4643. Ong, N. P., Wang, Z. Z., Clayhold, J., Tarascon, J. M., Greene, L. H., and McKinnon, W. R. (1987). Phys. Rev. B 35,8807. Paasch, G. (1970).Phys. Status Solidi 38, K123. Pathak, K. N. and Vashishta, P. (1973).Phys. Reo. B 7,3649. Person, B. N. J. and Liebsch, A. (1982).Solid State Commun. 44,1637. Petri, E. and Otto, A. (1975). Phys. Rev. Lett. 34, 1283. Huger, J. (1983). KfK-Report 3585. Pfliiger, J., Fink, J., Crecelius, G., Bohnen, K. P., and Winter, H. (1982). Solid State Commun. 44, 489. Pfliiger, J., Fink, J., Weber, W., Bohnen, K.-P., and Crecelius, G. (1984). Phys. Rev. B 30,1155. Pfliiger, J., Fink, J., Weber, W., Bohnen, K.-P., and Crecelius, G. (1985a). Phys. Rev. B 31, 1244. Pfliiger, J., Fink, J., and Schwarz, K. (1985b). Solid State Commun. 55,675. Piancastelli, N. N., Lindle, D. W., Ferret, T. A., and Shirley, D. A. (1987). J . Chem. Phys. 86,2765. Pines, D. and Bohm, D. (1952). Phys. Rev. 85,338. Platzman, P. M. and Wolf, P. A. (1973).Solid State Phys. Suppl. 13. Raether, H. (1965). In “Springer Tracts in Modern Physics,” Vol. 38, p. 85. Springer-Verlag, Berlin.

RECENT DEVELOPMENTS IN ENERGY-LOSS SPECTROSCOPY

23 1

Raether, H. (1977). In “Physics of Thin Films,’’ Vol9, p. 145. Academic Press, New York. Raether, H. (1980). In “Springer Tracts in Modern Physics,” Vol. 88, p. 1. Springer-Verlag, Berlin. Resta, R. (1977). Phys. Rev. B 16,2717. Rice, M. J.,(1979).Phys. Lett. 71, 152. Rice, T. M. (1987). 2. Phys. B 67, 141. Rife, J. C., Donnelly, S. E., Lucas, A. A., Gilles, J. M., and Ritsko, J. J. (1981).Phys. Rev. Letters46, 1220. Ritchie, R. H. (1957). Phys. Rev. 106,874. Ritsko, J. J. (1981). Phys. Reu. Lett. 46, 849. Ritsko, J. J. (1982).Phys. Rev. B 26, 2192. Ritsko, J. J., Crecelius, G., and Fink, J. (1983a). Phys. Reu. B 27,4902. Ritsko, J. J., Fink, J., and Crecelius, G. (1983b).Solid State Commun. 46,477. Ritsko, J. J., Crecelius, G.,and Fink, J. (1983~). Phys. Rev. B 27,2612. Robertson, J. (1986). Adu. Phys. 35,317. Rocca, M., Ibach, H., Lehwald, S., and Raman, T. S. (1986). In “Structure and Dynamics of Surfaces 1,” Topics in Current Physics (W. Schommers and P. von Blankenhagen, eds.), Vol. 41, p. 245. Springer-Verlag, Berlin. Roth. S. and Bleier, H. (1987). Adu. Phys. 36, 385. Rudberg, E. (1929). K. Suenska Vet. Akad. Handl. 7, I. Ruthemann, G. (1941). Naturwiss. 29,648. Ruthemann, G . (1942). Naturwiss. 30,145. Scalapino, D. J. (1969). In “Superconductivity” (R. D. Parks, ed.), Vol. 1, p. 449. Marcel Dekker, New York. Schnatterly, S. E., (1979). Solid Sfate Phys. 34,275. Schrieffer, J. R. (1985). In “Proceedings of the International School of Physics ((Enrico Fermi)), Course LXXXIX” (B. Bassani, F. Fumi, and M. T. Tosi, eds.) p. 300. North Holland, Amsterdam. Schiilke, W., Nagasawa, H., and Mourikis, S. (1984). Phys. Reu. Lett. 52,2065. Schulke, W . ,Nagasawa, N., Mourikis, S., and Lanzki, P. (1986). Phys. Rev. B 33,6744. Schulke, W., Bonse, U., Nagasawa, H., Mourikis, S., and Kaprolat, A. (1987). Phys. Rev. Lett. 59, 1361. Scott, J. C., Pfluger, P., Krounbi, M. T., and Street, G . B. (1983) Phys. Rev. B 28,2140. Sette, F., Stohr, J., and Hitchcock, A. P. (1984). J. Chem. Phys. 81,4806. Siegbahn, K., Nordling, C., Johansson, G., Hodman, J., Heden, P. F., Hamrin, K., Gelius, U., Bergmark, T., Werme, L. O., Manne, R., and Baer, Y. (1969). In “ESCA Applied to Free Molecules,” p. 104. North-Holland, Amsterdam. Singwi, K. S., Tosi, M. P., Land, R. H., and Sjolander, A. (1968). Phys. Rev. 176,589 Springborg, M. (1986). Phys. Reu. B 33,8475. Stohr, J., Sette, F., and Johnson, A. L. (1984). Phys. Rev. Lett. 53, 1684. Streitwolf, H. W. (1985).Phys. Status. Solidi B 127, I I. Sturm, K. (1978).Solid State Commun. 27,645. Sturm, K. (1981) Phys. Rev. Lett. 46,1706. Sturm, K. (1982) Adu. Phys. 31, 1. Sturm, K. (1987). Private communication. Sturm, K. and Oliveira, L. E. (1981). Phys. Rev. B 24,3054. Su, W. P., Schrieffer, J. R.,and Heeger, A. J. (1980). Phys. Rev. B 22,2099. Sum, U., Fesser, K., and Buttner, H. (1987). J . Phys. C : Solid State Phys. 20, L71. Suzuki, N., Ozaki, M., Etemad, S., Heeger, A. J., and MacDiarmid, A. G . (1980). Phys. Reu. Lett. 45, 1209. Taylor, P. R. (1985). Chem. Phys. Lett. 121,205.

232

JORG FINK

J. Phys. /:: Mvr. I’kw Temmerman, W. M., Stocks, G. M., Durham, P. J., and Sterne, P. A. (1987).

17,L135.

Templier, C., Jaonen, C., Riviere, J.-P., Delafond, J., and Grilhe, J. (1984).C.R. A i u d S c i . 299.

613.

Tessman, J. R., Kahn, A. H.,and Shockley, W. (1953).Phys. Rev. 92,890. Thole, 8. T., Cowan, R. D., Sawatzky, G. A,, Fink, J., and Fuggle, J. C. (1985).Phys. Reo. H 31,

6856.

Thomi, L., Traverse, A., and Bernas, H. (1983).Phys. Rev. B 28,6523. Tieke, B., Bubeck, C., and Lieser, G. (1982).Makromol. Chem. 3,261. Toennies J. P. (1984).J. Vac. Sci. Technol. a2, 1055. Trinkaus, H.(1983).Radiat. Ejf. 78,189. Van Hove, L.(1954).Phys. Rev. 95,249. Vardeny, Z., Ehrenfreund, E., and Brafmann, 0.(1985).In “Electronic Properties of Polymers and Related Compounds, Kirchberg I” (H. Kuzmany, M. Mehring, and S. Roth, eds.), Springer Series in Solid State Science, Vol. 63,p. 91.Springer-Verlag, Heidelberg. Vashishta, P. and Singwi, K. S. (1972).Phys. Rev. B 6,875. Vidal, D. and Lallemand, M. (1976).J . Chem. Phys. 64,4293. vorn Felde, A. and Fink, J. (1985)Phys. Rev. B 31,6917. vom Felde, A. (1986).Unpublished results. vom Felde, A., Fink, J., Miiller-Heinzerling, Th., Pfliiger, J., Scheerer, B., Linker, G., and Kaletta, D. (1984).Phys. Rev. Lett. 53,922. vom Felde, A,, Fink, J., Buche, Th., Scheerer, B., and Nucker, N. (1987a). Europhys. Lett. 4,1037. vom Felde, A., Fink, J., and Ekardt, W. (1987b).Phys. Rev. Lett. (submitted). Wehenkel, C. and Gauth6, B. (1974).Phys. Sratus. Solidi. B 64,515. Williams, P. F. and Bloch, A. N. (1974).Phys. Rev. B 10,1097. Winter, H.(1984).Unpublished results. Wiser. N. (1963).Phys. Rev. 129,62. Wu, K.-S. D. and Beck, D. E. (1987).Phys. Rev. B 36,998. Yannoni, C. S. and Clarke, T. C. (1983)Phys. Reo. Lett. 51,1191. Yu, J., Freeman, A. J.,and Xu,J.-H.(1987).Phys. Rev. Lett. 58, 1035. Zaanen, J., Sawatzky, G. A., Fink, J., Speier, W., and Fuggle, J. C. (1985).Phys. Rev. B 32,4905. Zacharias, P.(1975).J. Phys. F 5,645. Zunger, A. and Freeman, A. J. (1977)Phys. Rev. B 16,2901.

ADVANCES IN ELECTRONICS A N D ELECTRON PHYSICS, VOL. 75

Methods of Calculating the Properties of Electron Lenses E. HAHN Jena German Democratic Republic

I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . .

233

11. Equation of Motion . . . . . . 111. Differential Equation of Trajectories.

. . . . . . . . . . . . . . . . 235 . . . . . . . . . . . . . . . . 237 IV. Methods of Solution . . . . . . . . . . . . . . . . . . . . . . 238 V. Coupling between Field and Basis

V1. Representation of the Trajectory . VII. Iteration Cycle . . . . . . . VIII. Electron Mirrors . . . . . . IX. Conventional Lenses . . . . . X. Theory of Micro-Lenses . . . . XI. Quadrupole Optics. . . . . .

. . . . . . . . . . . . . . . . . 246

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . 250 . . 255 . . 258 . . 262 . . 269 . . 294

XII. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .

325 328

I. INTRODUCTION

The concept of electron optics first brings to mind the imaging properties of electrostatic magnetic fields, in which a ray bundle can be deflected and focused. Provided that wave optical properties play no role, the Lorentz equations of motion form a starting point for the calculation of trajectories as constituent parts of a bundle. If one considers the integration of trajectories, i.e., the solution of the corresponding differential equation, for which there are standard procedures (for example, the Runge-Kutta method), the optical point of view can be introduced only afterwards in terms of choice of initial values. The interaction between field, trajectory, and optics, which finds expression in the Eikonal theory, based on Fermat’s principle, is lost in the calculation of individual trajectories. We consider here the integration of the equation of motion as an optical problem: The optics is coupled with the field, and this coupling is realized in the form of an iteration process by which a linearization of the equation of 233 Copyright 0 1989 by Academic Press, Inc All rights of reproduction in any form reserved. ISBN 0-12-014675-4

234

E. HAHN

motion is attained. The solutions of this linearized equation of motion are represented as a linear combination of a fundamental system forming the “basis” that incorporates the optics. Three differential expressions, related to field and trajectory, which are associated with the round-lens, quadrupole-lens, and deflection-system components of the field, become field functions related by an (n - l)lh order approximation to a particular trajectory. Its optical equivalent arises through a local field-basis coupling in the form of step matrices (lens matrices) and a deflection vector. The four-vector of the trajectory is thus represented in an nth order approximation as a function of its initial values. The field-basiscoupling consists of a homogeneous first-order differential equation for the step matrices, whose solution is given explicitly, so that the introduction of “extraneous” integration procedures is not necessary. This concept not only renders a very precise trajectory calculation possible, but also has a number of advantages in the treatment of problems connected with optical conjugation arising from the cross-sectional properties of ray bundles, especially with quadrupole optical conjugation. In the equation of motion or, alternatively, the differential equation of the trajectories, the time t or the abscissa z along the optic axis is usually chosen as the independent variable. The solution, accordingly, appears as a sequence of points (motion) or as a line of points (trajectory). Regarded mathematically, threre is no essential difference between the two representations, since motion can also be regarded as a representation of the trajectory in parametric form. We use here as independent variable, the basis variable w, which is defined in terms of the fundamental system associated with the round-lens component of the field. From an optical point of view, motion can be regarded as a process of successive mappings. To each process step (integration step) corresponds an operator (step matrix) that produces the mapping. The mapping can be spacelikeor timelike. The objects of the mapping are firstly the straight lines of a ray space, which touch a trajectory in the vertical planes of a “partial-lens” corresponding to the integration step, and, secondly, certain points or lines that are defined as intersection points or lines of a ray bundle and that characterize a state of the ray space. The spacelike mapping is a collineation determining which straight line of the object-side ray space of the partial lens is identical with which straight line of its image-side ray space, i.e., belongs to the same trajectory as a tangent. The optical action of the process step can be understood by comparing identical objects in two different ray spaces. The two ray spaces exist simultaneously and are spatially separated by the integration step along the taxis or the z axis. The timelike mapping is produced by a “micro-lens.’’ The “micro-lens” only appears in the form of its optical action (lens matrix) and may be

CALCULATING THE PROPERTIES OF ELECTRON LENSES

235

compared to an “event” separating two states. The ray space is present always only in one state. The identity of its rays cannot be determined, since a comparison analogous to the “partial-lens” concept is not possible. In the “micro-lens’’concept, the intersection configurations of rays are determined, especially vertices of incident and exiting rays, whose identity is secured by means of optical conjugation. The basis variable has a double function: as a substitute for t or z, it serves to describe the progress of the ray, and as the argument of periodic solution functions, it serves to describe a state and the conjugation. In this double function resides an important advantage of the theory set out below. OF MOTION 11. EQUATION

The relevant equation of motion (Lorentz equation) for the geometrical optics of an electron with negative charge (-e), rest mass m,, and velocity v in a time-invariant, electromagnetic field (cp, A ) may be written

d dt

-4

m,v

= egradcp

+ ecurlA x v.

Here cp denotes the electrostatic potential with -grad cp = E as the electro-static field strength. A is the magnetic vector potential with curl A = B as the magnetic induction and cis the velocity of light. It is useful to introduce the proper time r of the moving electrons instead of the time t

dt d? =

-

1

= 1 + - ecp m,,c2’

(2)

The electric potential cp is normalized in such a way that erp is equal to the kinetic energy. With v = ds/dt, the equation of motion (1) takes the form

d2s ds mo2 = egrad@+ ecurlA x -, dz dr which differs formally from the nonrelativistic equation (c -P should be replaced by T and cp by

@=(I+&)’.

(3) 00)

only in that t

(4)

The coordinates x, y, z form a rectangular (right-handed) Cartesian coordinate system with z as the abscissa of the optic axis. We write the coordinates x, y in complex form r=x+iy

r*=x-iy.

(5)

236

E. HAHN

In the vortex-free region of the vector field B = curlA the magnetic field intensity (induction) can be represented as the gradient of a scalar potential B = curl A = -grad II/.

(6)

If A,, A,, A , are the components of A, the system of equations (four real equations)

a aZ

-($

-

a

iA,) = 2i-(A, ar

+ iA,) f7)

is equivalent to ( 6 )together with the equation div A = 0 (Lorentz convention for cp independent of time), and the Laplace equation, which has the following appearance for rl/,

+

(4&

$)*

= 0%

holds for all four functions ( $ , A , , A,, and A,) in which the freedom from sources (divB = 0) and from vortices (curlB = 0) of B = curlA has been incorporated. By the use of (7) one obtains the following equations of motion for the coordinates (r, z ) of the moving electrons: + j

e d2z dt2 -m,

-+

j

*

(a$

az {a$

dt

dr dr d t

(9)

2 az d t

a+ d r * ) } ar* d t

‘

Scalar multiplication of ( 3 ) with dsldt and subsequent integration with respect to z yields the energy relationship

Between the arc length ds and its projection dz on the z axis, there exists the relation (dashes mean differentiation with respect to z ) (ds)’ = (1

+ r’r*’)(dz)2,

(12)

and one obtains for the derivative of the z coordinate of the moving electron with respect to the proper time t

CALCULATING THE PROPERTIES OF ELECTRON LENSES

237

Repeated differentiation and comparison with (10) yields the relation

111. DIFFERENTIAL EQUATION OF TRAJECTORIES

If we regard the equations of motion (9), (10) as equations determining the trajectories r(z), z(r) in parametric form, a constant in the modified electrostatic potential @ has no influence, since only its gradient is effective. In order to obtain a trajectory it is necessary to set the initial values. In the case of the differential equation systems of (9) and (lo), these are the initial coordinates and velocity. In the trajectory bundle, emerging from a point, the initial velocities may be different. Fixing the constant in the potential @ via relation (1 1) would have the effect that the electrostatic potential cp would depend on a particular trajectory. On the other hand, in the electrostatic potential cp(r, r*, z), the solution of the Laplace equation (8), with I) replaced by cp, is fixed by its boundary values at the given geometrical electrode arrangement. In these electrode potentials, the potential of the cathode is implicit; it is set at cp = 0. The various emission velocities of the electrons emitted by the cathode are taken into consideration by adding a quantity q (chromatic parameter) to the potential cp. For the modified potential @, we must then write

instead of (4). If we bear in mind that in the equations of motion, (9),(lo), and (13), the electric potential arises only in its modified form (15), we can subsequently neglect the relativistic correction term 1 + e(cp + q)/2m,c2 and of further simplification of the simply replace cp by (cp + 7). adopt the convention that I) notation, we replace the denotes (e/8m)'/2 +. We write for short

and have in the normal case of the ray propagation in the direction of the positive z axis according to (1 3) the following relation between z and t dz

-= dt

Jq.

238

E. HAHN

For a trajectory in an electron mirror, the sign of the ray propagation in the z direction reverses at the point of reflection, determined by the zero value of (c + (p). In this case, the square root in (17) is ambiguous. Using the relation (17) we can eliminate the variable t on the right-hand side of the equations of motion (9), (10) and obtain, in terms of the agreed simplifications,

8q

d2z

Likewise, we can eliminate the variable t on the left side of (18) and obtain the differential equation for the trajectory r(z). 1 D(r) = (c + @)r"+ -(c 2

84) + @'r' - __ + i44dr*

Multiplication of (20) with r*' and addition of the complex conjugate leads to the relation

which is equivalent to relation (14), taking into account the agreed simplifications. If r(z) is a solution of the differential equation (20) with the initial values r(z=), r'(z,) in the object plane z, for a given value of the chromatic parameter q, and if one inserts this in expression (16), then integration of (17) vields

The function z(t) arising from the solution of this equation, together with r(t) = r(z(t)) then satisfies the differential equations (18), (19) of the equations of motion. For the lower intergration limit zo in (22) we adopt the following conventions: In the normal case (conventional lenses) the quantity (c + 4) has no zeros in the lens space, and zo is the abscissa of some point on the axis representing the center of the lens; the choice of zo is independent of the trajectory.

Iv. METHODSOF SOLUTION The differential equation (20) is a complex composite of two coupled, ordinary second-order differential equations and hence can be integrated stepwise, for given initial conditions, by means of well-known numerical

CALCULATING THE PROPERTIES OF ELECTRON LENSES

239

methods, e.g., the Runge-Kutta-method (Zurmiihl, 1963). Since this procedure is universally applicable, it cannot reflect the specific structure of the differential equation (20).Such a specific characteristic results, inter aka, from the existence of a paraxial region, the possibility of expanding the field as a power series in r and r*, and the classification of these fields as multipoles of various orders. The procedure developed here is an iterative process. This enables the differential equation (20) to be linearized in such a way that, stepwise, differential equation systems of the paraxial type are formed. The corresponding coefficient functions arise out of the r, r’ and z-dependent “field expressions” governed by the differential equation structure and field distribution; these expressions become “field functions” (functions of an axial variable) by the insertion of the trajectories obtained in a cyclic operation (as an approximation for the prescribed initial conditions). These “field functions” change in successive iteration cycles until these trajectories become the solution of the differential equation (20). The solution of the differential equations takes place by the construction of fundamental matrices, which are products of step matrices. The coupling between the elements of the step matrix and the corresponding field functions is derived by a specially developed formula system of order h6, where h is the step length in the z raster (Hahn, 1985). A . Fundamental Matrix

Consider a differential equation of the paraxial type d2r dt2

-

+ f ( t ) r = 0,

in which the field function f ( t )for a given field expression f ( r ,r’, z) is obtained by using the functions r(t), r’(t),z ( t )from the above-mentioned cycle instead of the free variables r, r’, z. Each solution of (23) is a linear combination of two linearly independent solutions rl(t), r2(t);these form a fundamental system. The matrix

is referred to as a fundamental matrix. The initial values of the fundamental system at t = 0 are defined by the relation S ( t = 0) = 1 (unit matrix). The Wronski determinant of the fundamental system is the determinant of S and det S ( t ) = 1 irrespective of t .

240

E. HAHN

The differential equation for the fundamental matrix

is on the basis of definition (24) equivalent to (23) and has the advantage of being linear and first order. If we replace the field function f(t) by f ( t ) + g(t), the fundamental matrix then goes over into the product S P, where P(t) is determined by the differential equation,

-

-+g(t)s-l(; dP dt

i)S.P=O

and P(t = 0) = 1. We put p(t) = S 2

+ T2,

S R(t) = arctan T

and call p the amplitude and R the phase of the basis defined by the fundamental system S(t), T ( t ) or simply the amplitude and phase of the fundamental matrix S(t). It can be seen that

and we can use R as a new axis variable (basis variable). We express the differential equation (27) in terms of R. In terms of R, S has the product form (basis form)

The dot notation indicates differentiation with respect to R, and C(R) denotes the rotation matrix

C(R) = We obtain

dP dR

cosR sinR

-sinR cosR

CALCULATING THE PROPERTIES OF ELECTRON LENSES

24 1

B. Quadrupole Matrix When the additional field function is complex, that is, it has the form g + ih with g and h real, we have a system of differential equations of the paraxial type to consider: d2x dt2 + ff + g 1 . x + h y = 0,

-

d2Y + (f - g) ' y dt2

+h

(33)

x = 0.

The fundamental matrix X = SQ

(34)

now has four rows. Its four column vectors

j = 1,..,4,

(35)

are formed analogously to (24). In (34) the 2 x 2 matrix S(t), defined with S(t = 0) = 1 according to (26), is to be replaced by a four-row matrix according to the relation (38). The quadrupole matrix Q(t), with Q(t = 0) = 1, is determined by the differential equation

/o

0

1

o\

In this the complex-valued field function, (g + ih) is a four-row matrix. First g + ih is written as a two-row matrix according to the rule g+ih=

g

+ ih

(37)

and then into a four-row matrix with real elements according to the rule

+

ac + iu, iy,

+

db + ip ia)=[

1; p 1;). c

d

d

(38)

242

E. HAHN

The product of the two quantities is independent of the form of presentation, the multiplication being carried out according to the usual rules. On introducing the transformation (28),we obtain the following equation, which is the analog of (32):

\o

0

0

o/

In the case h = 0 , Q has the “separated” form

\

913

and we have

as the solution matrix of the differential equation (32) and

as the solution matrix of the differential equation (32) with - g instead of g. The following theorems hold: 1) The determinant of the solution matrix Q of (39) is constant, independent of Q, and has, in particular, the value 1 when (e.g., at Q = 0) Q = 1. 2) The inverse Q-’(n) is obtained directly from Q(Q) (with det Q = 1) by a simple rearrangement of its elements. If A, 6, C,D are two-row matrices and A’, B’, C’, D’ are their transposes, then:

C. The Step Matrix

If Q, are the nodes of a raster, where n is an integer, then for the step matrix of Q we have

-

Q(Qt + 1)Q - ‘@L) = C - ‘(On+ i)Qm,,,

+ 1 C(QJ.

(44)

243

CALCULATING THE PROPERTIES OF ELECTRON LENSES

On,,+ is the “phase-rotated” step matrix of the interval (R,, R,+

\Y, + uy, - x y + uy, whose real elements are determined by the value of the function q = p2(g + ih) at the boundary points of the step interval and whose first and second integral are determined over the step interval. For the reciprocal phase-rotated step = Q,;+,, rule (43) holds matrix

Qn+

1.n

=

i

xa

+ ua,

y, - 08, - uy, - Y , + u,,

-Xy

Ya -xa

+ ua, + Uar

-xp - ug,

-Yp

-Yy - u y , Xy

Xa

- uy,

+

+

-Yg - up xp -

ua,

Ya - u a ,

Ya -Xa

For the matrix Q at R,, with n a positive integer, we have

Q(QJ= C-1(Q,)Q,,-1,,,Q,,-2,,,-

1

+ Va + ua

.

(46)

.60,1C(WQ(QJ

(47)

Q(W= C-1(R,)Q,+i,,Q,+2,a+i... Qo,-iC(%)Q(%).

(48)

1..

and for Q at R, with n a negative integer,

The formulae for the elements of the phase-rotated step matrix (45) were developed on the basis of the differential equation for the four-row quadrupole matrix Q(n) 10

0

1

o\

using the rotation matrix C(R) (31) for a prescribed complex-valued function q(R). Here C and q are to be understood as four-row matrices with real elements, according to the rules (37) and (38). For each step interval (a,,a,,+ we form the following quantities, which will be needed as “building blocks” in the formation of the matrix elements of Q , , n + l :

a+ia=

qdR,

b

+ ip = - ] z + ’ ( O

-

*‘ +2*“+‘)qdR

(50)

244

E. HAHN

-

The following formulae for the elements of the phase-rotated step matrix On,,+ (45) reproduce the exact solution n-tI

Q(R)Q-’(Q,) = 1

n-

+ ‘6 +

Dn6 + D 0%

+ ...

(53)

of the differential equation dQ -=DQ dR

(54)

with an accuracy of the order (fin+ - QJ6. The terms of the series (53) consist of an intricate series of integrations (the symbol n-means integration with respect to R with the lower integration and matrix multiplications, which require a solution. limit 0,) We note the results: 1 2

X, = - U S

- b - d-

s4

+ 120

1 S4 y , =-as - p - 6-2 120 1 2

+ y(c2 +

1 y, =-as

s4 + p + 6-120 + y(c2 + r’,,,

s3 xp = as+ a- - c - s 2

5

SZ

yp = as+ a-

5

60 3

-yS-S2 60

3

S2 x = a - - c - s s -2 5 60

s2

s3

5

60

y =cr--y-s

- (d2

5760

s4 S6 + b + d- 120 + c(c2 + y2)- 5760

s2

= S + (UC

S6 72)-

~6 = - U S

2

U,

+ 7’)- 5760 S6

C(C’

-2

s3 + ~ 7 24 ) - - (c’ +

+ 8’)-

s4

1440

s3

U, = (U? - w)-

120

s4 + ( ~ -6 7 d ) -120

S6

s5

+

c(c2

+

+

y(c2

s5 + y2)- 960

y2)-

960

(55)

CALCULATING THE PROPERTIES OF ELECTRON LENSES

245

“4

= u, - (cd

+ y6)”120 s3

vg = v,

- (ay - ac)-

60

up = -s

+ (ac + cry)-?203

- (c2

+

”5

up = -(up - crb) - (c6 - yd) 2-

180

uy = s

U?

=(

+ (CZ +

s5 72)-

480

s5

~6 yd)-.

720

D. Local Basis The fundamental matrix X in (34) is the product of successive step matrices X(tn+l)X-’(cn) = S(tn+l)C-’(Qn+ l)Qn,n+lC(Qn)S-’(tn).

(57) Each step matrix (n = 0, k 1,. . .) is calculated independently of the others. This constitutes an essential difference between this method and the usual integration methods with continuous calculation of the solution curve. The relation of the step matrix with the field functions (coefficients of the differential equation) is valid for a discrete interval of the t axis and z axis, respectively. We may say, therefore, that the step matrix (57) is a “discrete solution” of the differential equation (33). It is invariant under a basistransformation of the form S(t) = S‘”’(t)A,,

(58)

in which A , is a two-row matrix with constant elements: X(t,+

l)x-yl,)= s ~ ” ~ ( t . + , ) c - ’ ( ~ $:!,c(Q$))s~)-’(tn). ~ : ! , ) Q ~ ! , + (59)

@)and p(”)arethe phase and amplitude of S(”).The related building blocks of the transformed basis, (50) to (52), with q ( n ) = - p*( g + ih)

(60)

yield, when inserted into (45), the phase-rotated step matrix &,+ 1. The invariance of the step matrix (57) under a basis transformation (58)tells us that

246

E. HAHN

for each step t,, t,+ the basis S‘”)[as a solution matrix of (26)] may be chosen freely. If we take A , = S(t,), we have been careful to ensure that at t = t,, the amplitude p(”)has the value unity and the phase a(”) has the value zero.

V. COUPLING BETWEEN FIELD AND BASIS With a suitable substitution of the spatial variable (r,z) the differential equation (20) can be separated into simply constructed differential equations, in terms of the substituted variables. In the substitutions

p , y, R, and 0 are real, w is complex.

A. Field Representation The field expression x(r, r*, z ) differs from the magnetic scalar potential $ only by a rotationally symmetrical round-lens component of the magnetic field: $- =

(-- 1)”

22”(p - l)!(p + l)!

Yb2”l(z)r”r*”.

Y,(z) is the axial function describing the magnetic round-lens potential, and Y hzN1is its 2pIh derivative with respect to z. Of the components of the vector potential in cylindrical coordinates (r, a, z ) that, according to (6), describe the round-lens component, only the a Component A , = A(r, z ) is nonzero. For the z component - a$o/az of the magnetic induction B = curl A of the round lens we have

a

1 -__ -- -(r,

aZ

r ar

A).

For the derivative of (62) with respect to z it follows that

-a$- _ ax = -rz(h)

aZ aZ

ar

r ’

(44)

CALCULATING THE PROPERTIES OF ELECTRON LENSES

247

and for the sum of (63) and (64) one finds

$x(r,r*, z ) is the magnetic scalar potential minus the magnetic round lens potential $o(r, r*, z). A(r, z ) has the series expansion

The electric potential q ( r , r*, z ) is composed of the round-lens component qo(r,r*, z), the deflection potential qx,l(r,r*, z ) and the quadrupole potential q x , 2 ( r , r * , z )all ; higher multipole potentials are included in qx = qx,l+ qx,z. Similar relations hold for the magnetic potential $(r, r*, z). In the series expansion

Vi2”](z) denotes the 2pthderivative of the axis function of the k-fold magnetic multipole field. The vertical bar notation(. .. .) is introduced as a short-

I..

hand for a frequently occurring operation on two arbitrary complex quantities, A = a + ib and X = x + iy: We write ( A X )= + ( A X * + A * X ) = ax by. The electric potential q ( r , r * , z ) has an analogous representation in terms of the axial function Ok(z)of the k-fold electrostatic multipole fields.

I

+

B. Field Expression The pair rl(t) = h s i n n ,

r2(t) = & c o s ~

forms a fundamental system of the differential equation

The expressions sin R R,(z) = -, Y

cos R R ~ ( z=) Y

likewise form a fundamental system of the differential equations

d2R 7 dz + y 3 ( j + y ) R

d ’R 7+ y4Fy * R = 0. dz

(68)

248

E. HAHN

The differential equation (69) is of the paraxial type (23), if the differential expression F,(p) is understood to be the result of a basis-field coupling giving rise to a field expression FJr, r', z) and the trajectory in parametric form r(t),z(t) is inserted so that Fp appears as a function Fp(t) of the axial variable t . The differential equation (71) is of the paraxial type if the differential expression y4Fy(y) is specified as a field expression y4F(r, r', z) and the trajectory r(z) inserted so that y4Fyappears as a function of the axial variable z. In view of the substitution (61), the expression F J p ) in the differential equation is related to y4Fy in the following way:

(c + 4 c+cp @)I'

=-

($GT$y

YET$

(72) *

This relationship defines either y4Fy or Fp as a field expression, when either Fp or y4Fy is specified as a field expression. For this, we insert in the differential equation (20) for r the substitution (61) and obtain

L) = 0,

(73)

together with (the suffix 1 indicates that the separation is not yet definitive),

(73 ax

The separation of the field expression in the substituted differential equation (73) into H, K,and L is unambiguous only in the paraxial region, since H must be real and both K and L must be continuous for r = 0. In the higher order region, the ordering is not unambiguous, as one can see from the identities

which serve to transfer components of higher order from H I to K , or L,. It is

CALCULATING THE PROPERTIES OF ELECTRON LENSES

249

even possible to change components of higher order from K , to L, and vice versa, as can be seen from the following identities.

The above separation (74) to (76) is unsatisfactory for the following reasons: a) Magnetic deflectors and quadrupole fields can have an effect on H, through the a x / a z term, unlike electric multipole fields, for which this is not possible through the term dpo,/rdr*. In K, and L, total differentiation with respect to z is called for. For abbreviation we put

and form the identity

When this identity is added t o the differential expression (73), the field expressions H, K, and L take the following (definitive) forms

The differential expression D(r) in (73) goes to zero if we require in the first place that Fp,regarded as a field expression Fp(r, r’, z), be specified through the

250

E. HAHN

basis-field coupling

F,=H and secondly that w(Q) satisfies the differential equation W

+ w - Kw* - L = 0.

(85)

The first requirement means that in the differential equation (26) for the fundamental matrix, the field function f(t) may be written f ( t ) = H(r(0, r w , do).

(86)

The second requirement is fulfilled by the expression

in which Q(Q) satisfies Eq. (39) with

VI. REPRESENTATION OF THE TRAJECTORY A. Four-Vector We wish to transform the representation (87) into a representation for the four-vector of the trajectory

p and Rare the amplitude and phase of the fundamental matrix S(t),which is related to the field function f = H through the differential equation (26). S(t) possesses the basis representation (30). On the basis of the substitutions (61) one has

Furthermore, the following identity holds for an arbitrary complex quantity v(t) independent of the field functions H(t) and K(t), to which S and Q are

CALCULATING THE PROPERTIES OF ELECTRON LENSES

251

related:

If we insert (87) into (90) and add (92), we obtain

If we now take for v the quantities defined in (79), we obtain after making the substitution (61),

Finally we apply (65) and (83) and obtain the following expression for the four-vector (89) of the trajectory as a function of its given initial values in the plane-z,

252

E. HAHN

In this representation the four-vector that is relevant for beam deflection has the meaning

If we denote the electric and magnetic field strengths of the deflector fields qx,l and t,bx,l by E(D)and B(D),respectively,

E(D)= -grad cpx, B(D)= -grad $x, (97) and the z components of the B field that remain after removal of the round lens field by

then (96) takes the form

The interpretation as beam deflection arises from the fact that we have freed the quadrupole expression in the original structure (75) from the term

By this means the field expression H determining the fundamental matrix S is freed from components of deflector and quadrupole fields and the field expression K (82) determining the quadrupole matrix Q is reduced to that of

the effective quadrupole field expression in the paraxial region. In the representation (95), depending on a particular trajectory, the variable t is a substitute for z and vice versa, and the relationship established in (22) holds. The calculation of the fundamental matrix S and the quadrupole matrix Q is performed in that t or z raster in which the corresponding field expression is given. In the usual case, the appropriate raster is an equally spaced z raster. In the case of an electron mirror, it is necessary to change over to a t raster in the neighborhood of the reflection plane (mirror zone), since the reversal point zo depends on the individual trajectory. The transformation of the field expression from a given z raster to the corresponding expression for a t raster is certainly possible if the reflecting field and the modulated field of the mirror surface to be imaged can be described analytically.

CALCULATING THE PROPERTIES OF ELECTRON LENSES

253

B. Initial Matrix

The field function (86) H(t) is related through (26) to the fundamental matrix S(t). If we replace the field function H(t) by 0 + H(t), the initial matrix

corresponds to the identically vanishing field function, from which one originally began. The initial matrix with S(t = 0) = 1 is also defined through the differential equation

""+(-; dt

gso=o.

The amplitude and phase of the initial matrix Soare denoted by p and w t = tano.

Now S is the product of So and P:

(103)

S ( t ) = S,(t)P(t).

P(t), with P(t = 0) = 1, is coupled with the field function (86) H through the differential equation

The discrete solution of this differential equation is the step matrix formed in an analogous way to (44) I

P(wn+I)P-l(wn) = c-'(wn+ 1)Pn,n+lC(wn)

(105)

in which P,,,, is the phase-rotated step matrix. Since q = - H p2 is real, this can be reduced to the two-row matrix

The formulae ( 5 5 ) and (56)serve to calculate its elements, whose building blocks can be formed according to formulae (50)to (52)and transformed to the z raster (on the assumption that in the integration interval of the expression

254

E. HAHN

(16) c + @ is appreciably different from zero) as follows: Zn+1

H

The quantity So(z) denotes S,(t(z)) and likewise for To. The amplitude p = 1 + t 2 and the phase o = arctan t of the fundamental matrix (90) S,(t) are related to the abscissa z through Eq. (22) for t(z). The discrete solution of the differential equation

is the step matrix S(t,+ l)S-l(t,,),and this comes in a formal way from (57), if we regard S there as the initial matrix Soand Q as the phase o S(tn+1)S-'(tn)

=

SO(tn+l)C-'(0n+l)~n,n+l

c(on)so(tn)*

(109)

It is invariant under a basis transformation of the initial matrix So(t)of type (58), and we can assign to each step interval ( t n , t n + l ) , or ( Z ~ , Z , + ~ ) , a particular (local) initial matrix Sg)(t),for example, with the normalization SF'(t,)= 1 Sgyt) = S,(t)S,'(t,) =

(t

: t,

a);

The phase u'") and amplitude p(") of Sg' tano'"'

=

t - t,,

p(n)

=1

+ ( t - t,)Z

=

l/cos2(w("))

111)

should be used for forming the building blocks of the elements of the phaserotated step matrix P,,,,, instead of o and p in (107). The elements of the phase-rotated step matrix Q,,,, of Q in (95) rely on the fundamental matrix S (or alternatively S'"))as a basis. With p and R as amplitude and phase, the formulae for the building blocks (50)to (52) in the z raster (q = K/p2 = K)run

CALCULATING THE PROPERTIES OF ELECTRON LENSES

255

as follows:

d

+ ih = (p2K),,+, - (P'K),,

In the usual case, i.e., for an ordinary electrostatic/magnetic lens, the field expression (16) c + (P = (p(r,r*,z) + q - (JZ$rr)(J-rr>*

(1 13)

is appreciably different from zero in the optical channel. The z raster, in which the explicitly z-dependent functions in the field expressions H, K, and L are given, need not be abandoned when performing iterative trajectory calculation. The integration over o or R needed for the calculation of the elements of the phase-rotated step matrix (45) according to (50) can be carried out immediately by integration in a prearranged, preferably equidistant, z raster. Amplitude and phase can be extracted from the fundamental system of the corresponding fundamental matrix according to (28), and the differential d o or dR can be eliminated, based on (62), by means of dz.

CYCLE VII. ITERATION Given the initial values (trajectory parameters) r(za), r'(za), q, the fundamental matrix S, the quadrupole matrix Q and the four-vector (89) of the trajectory are calculated in each iteration cycle, according to the given formulae, at the nodes of the prescribed z raster. By insertion of the four-vector into the field expressions, these are turned into field functions. In the subsequent iteration cycle these lead to new values of S, 9, and the four-vector (89) at the nodes of the z raster. The series of iteration cycles is either broken off when a prescribed number of cycles has been worked through or proceeded with further if the results of successive cycles are not in sufficiently good agreement, or if the field expressions need to be supplemented by the inclusion of higher order terms.

256

E. HAHN

A. First Cycle

In the first cycle, the four-vector (89) is set identically equal to zero, whereupon the field expression is reduced to its axial distribution.

H

1 4

= -@z(z)

+ Il/z(z),

+

= K = e'2e{@2(z) i 4 a Y 2 ( z ) ] , (114)

m = @ , ( z )+ i 4 J o o ~ , ( z ) , n = 0. The initial matrix Soof each cycle always has the form of (100) or (1 10). The fundamental matrix Soof the first cycle is S(1)= s0 pel). (115)

PI')is determined via the field function ( H ( r = 0, r' = 0, z ) and S ( ' )contains the optical properties of the round-lens field in the paraxial approximation. The field function K(r = 0, r' = 0, z ) determines the quadrupole matrix Q('), in which p") and 51") are the amplitude and phase of S ( l ) . After carrying out the remaining integration over the deflector part and the matrix multiplication according to (95), we obtain the four-vector ( a r t ,r)(') as a result of the first cycle. Since a change of trajectory parameter (apart from = 0) has hardly any influence on the field function, no new integrations are necessary for other trajectories in this paraxial region. Between the (real) components of the four-vector of the trajectory in the plane z and the plane z,, there exists a linear transformation that depends solely on z and z,. B. Second Cycle and Aberrations of the Third Order

In the course of the second cycle, the coefficients of the transformation are found to be functions of the trajectory parameters, since the field function of a field expression now depends on the trajectory. Deviations (aberrations) arise with respect to the path of the corresponding trajectory in the first cycle, and these can be interpreted as image aberrations, when z and z, are optical conjugates. It is chiefly the third-order aberrations that are involved, since in the second cycle a field function is formed by the insertion of a trajectory of the paraxial region in the field expression. The polynomial of the aberrations of third order is not restricted to the geometrical parameters, beam position r(z,), and beam direction (r'(z,) in the object plane z,, but also includes a variation of the parameter q of the initial energy (chromatic parameter), as well as the ray position in the image plane due to deflection.

CALCULATING THE PROPERTIES OF ELECTRON LENSES

257

One may recall that in electron-beam lithography, the beam deflection is program controlled according to the illumination pattern to be reproduced. The number of (real) third-order aberration coefficients is considerable, of the order of magnitude of hundreds. The formal expression of the complete aberration coefficients analogous to the 10 or so third-order image aberration coefficients of conventional round-lens optics is not necessary in our procedure. The aberration polynomial of third order, for example, of a deflection objective in which only the beam direction (beam position in the pupil plane) and the beam deflection (ray position in the target plane) are varied contains a total of 35 complex polynomial coefficients. These can be calculated since 35 trajectories with different ray parameters can be computed in perhaps two cycles. The aberrations of the trajectories are related to the polynomial coefficients by a linear system of equations. The formal solution of the system of equations poses no difficulties, given a suitable choice of ray parameters. The (complex-valued) elements of the reciprocal matrix of the system of equations are either zero or roots of the equation X 8 = 1. The values of the polynomial coefficients obtained in this way do not agree exactly with the values of the formal image aberration coefficients of third order, since in our procedure the solution of the differential equation of the trajectory itself is not made from the standpoint of an approximate procedure in the sense of a development of the differential equation expression in powers of r and r’. This less-than-exact agreement has no influence on the final result, since the aberration coefficients of third order form only an intermediate stage on the way to a higher accuracy of trajectory integration.

C. Aberrations of Higher Order In the results of the third iteration cycle we obtain conclusive information as to what extent the limitation to third-order aberrations is permissible. An extension of the aberration polynomials to the relevant region of fifth order and higher by taking into account the appropriate terms in the field expressions is in principle possible. The description of the properties of an optical system by a multiplicity of aberration coefficients is, however, unsuitable for the evaluation of optical quality, which can be much better demonstrated by a reconstruction of the intensity profile. The “objects” of electron-beam lithography are deflection fields and apertures; its images are intensity patterns. Intensity profiles can be calculated by starting from a particular surface element in the target plane, calculating the representative trajectories backwards to the source, applying there the weighting of the directional brightness, and distinguishing the passage

258

E. HAHN

through aperture planes between transmission or blanking. This basic approach thus constitutes an accurate trajectory calculation. With the iteration procedure here described, the question of accuracy is reduced to the question of the required number of cycles and to the question of how accurately the electric and magnetic fields can be described or included in the array plane of the z raster. VIII. ELECTRON MIRRORS

+

It is a characteristic of the electron mirror that the expression c (p (113) has a (simple) zero position zo for each trajectory. This arises since the denominator in (16) increases indefinitely. The vanishing of the numerator for finite values of the denominator is not excluded and indicates that the trajectory crosses the potential surface (mirror plane) (cp q = 0) at normal incidence and is turned back on itself. In the application of the method previously described for trajectory calculation, two difficulties arise. The first is that in (22) the integrand for the matrix element t(z) of (100) So,is singular at z o , and secondly zo is not fixed but depends on the particular trajectory. These difficulties may be overcome if in the vicinity of the mirror field region (mirror zone), a t raster is employed instead of the z raster. Thereby, in the formulae (107) and (112), which serve for the calculation of the elements of the step matrix of P or 9, the integrand is finite in zo because of Eq. (17).

+

A. Determination of the Zero Position zo In the t raster, the zero position has the fixed value t = 0, and zo must be recalculated for each cycle. When the four-vector Eq. (95)has been calculated, its components for t = 0, which we designate by uo, v o , x o , y o , are inserted into Eq. (113) and the equation c + i$= 0 is solved for z = zo. For this an iterative procedure (regula falsi) can be used:

The iteration can be included in the course of the iteration cycle ( j = 1,. ..), as is described in the following formula:

CALCULATING THE PROPERTIES OF ELECTRON LENSES

259

ut), I#), x f ) , y $ ) are the components of the four-vector for t = 0 as a result of the jthcycle. zko) is the zero position of @ ( z )+ q = 0. zbo)(t)is the inverse function of

so that the field functions formed in the first cycle appear as functions of t. If, in the field expression of Eq. (19) (right-hand side), we replace (J?@ r', r) by {u(')(t),~ ( ' ) ( t ) , x ( ' ) ( t ) , ~ (obtained ' ) ( t ) } in the first cycle, Eq. (19) goes over to an ordinary differential equation for z(t). Its solution z("(t), with the initial values z")(t = 0) = zb'), dz")/dt(t = 0) = 0, forms approximately a solution to the differential equation system (18), (19). Thereby, the initial value zbl) is determined by Eq. (1 17) with j = 1. B. Calculation of the Abscissa Function z ( t )

1. Iterative Determination

As a result of the

jIh

iteration cycle, the values of the abscissa function

zci)(t)can be calculated stepwise and iteratively in the points t , of the t raster

by means of the formula

If we consider the components (u(j),d j ) , x(j),y(j))of the four-vector as known functions of t, then Eq. (119) is an exact step representation of the solution z(j)(t)of the differential equation (19). It is, however, a matter of an implicit representation, since the value z(j)(tv+ to be calculated first with the help of Eq. (I 19) is already required on the right-hand side in the calculation of the field values q(r, r*, z ) , d q / a z , d q / d r , and acp/ar* in the field expressions (113) (c + @) and (21) (c + @)'. The integral equation (119) can be solved iteratively if on the right-hand side the value z ( j - I ) ( t , + calculated on the previous cycle is used in place of z ( j ) ( t , +'). 2. Numerical Integration of the Diyerential Equation for z(t) One might possibly consider determining the abscissa function z(j)(t)by solving the appropriate differential equation. As a result of the j I h iteration

260

E. HAHN

the four-vector ,IT%/,( r) is given by (95) at the points of the t raster. Inserted in Eq. (19), this yields an ordinary differential equation of the second order for z(j)(t), which can be integrated numerically after setting the initial values z(j)(t = 0) = zbj), d z ( j ) / d t ( t= 0) = 0, by, for example, a Runge-Kutta procedure. Here use is made of the assumption-which is usually necessary in an iterative solution procedure of equation (ll9)-that the electric and magnetic fields together with their derivatives in the mirror zone are given analytically in r and z. For in each iteration cycle the mapping of the fixed t raster is different, and the mapping points z ( j ) ( t )of the t raster do not, in general, match those of a given z raster. Outside the mirror zone, in order to facilitate the transition from the t raster to the z raster, a transition zone is called for in which the t raster is flexible and accommodates itself with respect to the given (equidistant) z raster, in whose nodal points the values of the axial functions of the fields are defined. In the application of the Runge-Kutta procedure, this requirement may have the consequence that the step length t , + - t , has to be adjusted to a prescribed increase z,+ - z , of the solution; numerical integration methods for the solution of differential equations are not in general adapted to this task.

3. The Transition Zone On the basis of the given connections (119) between the conjugated step distances of the t and z raster, the solution of the equation

is easier in terms of t!,ji - tl;" than the solution in terms of the step distance in the z raster, since the coefficient of the linear term has in each case a prescribed fixed value. The contribution of the integral is small in third order and yields, by integration by Simpson's rule,

Since Eq. (120) takes the form of a quadratic equation in t;il - tl;" with coefficients whose values in the equidistant nodal points in the z raster of the

CALCULATING THE PROPERTIES OF ELECTRON LENSES

26 1

transition zone can be calculated by the use of the four-vector (95) of the trajectory resulting from the j I h iteration cycle,

In general, the coefficient of the quadratic terms is small compared with that of the linear term (in a uniform field it is zero), and one solves the equation iteratively. With Eq. (122) an adjustment is made of the abscissa differences of the (equidistant) underlying z raster of the transition zone to the (nonequidistant) abscissa differences of the particular overlaid t ( j ) raster of the transition zone. In order to be able to assign values of the abscissae t?) itself, one needs a knowledge of an abscissa pair ( t ( j ) ,z) whose partners t ( j ) and z are correlated. In the mirror zone the equidistant t raster predominates. To its zero point t = 0 is correlated, according to Eq. (1 17), the abscissa z$) of the reversible point. The abscissae z(J)(t,) correlated with the points t , of the (equidistant) t raster of the mirror zone are calculated by means of Eq. (1 19). The (equidistant) t raster has such a length that at least one mapping point z(j)(t,) comes to lie in the transition zone. Its distance to the nearest neighbor point z, of the (equidistant) z raster is known. Then Eq. (122) yields the distance t , - t, and thereby the abscissa t , correlated with this raster point z,. The requirement concerning the overlapping of the two rasters is based on the following reasoning. For the calculation of the elements of the phaserotated step matrix (%), (56), quadrature (50)is needed over the relevant step interval. In the region of the (equidistant) z raster, the values of the integrand are available only in the nodal points. In order to carry out the quadrature with sufficient accuracy, nodal-position values outside the actual integration interval are also needed. When using symmetrically arranged quadrature formulae (with constant step lengths), for reasons of accuracy, at the start of the integration in the range in which the z raster dominates, nodal point values from the transition zone are also needed. The width of the zone is determined by the number of iterative cycles. The nodal position values in the transition zone are predetermined by the integration process in the t raster. This is carried out in equidistant t rasters and the result, i.e., the matrices, S and Q as well as the four-vector interpolated at the image points t(j)(z,) of the underlying z raster of the transition zone. This has the advantage that all quadratures (and also interpolations) can be carried out uniformly with symmetrically arranged formulae at constant step length.

262

E. HAHN

We understand by “field call” the evaluation of a field expression by the insertion of coordinates (r,r’,z), so we can say: In the region in which the z raster predominates “field calls” exist only in grating planes of the (equidistant) z raster. For this, field representations of type (67) are suitable. The axial functions are primarily given at the nodal positions of the z raster and must not be interpolated in z. In the region in which the t raster predominates, i.e., in the region of the mirror and transition zone, “field calls” take place in the grating planes of the (equidistant) t raster. Since the z(j) raster (as a mapping) that is correlated to the t raster (as original) is in general not equidistant, it is necessary in the field representations of the type (67) with given axial functions in equidisThis tant z points to interpolate from the nonequidistant mapping points di). requirement on the field representation in the mirror and transition zone means that the fields must be expressed analytically not only in r but also in z. For an electron-mirror objective, as has, for example, been proposed in Hahn (1979) for the program-controlled modulation of a multiray bundle, simple field models are available in closed form, which is suitable for answering questions of contrast production.

LENSES IX. CONVENTIONAL A . Fundamental Matrix R(z)

The expression (1 13) for c + is everywhere (i.e., in the channel under consideration) different from zero, and it is usual to eliminate the variable t from the equations of motion (18), (19). This is based on optical considerations, i.e., the interest in the properties of the beam complex is of primary importance and not the mechanical considerations, which are concerned with individual trajectories and the temporal relations of the motion. Tacitly, thereby, the usual space-time considerations are used as an auxiliary aid, in which the abscissa z of the optical axis is used as the space variable and t as the time variable. This approach is a hindrance to the decision, based on mathematical and optical principles, to give to the variable t or the variable z or the basis variable R precedence of independence from a problem-orientated point of view. For us, t and z are substitutes of a basis variable, whereby the substitution of a particular optical state can depend on local or global circumstances. This becomes clear from the basis-field coupling that we have set up in Eqs. (73) and (84), with t as an independent variable and have to carry it out analogously with z instead of t.

CALCULATING THE PROPERTIES OF ELECTRON LENSES

263

I. S(t)as Basis For S(t) as a fundamental matrix of the fundamental system (68) the differential equation becomes

Fp is an abbreviation for the differential equation expression formed with the amplitude p

and takes on, as a result of the basis-field coupling(84), the meaning of the field expression (81) I$

= H(r,r’,z).

(125)

This is transformed by the insertion of the inverse function (22) z(t) and the four-vector (95) calculated in the process of iteration cycles, into a field function F,(t). A discrete solution of the differential equation (123) is the step matrix S(t,+ JSW1(tn), whose elements can be calculated with the help of the procedure developed in the section, Methods of Solution. 2. R(z) as Basis

For R(z) as fundamental matrix of the fundamental system (70),

the differential equation takes the form

F, is an abbreviation for the differential expression formed with the eigenfunction y(Q)

L’ + 1

F,(Y) = Y

and takes on, as a result of the basis-field coupling, for the purpose of

264

E. HAHN

satisfying the differential equation (20) for the trajectories,

(I=)" H 1 D(r) - -y4Fy c+@ r c+cp

+-

y4 +(W + w - K w * - L) = 0, w

the meaning of a field expression

( 129)

In order to rearrange these in a field function dependent only on z, one has to insert the four-vector (95) calculated in the process of iteration cycles in the nodal points of the z raster. On the basis of the relation (60)existing between p and y, we can transform S(t)into R(z).

and eliminate S(t)in Eq. (95). A discrete solution of the differential equation (127) is the step matrix R ( Z " + ~ )R-'(zn), . whose elements can be calculated with the help of the method developed in Methods of Solution. In the field expression (130) a problem arises with the additional term, as opposed to the field expression (125), since on it a double total differentiation is required. Clearly, this tells against the use of the fundamental matrix R(z), if this is directly calculated analogously to the fundamental matrix S(t). This disadvantage is not compensated by the advantage that the substitution (22) can be omitted.

3. Initial Matrix Ro(z) The substitution (22) can be regarded as a component of a method for calculating a special initial matrix Ro(z) that is based on the transformation (13 1). Belonging to the initial matrix

r i together with S,(t) and (22) for t(z), is the fundamental system

CALCULATING THE PROPERTIES OF ELECTRON LENSES

265

which is a solution of the differential equation

With the special initial matrix (1 32), the disturbing term in the field expression is removed, and there remains for the matrix P R ( z ) Pg(z) = R i ' ( z ) R ( z ) ,

(135)

the differential equation

R, has the basis representation

1/Y is the amplitude and o is the phase of R, and the differential equation for P R ( m )has the form

Comparison with (104) shows that by considering the existing transformation (61) between p and Y ( Y instead of y, o instead of Q), P R ( z )and P ( t ) coincide, whereby the existing substitution between z and t is described by (22). Analogously to (109),there holds for the step matrix of R. R(zn+l)R-l(zn)= R ~ ( Z n + ~ ) C - ' ( o n + l ) e n , n + l C ( o " ) R ~ ' ( Z n ) .

q

The phase-rotated step matrix -H/(c + @)Y4 is real.

(139)

en,,+ is a two-row matrix as in (106) since

=

B. Paraxial Region In the paraxial region the additional term in the field expression (13 1) does not cause trouble but ensures that the second derivative of the electrical potential in the field expression (1 30) vanishes.

266

E. HAHN

The study of the paraxial optical properties of a particular lens defined by means of the axial functions Oo(z) and Yh(z) usually begins with the calculation of two independent linear solutions of the differential equation. R“

+ f(z)R

= 0.

(141)

This differential equation is of the paraxial type and can be solved by means of the procedures given in Methods of Solution. A discrete solution of the differential equation (141) is the step matrix R(zn+l)R-l(zn). For, by supplying its initial value R(zo) = 1in the axial point z , of the “lens centre,” the fundamental matrix R(z) is the sequential product of step matrices. A disturbing term need not be eliminated, and we take as initial matrix Ro in the product (135) a special solution of the differential equation

By supplying at the particular raster interval ( z n , zn+ values of the initial matrix

the relevant initial

there holds for the initial matrix R,, generating the corresponding local basis,

R, has the basis form (137) with tanw

=z -

Zn

+ Zn+ 1 2

’

Y(w)= cosw

(145)

For P,(o) the differential equation has the form

this can be solved by means of the step matrix of P,

-

P,(zn + ~ ) P‘(Zn) R = C - ‘(wn+ 1)Pn.n+ 1C(wn).

(147)

By means of the local basis defined by Eq. (143),the rotation matrix has the following form in the interval delineated by the limiting nodal points zn and

267

CALCULATING THE PROPERTIES OF ELECTRON LENSES Zn+1:

c - l ( w n + l )= c(mn) =

il + (

1

Zn+ 1

- Zn

-Zn+1 - Z n

1

[

Zn+1 - 2 ,

y y / 2

1

2

1.

(148) The building blocks (50) to (52), out of which the elements of the phaserotated step matrix Pn,n+ (it is, as in (106),a two-row matrix since q = -fly4 is real), are formed, according to formulae (55) and (56), are: a = -J;;+'{l

+

(. -

"+'

+

2

'")'}. f d z ,

h = S : : + ' { l + ( z - zn+ 12+ zn)2}arctan(

c

=

-{

1

+ (zn+1;

z.)')'

- zn+ 12+

zn

).fdz,

- ( f ( z n + l+) f(z,)},

C. Checking and Reconstruction We now consider the procedure for the solution of the paraxial differential equation (141) as a development process of the step matrix (139) and as a solution of the differential equation (127)for the fundamental matrix R(z).The field function y4Fymay not at first have the special form (140)but consists as in (1 30) of two summands F and G. We therefore write instead of (127) dz

0

and designate R,+,(z) as the optical equivalent of F + G. A slice of field enclosed within two vertical planes with abscissae znand zn+ may be regarded as a partial lens, whose optical action can be described in the sense of a tangent

268

E. HAHN

mapping by the step matrix RF+G(z,+l)Ri;G(zn). For this, we also call the step matrix a partial-lens matrix and introduce into the paraxial optical case F + G = f the notation

If G is excluded (G = 0), the fundamental matrix RF is the equivalent of F. If G is included, RF is an initial matrix in the transition from RF to RF+, = RFP,. The coupling between P, and G subsists in the differential equation (146) (with G instead of f ) , whose solution is represented by the step matrix. On the amplitude 1/Y2and phase o of the initial matrix RF is supported the "processing" of G into the building blocks (50) and (52) that enter into the elements of the phase-rotated step matrix k,,,+ according to formulae (55) and (56). (The actual basis field coupling is described by the differential equation (54), for which (53) is an exact solution). In order to set the process in motion, the provision of an initial matrix is necessary. In the case of the field function (130), the initial matrix was the optical equivalent of F'=

-(.ycsCp)"/.yzq.

In the case of the paraxial field function (140), f,the initial matrix is the optical equivalent of F = 0, and the process yields, with G = f,the fundamental matrix R,. This is now the optical equivalent off, which in the sense of the development process means that F goes over into f and G into 0. In order that the process may once more assume an initial position, we have once more to extract from the basis formulation (137) of the initial matrix RF the nodal position values of amplitude and phase. O n these is once more based the processing of a newly compounded field function G , which is represented by its nodal point values at the points of the z raster. If we take G = -f, there arises as a result of this (second) process step, the step matrix of the initial matrix (144) R,

from which we began in the first process step. In this way, the partial lens thicknesses z, + - zn are reproduced. Since the abscissae of the nodal points of the z raster are prescribed in advance and, in the second-process step, not used explicitly, their comparison with the result (152) enables a check to be made on the accuracy of the calculation. If we consider the partial-lens matrix (151)(n = 1,2,3, N ) as primary data, we can reconstruct, with the aid of the above second-process step, the abscissae of the splicing positions (vertical planes) in which two successive

CALCULATING THE PROPERTIES OF ELECTRON LENSES

269

partial-lenses touch. Here the supplying of the splicing position values f, = of the field function is again required. It is now worth noticing that we can also reconstruct, without prior knowledge of the splicing position values J,, i.e., with only a knowledge of a sequence of partial-lens matrices, the partial1. This is a result lens thicknesses and the partial-lens excitation n T n of the theory of micro-lenses developed below. This opens up the possibility of operating, in a corrective manner, on the step matrices calculated according to Methods of Solution so that these are self-consistent in relation to the z raster and to the field, and also form a consistent solution of the differential equation (141). f(z,)

+

x. THEORY OF MICRO-LENSES A . Macro-, Partial- and Micro-Lenses

The optical properties of a conventional lens (macro-lens)-its axial field function (140) may first of all fall to practically zero outside the interval (z,,z,)-derive from the fact that each trajectory is a linear addition of two linear independent solutions of the paraxial ray equation. Each object side incident line (object ray) is optically conjugate to an exiting straight line (image ray). This collineation rests on the premise of continuous and steadily progressing trajectories that smoothly bind the object and image rays over the lens interval (zl, z,). I . Partial Lens Image

If the interval (zl,zr) lies inside the field, the object and image rays are represented by the trajectory tangents in the vertical planes .z = z1and z = zrr respectively, of this partial lens. The cardinal elements determined by the partial lens matrix (15 1) R(z,)R-'(z,)

=

(

J!

zFB represent the abscissae of the object and image side focal points, fA, the focal lengths, and and QB the values of the electric potentials in the object and image side vertical planes z, and z,, respectively)

(zFA,

fB

270

E. HAHN

describe the collineation, caused by the partial lens, of the object rays and the image rays in the sense of a tangent mapping. If the lens interval ( z , , z r ) of a macro lens is covered by N gap-free and non overlapping partial lens intervals ( z ~ ,z~ ~, , (n ~ = ) 1,2,. ..N ) , the lens matrix of the macro-lens is the product of the partial-lens matrices

(1; -@)=( T

-qN

- 2sN

(

-41

irN). tN .. -2s1

31.1 -t1

).

(155)

The concept of partial-lens arises in connection with the splitting up of a macro-lens according to its fields. The lens matrices are thereby the results of the solution of the differential equation of the trajectories for prescribed axial functions of the field and division of the axial step intervals. The arrangement of the partial-lenses along the optical axis ( z axis) is thought of spatially and we speak of a “partial-lens image,” when we are looking at the situation chiefly from a spatial point of view. Consequently, the so-called first process step in IX.C, in which the partial-lens matrices are calculated, is strongly determined by the characteristics of partial-lens imagery. 2. Micro-Lens image If we regard the lens matrix as an expression of the optical action and think of this, as in the designated second-process step, as given beforehand, we can then interpret the product representation (155) as a splitting up of the macrolens according to its optical action. The component part in this kind of splitting we call a micro-lens. The micro-lens can be considered as an “event” whose action is characterized by the change from one state to another; the first state characterizes the situation before the micro-lens action, the other the situation after. By “state” we mean the structure of a ray complex in an overall field-free space (ray space), which comprises the image-side field-free space z 2 zr,nof the n”‘ partial-lens and the object-side field-free space z I z ~ , of~ +the~ (n + l)Ih partial-lens, where zrPn = z ~ , ~is+ the common “dotted” splicing position. We speak of the “micro-lens image,” when we mainly have it in mind as a time concept. In the micro-lens image, we are concerned with a time series (n = 1,2,. . . N ) of those ray spaces that touch the space of the partial-lens image in its “dotted” splicing position. In the partial-lens image, the “dotted” splicing positions are the nodal points of the z raster. In the role of a fixed-space reference system, the z raster exists as a whole, i.e., it is always at hand. The optical equivalent of the field function appears in the form of trajectories that are represented by their ordinates at the nodal points of the z raster. In the micro-lens image, from the whole of the z raster only one nodal position is present, i.e., of the nodal points of the z raster, there exists (momentarily) in the

27 1

CALCULATING THE PROPERTIES OF ELECTRON LENSES

ray space only one dotted splicing position. The metrics in the ray space are thus decoupled from that of the z raster. The spatial separation of two consecutive dotted splicing positions, i.e., the partial-lens thickness, is not “measurable” in the traditional sense. For this (to be possible) the two ray spaces, correlated with both raster points as dotted splicing positions, must be present simultaneously. That is, however, not possible since the existence of the two ray spaces, by the action of the micro-lens (as “event”) do not coincide in time but are separated. Moreover, in the step matrix (151) representing the micro-lens, no information concerning the step length z,+~ - z, is “stored,” which was in fact required during its formation phase in the first process step previously mentioned. In the micro-lens image we consider the step matrix as given and reconstruct thickness and excitation of partial lenses in such a way that these-carried over into the partial-lens image-have their optical equivalent in precisely these step matrices.

3. Reconstruction of Trajectories We consider the differential equation of paraxial type (141). Let R,(z), R,(z) be a fundamental system and let R be the fundamental matrix according to (126). We specify two neighboring nodal points of the z raster as abscissae z = z1 and z = z, of two vertical planes of a partial lens. This posseses, for z < zI, an object-side field-free space and for z > z,, an image-side field-free space. In the region between z1and z,, the field function f(z) is optically active. The fundamental matrix goes over, on the object side, into R,(z)

and on the image side into R,(.T)

(-

R,(F) = z

-

2,

(157)

:)R(z,).

The axial points with the abscissae z and Tare optically conjugate with respect to the partial-lens (zl, z,), if in Rr(z)R;’(z) = (?!

=(;

z,

y ) ( ~ : ~ I:)( ;;;), -z+z,

O) 1 (158)

the matrix element a,, is zero. R,(z) describes a state of R in the object space of the partial-lens and, correspondingly, for R,(T) in the image space. Both the object space and the

272

E. HAHN

image space extend from z = -LO to z = +a.A solution y(z) of the differential equation (141) may have the value y;, y, in z1and the value y:, y , in z,. Between these two line elements there exists the transformation:

In the partial-lens image, both line elements exist in both vertical planes of the partial-lens simultaneously, next to each other. This idea is applicable to all partial-lenses of the macro-lens. A solution curve is shown as an envelope of the line elements defined in the nodal points of the z raster. In the partial-lens image, the reconstruction of trajectories is relatively simple on production of the partial-lens matrices, since the step lengths remain unchanged and so may be regarded as given in advance. This method of reconstruction of trajectories is not transferable to the micro-lens image. 4. Identity and Optical Conjugation a. Basis Function

The function (basis function) P(z)

defined by the above quadratic form, satisfies the differential equation 1 PI1 2 P

- --

(1

;)2

-

)(;

2

+ f(z)

= 0.

The quadratic form with i12

= D,,D2, - D f 2

>0

is positive definite. With a partial-lens for which f ( z ) = 0 outside its lens interval zI 4 z I z,, P(z)goes over, in object space, into the object side tangent parabola

and in the image space into the image-side tangent parabola 12

The geometrical defining components of both tangent parabolae are the abscissae zpASand zpeSof their apex points, their vertical heights PASand PBs,as well as their curvature parameters

CALCULATING THE PROPERTIES OF ELECTRON LENSES

273

b. Basis Variable When we define an axial point (object) in relation to the mapping of a partial lens in such a way that we place it at the starting point (vertex) or, in general, at the intersection point of a ray bundle, for the purposes of an unique determination of its position on the axis, we must add whether the relevant intersection point is meant to be asymptotic or real on the object side or asymptotic on the image side. In the micro-lens image, the identity of a ray bundle that makes its appearance in successive ray spaces is ensured by optical conjugation. To this end, the relation between the intersection point of a ray bundle in the ray space and the variable that localizes the intersection point must be unique and the description of the optical conjugate metrically simple. Since the optical state of the z axis in relation to the ray bundle is three-valued, we replace the axial variable z by the basis variable o according to the following definitions: n n a. Object-space condition - -- - < w < LIP 2

2

dz

n

n + + w ( z ) = jl vjl 2

~

71 --

-

z - ZpAS

=-

cot

2

1 (? + ->. j l 2

n n b. Lens-space condition - - < o I 2 2

c. Image-space condition

n n <w <2 2

-

n + Vjl ~

The parameter jl serves for normalizing the intervals of the lens space in terms of the basis lengths w(z = + 00) - w ( z = - 00) = n Cm

n=

dz

p'

274

E. HAHN

The functions

form a fundamental system. Two axial points z and Tare optically conjugate, when their w abscissae satisfy the condition 3,

-(W - w ) = m. n, P

(173)

where m is an integer that depends on the particular spaces in which z and Z may lie. Since we can take it for granted that the action of the micro-lens is weak (Alp = 1 + E with 0 < E << l), m = 2, for example, for the case that w lies in the object space and W in the image space.

5 . Eigenfunctions and Basis Elements a. Eigenfunctions We replace the basis functions P(z) by the eigenfunction Y(w)of the micro-lens

The eigenfunction satisfies the differential equation Y

+ (1 - H(w))Y = 0.

(175)

It has the boundary values

and is positive in the interval ( - 4 2 , n/2). The dots now indicate differentiation of Y with respect to the basis variable multiplied by the eigenvalue (Alp). The vertex abscissae zI,z, of the partial-lens correspond to the abscissae w,, w, with - 4 2 < w, < w, c 4 2 on the “basis length” of the microlens, The “impact function” H ( o ) is zero outside the interval (wl,wr) and within the interval is related to the field function f (z) of the differential equation (141) through z(w). H(w) = f-. (4

Y4

b. Tangent Parabola and Osculating Eigenfunctions we have as a solution

(ot,w,),

(177) Outside the interval

CALCULATING THE PROPERTIES OF ELECTRON LENSES

275

with the “amplitude parameters”

Its connection with the determining components of the tangent parabola (163) and (164) consists of

The apex abscissae zpAS,zpBSof the tangent parabolae are independent of and oES, since they bisect the asymptotic the corresponding w abscissae uAS interval in question.

For the distance to each splicing position zI, z,

The abscissa z can be written as a substitute for w in the object and image space z - z l =+[cot;(;+o~)-cot;(;+w)], 1

[““ti (-5 +

wr) - cot;

(-5+ 41.

(183)

c. Basis Elements We characterize the two amplitude parameters u,, v,, the eigenvalue l i p , and the two vertex abscissae q,w, as the elements of a basis. They form a complete set of five definitive parameters for the description of the optical action of a micro-lens. Two parameters may be freely chosen, the other three are determined by the matrix elements of the micro-lens. This

276

E. H A H N

becomes clear from the representation:

V sin?P

-9 = - U

(-:

sin- P 2

+ ol), + o1

)

1

I

- r = uvsin-?r, 2 P

I

sin - (or- a,)

-1 -2s = wv sin! P

P

(E2

+ ,,)sin:

(-;

+ or)'

(184)

One obtains them if one forms the step matrix (153) with the fundamental matrix R(z) in the basis form (137)

remembering that

_1 -d-o - Y 2 P dz

The meaning of the basis elements for the formation of the lens matrix arises from the product representation

CALCULATING THE PROPERTIES OF ELECTRON LENSES

277

The factors in the matrix product from right to left can be explained as follows: Insertion of the spacing zpAS- z,, elliptical contraction (u), twofold rotation in the same sense about l n / 2 p , elliptical dilation (v), and insertion of the spacing ‘r

- ‘PBS. B. Basis Transformation

1. External and Internal Aspects of the Concept “Micro-Lens” The definition, previously given, of the concept of “micro-lens’’ leaves open the question of to what extent its optical action as an optical component of a macro-lens may be regarded as primary data. This question is closely allied to the formation process of the micro-lens and its answer depends on which of two aspects is regarded as primary. Under the first aspect, which we shall call the external aspect, the microlens is an operator in the form of its lens matrix, which gives rise to a collineation between two ray spaces, i.e., object and image space. The procedure developed in IV for the formation of the step matrix as a discrete solution of the differential equation (141) can be understood in this connection as a formation process. Under the second aspect, which we call the inner aspect, we consider the micro-lens, not in its external optical action, but in its representation as an invariant of the basis transformation. Under this aspect, the micro-lens is flexible on the basis of the two degrees of freedom in the group of the basis transformations that leave the optical action invariant to the outside. The reconstruction of the basis elements can be understood as a formation process that, for the purpose of fullfilling the requirements of consistency, reacts correctively on the lens matrix through “phantom” matrices. In connection with the substitution of the axial variable z by the basis variable w , we have introduced the concept of “basis length” (Basis-stab)with the meaning “axial component” as the carrier of the basis variable. If the eigenvalue Alp of the micro-lens is not less than unity, the interval -3n/2 to +3n/2 is adequate for mapping the z points of object, lens, and image spaces by means of the substitutions introduced in (166) to (170)onto the o points of the basis length. With the provision of five basis elements, the basis is determined. It is sensible to furnish each basis with its own basis length as a carrier of its own basis variable. Insofar as the basis length has the significance of a one-dimensional coordinate system (coordinate axis), the micro-lens space is similarly endowed with the implicit concept of object, lens, and image space. By a basis transformation, we understand a transformation of the basis variable, which is brought about by a change of basis. We call one the w basis, i.e., a basis

278

E. HAHN

with the basis elements (uw,uw, Alp, w,, w,) and the other a cp basis with the basis elements (uq, uq, cr/q cpl, p,). The invariance of the lens matrix under a basis transformation expresses itself in the existence of the four equations given in (184), which can be reduced to three defining equations, since the determinants qt + rs of the lens matrix have the identical value 1 in the basis variables. The basis transformation is the counterpart of the external action of the micro-lens that is described by the lens matrix and exerts a collineation between two ray spaces, the object space and the image space. The concepts of external optical action, lens matrix, collineation, and ray space are bound up with the definition given under the external aspect of the concept of micro-lens. To the inner aspect of the concept of micro-lens belong the concepts of basis transformation and basis length. Let us think of given objects in object space: Within the framework of this one-dimensional theory, this means axial points that are marked out by intersection points of exiting ray bundles and represent an optical state are considered to be retarded. By means of the substitution (166), valid for the object space, this state is transferred to the object-side interval of the w-basis length and by means of the optical conjugation (173) is copied onto the w interval provided for the lense space. Anticipating the optical action of the micro-lens, the optical state in the image-side ray space is considered to be advanced and is marked out by intersection points of incident ray bundles. By means of the substitution (164) (with 9 instead of w), this state is transferred to the image-side interval of the 9-basis length and by means of the optical conjugation (173) is copied onto the cp interval of the lens space. The basis transformation enables one to compare simultaneously, from their identity, both optical states, which do not exist simultaneously in the micro-lens image that is endowed with the external aspect. Of service here is the mapping of the micro-lens space on itself (oP cp), and this mapping is possible on the basis of the existence of the groups of basis transformations whose invariants are the optical action of the micro-lens. 2. Representation of the Basis Transformation We can split the lens matrix into the product

and set out for the two factors the following identities:

CALCULATING THE PROPERTIES OF ELECTRON LENSES

An

279

An

An

By q, r, t we understand the quantities defined in the expressions (184). Since the left-hand side of (189) is basis invariant, [the same also holds for (190)] we can, on the right-hand side of (189) formally replace the basis elements (urn,urn,Alp, wI, w,) of the w basis by ( u q , uq, a/r, cpl, cp,) of the cp basis. If we then rearrange the lens matrix according to (188) and subsequently eliminate it by means of the representation (1 87), three different forms, MA, M,, ME,arise that are correlated with the transformations of the basis variables w and cp in the object, lens, and image spaces, respectively. As to the meaning of the equations M A

this will be discussed below.

= M, = ME,

(191)

280

E. HAHN

. In sin - P 2 . I n -sin - P 2

I n cos - P 2

I n

cos - P 2

V,

O

ljl

1

-v;c o t q -T; + c p r )

The equation MA = MB implies the invariance of the lens matrix with respect to a basis transformation. Since the refractive power - r / 2 of the micro-lens (T 1 . A sin - n = - u p , sin - n (195) 2 P z is a basis invariant, there follows from MA = ML an expression for the separation of the apex abscissae zpASand zQAsof the basis functions PA(z)and QA(z)of the object space, on which the o or cp basis, respectively, are founded.

- -r = - u,v,

ZQAS

-

There follows analogously from the equation ML = M, an expression for the distance between zPBsand zqBSof the basis functions PB(z)and Q B ( z ) of the image space

CALCULATING THE PROPERTIES OF ELECTRON LENSES

28 1

3. Basis Transformation in Object and Image Space

The basis transformation in the object or in the image space can be extracted from (183), since the basis variables w and cp belong to the same z point (as primitive form). They can be written in vectorial form by means of the transformation matrices defined in (192) and (194), respectively. In object space one has

One of these two equations serves to define the quantity p

and the other contains the invariance of (183)z - z,. Analogously, because of the invariance of (183) z - z r , one has in the image space cos - cp

cos --w

V,(cp) and VB(cp)are analogous to YA(w) and YR(w),respectively, the osculating eigenfunctions of the cp basis. 4. Basis Transformation in Lens Space

The basis transformation in the lens space is derived from the invariance of the differential

282

E. HAHN

and the condition that the boundary points of the lens space in o and cp should have in all cases the values - n/2 and + n/2. Let Q(z) be a second solution of the differential equation (161) with the curvature parameter 0 (instead of A) and the normalizing parameter z (instead of p). This can be expressed, as was the case with P, in a positive definite quadratic form

Q(4 = QuR:

- 2QizRiRz

+ Q22Ri7

(203)

whereby, analogous to (162), the coefficients of the relation satisfy a 2 = Q i i Q 2 2 - Q:2.

(204)

If we use Q(z) as a basis function instead of P(z), we denote the basis the eigenfunction as V ( q )[(instead of Y ( o ) ] and , variable as cp (instead of o), the basis elements as uq, uq, O / T , 'pi, cp,. Between Q(z) and V(cp)there exists the relation (1 74) a V2=

Q'

From (202) we have for cp(w) the differential equation

in which the coefficients L , M , N depend on the elements of the o and cp basis. From (204) one can immediately write L2 - M 2- N 2 = 1.

(207)

The general solution of (205) may be written in the form

The complex matrix elements a and b (a* and b* denote the complex conjugates of a and b) with the normalization

aa* have the following meaning:

- bb* = 1

(209)

CALCULATING THE PROPERTIES OF ELECTRON LENSES

283

In (208)the parameter a/z determines the basis length cp(w = +n/2)cp(w = -42) and the integration constant fl, the basis position. With

we put (208)in the real form

and the coefficients L , M , N are represented by the elements in the matrix M

(" MN

L + N )=MINI.

The transformation (212)with (206)for p(w)is interpreted as a reflection of the micro-lens space on itself. This space consists of the interval extending from - 4 2 to n/2on the basis length, whose metrics is described by (202). An invariant of the metrics is the differential displacement dz. In contrast to the transformation (198)in the object space (and analogously to (200)in the image space), where an axial point (object) exists as a primitive image of w and cp,w and cp have no corresponding primitive image in the lens space. This is, however, not necessary, since the correlation between the basis variable w and cp, in view of the required invariance of (202)because of (212),is already predetermined. One need only determine, therefore, the two parameters a/z and in (210).If we put M, into (212)with (913)

+

M = M,, (214) we have so arranged both these parameters that the boundary points - 4 2 and +n/2 of the w basis are, at the same time, boundary points of the cp basis

The reflection matrix M of the basis transformation (212)can be defined in two ways. If we consider p (positive) to be defined by (206),since from (207) the coefficients L, M,N are fixed in advance, the elements of the matrix M are determined by (211)and (210),when the eigenvalue a/ z and the phase constant fl are so determined that the condition (215)is fulfilled. Thus in terms of the function w(cp),now defined in (212),

Vcp) = p(w)Y(w) is the reflected eigenfunction, on the cp basis, in the lens space.

(216)

284

E. H A H N

On the other hand, if we consider in (212), the matrix M defined by (214) with (193) as ML, since the basis elements u,, u,, A/p, and up, up, a/z are predetermined (fixed) from a consideration of (195), then (212) is identical with the two determining equations (199) and (201) for p. Now w and cp in (199) are basis variables in object space and in (201) basis variables in image space, and to this extent an imaging is described by means of (212) with M = ML, between object and image space as a transition of a fixed w basis in object space to a fixed cp basis in image space as follows:

1

1

o\T1

or by inverting and exchanging the two bases 1 up

a

z 2

If we eliminate the action of the lens matrix in (217) by inserting the product representation (187), (217)M, goes over into (194)M,. If we elimi-

CALCULATING THE PROPERTIES OF ELECTRON LENSES

285

nate the action of the lens matrix in (218) in the same way, (218)M, goes over’ into (192)M,. The content of the basis transformation (212) with M = M, in the form (217) or (218) is the imaging equation for conjugate and axial points z and Z of the object and image space. (z

- zFA)(y - z F d =

(2 19)

C . Micro-Lens Series I . Basis Transport

Two successive micro-lenses (index n and n -t- 1) have a common splicing position z?,, = zl,,+ 1. We denote their basis variables as w,,, and qfn+ l). Their lens matrices are determined by their basis elements urn,,,u,,,, (L/p),,,w ~ ,wr,n ~, and iiO,n+l,uO,n+l,(u/t),+l,~l,n+l cp,,,+,,respectively.The transformationof the basis variable is of the type (212)

The matrix M,+ stems formally from (218) if we denote the elements of the o basis by index n and the elements of the cp basis by index n 1. Since the transition from the w-image space basis into the 9-object space basis takes place with no optical action, the lens matrix in (218) may be replaced by the unit matrix E = 1.

+

286

E. HAHN

The content of the transformation (220) with (221)Mn,n+1 is the definition of P n + 1.n

P n + 1.n

=

V i n + l(d

(222)

YB,n(-o)

and also the identity of z - z ~ and , ~z -. z ~ ,1.~ + By a basis transformation of o ( non ) q('-

(..l.I-l (T

cos-cp

i-1: II

cos --o = pn-l*nMn-lVn[-sin

(223)

the matrix Mn-l,n arises formally from (217) if the elements of the o basis with index nand the elements of the cp basis with n - 1 are provided. The transition from the -o basis into the 50 basis takes place with no optical action, and the lens matrix in (217) may be replaced by the unit matrix E = 1.

I n sin - The content of the transformation (220) and (224) is the definition of pn-

and the identity of z - zr,n-l and z - z ~ , whereby ~ , = z ~is,the ~ common (dotted) splicing point of the two micro-lenses with index n - 1 and n.

2. Coherence

The basis transport transformation (220) has two degrees of freedom with respect to the initial basis of the micro-lens with index n, since from the

CALCULATING THE PROPERTIES OF ELECTRON LENSES

287

+

five definitive basis elements of the micro-lens of index n 1, three are fixed by (184). Therefore, for fixing the micro-lens basis (index n + l), two further conditions must be fullfilled. We denote as coherence conditions the two closely related expressions

(Ug)n+ 1

= (ug)n.

(227)

In this case, the transformation matrix Mn+l,n takes on the simple form of a rotation matrix

Mfl+i,.=c[((0)

I +1

- l):+((:)n-l);],

(228)

= 1. This means that the osculating eigenfunctions b,n(cp(n)) and (cp("+ l ) ) are the basis-related descriptions of one and the same "object."

and pn+

&,,+

1,

This object is an eigenfunction V(cp)of the macro-lens, which appears in the image space of the nthmicro-lens as

and in the object space of the (n + l)Ih micro-lens as

Between the basis variable q(n)and there exists the relation

cp("+l),

because of the coherent bonding,

The integer rn depends on which of the three spaces of the relevant microlens the basis variables cp(") or cp(n+l) happen t o occupy, respectively, If both basis variables lie in their respective lens spaces, then rn = + 1. The description, with respect to a coherent basis of an eigenfunction V(cp)of a macrolens in the lens space of nth micro-lens is as follows:

288

E. HAHN

Between the basis variable cp!") of the nthmicro-lens and the basis variable cp of cp(n) I cpp), we have as a the macro-lens in the vertical plane interval cpi"' I result of (202)

Since here V,(cp("))agrees with V(cp)in the correlated points and cp, there exists a linear transformation in the relevant splicing or raster interval cp,, I cp I cpn + 1 , respectively,

The eigen value a/? of the macro-lens is determined in such a way that in the macro-lens space, the boundary points cp = +n/2 of the cp scale correspond to the f03 points of the z axis. The integration constants S,,serve the purpose of fulfilling the conditions of the gap-free and overlap-free connections of the partial lens interval with the splicing positions (Pr,n-1

= 'P1.n = (Pn,

Vr.n

=

(235)

' P l . n + ~=

on the cp-scale of the macro-lens

f [(;J

0

? - 1 = n=

1

-

I],

n-1 ~ n = = L n = ~ , , n - l =

q;)v-l];-jn[($-

v=l

11;.

(237)

The analytical expansions of the osculating eigenfunction (230) and (229) in the relevant partial-lens interval are

The eigenfunction V(cp)of the macro-lens can be derived from this, since the constant amplitudes and phases in the lens space of the relevant micro-lens may be considered as continuous and differentiable functions of the basis variable cp of the macro-lens.

The procedure of the coherent-basis transport yields the abscissae of the nodal points (splicing positions) and the nodal values of u(cp) and S(cp).

CALCULATING THE PROPERTIES OF ELECTRON LENSES

289

Further, in these nodal points (240)

so that V ( q )is the envelope of the coherent series of osculating eigenfunctions. 3. Recursion Formulae for a Series of 'Coherent Bases

Given N partial-lens matrices

whose product (155) represents the action of a macro-lens, we define the splicing planes of two neighboring partial lenses (index n = m - 1 and index n = rn, i.e., 2,) as "lens center." We go over to the micro-lens image and fix in each of the two micro-lenses separate bases that are coherent with one another. The fixed basis of the micro-lens (m) is an initial basis for a progressive determination of the coherent basis elements of all micro-lenses with index n > m, and the fixed basis of the micro-lens ( m - 1) is an initial basis for a progressive determination of the coherent basis elements of all micro-lenses with n m - 1. Both initial bases are fixed by

-=

With these initial conditions, the coherence conditions (226) and (227) are fullfilled. 1 is a length, that with 1 = 1 represents a unit length from which all other sizes of dimensions can be related to a given length (for example, the focal length of a macro-lens). a. Recursion Formulae for n 2 m

290

E. HAHN

($[; +

-5+

cpi"'] = (;)n-l[

I+.,

40, 0 - 1)

(244)

1351

tan[($+[(;)n-= -tan[(:).

-

1171

+

%.n%.

cos[(;)n

n

-

-{

-2sn++cot(;)n-l[-;+ t

IIn ..-1)]}

uv,n

b. Recursion Formulae for n I m -- 1

cot[(;)n

- 1171 =

g& t

+ .I.+"].

- cot 1

Uv,n

=

{-tn+$cot(;)n+l[;+

uwl

-

COS[($

cpi.+'i]},

lIn

($[-; + cpj"i] + [($

+ .y+')]

= (;)n+l[;

tan{(;)$)

.-

]

=tan[(:).

-l

- 71,

1]5}

=+.qJq Uq,nVv.n

If the refractive power of a micro-lens (n) vanishes, r,

= 0 and (c/z),

4 . Consistency

In the N

(245)

+ 1 nodal points cpn on the cp-scale of the macro-lens

= 1.

CALCULATING THE PROPERTIES OF ELECTRON LENSES

29 1

the values of the functions u ( q ) and 6(q) are determined: (n = 1 . .. N )

The nodal position values of the eigenfunction V ( q )are

and these define the eigenfunction V ( q )of the macro-lens in the framework of the limited accuracy due to the discrete dividing up of the optical action. The thickness of the nlh partial lens is reconstructed from the relation (202)

By numerical integration, nodal point values in the vicinity of the integration interval can be included. The situation is thus similar to that which arose previously in the determination of the building blocks (50) with a prescribed field function in a discrete raster. Since here, however, the values of the derivatives of the integrand are known from (240), advantageous integration formulae can be used, based on Hermitian interpolation. We assume that each micro-lens has a refractive power (- r / 2 ) that is greater than zero. In that case, the phase function 6(q) increases monotonically in the interval -n/2 to +n/2 and the inverse function q ( 6 ) is single valued. With (239), V ( q )follows from (240)

[f

]+

-cos - q - 6 ( q )

(250)

dduq Integration yields the consistency condition

If V ( q ) is to be an eigenfunction of the macro-lens, it must satisfy the differential equation (1 75)

Since V ( q )is reconstructed, we can consider this equation in connection with (177) as a determining equation for the reconstructed field function f(z)

(253)

f(z) = V4((P)H(d= u4(v)c0s2 z

292

E. HAHN

and obtain for the reconstructed excitation of the micro-lens

r+’

f ( z )dz =

[’“

&, + 1

(254)

U2C(P(6)l d6.

Also here, the differential of the integrand in the nodal positions 6, is known, in view of (250), so that numerical integration based on Hermitian interpolation may be used. In general, the reconstructed magnitudes (249) and (254) (right-hand side) will differ by small amounts from the values on the lefthand side, which are to be considered as given in advance.

-12rn =

[Ii1

f ( z ) d z - [6:t’u2[p(b)]d6,

(255)

d

s,. v20’ 9”tl

-, ;2

= (z, + 1 - z,)

--

-

d(P

(256)

By applying the nodal position values of the field function f(z), the derivative of the integrand of (251) is also known, in view of (253), and numerical integration based on Hermitian interpolation can be applied. In general, the consistency condition (251) will not be fullfilled exactly, and a correction 6/C is needed.

,.

log, un = log-u n + 1 un

Un

+

j”’ril [:

tan -q@)

]

-6

dh.

(257)

As will be shown below, the correction magnitudes set out in (255), (256), and (257) can be regarded as elements of a “phantom” matrix whose optical action arises as a result of the testing for consistency and through the formation of

i.e., through the transition of the micro-lens matrix from the “uncorrected” (no superscript bars on quantities) to the corrected state (with superscript bars), whereupon the “phantom” matrix disappears. If we, in fact, develop the elements of the partial-lens matrix (153) in a series of ascending powers of thickness h = z, - z l , there arises, since we develop the fundamental system R,(z), R 2 ( z )at each of the two vertex points in a Taylor series and eliminate the

CALCULATING THE PROPERTIES OF ELECTRON LENSES

293

higher derivatives by means of the differential equation (141), 1

2'"

=

L+l + f f l h

= 1J

-4

+

2

+

l

+fn

2

O(h3),

h2 2!

1

h3 3!

- - - f:, -

2

+ O(h4),

A comparison of (259) with (255) and (256)justifies the interpretation of the corrections ?,/2 and -2S;, as elements of the phantom matrix. For the ratio q / t of the partial-lens matrix we have

-4 = I t

f:+1 + f:,h" 2

+

O(h4),

3!

For a phantom matrix, this ratio is, however, a free choice. Even so, it is permissible for s^ and -?I2 to be also negative. Externally, i.e., as component parts of the splitting up of the macro-lens according to its optical action, the phantom matrices do not make their appearance, except as correction matrices, differing only slightly from the unit matrix. They allow a splitting up of the optical action in the micro-lens image, which is consistent with the primary data of the partial-lens image. From the representation (184) of the elements of a micro-lens matrix, it appears that in the limiting case A/p = 1, the matrix elements -4 and --t have the limiting values v/u or u/v, respectively. Since the determinant of the phantom matrix has the value 1 we determine that:

-

V

-ij, = 1 ( 1 - ?,$")1'2, un

-t,

u, = ,(1

- ?,Q1'?

un

The elements of the phantom matrix (n = 1,2,. . . N ) are determined by (255), (256), (262), and (257). Inserted in (258) they give rise to a new dismantling (155) of the optical action of the macro-lens whose parts (factors) as micro-lens matrices reproduce more exactly the unchanged prescribed partial-lens thickness and field excitation and fullfill better the consistency conditions (251). O n this sequence of lens matrices, altered in this way, the correction procedures by means of phantom matrices can be applied iteratively. As a result of the procedure described, there arises a series of step matrices that can be designated as a consistent solution of the differential equation (141) in terms of the prescribed raster.

294

E. HAHN

XI. QUADRUPOLE OPTICS A . Osculating Micro-Lenses

1. The Basis Variables of Macro- and Micro-Lenses In the development of the theory of micro-lenses we have taken as a starting point that the solution of the differential equation of the paraxial type (141)is given in the form of a field function, defined in the nodal positions of a z raster, in the form of a sequence (155) of step matrices. In the partial lens image, this sequence corresponds to a fundamental matrix (123), in terms of which a fundamental system R,(z), R2(z) is defined. Each positive definite quadratic form (160) P ( z ) of a fundamental system can serve here as a basis function of the macro-lens. In accordance with (166)-(171), R is set up as basis variable and, in accordance with (174), Y(0)is set up as an eigenfunction. The eigenvalue A / p is determined in such a way that the points f c o of the independent variables in the differential equation of the paraxial type correspond to the boundary points f n / 2 on the R scale of the macro-lens. The independent variable in the case of the differential equation (141) is the abscissa z with the fundamental matrix R(z), and in the case of the differential equation (23), the independent variable is t with the fundamental matrix S ( t ) . The optical conjugation of axial points in the object, lens, and image space caused by the action of the macro-lens is described by an equation of the form (173). In the micro-lens image, to each step matrix (139) or (109), respectively, as well as an eigenvalue (L/p),,and there is related an eigenfunction Y(")(o)(") w(")scale. The step matrix (micro-lens matrix) is an invariant of the basis transformation that in its partial-lens image corresponds to a change of the fundamental system. The five basis elements u,,", u,,,, (A/p),,, a?),cop' define a basis. They determine the elements of the lens matrix (184) and describe a basis-related eigenfunction of the macro-lens in the form of osculating eigenfunctions (1 78) in the object and image space of a particular micro-lens. If in a sequence of micro-lenses the bases are coherent, then the envelope of the osculating eigenfunctions of the micro-lenses is an eigenfunction of the macro-lens. In the S cop', the basis variable 0 of the macro-lens is micro-lens interval 01") Io(") a linear substitute of the coherent basis variable of the particular micro-lens.

&=nqi)v- 5 [(;)"v= 1

114-

o=n+l

11;.

(264)

CALCULATING THE PROPERTIES OF ELECTRON LENSES

295

2. Step Matrix S(t)as Lens Matrix We now take as a starting point that the solution of the dimerential eqaution of the paraxial type (23) is given in the form of a sequence (155) of partial lens matrices

that have been prepared as step matrices (109). In the partial lens image, this sequence corresponds to a fundamental matrix (24) S(t)by means of which a fundamental system rl(t), r2(t)is defined. Any positive definite quadratic form p ( t ) of a fundamental system can serve here as a basis function of the macro-lens; in accordance with (166) to (171), with formal t instead of z and p(t) instead of P(z), as a basis variable R instead of o and, in accordance with (174), Y(Q) as an eigenfunction. Analogous to (185), S(t)has the basis representation

which agrees with (30),if p is replaced by p / A and R by AQlp. In the lens space of the micro-lens there exists between R and t , analogous to (168), the transformation

3. Optical Conjugation with a Lens Section

The optical action of a component part of a macro-lens (lens section) may be represented by a sequence of micro-lenses with indices n to m. By the iterative application of (220) one obtains the transport basis transformation in a particular lens space of the micro-lens between the variables w(")and dm) within the limits of the lens section.

If the bases of the micro-lens series of the lens section are coherent, then

296

E. HAHN

p,,, = 1 and M,,, is the product of rotation matrices (228) Mm,n

=

-

(269)

C(6, - 6,).

In the partial-lens image t n + l - t , and t m + l - t , are the thicknesses of the two partial-lenses, corresponding to the micro-lenses at the beginning and end of the lens section. For vanishingly small thicknesses, the eigenvalues ( A / P ) ~and ( A / P ) ~take the value 1, and we denote the basis variables of the osculating micro-lenses at the point R, by o and at the point R, by 0. The lens section extending from t, to t,,, in the t raster, on the particular w scale of the two micro-lenses abutting the macro-lens section, gives rise to an optical conjugation c5 = w -

13.. q(;)” -

v=n

B. Conjugation of the R a y Coordinates

1 . Nodal Position Matrix of the R a y Coordinates

If, along the component path (t,, t,), quadrupole fields are operative, the question of optical conjugation between the basis variables o and 0 of the touching micro-lenses arises once more. As well as a differential equation of the paraxial type (23), we now have to deal with a differential equation of the type (33), whose solution in the form of a fundamental matrix (34) SO came into the formulation (95) of the four-vector of the trajectory (89). The field expression (75) or (82), respectively, is different from zero and optically active in the paraxial region, as is revealed by the differential equation (85). From the coefficient of w*, another characteristic of the intersection properties of the ray bundle is determined that establishes an optical conjugation on the basis of a line-grating imaging. The differential equation ( 8 5 ) for the “ray” w(R) has the solution

s stands for the nodal position matrix

s-1 =

cosp

-cosa (272)

w, = w1 + iw,, wg = w 3 + iw, are the ray coordinates at the nodal positions R, and Rp.

297

CALCULATING THE PROPERTIES OF ELECTRON LENSES

2. Transformations of the Ray Coordinates The four-vector (w,,w,) of the ray coordinates, as a result of (87), has the form

The quantities fi and fi indicate the beginning and the end of the active length on the Q scale of the macro-lens, inside which the field expressions K and L, defined in (82) and (83) are effective. Before the occurrence of this activity, the ray w(Q) is in the state A

sin -(Q - Q,) P w(Q) = A sin - (Q, - Q,) P

A

"'a

-

sin -(Q P

-

Q,)

A

WB 7

(274)

sin -(a, - 0,) P

where w, and w, are the values of w in arbitrary nodal positions Q, and 0,. After the occurrence of activity, i.e., as the active path is being traversed, w takes up the state 2 A sin -(Q - aZe) sin -(Q P P W ( 0 ) = (275) w, WB A sin - ( Q ~- *), sin '(*, - *,) P P

a,)

2

where W, and W, are the values of

W

in arbitrary nodal positions

a, and

3. State of the Ray in the Space of the Touching Micro-Lens

The beginning and the end of the active path (fi,h) is limited by a corresponding micro-lens. Since we may assume coherence, in each lens interval ( w , , ~ , )or (O,,Wr), respectively, the basis variables w and W are linear substitutes of the basis variable Q of the macro-lens. In the lens interval of the micro-lens at the beginning of the active path we place the nodal points Q, and Q,, and in the lens interval of the micro-lens at the end of the active By means of (263) we can go over from the path, the nodal points a, and basis variable of the macro-lens to that of particular micro-lenses and obtain for the touching ray, before the activity event

a,.

w(0) =

sin(o - a) sin(w - p) wfl(Q), sin(a - B) w ~ ( Q )- sin(a - p)

(274)

298

E. HAHN

and for the state of the touching ray after that,

We have passed from the micro-lens with nonvanishing lens interval (wl, w,) to the touching micro-lens and have tacitly ignored- the condition that the nodal positions o, = u, us= p, and 0,= CC, GB= p, respectively, lie in the lens interval of particular micro-lenses. This is allowable since already in (273) the above-named transition could be carried out.

In this representation Rand are the beginning and the end of the action path on the Q scale of the macro-lens. w,(R), w,(Q) are the coordinates of the touching ray at the nodal points u, p in the lens space - n/2 w < n/2 of the touching micro-lens at the point R, and W u ( a ) , WB@) are the coordinates of the touching ray at the nodal points CC, Bin the lens space - 4 2 < 0 < 4 2 of the touching micro-lens at the point fi.

-=

4 . Relation Between Ray and Trajectory

The ray (274) w(R) is according to (61) a substitute for the trajectory r(t) with p ( t ) as the basis function of the macro-lens and @(t)as the rotation-angle function. In the object or, alternatively, image space of the lens section ( R , a ) of the macro-lens, the trajectory tangent is a transform of (276) w(R) or, alternatively, (277) w(a), with p as a basis function (tangent parabola) of the particular touching micro-lens. The trajectory tangent in object or image space of the lens section has the nodal position value r., rB or, alternatively, Fa,Fo,which arises by a rotational stretching (61) of the ray coordinates w,(R), wa(R) or, alternatively, W,(Q), W,(n). The conjugation (278) of the beam coordinates contains the identity of the trajectory before and after the occurrence of activity of the field expressions K and L. In the investigation of the astigmatic conjugation that the fieid function gives rise to between the w and the KI scale, the rotational stretching plays no part in it. It has significance only in connection with the supplying of imaging magnifications and position angles of the line-grating imaging in real three-dimensional space and has, therefore, not been taken into consideration. Likewise, we dispense with a consideration of a deflection term caused by the field expressions L in (278).

CALCULATING THE PROPERTIES OF ELECTRON LENSES

299

C . Collineation I . Collineation Matrix

For the study of the astigmatic conjugation between the basis variables w and W of the micro-lens touching the action path (Qfi) of the quadrupole field, the collineation matrix K = ( k i , j ) K

= s(C1,

p)Q(fii)Q-’(Sz)~-’(~, p)

(279) forms a starting point. It is an expression of the quadrupole action and describes a transformation existing between the ray coordinates in object and image space of the lens section (Q, fi)

By object and image space, respectively, of the lens section, we understand the lens space of the micro-lens touching the lens section on the object or image side, respectively. The choice of the nodal positions w = a,fl, or, alternatively, 6 = CC, in the object and image space, respectively, is at our disposal. However, the difference between a and p and CC and respectively, must not be a multiple of 7r for then, the determinant of the collineation matrix sin2(g- 8) det K = sin2(@- /?)

p

p,

is not zero and the transformation described by (280 ) is invertible. The ray coordinates correlated with each other through the collineation matrix represent one and the same object. In the partial lens image, the lens spaces (ray spaces) of both touching micro-lenses exist simultaneously. By “object” should be understood the trajectory that binds smoothly the object side ray with the image side ray. In the micro-lens image, both ray spaces are considered sequentially in time. The conjugation described by (280) is an expression of the identity of single rays and not of points (axial points), if these have their origin in the vertex of one and the same ray bundle. For the property of a ray bundle, in any ray space, to possess an intersection, at which all rays of the bundle intersect, does not hold good under the action of a quadrupole. In order to solve the identity problem of ray bundles in the microlens image, even in quadrupole systems, we require, analogous to XI.A.4.b, the unambiguous and metrical simplicity of the quadrupole conjugation.

300

E. HAHN

2. Line-Grating Imaging The first two equations of (280) set out a relationship between the ray coordinates w l , w2 in an a plane (a plane whose axial point is the nodal position w, = a) and the beam coordinates W1, W2 in an &plane, together with the ray coordinates w3, w4 in a plane, as parameters. By linear combination, we can eliminate any one of these two parameters

a

+ [ki2kz31w2

k23Wi

- k ~ 3 W 2= [ k i i k 2 3 l w i

k24wi

- k14$z = [ k i i k z 4 l w 1 f

Ck12k241W2

[ki3kz4IW4,

(282)

+ Cki3k241W3.

(283)

-

In the formulation of astigmatic imaging, two-row minor determinants of the collineation matrix K must be formed, which we abbreviate as follows =

i <j,

[k,kj,],

k
E

{ 1,2,3,4}.

From (279) with (272) it is clear that the minor determinants [ k , 3 depend on a and Cr except for the factor sin(a - 8).The requirement Lki3k241

=0

(284)

kz4] (285)

is, for a given a, a determining equation for ti in the form of a quadratic equation in tan E or, for a given Z, a determining equation for a in the form of a quadratic equation in tan a. Two planes whose abscissae a and E fullfil equation (285), are said to be astigmatically conjugate. In the w l , wz plane (a plane), a family of parallel “object lines” is distinguished and, likewise, in the W,,W2 plane ( E plane), a family of parallel “image lines.” If (285) is fullfilled, the two equations (282) and (283) are linearly mutually dependent and therefore give rise to the same relation. In the w l , w2 plane, a straight line of the direction cp = $ 742, touching a circle of radius p at the point pcost), p sin $, is described by

+

w1 cos t,h

+ wz sin $ = p .

With

there holds for the direction cp of the object lines

(286)

CALCULATING THE PROPERTIES OF ELECTRON LENSES

30 I

and with

there holds for the direction @ of the image lines

The relation of ,5 to p is the imaging magnification of the line-grating imaging

3. Symmetry Requirement

The foregoing formulation of line-grating imaging is nonsymmetric in two respects. Firstly, there is a difference between the formulae for the directions of the object and image lines. The reason for this lies in the way that the relation between the object and image rays is set out in (280).The form of the definition of the collineation matrix (279) is symmetric with respect to object and image space, but its action on the object and image rays in the form given explicitly in (280) is asymmetric. If we wished to define the collineation matrix K = (k,) in terms of its action and hence choose the form (280), this definition would not have the same rank as that of (279) because of the aforesaid asymmetry. This is also clear from the formation of the reciprocal collineation matrix. This may be obtained on the basis of (279) by the rules of matrix multiplication, in the form K-’

= s(a,B)Q(R)Q-’(Zi)s-‘(a,

F),

(292)

in which the commutation of object and image space finds expression. On the other hand, this operation, which according to the rules of matrix calculation, yields the reciprocal of the transformation matrix of (280), of itself enables no such relationship to be recognized. Secondly, the particular choice of the elements of the collineation matrix for the formulation (285) is asymmetric. The designation of the astigmatic conjugated axial points by c1 and Cr has a merely formal character. We could just as well have formulated the astigmatic conjugation for the pair of variables p and Cr, or p and 8, or a and That would have meant that in (285) the minor determinant [ k 1 3 k 2 4 ]could have been replaced by [ k l , k z 2 ] or [k3,k4,] or [ k 3 3 k 4 4 ] , with which, however, the asymmetry would not have been removed. Since two-row minor determinants of the collineation matrix are important for the formulation of the astigmatic conjugation and line-grating

p.

302

E. HAHN

imaging, in symmetric form, we shall proceed to formulate the quadrupole properties in elements of a six-row matrix, which is composed from the tworow minor determinants of the four-row quadrupole matrix.

D. Formulation with Six-Row Matrices I . Quadrupole Lens Matrix

The four-row quadrupole lens matrix H(Q R) = Q@)Q-'(R)

(293) with a fixed lower limit R and a variable upper limit of the action path according to (39) with (88), satisfies the differential equation

a

0 dH(fi,R) A -

= K(a)C-l(iR)[o

- dR

P

0

0

1

0

0

0

0

0 -']C(;R)H(a,R). 0 0

(294) If we keep the upper limit fixed and consider the lower one as a variable, we obtain for the reciprocal lens matrix

a

H-'(G,R) = Q(R)Q-'(a),

(295)

formally the same differential equation as in (294),

lo

0

1

o\

2. Frame Matrix A four-row matrix Q = (qik)has 36 two-row minor determinants, which, abbreviated as in (284), we write as

The rearrangement of the 36 minor determinants in a six-row matrix scheme is given in (298). The two digit numbers lj and kf correspond to the row/column

CALCULATING THE PROPERTIES OF ELECTRON LENSES

303

numbers, respectively, if the numbering of the rows and columns is carried out as indicated on the left-hand and upper side, respectively.

k l

1 2

1 3

1 4

2 3

2 4

11,22 11,32 11,42 21,32 21,42 31,42

11,23 11,33 11,43 21,33 21,43 31,43

11,24 11,34 11,44 21,34 21,44 31,44

12,23 12,33 12,43 22,33 22,43 32,43

12,24 12,34 12,44 22,34 22,44 32,44

3 4

i j

1 1 1 2 2 3

2

3 4 3 4 4

13,24 13,34 13,44 = Qt. (298) 23,34 23,44 33,44

The frame matrix has two important properties. The transpose of the frame matrix of Q is equal to the frame matrix of the transpose of 9. (299)

QtT = QTt.

If Q is the product of 2 four-row matrices Q 1 and Q 2 ,this also holds for the corresponding frame matrices.

Q=9192,

9’ = Q i t Q 2 ’ .

(300)

3. System Matrix

By multiplication from the left of the frame matrix Q by the “prematrix” V and from the right by the “postmatrix” N

V=

1

0

0 0

0 -1 1 0 1 0

, o

i:,

0 N = (0 0 1

0 0 -1 1 0 0 -

0

1 0 0 0 0 1

0

0

-1 0 0 - 1 0 1 0 0 -1 -1 0 0

0 0 0

0 1 - 1 0 0 :I3 -1 -1 0 0

I

there arises the six-row system matrix S(‘) = 4VQ’N

= (sik).

(302)

The matrix elements sikconsist of a sum from every 4 two-row minor

304

E. HAHN

determinants. For example, the elements s1 and s52 go as

+ [31,42] + [13,24] + [33,44]}, =${-[11,34] + [21,44] + [12,33] - [22,43]}.

sll = ${[11,22] ~ 5 2

The product E = V N / 2 of the prematrix and postmatrix contains only elements f 1 in the principal diagonal and is designated system unit matrix E. In the product NV/2, only the codiagonal is occupied by & 1. 1 -1

-1 -1

,

1

1

$NV=

1 -1

1 1

-1

(303)

For the inverses of the prematrix and postmatrix one has

-

($V)-' = V T = N E.

($N)-' = NT = EV.

(304)

In the formation of the system matrix of a product Q = Q,Q, one has, as a result of (300) and (304),

.

s(ala2)= s ( a I ) .E s(a2).

If we apply this product theorem to the case Q2 = Q ; ' , there results E = s'a). E s(a-l )

.

(305)

(306)

since the system matrix of the four-row unit matrix I yields the system unit matrix S(') = E.

(307)

If we apply to (302) the operation of transposition by the application of (299) and (304), we obtain s ( a T )= E(s(Q))TE.

(308)

The above relations are identities in the elements of the four-row matrix Q as mutually independent variables. 4. System Matrix of the Reciprocal Quadrupole Lens Matrix

If (293) H(a,O) is a four-row quadrupole lens matrix, that can be developed according to law (294) or (296), then there exists between the

CALCULATING THE PROPERTIES OF ELECTRON LENSES

305

reciprocal H-' and the transpose H T the relation, according to (43)

The factors of H are to be translated as four-row matrices according to rule (38). Their system matrix is equal to the six-row unit matrix 1.

s(-:

3 = sc - 3 = 1

(310)

The product theorem (305) then provides S0-l)= ES(H*)E

(31 1)

and comparison with (308) (with H instead of 9)yields S(H-l)

= (S(W)T.

(312)

In the transition of the quadrupole lens matrix (293) H(fi,R) to the reciprocal H - there corresponds in accordance with (295), the exchange of R and and, in accordance with (312), the reflection (transposing) of the elements of the ) the principal diagonals. system matrix S ( H at

',

5. Diflerential Equation of the System Matrix

The six-row vectors of the system matrix S(")as functions of the lower limit R of the action path have the components (313)

A . . B IJ. . C IJ. . g ?I. . h ?I . . y .V '. lJ

with the index numbering as in (298). They are solutions of the differential equation system with (88) K(R) = -D(R)ei2d(n).

k = -D(gcos26 + hsin26) B

A

= Dsin2-R

e(gcos26

c1 A

C = Dcos2-R c1

j=O

+ hsin26)

(gcos26 + hsin26)

A

1 + Bsin2-R +

A

II + Bsin2-R +

P

P

(314) P P

306

E. HAHN

The dots denote differentiation with respect to (Alp)R. The six-column vectors of the system matrix S(")as a function of the upper limit of the action path have the components

a

BkL

ck,

gkl

ykl,

Kkl

(315)

with the index notation kl as in (298). They are solutions of the differential equation system (314) with d instead of R. 6. Structure of the System Matrix

If we insert (312)into (306) (with H instead of Q), integral theorems of the differential system (3 14) arise in the form of orthogonality relationships of the row or column vectors of the system matrix with respect to the indefinite system metrics founded on the fundamental tensor by means of (303) E

.

S(") E(S'"))T = E.

(316)

This equation for the system matrix S ( His) a consequence of equation (309)for the quadrupole lens matrix H. The analogous form to (316) is

For fi = R, the quadrupole lens matrix H = I and the system matrix S ( H = ) E. Stemming from the last equation of (314) 9 = 0 the elements yij of the sixth are all column and those of the sixth row in the system matrix of H(a,Q) zero up to s66 which has the value - 1.

A l2 B 13 14

e g h

23 24

7 3 4

sll

s2,

s12

s13

111

s44 s54

s51

0

0

0 0

I1

sgl s41

slJ

s14

I

0

0

I"

s45 s55

0 0

0

-1

5

(sa) = S'"),

The decomposition of the matrix schemes in the ranges I to IV corresponds to the different transformation relations of the matrix elements in the transition of the quadrupole lens matrix H, which generates the system matrix S("),into H' = ei'He-iT.This transition takes place by a coaxial rotation z or Z; respectively, about the coordinate system of the complex ray coordinates in the lens space of the touching micro-lens at the beginning or end, respectively,

CALCULATING THE PROPERTIES OF ELECTRON LENSES

307

of the action path. The elements of region I remain unchanged and are therefore rotation invariant. The elements in range I1 and 111, in the complex form g + ih and + ih, experience a rotation of eir or e" respectively. g'

+ ih' = e"(g + ih),

9' + ih'

= (9

+ ih)e".

(3 19)

The transition of elements of region IV is described in the same way as for the transition from H to H'. (H')

Sl"

= i i (H) - i r

e S1"e

*

(320)

The determinant of the two-row matrix Si":(matrix region IV) is equal to the determinant of the three-row matrix S$) (matrix region I) det S r ) = det S:!)

(321)

7 . Reciprocal Collineation Matrix

The 36 two-row minor determinants [kik,jl]of the collineation matrix (279) K = s(E,p)H(Q R)s-'(Ix, p)

(322)

will now be placed in the frame matrix K t formed according to the scheme (298). For this we have, according to (302) with (304) 2K t

= V TS(K)N

(323)

Here S(K)is the system matrix to be formed from (322) K, according to the product theorem (305), and we obtain the form 2 K t = (VTS(s(8.B))E)S(H)(ES(S-'(a.8))NT ).

(324)

The nodal position matrix (272) s(cr,p) satisfies a relation of the type (309)

and we can form the frame matrix of the reciprocal collineation matrix K-' = S ( C L , ~ ) H - ' ( ~ , ~ ~ ) S - ' ( E , ~ )

(326)

and find the relation

The operation on the right-hand side consists of a reflection of the frame matrix K t about its codiagonal and the multiplication of the second and fifth rows and columns by - 1. In this way, we are in a position to remove the asymmetry that exists in formulae (288) to (291) for the line-grating imaging,

308

E. HAHN

8. Line-Grating Imaging in Symmetric Form

If the abscissae c1 and iiof two planes, perpendicular to the axis in the lens space of a particular micro-lens, touching the action path (0, fulfill the conditions of (285), then there exists between these two planes a line-grating imaging. The object lines have the direction q~

a),

and the image lines the direction (p

For the image magnification m = p / p one has

The necessary two-row minor determinants [kij,J required here are elements of the frame matrix (324), which is formed by multiplication of the system ) V T SrS('*P)1E from the left and by ESIS-'(aJ)lN from the matrix S ( H by right.

.

sin 28

1

sin 28 cos 2p 0

0 0

sin(cc - 8)

cos 2 j

-cos(a

+ 8) + p)

-sin@

-

-sin@

p)

0

0

+ p) + p)

sin(% cos(u

0

0 0

-sin@ - p) -sin(a - p )

sin 2a cos 2a

CALCULATING THE PROPERTIES OF ELECTRON LENSES

309

For a coaxial rotation, by Zand z, respectively, of the coordinate systems with complex ray coordinates in the lens space of micro-lenses touching fi and a, the nodal position matrices s(E, and s(a, B) must be multiplied by eii and e", respectively, and the strongly boxed-in two-row matrices shown in (331) and (332) must be multiplied by the rotation matrix (31)C(2Z) or C(-22), respectively.

p)

E. Form of the Astigmatic Conjugation 1 . Collineation Function

p,

An astigmatic conjugation of the pair variables c1 and CC, fl and Z,a and or p and is indicated by the vanishing of the corresponding corner elements [k13k24], [kllkz2], [k31k42]r or [k33k44], respectively, of the frame matrix K'. In the formation of the corner elements, only the first and sixth row of matrix (331) and only the first and sixth columns of matrix (332) contribute, whilst from the system matrix (3 18)S'"),only the elements from the three-row matrix region I contribute. An astigmatic conjugation of the pair variables w and W is indicated by 1 (sin 2 6 ) cos20

(::: s31

s12

s22 s32

:::) s33

1 (sin 2). = 0. cos2w

(333)

From the multiplication of the first or last line of (331) with the system matrix S(")arise functions of the type &(O) = S l k

+ sZksin 20 + sJkcos2 0 ,

(334)

which in the series k = 1 to k = 6 are denoted as image-side "collineation functions" A(G),B(O),C(O), G(O),H(G),T@). (335) Analogously, from the multiplication of the system matrix S(")by the first or last column of (332), arise functions of the type A,@) = sil + sizsin 2w

+ si3cos 2w,

(336)

which in the series i = 1 to i = 6 are denoted as object-side collineation functions. A W , B(4,C(4, G(@, H(Q.4, (337)

rw.

The form of the astigmatic conjugation (333) with the designation (335) and (337) A(G) B(w)sin 20 C(O)cos 2 0 (338) = A ( o ) + B(w)sin 2 6 + C(w)cos 2 6 = 0

+

+

310

E. HAHN

represents a quadratic determining equation for tan w and tan 0 , respectively, with the solutions t a n 6 = B ( o ) - D(4 C(w) - A(w)'

- D(0) t a n w = B(0) C ( 0 ) - A(W)'

(339)

where D(0) and D(w)are defined, apart from the sign, by D2(w)= B 2 ( 0 )t C 2 ( w )- A2(w)

(340)

P(0)= P(5)+ C y W ) - P ( G ) ,

An unsatisfactory aspect of this representation is that, firstly, the reality condition 0 2 ( W ) 2 0, D2(w)2 0 is not guaranteed and, secondly, b(0)and D(w)are two-valued. 2. Reality Condition

The scalar products formed with reference to the indefinite system metrics ( v T . s ( S ( i 9 8 ) ) ES'H)). E ( v T . s(s(&%). ES(H))T = NV (341)

.

and (S(")ES(S-'(U.")NT)TE(S'H).

ES(S-'(G@)INT)= NV

(342)

allow the validity of

+ CyW) + Z P ( W ) - P(0)= 0

(343)

+ C ~ ( O+) ~

(344)

2 ( 0 )- BZ(6.j) - P(0)

and ~ ( w-)~

2 ( w)

cz(w)

( w-)r2(w) =

o

to be recognized in view of (303)for NV. As a form of the astigmatic conjugation, we can, instead of a quadratic determining equation with the solution (339), also provide a linear equation system for the components sin 20, cos 2 0 of the row vectors of (333) (and analogously for the components sin 20, cos 2w of the column vectors) B(w)cos 2 0 - C(to) sin 2 0 C(w)cos 2W

= D(o),

(345)

+ B(w)sin 2 0 = -A(w).

Its representation in matrix form

[;;a;

-B'w)]( C(w)

cos2G -sin20,

cos20 sin20> =

- A ( w ) -D(w) D(w) - A ( @

[

]

(346)

is also at the same time a solution. The reality condition of the solutions is contained in the requirement that the two vectors [C(w),B(w)] and

CALCULATING THE PROPERTIES OF ELECTRON LENSES

31 1

[ - A(w),D(w))]are of equal length for all o,which, by means of (344)with

+

D2(w)= C 2 ( w ) H’(w) - T’(w),

(347)

is guaranteed. From considerations of the special form (318)of the system matrix, T(w)and T(0)are identically zero, and the reality condition is satisfied

+

D’(w) = G2(w) H’(o),

b’(0)= G 2 ( 0 )+ B2(0).

(348)

3. Signijcance of G (o)H (o)and G (W),H(0) I

For the directions (328)cp and (329)@ of the object and image lines of the linegrating imaging, which exists between astigmatic conjugated planes with abscissae w and 0, one may derive the representations

H(0)- T(5) - G ( 0 ) + b(W) -

tancp(0) = -

G(0)

+ D(W)

H ( 0 ) + f(0)

+~ ( w ) ~ ( w+) ~ ( 0 )~ ( w+) r(w) ’

-G(o) tan@(w)= ~ ( w-) r ( w ) -

(349) (350)

and for the imaging magnification (330)m of the line-grating imaging w -+ 0 one obtains P(0)+ P ( 0 ) m 4 ( w , 0 )= (351) C ’ ( 0 ) + H2(w) * The direction (p of the image lines in the plane 0 appears as a function of the variable w, to which 0 is astigmatically conjugated. To one U-value there exist, in general, two different 0 values, since D(w)in (345)is two-valued. In order to assign the appropriate D(w)to one of the two 0 values, one must refer back to the first equation of (345).Then the direction $jof the image line appears once more as a function of two variables o and 0, which are mutually astigmatically conjugated. If in (350)one goes over to the double angle 2Cp(w),one obtains by the use of r(0)= o

The reason for the ambiguity of D(w) now becomes clear in that, along with @(o), @(w)+ n/2 is also a solution. This means that the directions of the image lines in the two planes 0,astigmatically conjugate to o (in the lens space of the micro-lens touching at fi), are mutually orthogonal. 4 . Ambiguity of the Astigmatic Conjugation

In each plane w of the lens space -n/2 I w c 742, of the micro-lens touching at R, there are two object lines with the directions cpl(w) and cp2(w)

312

E. HAHN

which, in general, are not orthogonal. Likewise, there are in each plane 0 of the lens space - n/2 I0 In/2 of the micro-lens touching at fi, two image lines and &(0). Now if 0 is astigmatically conjugated with the directions &(0) to w, then there exists between both planes with the abscissae w and 0 a linegrating imaging. Of the two possible function values tpl(o) and tpz(o)or &(W)and &(W), respectively,there is only one permitted to be the direction of the object or the image lines, as the case may be, of the line-grating imaging, defined by the astigmatically conjugated point pair w, i5. Assume ‘pl(wl) and Cpl(Ol) are the directions of the object and image lines of the line-grating imaging that is defined by the astigmatically conjugated pair wl, GI. Then tp2(w1)and &(W1)are directions of a second family of object lines in w1 or, alternatively, a second family of image lines W1belonging to a different line-grating imaging. One is defined by the astigmatically conjugated pair wl, O2 and the other by the astigmatically conjugated pair w2, W1.In general, however, w2 and G2 are not astigmatically conjugated. This is in fact only the case if the determinant (321)is zero. The state, characterized by det Si”:= 0, we call orthoastigmatic. It may now be that w’ and 0’are the abscissae of another astigmatically conjugated pair of points. Then we cannot immediately recognize whether this pair of points is of the index type 1,l or 1,2or, alternatively, 1,l or 2,l.This difficulty arises from the ambiguity of the functions D(o)and D ( 0 ) . The attempt to characterize the astigmatically conjugate points w, 0 from the sign of D in (345)brought no clarity, since D(w) can have zeros. If w is a zero of D(w) = Oit is, according to (339),a conjugate axial point zero of D(0)= 0, and the imaging defined by this point pair is stigmatic.

F. Advanced and Retarded Quadrupole Optical Imaging An objective characterization of line-grating imaging may be obtained by a comparison with the axial point 0, which is stigmatically conjugate to the axial point w if the quadrupole fields are not operative. If the quadrupole fields are operative, the translation of the two astigmatically conjugate points Ol and O2 is determined with regard to 0. The line-grating imaging is called “advanced” if the translation is positive and “retarded” if the translation is negative. If the identity (338) is partially differentiated, and the requirement (338)totally differentiated, with respect to w, we obtain an equation analogous to that formed by (345) B(6)cos 2w - C(O)sin 2w = D(w),

(353) giving a relationship between the differentials dw and d 0 that must be

313

CALCULATING THE PROPERTIES OF ELECTRON LENSES

fullfilled, since besides the pair o and 0, the other pair o + dw and 0 + dG is astigmatically conjugated.

(354) Since in the case of round-lens optics d o and d 0 have the same sign, and this property also holds good in quadrupole optics, D(w)and D(O),with o and W as the abscissae of two planes, between which a line-grating imaging exists, always have different signs. The relation (354) permits a formulation of the astigmatic conjugation in the following form. If we substitute o and 0 by U(W)

=b

S.

do

JG2(w)

+ I f 2 ( @ )’

ii(W) = 6

i

d0 JG2(0)

+H Z ( 0 )

.

(355)

(with a, b, G, 6 as constants), the quadrupole optical conjugation may be written U-u=A, (356) with the eigenvalue A = LA for the advanced, and 1 = AR for the retarded, quadrupole imaging. The constants of the quadrupole optical conjugation are chosen in such a way that AA and RR differ only in sign. The requirement, set out in X1.C.1., concerning unambiguity and metric simplicity of the quadrupole imaging is fullfilled by (355) and (356). By analogy with the stigmatic conjugation of two axial points z and Z in the form of a constant translation (173) of their substitutes w(z) to G(z),the astigmatic conjugation of two axial points z(w) and 2(G) is described by a constant translation (356) of the substitutes u(w)to ii(0).The amount (A) of the translation is determined by the quadrupole lens matrix. The choice of direction of translation, positive or negative, forwards or backwards, advanced or retarded, is free. The quadrupole optical conjugation is unambiguous in the sense that the choice of translation must be decided on before a ray connection between conjugated points can be set up. This ray connection is not as “strong” as in the case of stigmatic conjugation, since only a part of the rays, outgoing from the object point as vertex is focused in the image point, but nevertheless is strong enough to guarantee the maintenance of identity, even with quadrupole optical activity. G . Classijication of Quadrupole Imaging 1. Elliptical Integral

The substitution (355) on the object side, with u(w) or, alternatively on the image side, with ii(0), leads to an elliptical integral. The following formulation

3 14

E. HAHN

is concerned with the object-side substitution and by a placing of bars over all relevant quantities with the image-side substitution. The sum of the squares of G 2 + H 2 may be broken down into the product (C + iH)(G - iH)and the two roots of G + iH = 0 satisfy the equation G - iH = 0, on going over to complex conjugates. We designate the two pairs of complex conjugate roots in the complex number plane of the variable x = -cot(o - 0 0 )

as p

(357)

+ iq and p’ + iq’ and obtain + H(Wo)2

G(W0)2

dx

. (358)

The relationship between the roots p + iq, p’ + iq’ and the values of the functions G and H at an (arbitrary) position oo,where G 2 H 2 is not zero, is set up in terms of

+

+

.

+ “O

I G(mo) i h ( m o ) = 2 G(oo) + iH(wo)’

as p = po

+ ccosy,

q 2 = (qo

p‘ = p o - ccosy,

+ csiny)2

q” = (qo - csiny)2.

The quantities p o , qo, and c (positive) are unambiguously determined as a function of 0,.For a variation of y by an odd multiple of n,p and q go over to p‘ and q’ and vice versa. We can agree among ourselves that the symbols will be so chosen that q and q‘ are not negative and q is not smaller than 4‘. 2. Complex Triad Matrix In the formulation of the object-side substitution u(m), the complex triad matrix A with elements sjk of region 111 of the system matrix (318) S(”)is formed * ( ~ 4 1-

is,, ,~

- is,, ,s43

4 2

- i s s 3 ) = (ajk)- i(bjk).

(361)

CALCULATING THE PROPERTIES OF ELECTRON LENSES

315

The matrix of the real part 4 1

+ s:1

'4Is42 s:2

s43s41

+ s53s51

s43s42

+ s51s52 + s:2 + s53s52

s41s43 s42s43 s3 :

is symmetric and the imaginary part is skew symmetric.

(bjk)

=

i

0

s42s51 s43s51

s41 s52

- s52s41 - s53s41

s43s52

- s51s42

s41 s53

- s51s43

0

s42s53

- s52s43

- s53s42

i i

+ s51s53 + s52s53 + s:3

0

*

(362)

(363)

In the formulation of the image-side substitution U(G), the complex triad matrix A with elements $k = s k j of region II of the system matrix is formed

The quantities p o , q,, c (positive) and y(mod n) defined as functions of w,, may be written in the form

=

1 1

0

0

0

cos20, sin2w0

-sin2w0 cos 2w,

(365)

By placing a bar on top of all the quantities, one obtains an expression for the image-side quantities Is,,ij,, C (positive) and y(mod n) as functions of G,(arbitrary). The following expressions are independent of o, and Go and because of (316) are the same on the object and image sides.

=

+ + = + iizz + a,, + + + s:4 + s 3 ,

-a11

= -2

a22

($4

a33

s5 :

-iill

316

E. HAHN

2

+

s5 :

-,:s

2

+

$4

- S i 5 - s:5)2

= (s44 = (s44

- sg5)2

+ 4(s44s54 + + 4(s44s45 +

s45s55)2 S54SS5)2.

(367)

-

The quantities a,, - a22 - a33 and a,, - a,, - Z33 are equal to each other and likewise for the quantities b& + bf3 - b i , and b:, + 6:3 - b;,. They are functions of the element combination a2 = s i 4

+ s i 5 + sZ4 + sg5,

A=

s44

Iss4

s45

s55(

(368)

of the matrix region (318) SC , ’: a,,

- a22

- a 3 3 = 2 - a 2,

-b32 - b:3

+ bi3 = I + A 2 - o2

(369)

and discriminants of the quadrupole optical state. 3. First Kind of ClassiJication

The state of a quadrupole optical system is classified according to the reality of the zeros of the radicand of (358). The state is called:

-

a. Elliptic, if q q’ > 0. There is no stigmatically conjugate pair of points. b. Parubolic, if q’ = 0, q > 0. There is one stigmatically conjugate pair of points. c. Hyperbolic, if q‘ = q = 0. There are two pairs of stigmatically conjugate points. In this classification, the special case, in which the elements of region I1 and

+

111 of the system matrix (318) are jointly zero, i.e., (348) D2(w)= C(w)’ H ( O ) ~and = C(W)’ H(uS)2 vanish identically in w or 0, respectively,

o2

+

is not included. In this case the quadrupole optical imaging is fully stigmatic. 4 . Second Kind of Classijication

The state of a quadrupole system is classified by criteria fullfillment of the two quadrupole discriminants according to a scale a to f. a. b:2

+ b:3 - b223 # 0,

and the state is elliptic.

then q q’ > 0,

CALCULATING THE PROPERTIES OF ELECTRON LENSES

b. b4,

+ b:3 - b i 3 = 0,

b23

then q‘

# 0,

317

= 0,q > 0

and the state is parabolic. c. bjk= 0,

then qq’ > 0

a , , - a22 - a33 > 0,

and the state is elliptic. d. bjk = 0,

- a33 c

a,, -

0,

then q’ = q = 0

- a33 = 0,

then q’ = q = 0

and the state is hyperbolic. e. bjk = 0,

a,, - a,,

with c = 0

and the two stigmatically conjugate point pairs coincide.

f. bjk

0,

Ujk

= 0,

and the conjugation is fully stigmatic. H . Constants of the Quadrupole Optical Conjugation

1. Elliptic and Parabolic State Here q > 0, and with the substitution x = -cot(w - wo) = p

+ q tan(cp - cpo)

(370)

and the abbreviations ( I , s both positive) r2 = ( p - p’)’

+ ( q + q’)’,

s2 = ( p - p’)’

+ (q - q‘)’,

(371)

one obtains the integrand (358)of the elliptic integral in the normal form dcp (372) 2 C(w), + H ( w ) ~ J1 - k 2 sin2 cp The constant cpo(modn),which depends on w o , is tied to the condition

r q E G F T w z dw = du =

rscos2cpo = ( p - p ’ ) ,

+ 4”

rssin2cp0 = 2q(p’ - p). (373)

- q2,

k is the modulus of the elliptic integral u

= F(k,cp) =

10

dcp

k 2 =-

4rs

+

(374) (r s)2 ‘ ,/I - k2sin2cp ’ It is independent of wo and coincides with the image-side value and is a function of the two discriminants of the quadrupole optical state. This is brought out in the following representation from a consideration of (366)and (367): 1 r2 s2 1 --k2 = (375) 2 r2 + s2 + J ( r 2 + s212- (r2 - s212’

+

318

E. HAHN

r 2 + s2 -- 29; 2

+ c 2 cos 2y + 3c2

(yy + = (2q;

c2

(3761

cos 27 - c 2 ) 2

(377)

Furthermore, the quantity g;

is independent of coo or is,,respectively. 2. Substitution Matrix The substitution (370) may be written in the form p cos(0 - 0 0 ) = P cos(cp - cpo)

+ 4 Wcp - cpo),

(379)

psin(w - w,) = -cos(cp - cp,),

where p2 is defined by the sum of the squares of the two right-hand sides. By multiplying both equations together or, respectively, with themselves, we form

and obtain for the substitution matrix

I

1 R,= 0 (0

0 cos 20, - sin 20,

o \ sin 20, cos 20,

I

1+p2+q2 ~2q

P

P

1

+ p2 - 42

\ +p2-q21 2q

~

-

4

-1 +p2+q2 P

-l

2q

*i 1

0 c0s2cp0 -sin2cp0 . sin2q0 c0s2cpO o ,

The image-side substitution matrix R, is obtained by placing bars on all relevant quantities.

CALCULATING THE PROPERTIES OF ELECTRON LENSES

319

3. Conjugation Matrix

The astigmatic conjugation in the variables o and (5 is indicated by the condition (333) and this becomes, on substituting cp and Cp, sin2F [c0s12J

- A,.

with the conjugation matrix A,

[c0s12J sin2cp = 0,

RYSF’R, = ( ~ j k ) . (383) The inverse function of (374) cp(u) is denoted as amplitude and sin cp = snu, cos cp = cnu are the Jacobi elliptical functions. The astigmatic conjugation in the variables ii and u is represented as A,

=

(384)

On the other hand, we know that the relation (356) holds between astigmatically conjugate points. This means that by replacing U by u + A, equation (384) must be satisfied identically in u. The addition theorem of the elliptical functions permits this if the conjugation matrix A, has the welldetermined structure

A,

= 033

l

cn ’A 1 + dn2A

0

1 -dn2A 1 + dn2A

0

2 dnil 1 + dn2A

0

1 - dn2A 1 +dn2A

0

1

1-4

(385)

4. Transformed System Matrix

The conjugation matrix (383) A, is a three-row matrix of the region I of a transformed system matrix S(”‘)

320

E. HAHN

with the quadrupole matrix H‘ as generator sin@, cos@,

:)(

-cos@,)L(ij sin@o & p

sino, -cosW,

-cosoo

COSO,

sin(5, sinw,

For the transformed system matrix S(”’),the orthogonality relation (316) holds with H’ instead of H. From this we obtain, from a consideration of the structure (385) of A,, the two determining equations for A (3 - (4 + k2)sn2A+ (4 - 2k2)sn41}

(388)

These are supplemented by the determinant relation of the matrix equation (383)

5. Eigenvalue Determination

We write for brevity u = 4(1

2b

=a

+ a2)+ (4 - 2k2)g2,

+ 3k2gi, 12312

2b’ = (sign cr22)a“Z

(391)

-A

and test with these quantities, depending only on the two discriminants of the quadrupole optical state, the content of the determining equations (388) to (390). By a linear combination of (388) and (390), we obtain, firstly, an expression for ui2 12a,2, = a and secondly, a quadratic equation for sn2A

(392)

asn4A - 2bsn21 + 39; = 0.

(393)

Further, by substituting (390)into (389)by the use of (392), a second quadratic equation for sn2L is obtained asn4A - 2b sn2A - 39; 9

= 0.

(394)

CALCULATING THE PROPERTIES OF ELECTRON LENSES

32 1

From this one obtains, on the one hand, by addition, the determining equation for the eigenvalue. b + b' sn21 = (395) a and, on the other hand, by subtraction, and application of (399, the relation

b2 - b',

=

3 q 02'

1396) Depending on the choice of the sign of a,,, which with (359) ymod R , and ?mod n is adjustable, b' is positive or negative (391). If it is agreed among ourselves that q should not be smaller than q'(positive), one then has to decide that 6' should be negative. sign a,, = -sign A

(397) With 0 < 1< F(k, cp = n/2), 1is unambiguously determined via (399, and 1, = 1,1, = - 2 are the eigenvalues of the advanced and retarded quadrupole imaging that depend only on the two discriminants of the quadrupole state. 6. Hyperbolic State

Here q' = q = 0 and the substitution (370) is not applicable. In(360) y takes on the value 0 mod n and the modulus k of the elliptical integral (374) has, due to (371) r = s = 2c, the value k = 1. According to (369) a' > 2 and A' = a2 - 1. The determining equation (395) for the eigenvalue 1goes over into coshZ1=

-

1 = A,.

(398)

With bjk = 0, comparison of (366) and (367) shows that qo = 0 and H ( o ) = const G(w) or A(W)= const G(c5). It is sensible to determine oOmodx/2 in such a way that (359) p o = 0. a23 aZ2 tan20, = = -.

'33

(399)

a23

c ( o o )is determined from (365) and both zeroes w, = wl, w 2 ,of the radicand (358) are fixed by cot(@, - wo) = c(og), (40) with -n/2 d wl < o2< 4 2 on the w-scale. The zeros w1 and w2 are stigmatically conjugate to the zeros W,, Wz of D 2 ( ( 5 )= C(O),+ H ( 6 ) , , in accordance with (339)

322

E. HAHN

Instead of substitution (370) there now arises with dq

= du/2

s w c p + cpo) = c(wo)tan@ - coo) cosh(cp + cpo)

if wo “internal” to o-partial scale

(402)

sinh(cp - cpo) cosh(cp - cpo) -

if wo “external” to w-partial scale.

(403)

1

cot(a, - wo)

c(00)

The w scale is broken up by the two zeros wl, ozmod IC into two partial scales. If w runs from w1 to w2 and thereby passes w,, coo lies “internally” with respect to the partial scale, otherwise “externally.” On the image side, 2W0 = w1 + Oz mod 7~ must be chosen such that with wo internal or external, respectively, Wo must also be internal or external, respectively. The substitution (403)(aoexternal to the partial scale) can be written in the form pcos(w - wo) = c(w,)sinh(cp - cpo)

(404)

p sin(w - wo) = -cosh(cp - qo).

This agrees formally with (379)if we put p = 0, q = c and replace sin(q - qo), cos(q - cpo) by s i n W - q0X cosh(cp - q0). The substitution matrix R, in cosh2q (405)

C

with wo external to the partial scale, has the form 0

0

-c+-

C

1 c

c

cosh2qo - sinh2qo cosh2qo 0 The form with wo internal to the partial scale (substitution 402) comes out of this by the transformation 0,+ 0 0

+ n/2,

c + l/c,

500 +

-cpo.

(407)

The condition (333) for the quadrupole optical conjugation has, with the

CALCULATING THE PROPERTIES OF ELECTRON LENSES

(

323

substitutions cp and Cp, the form

c0s;2q)

[c0s;2v)

sinh2Cp A, sinh2cp = 0,

(408)

with the conjugation matrix A,, formed analogously to (383). O n the other hand, there exists between cp and Cp, because of (356), the relation

2(4i - cp) = 1. (409) With the replacement of 24i by (2Cp + A), equation (408) must be identically fullfilled by cp. The addition theorem of the hyperbolic functions permits this if the conjugation matrix A, has the well-determined structure A, = RT. Sl("). R, = (sign A)

[ -'

-coshA)'

(410)

With this equation the quantity q0 - pois also determined. The substitution existing between @ and 6 can be achieved via (402) and (403),respectively, by a formal placing of bars over all relevant quantities. 7. Cardinal Elements If the biquadratic form (333) in = tan 6,c = tan w turns out as the product of two bilinear forms, the setting to zero of one of the two factors,

gives rise to a conjugation, which in the usual form

can be described with the cardinal elements

The substitution (370),which is valid for the elliptical and parabolic state, has the form

-'("i"')=(: P coscp

:)(-;:)=(-;:)(:

Ey

(414)

324

E. HAHN

with sincp,

By multiplication by

-coscpo)(;

A)

(-:

=(-;ow)(;

:)(

sinw, -cosw,

cosw, sino, 1415)

there results

3'c -0.

(416)

We form the scalar product of the object- and image-side vectors (414) and (416)

"I(;

and obtain by linear combination the form (411) with the cardinal matrix

(; ;)=(;

;y[

);

sin(@- cp), - C M @ . (419) cos(@- q), sin(@- cp) When this matrix has the same value for all point pairs w, W of the advanced line-grating image, i.e., it is constant, and if this prbperty also holds for all point pairs of the retarded astigmatic conjugation, the cardinal elements are fixed by (413). The astigmatic conjugation (356) by translation of the variable u, shows us that this case occurs when the modulus k of the elliptical integral has the value k = 0. Then the difference @ - cp is constant and has for the advanced conjugation the value A, = 2, and for the retarded conjugation the value A, = -A. In the general elliptical state (k > 0) and in the parabolic state (k = l), the cardinal elements are not fixed, since the cardinal matrix is not constant. In the hyperbolic state, we obtain for the cardinal matrix constructed for the analog to (419), if we start out from the substitution (402) with o,internal to the o

325

CALCULATING THE PROPERTIES OF ELECTRON LENSES

partial scale -

(: i)

=

-sinh(cp-cp)

(g ): (

L3)

-cosh(rp-cp)(A

cosh(cp - cp)

sinh(@- cp)

C (420)

with

(: ):

sinhq, = (coshg,

-coshqo)( -sinhq,

Ol ~

](

sinw, -coswo

coswo sinw,

& (421)

The difference (p - cp is constant for the conjugate points w, W and has for the advanced conjugation the value &I2 = 112, and for the retarded conjugation the value &/2 = -11/2. The cardinal matrix is invariant with respect to a transformation (407) of the object- and image-side quantities if thereby the sign of (p - cp is reversed. This means that the sign of the eigenvalue does not characterize the state of the cardinal matrix. That is, however, not necessary, since a conjugation described by (412) is already unambiguous.

XII. CONCLUDING REMARKS To conclude, we would like to answer the question, how is the method set out here to be classified in relation to existing methods? If one formulates the laws of motion in the form of a differential equation (Lorentz equation of motion), the solutions reveal the trajectories. If one wishes to reduce to a minimum the necessary expenditure of effort needed for the numerical integration of the differential equation and to find closed forms for the optical properties of the field configurations, one can develop the differential expression in terms of the powers of the ray position and direction and so construct the trajectory by means of perturbation methods. 0. Scherzer (1936) went this way and set out formulae for the image aberrations of third order. If one formulates the laws of motion as a variational problem, one can then, as W. Glaser (1952) did, from the outset formulate the electron motion as an optical problem and treat the electron optical imaging in the region of the third order by analogy with the Seidel image aberration theory. Of basic importance here is the representation of the action function (Eikonal) as an integral over the refractive index along a particular trajectory of the paraxial region. The refractive index (Lagrange’s function) is so constructed that the Euler equations of motion associated

326

E. HAHN

with the variational problem agree with the Lorentz equations. Since the trajectories, expressed in beam parameter variables, can be expressed in terms of two fundamental solutions of the paraxial differential equation, these ray parameters are explicitly present, up to the fourth order, in the action function. Partial differentiation with respect to them leads to formulae for the image aberrations of third order. They agree with those obtained from formulae based on a perturbation method. Both these methods were developed for electron optics at a time when the implementation of numerical calculations was tedious and time-consuming. It is therefore understandable, with the advent of large computers, that the methods of solving electron optical problems might also change, especially in those cases in which the degree of difficulty and complexity has greatly increased, in particular in the area of microlithograph y. If the field configuration is described sufficiently accurately, and if one uses an integration method with sufficiently small step widths, one can without any further analysis, introduce direct trajectory integration (DTI methods). By a choice of an array of initial values and numerical integration of the differential equation for each initial value, one can build up a discrete trajectory bundle and, from a system of equations, calculate the aberrations (Kasper, 1985). With the DTI method, the trajectories of a discrete bundle are calculated independently of each other. A single trajectory gives no information about the structure of the surrounding ray bundle, in particular, about defocusing and astigmatism. Since the image aberrations are determined from the differences between the trajectories evaluated, oscillations or discontinuities in the higher order coefficients of the field representations can have a very harmful effect on the final result (image aberrations). A combination of the DTI method and the perturbation method avoids these difficulties. The differences between the trajectories are then represented in the form of integrals and the higher order coefficients may be eliminated by partial integration. In electron lithography, the electron beam serves to process a target (resist layer) and the sharpness of the shaped ray bundle (electron probe) is influenced by local image aberrations. These cannot, in general, be derived from the global image aberrations, since the deflection objective contains correcting fields that are altered in a controlled manner, according to the position of the electron probe on the target. In Hahn (1980) a method was described in which the intensity inside a shaped electron probe (line probe) may be structured in a controlled manner over many parallel electron channels in order to attain a higher fabrication productivity (writing speed). With this kind of modulation, a quadrupole lens is used to produce an astigmatically structured ray bundle. In this example, the advantage of the method described here is particularly clear, since the

CALCULATING THE PROPERTIES OF ELECTRON LENSES

327

quadrupole matrix is calculated as a whole and need not be constructed from an assortment of single trajectories. The method described here can be applied in such a way that each trajectory can be precisely calculated by providing its initial values. Thus, the actual trajectory integration is carried out on the principle of the field-basis coupling, i.e., the coupling between the electromagnetic field and the rayoptical totality of the ray bundle surrounding a particular trajectory. The field-basis coupling is effected not pointwise but intervalwise in the form of a step matrix (lens matrix) whose representation (53)within the framework of an iteration process, and in connection with the representation (95),is a solution of the equation of motion. The trajectory is consistent with the primary data (raster and field values) and with the secondary data (lens matrix), since from the secondary data the primary data can be reconstituted (correction by phantom matrix). In the development of this theory, the starting point made use of concepts that are directly related to space and time considerations and can be recognized as thought categories. Its application finds both these categories in the concepts of the partial-lens image and the micro-lens image. In contrast to the partial-lens image, in which a partial-lens can be characterized in the usual way, with vertical planes as limiting surfaces and entering the exiting rays, the micro-lens image is unusual. A micro-lens is described only in terms of its action, in that one postulates that in a ray space a first (object-side) state is transformed into a second (image-side) state so that one cannot compare the two states simultaneously. Such a comparison serves, in the partial-lens image, to determine the action (lens matrix), since the identity of the object and image rays, i.e., the common membership of one and the same trajectory, is postulated. In the micro-lens image, such a comparison is not possible, and even not necessary, since the action is known. What must be determined in the micro-lens image is the identity. This problem is solved in the framework of a theory of micro-lenses by showing that the optical conjugation is unambiguous and metrically simple. The basis variable is a substitute of the axial abscissae and in this substitution, which varies from iteration to iteration, are the paraxial optics of a micro-lens with the particular trajectory included as principal ray of a bundle. In a further development of the method, not included here, the particular trajectory is the axis of a curved line coordinate system, on which the field has been transformed. Here, the paraxial optical properties of the ray bundle, enveloping a particular ray as principal ray, are directly represented. For the quadrupole optics, in a similar way, the unambiguity and metric simplicity of the optical conjugation have been built up. The substitution of the basis variable leads to an elliptical integral and enables a classification to be made of the state of a quadrupole optical system.

328

E. HAHN

REFERENCES Glaser, W. (1952). “Grundlagen der Elektronenoptik.” Springer, Wien. Hahn, E. (1985). Optik 69,45. Hahn, E. (1979). DD 147018, HOlj 37/30. Hahn, E. (1980). DD 158 197 HOlj 37/30; US 4.472.636 HOlj 37/302. Kasper, E. (1985). Optik 69, 117. Scherzer, 0.(1936). 2.Phys. 101,693. Zurmuhl, R. (1963). “Praktische Mathematik fur lngenieure und Physiker.” Springer, Berlin, Heidelberg, New York.

ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS. VCL 75

Derivation of a Focusing Criterion by a System-Theoretic Approach MICHAEL KAISER Dornier System GmbH Friedrichshafen. Federal Republic of Germany

List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . I1. System-Theoretic Approach to Scalar Radiation Problems in Homogeneous Space . A . The System-Theoretic Concept . . . . . . . . . . . . . . . . . . B . Scalar Formulation of the Radiation Problem . . . . . . . . . . . . . C. The Electromagnetic Radiation Problem . . . . . . . . . . . . . . . 111. Focusing by Plane Radiators . . . . . . . . . . . . . . . . . . . . A . Illumination of the Infinitely Extended Radiator, Scalar Treatment . . . . . B. The Optimum Focusing Illumination and Its Radiated Field . . . . . . . . C. The Effect of Limiting the Optimum Focusing Illumination . . . . . . . . D . Comparison with the Conventional Focusing Illumination . . . . . . . . E. The Electromagnetic Case . . . . . . . . . . . . . . . . . . . . IV . Focusing in Stratified Media . . . . . . . . . . . . . . . . . . . . A . Propagation of Plane Waves . . . . . . . . . . . . . . . . . . . B. The Transfer Function Tensor . . . . . . . . . . . . . . . . . . C. Examples . . . . . . . . . . . . . . . . . . . . . . . . . V . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . VI . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . A . Theorems of Fourier and Hankel Transform Used in This Contribution . . . . B. Derivation of the Optimum Focusing Input Illumination (Eq. 59) . . . . . . C. Derivation of the Optimum Focusing Input Illumination with the Constraint of Finite Aperture Dimensions (Eq. 73) . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .

329 333 335 335 339 344 348 348 350 352 358 358 363 363 361 372 382 383 383 385 386 387

LISTOF SYMBOLS a

aperture radius

a(k.. k. )

spectral function of the electric field

a.(k.. k. )

transverse part of a@.. k. )

a.(k.. k. )

longitudinal part of a(k.. k. ) 3 29 Copyright CI 1989 by Academic Press Inc. All righrs o f reproduction in any form reserved. ISBN 0-12-014675-4

330

MICHAEL KAISER

a',,"(K),a;,.(K) spectral function of the parallel polarized part of the incident and reflected field, respectively, in the n-th layer for an electric field strength impressed in the input plane ai,(K), a;,(K)

spectral function of the perpendicular polarized part of the incident and reflected field, respectively, in the n-th layer for an electric field strength impressed in the input plane

Af,, A',

amplitude of an incident or reflected plane wave, respectively, in the n-th layer

Ailn, Ailn

parallel polarized part of Af, or A:, respectively

A ; , , A;,

perpendicular polarized part of Af, or A;, respectively

bf,.(K),bil,(K) spectral function of the parallel polarized part of the incident and reflected field, respectively, in the n-th layer for a magnetic field strength impressed in the input plane bi,(K), b;,(K)

spectral function of the perpendicular polarized part of the incident and reflected field, respectively,in the n-th layer for a magnetic field strength impressed in the input plane diameter thickness of the n-th layer distance two-dimensional Fourier-transformed vector of the electric field amplitude factor components of e spectral function of the electric field transverse part of eo components of eo vector of the electric field tangential electric field in the input plane components of E amplitude factor of the electric field strength electric input field strength components of the electric input field strength amplitude of an incident homogeneous plane wave frequency Fourier transform of the function F ( x ) two-dimensional Fourier transform of the function F ( x ,y) Hankel transform of the function F ( p ) input function of a system

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

33 1

input function of a two-dimensional system circular symmetric function of p = operator of the one- and two-dimensional Fourier transform, respectively Fourier transform of the function G ( x ) two-dimensional Fourier transform of the function G(x, y) output function of a system output function of a two-dimensional system vector of the magnetic field strength tangential magnetic field in the input plane amplitude factor of the magnetic field magnetic input field strength components of the magnetic input field strength imaginary unit

J-l

Fourier-transformed surface current density electric surface current density wave number propagation vector wave number of the n-th layer transverse components of the propagation vector or variables of the Fourier transform, respectively z-component of the propagation vector of the n-th layer radial component of the propagation vector or variable of the Hankel transform, respectively

k:,k;

propagation vector of an incident or reflected wave, respectively, in the n-th layer

ko

free space wave number

K

magnitude of the transverse part of the propagation vector

K

transverse part of the propagation vector

n,n'

normal vectors

r, r'

field point vector, source point vector

rllnr T i n

reflection coefficient at the n-th layer for parallel and perpendicular polarization, respectively 3-dB radius distance between an aperture element and the focus surface transfer function of a one-dimensional system

MICHAEL KAISER transfer function of a two-dimensional system transfer function of a two-dimensional circular symmetric system impulse response of a one-dimensional system impulse response of a two-dimensional system impulse response of a two-dimensional circular symmetric system time variable Fourier-transformed scalar wave function U, 6, U, unit vectors in Cartesian coordinates unit vectors in cylindrical coordinates Fourier-transformed optimum focusing illumination time-dependent voltage amplitude factors scalar wave function unit vectors of the parallel polarized incident and reflected field, respectively, in the n-th layer unit vectors of the perpendicular polarized incident and reflected field, respectively, in the n-th layer volume Cartesian coordinates of field point input plane coordinates focal plane coordinates depth of the n-th boundary intrinsic impedance of the n-th layer attenuation transfer function tensor of the system electric input field strength-radiated electric field transfer function tensor of the system magnetic input field strength-radiated electric field elements of y ( k , , k,; z) elements of y"(k,, k,; z) impulse response tensor of the system electric input field strength-radiated electric field impulse response tensor of the system magnetic input field strength -radiated electric field elements of r ( x , y , z) elements of

r H ( ~y,, Z )

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

6

loss tangent

6(x), 6(x, y)

one- and two-dimensional Dirac delta function, respectively

333

delta-operator relative permittivity permittivity of vacuum wavelength free space wavelength wavelength in a dielectric integration variable cylindrical coordinates of the field point cylindrical coordinates of the input plane reflection coefficient at the n-th layer for parallel and perpendicular polarization, respectively (assuming no further boundaries are present) Fourier-transformed correlation function correlation function phase angle angular frequency one- and two-dimensional convolution operators, respectively, e.g., U(x, Y)

** S(X. Y)

correlation operator complex conjugate of S

I. INTRODUCTION

System theory has become a widespread tool in electrical engineering and is well known to the engineer from communication and control theory. System theory has yet to find very many applications in the solution of wave propagation problems, however. In the present paper a focusing criterion for electromagnetic waves will be derived by means of a system-theoretic approach. Emphasis is placed neither on an extensive representation of all aspects of system theory nor on mathematical rigour in deriving some of the theorems of computational methods commonly used in system theory. Rather, we shall be concerned with the applicability of system theory to the treatment of wave propagation problems. For an extensive treatment of system theory, the reader is referred to the books by Papoulis (1968) and Gaskill (1978); the reader more interested

334

MICHAEL KAISER

in the mathematics of system theory is referred to Papoulis (1962) or Bracewell (1965). There is an increasing number of applications of electromagnetic waves outside their classical domain (the transmission of information), for example, in industrial measurements, remote-sensing tasks, imaging purposes, and, in the medical field, diagnosis and therapy. In many of these applications, it is desirable to focus radiated electromagnetic energy to a well-defined region or to have a preferred reception of the radiation emanating from a certain region; it is essential that the transmitting or receiving device be focused on that region. As an example, in localized microwave hyperthermia in cancer treatment (Melek and Anderson, 1980) focusing is required in order to concentrate the electromagnetic energy effectively on the tumor so as to reduce damage to healthy tissue. Focusing is also essential in receiving antennas of microwave thermographic measurement devices used in the detection of warmer areas (i.e., areas with enhanced levels of emitted radiation) within the body, e.g. a tumor, (Myers et al., 1979). In this case focusing is essential to obtain good spatial resolution. In non-destructive material testing, inhomogeneities, e.g. flaws, in an otherwise homogeneous material can be detected by means of scattered radiation (Holler, 1983). In these applications, the medium in which focusing is required is usually not homogeneous. In microwave hyperthermia applications, skin and fat layers must be penetrated before the microwave radiation reaches the target area in the tissue below. As in many other applications, a planar stratified medium may serve as a good model, where skin, fat and muscle tissue are approximated by layers with different (complex) permittivities (Guy, 1971). Under certain assumptions, the same model may be used to approximate the different geological layers of the earth in the application of electromagnetic methods in geophysical probing (Wait, 1979; Hansen, 1983). The goals of the present paper are the development of a method of synthesizing an illumination focusing electromagnetic radiation in stratified structures and the determination of the radiated field. The classical focusing criterion given in the literature (e.g., Wehner, 1949) is a heuristic one and assumes a distribution of elementary sources in the aperture, the contributions of which have to interfere constructively at the focus. The computation of the field radiated from a focusing antenna into a stratified medium is a rather difficult field-theoretic problem. Here we shall treat radiation problems by a system-theoretic approach. Like the classical methods, such an approach allows us to carry out a field computation from a given source distribution. A synthesis of an aperture illumination for producing a desired radiation field is also possible. The input plane/focal plane system can be completely described by its impulse response in the spatial domain or by its transfer function in the spatial

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

335

frequency domain, respectively. They are interrelated by the Fourier transform. The transfer function can be given analytically for a homogeneous medium as well as for a structure composed of planar layers between input and focal plane. The impulse response, however, can be represented by a closed expression for the homogeneous medium only, whereas the stratified medium requires a numerical solution. The desired focusing aperture illumination is obtained from the impulse response.

APPROACH TO SCALAR RADIATION 11. SYSTEM-THEORETIC PROBLEMS IN HOMOGENEOUS SPACE A. The System- Theoretic Concept In this chapter, the application of system-theoretic methods to wave propagation problems shall be introduced. For this purpose some basic system-theoretic concepts will be presented. Below, the term, “system,” shall apply to a configuration having at least one input, while its output or outputs exhibit a reaction due to a stimulus at the input. Thereby, the reaction depends on the properties of the configuration and on the stimulus itself (Gaskill, 1978). For example, the stimulus of an electric network may be a voltage pulse at the input terminals which results in a voltage pulse at the output terminals that depends on the shape of the input pulse and on the physical realization of the network (Fig. la). Another example for a system is the configuration of two (fictitious) parallel planes in homogeneous space between which electromagnetic waves propagate. If the tangential components E,, and Eoy of the electric field E,(x’,y’,O)= E, are given in the input plane z = 0, the radiated field E(x, y , z ) = E, is completely determined in any output plane z # 0 due to the uniqueness theorem valid for electromagnetic fields (Honl et al., 1961). If one assigns one input and one output to each component of the electric field strength, then the two parallel planes represent a system with two inputs and three outputs for the propagation of electromagnetic waves (Fig. lb). To analyze a system, a suitable mathematical model that describes the behavior of the system has to be developed. Usually this requires certain simplifications for otherwise the mathematics involved would be considerably more complicated. For example, when the wave propagation between two parallel planes is treated by system-theoretic means, one assumes that it is possible to impress an arbitrary field distribution in the input plane. The realization of this field distribution and any possible feed-back of the system

336

MICHAEL KAISER

(b)

FIG.1. Examples of systems. (a) Electric network. (b) Parallel planes.

to the input are neglected. Despite these restrictions, the model may be a useful tool for system analysis as long as one is aware of the simplifications. A system can be described by operators. Such an operator L assigns to a function &(x) which is a member of one class of admissible input functions a function G,(x) of a second class of output functions: L{&(x)} = Gi(x),

i = 1,2,3,.. . , n .

(1)

Such a rule of assignment may be given in many different ways: by a differential or integral equation, by a table, or by a graphical representation. In optical diffraction for example, Kirchhoff’s formula

is commonly used. In the region bounded by the closed surface S , it is a solution of the scalar Helmholtz equation AV(r)

+ kzU(r) = 0

(3)

from the boundary values U(r’) and dU(r’)/an‘ on S . Here r = xu,

+ yu, + zu,

is the field point vector,

r’ = x’ux + y’u, the source point vector, and region V.

+ z’u,

a / a d the normal derivative directed into the

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

337

If applied to the system of parallel planes in Fig. lb, Kirchhoff’s formula represents a method of computing the field in an output plane z # 0 from the boundary values in the input plane z = 0, assuming that contributions from the surface at infinity can be neglected. The assignment rule (2) has the properties of linearity and shift invariance. The analysis of a general system is practically impossible; only systems with very special properties allow a detailed analysis. The most important properties are those of linearity and shift invariance. For linear systems, the superposition principle is valid; i.e., if a system reacts to two stimuli F,(x) and F2(x)with outputs G , ( x ) = L { F , ( x ) }and G2(x) = L { F 2 ( x ) } respectively, , it will react to the sum of the input signals with L{aF,(x)

+ bF2(x))= aG,(x) + bG2(x).

(4)

A system is called shift invariant if as a result of a shift of the inputs by a certain quantity, the outputs are shifted by the same amount. If

L ( W } = G(x), (5) then a shift-invariant system reacts to the input function shifted by x o with L { F ( x - xO)}

G(x - xO),

(6) i.e., with an output signal which is shifted by x o relative to the original signal but still having the same shape. Systems possessing the properties of linearity and shift invariance (linear shift-invariant (LSZ) systems) can be analyzed particularly easily. The eigenfunctions of the operators describing LSI systems are complex exponential functions (Gaskill, 1978). If an LSI system is stimulated with F ( x ) = exp(jk,x), then the system reacts with =

s(kx)ejkxx= G ( 4 ,

(7 1

where s(k,) is the eigenvalue associated with the eigenfunction exp(jk,x). s(k,) is called the transfer function of the system; it is computed by Fourier transforming the impulse response S(x) of the system according to

s(k,) =

rm

S(x)e-jkx”dx = F, { S ( x ) } ,

(8)

where S(x) is the reaction of the system when stimulated by a delta function: S(x)= L{S(x)}.

(9)

The impulse response describes a system completely. Since an arbitrary input function can be represented by

338

MICHAEL. KAISER

as a linear combination of delta pulses, the reaction of an LSI system can be computed by convolving the input function with the system's impulse response or, in k,-space, by multiplying the Fourier-transformed input function by the transfer function: F ( O S ( x - Od5

G(x) = m: J

= F(x)

* S(x)

(1 1 4

OFT A subsequent inverse transform to the x-space yields the output signal G(x). For our purpose, the Fourier transform and its inverse have been defined slightly differently from the usual formulation (Papoulis, 1962), for the onedimensional case by

f(k,) =

Spyrn

F(x)e+jk-xci, = F, { F ( x ) )

(1 2 4

OFT

F(x) = 2n

Sm -m

f ( k , ) e - j k x xdk,

= F;'

{f(k,)},

(12b)

= F,

{ F(x, y ) }

(13 4

and for the two-dimensional case by

f(k,, k,,) =

F ( x , y)e+j(kxx+k yY) d x d y

OFT

If applied to radiation problems, x, y in Eq. (13a) are transverse coordinates (Fig. 1b), and k,, k, are the transverse components of the propagation vector; in Eq. ( 1 3b), k,, k, are the variables after performing the Fourier transform, i.e., spatial frequency domain variables. The sign of the transformation kernel has been chosen to represent the radiated field as a superposition of plane waves a(&,,k,,)exp( -jk r) propagating into the half-space z > 0 when a time dependence like exp( + j o t ) is assumed. The quantities transformed to the spatial frequency domain are denoted by lower-case function symbols, whereas quantities in the spatial domain are denoted by upper-case function symbols. The Hankel transform is an appropriate tool in the treatment of circularly symmetrical problems (Papoulis, 1968). To distinguish it from the Fourier transform f(k,, k,,), the Hankel transform f ( k , ) of the circular symmetric

.

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

339

function F ( p ) is denoted by an overbar: f(k,) =

1;

F(P)Jo(k,P)P &J

fHT f m

with k , = , / W

and p = J m ,

(15aM

where Jo denotes the Bessel function of order zero. The Fourier transform of the circular symmetric function F ( p ) is obtained by multiplying its Hankel transform by 2n: Further theorems of Fourier and Hankel transform used in this contribution are given in the appendix (V1.A). B. Scalar Formulation of the Radiation Problem As noticed in the previous section, it is possible to treat the problem of wave propagation between two parallel planes by a system-theoretic approach. In this section a scalar formulation of radiation problems will be given. A scalar treatment is need not be limited to acoustic problems; it is also a useful approximation in optics where aparture dimensions are usually much larger than the wavelength (Born and Wolf, 1975). A scalar treatment is also very accurate in the design and analysis of large reflector antennas (Silver, 1949).An extension of the technique to electromagnetic problems will be given in the next section. The starting point of the following considerations is the linear shiftinvariant system of two parallel, infinitely extended planes depicted in Fig. 2. Here the plane z = 0 represents the input plane with the impressed sources, while a parallel plane at a fixed out arbitrary distance z from the input plane represents the output plane. The radiated field in the region z > 0 must solve the Helmholtz’ equation Eq. (3) and the Sommerfeld radiation condition, and it must take the values of the field distribution U(x’,y’,O) given in the plane z = 0. Such a solution was given by Rayleigh (1877, 1878) and more recently by Sommerfeld (1964):

340

MICHAEL KAISER

FIG.2. System of two parallel planes in homogeneous space.

a

e-jk"(*-x')z+(y-y')Z+zz

--

az 2nJ(x

- x ' ) 2 + ( y - y')2 + z2 x'=y'=O

-_

2x x 2

+y2

+z

xz

+ y2 + z2

= S(X, Y , Z )

(19)

represents the impulse response of the system of two parallel planes in homogeneous space separeted by an arbitrary distance z = const. So the field in the plane z # 0 (output), which is generated by the field impressed in the plane z = 0 (input),can be computed by a two-dimensional convolution of the input distribution with the impulse response of the system:

w, Y, z) = w,y, 0) ** w, Y, 4.

(20)

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

341

As in the system theory of electric networks, where the response of a filter to an input function can be computed either in the time domain or frequency domain, the evaluation of the radiated field is not only possible in the spatial domain, but also in the spatial frequency domain. For that purpose the Helmholtz’ equation Eq. ( 3 ) will be Fourier-transformed according to Eq. (1 3a) with respect to x and y :

Taking into account the given source distribution u(k,, k,; 0) in the plane z = 0, the solution of Eq. (21b) for z > 0 is given by u(k,, k,; z ) = u(k,, k,;

O)e-jkz”

with k,

= J k Z - k: - k ; .

Here the positive real root is chosen when k , is real, and the root with a negative imaginary part is chosen when k, is imaginary so as to satisfy Sommerfeld’s radiation condition. In eq (22), e

-j.fk2-k2

-kZz r

y

= s(k,,k,; Z)

(24)

is the transfer function for a system of two parallel planes in homogeneous space. Thus, in the spatial frequency domain multiplying the Fouriertransformed input field by the transfer function in accordance with Eq. (22) is equivalent to the field computation by convolving the input distribution with the impulse response in the spatial domain: V(x7 Y , 4 =

Sw,

U(X’,y’, O)S(X - x’, y - y’, Z) dx’ dy‘

(254

S I m

=

Y , 0) ** S(X, Y , 4

OFT u(k,, k,; z ) = 4 k , , k,; O)s(k,, k,;

4.

(25W

In order to find the relation between impulse response and transfer function, it is advantageous to make use of the Hankel transform pair (26)

342

MICHAEL KAISER

It can be shown (cf. Appendix V1.A) that the relation

exists, according to which the transfer function is the Fourier transform of the impulse response and vice-versa, corresponding to the one-dimensional case in Eqs. (1 1a,b). At the end of this section, two commonly used approximations in diffraction theory shall be introduced, the Fresnel and the Fraunhofer approximation. The first one, which is valid if the following assumptions are satisfied: distances z larger than several wavelengths, small transverse wave numbers kx and k,, small wavelength I, can be obtained by expanding the root in the exponent of the exact transfer function of Eq. (24) in a binominal series, neglecting terms of order hi her than the quadratic, and a subsequent low-pass filtering by c i r c ( J d / k ) (Blume, 1976):

With the ideal low-pass function in the spatial frequency domain

we associate a b-function approximation in the spatial domain (Bracewell, 1965), where J1 denotes a Bessel function of order one. The approximating function approaches a &function the smaller the wavelength I becomes, i.e., the larger the radius k = 2 4 1 of the ideal low-pass function. The consequence for the approximated impulse response Eq. (28b) is that the convolution of the first term with a second term that is similar to a &function for all practical purposes leaves only the first term as long as the wavelength 1 is small compared with all other dimensions.

FOCUSING CRITERION BY A SYSTEM-THEORETICAPPROACH

343

The field UFres(X3Y , 2) = SFres(X,Y , z,

** u(x,Y , O)

(30)

resulting from a convolution of the input distribution with the approximated impulse response (28b) is called the Fresnel diffraction field. However, one usually obtains the above result by quite a different approach, i.e., by solving an inhomogeneous Helmholtz’ equation (in the spatial domain) using a Green’s function of the unbounded space (Kirchhof’s formula) or the halfspace (Sommerfeld, 1964; Goodman, 1968). If the field at greater distances z from the input plane is to be evaluated, one enters the Fraunhofer zone. In order to obtain the Fraunhofer field, one replaces the exact impulse response in Eq. (25a) by the Fresnel approximation of Eq. (28b) and takes the constant factors out of the integral:

Furthermore, if the relation k

z >> - ( X I 2 2

+ y’2),ar

is valid, i.e., the distance z is much larger than, essentially, (aperture dimensions)2/wavelength, the phase factor in the integrand can be replaced by 1. The result

(33) is usually called the Fraunhofer jield. It is obvious that the Fraunhofer field is directly proportional to an integral of the two-dimensional Fourier type , Thus the Fraunhofer field is directly over the aperture distribution U ( x ’ , y ’ 0). proportional to the spatial frequency spectrum u(kx,k,; 0) of the input distribution with the spatial frequencies X

kx = k -Z,

Y and k, = k -Z.

(34)

344

MICHAEL KAISER

C. The Electromagnetic Radiation Problem

The vector character of electromagnetic fields has to be taken into account if the linear dimensions of the radiating aperture are comparable to the wavelength or if polarization effects are to be studied, as, for example, in crosspolarization discrimination. Therefore, the propagation of electromagnetic waves in a system of two parallel planes in homogeneous space (Fig. 3), taking into account the vector character, will be studied below by a system-theoretic approach. In the input plane z = 0, the tangential electric field strength E,(x’,y‘,O)= E, shall be impressed. In order to compute the radiated field in a plane parallel to and at an arbitrary distance from the plane z = 0, the impulse response and transfer function of the system of two parallel planes in homogeneous space-now in the case of electromagnetic wave propagation-will be evaluated. The starting point of the following investigations will be Huygens’ principle in its vector form. In the case of plane source distributions as given here, the electromagnetic field E(r), H(r) may be computed from the tangential components Et(r’)or Ht(r’)in the aperture S (Severin, 1951):

ss

E(r) = -rot 271

e-jklr-r‘l

[n’ x E(r’)]

~

1 1 H(r) = -rot rot 271 J’wopu, ~

Ir - r’l

lj[n’

ds‘,

(35)

e-jklr-r‘l

x E(r‘)]

~

Ir

- r’l

ds‘,

or e - j k l r -r’1 1 1 E(r) = - -rot rot sJ[d x H(r‘)] ds’, 271 j m O E , Ir - r‘1 ~

H(r) = -rot 271

ss

(36)

e - j k l r -r’I

[n‘ x H(r’)]

~

Ir

-

r’l

ds’.

As in the scalar case, the radiated field will take the values assumed in the aperture if the field point is moved into the aperture. Applied to our system of parallel planes (Fig. 3), Eq. (35) yields a e- j k l r - 1’1 EJr) = -EX(r’)% (374 271 Ir - r’I ds‘ E,(r) = -27L

ss ss

a

e - j k l r -r‘1

EY(r’)%

Ir

- r’l

a

+ E7(r’)-ay

ds’ e-jklr-r‘l

Ir - r’]

]

ds‘

(37c)

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

345

FIG.3. System of two parallel planes in homogeneous space for the propagation of electromagnetic waves.

for the electric field. The integrals in Eqs. (37a-c) are of the convolution integral type Eq. (lla). Hence, from a system-theoretic point of view the components of the radiated field (“output field”) originate from a convolution of the aperture field components (“input field”) with impulse responses r i k ( X , y, z) as in Eq. (20): a e -jk & rxx = - EJX, y , z) = E J X , y, 0) ** rx,(x, y , z), az 271 Jx’ + y’ + z’ ’ (384

346

MICHAEL KAISER

The scalar impulse responses

rik(r) can be combined to form the tensor

ie., r:;i ij 0

rX,(r)

r(r)=

(39)

which represents the impulse response tensor r(r)of the system of two parallel planes (aperture plane/field point plane). Using the impulse response tensor, the electromagnetic field in the output plane Z) **

w,Y, 0)

(40) can be computed as a two-dimensional convolution of the impulse response with the aperture field. The equivalent circuit of the system aperture/field point plane is depicted in Fig. 4; to an input E , the system reacts with E,. As in the case of scalar waves, the computation of the radiated electromagnetic field can be carried out in the spatial frequency domain. For that purpose, the Helmholtz equation for the electric field ~ ( xY, ,Z) = r(x, Y,

AE(x, y, Z)

a2

-e(k,,

aZ

k,; z ) + ( k 2 - k:

+ k 2 E ( x ,y ,

- k;)e(k,,

Z)

=0

0

(4 1a)

FT

k,; z ) = 0

(41W

is Fourier-transformed with respect to x and y (Collin and Zucker, 1969). The solution of this ordinary differential equation is, as in Eq. (22),a plane wave

e(k,, k,; z) = e(k,, k,;

O)e-jkZ"

(42)

with the spectral coefficient e(k,, k,; 0) = e,. The sign of

k,

= J k 2 - kf - k;

is determined according to the rules set up in Section ILB for scalar waves. By means of an inverse two-dimensional Fourier transform of Eq. (42), one

FIG.4. Equivalent circuit of the aperture plane/field point plane system,

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

347

obtains the electric field in the half-space z > 0 E(x, y, z ) = F;'{e(k,, k,; O)e-jkz')

=$JJ:a

(43)

e(k k,; O)e-j(kxx+kyy+kz~)dk dk X

X'

Y

as a superposition of plane waves with propagation vectors k

= u,k,

+ u,k, + u,k,

=K

+ u,k,.

(44) This form had been introduced by WeyI (1919) to treat the propagation of electromagnetic waves along a boundary between two media with different electromagnetic properties. The spectral function e, can be computed from the tangential field in the plane z = 0; one obtains

eor =m:JJ

E,(x', y', O)e+j(kxx'+kyy') dx' dy' = F2{Eo(x',y', 0)}

(45)

for the transverse component, while the normal component is given by

Substituting Eqs. (45) and (46) in Eq. (43), one obtains the components of the radiated field and their Fourier transforms, respectively: E,(x,y,z)

= F;'{e,(k,,

0

k,; O)e-jkz"}= F;'{F2{E,(x',y',0)}e-~k*'}

FT ex(k,, k,; z) = e,(k,, k,; O)y,,(k,, k,;

u

E , ( x , y , z ) = F;'{e,(k,,

4,

k,; O)e-jkz'} = F;1{F2{Ey(~',y',0)}e-jkzz}

FT ey(kx,ky;z ) = ey(kx,k,; O)r,y(k,,k,;

4,

(47a) (47b) (48a) (48b)

348

MICHAEL KAISER

Equations (47b)-(49b) may be combined to yield the Fourier-transformed electric field

-

e(k,, k,; z ) = ~ ( k ,k,; , 4 e(k,, k,; 0) in the output plane z as a product of the transfer function tensor

lo

Yx,(k, k, ; 4 7

y ( k , k,; 4 =

0

(50)

“i

?,,(k, k, ;4 Y Z , ( L k,; z) Yzy(kxrk,; 4 0 and the Fourier-transformed field vector in the input plane z = 0. If one considers the transform (27), it becomes obvious that Eqs. (40) and (50) are linked by a two-dimensional Fourier transform. So the transfer function tensor y(k,,k,; z ) in Eq. (51) is the Fourier transform of the impulse response tensor r (x,y , z ) in Eq. (39): 9

r ( X , y , z )= F I ~ { Y ( ~z)}. ,,~,;

(52)

This result corresponds to the scalar case, where the transfer function is also obtained by a Fourier transform of the impulse response. BY PLANERADIATORS 111. FOCUSING

A. Illumination of the Injinitely Extended Radiator, Scalar Treatment

Now we apply the system-theoretic approach to scalar radiation problems in order to derive a focusing illumination. Consider the linear shiftinvariant (“isoplanatic”) system of two parallel planes shown in Fig. 2, the plane z = 0 being the input plane and the plane z = zf the output plane containing the focus. In the plane z = 0, a field distribution U(x’,y’,O) shall be derived that will produce a spatially limited field U ( x ,y , zf)in the output plane having its maximum amplitude (focus) at the point ( x f , y , , z f )A. singular field distribution of the type 6 ( x - x f , y - y f ) in the output plane would fulfil the above requirement ideally; however, such a field includes infinite energy, as can be shown from Parseval’s formula (Papoulis, 1968), and, consequently, the input field producing this field distribution would also have infinite energy. Therefore, the input illumination maximizing the relation

between field intensity at the focus and available input power shall be derived

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

349

(Kaiser and Hetsch, 1984).Using Eqs. (25a,b), the value of U at the focus can be determined by multiplying the Fourier transform of the input field distribution u(k,, k,; 0) by the transfer function s(k,, k,; z ) of the input plane/focal plane system and performing a subsequent inverse transform in the spatial domain:

(54)

Making use of Parseval’s formula, the integral in the denominator of Eq. (53) can be expressed in the spatial frequency domain:

Squaring both sides of Eq. (54), one obtains Schwartz’ inequality

x

$JjIals(kx, k,;

z,-)e-j(kxx+kyy)

I2dkdk,,

(56)

and thus the upper limit for the intensity at the focus. The first integral on the right side is identical to the one in Eq. ( 5 5 ) , so that Eq. (56) can be written as

The value of the integral is determined by the transfer function s(k,, k,; z,-), i.e., by the geometry of the input plane/focal plane system. The upper limit for the intensity at the focus is achieved if uo = u(k,, k,; 0) = ii,[s(k,,

k,; ~ , - ) e - j ( ~ ~ ” J + ~ y ~ J ) ] *

(58)

is chosen (cf. Appendix VI.B), where ii, is the amplitude of the input illumination. Transforming Eq. (58) to the spatial domian yields ~op,(x’,Y’,o) = iioS*C-(x‘

- X f ) , - ( Y ’ - YJ), Z f l ,

(59)

which is the optimum focusing illumination, proportional to the complex conjugate of the inverted impulse response shifted by (x,-, y,-). This result is true for any linear shift invariant system, i.e., where there are no restrictions on the medium between the input plane and the focal plane besides linearity and independence of the transverse coordinates.

350

MICHAEL KAISER

Finally it should be mentioned that the derivation of the optimum focusing illumination sketched in this section is similar to the derivation of the matched filter principle in communications applications. Here it is the intensity at the focus relative to the input power which is to be maximized, whereas in communications applications we wish to maximize the signal-to-noise ratio of a signal disturbed by white noise at the time of sampling. B. The Optimum Focusing Illumination and Its Radiated Field

In this section, the case of a homogeneous lossless medium between an input plane and a focal plane will be treated. Since a system of two infinitely extended planes in homogeneous space is circular symmetric with respect to the z-axis, it is advantageous to give the impulse response of the input plane/focal plane system in cylindrical coordinates. Equation (19) yields, with (x2 + y2) = p2:

Jw,

Making use of Eq. (16) with k, = the transfer function can be given as the Hankel transform of the impulse response multiplied by 2n:

From this one obtains-assuming, without any loss in generality, the focus to be located on the z-axis-the optimum focusing illumination according to Eq. (59) and its Fourier spectrum:

With the illumination given by Eq. (62a)the input power is proportional to

Because of the circular symmetry of the functions to be convolved, the field in the focal plane U(P,Zf) = UOP,(P,O)** S(PJf) = ~,S*(P,Zf)** S(P,Z,) (64) resulting from Eqs. (20) and (59) with xf = y, = 0 is circular symmetric too.

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

35 1

Since it is not possible to evaluate the convolution integral in cylindrical coordinates (Gaskill, 1978)

analytically, the field will be computed in the spatial frequency domain. The two-dimensional Fourier transform of Eq. (64), 4k,;

Zf) = Uop&

O)s(k,; Zf) = %14kp; Zf)I2,

(66)

shows that the spectrum of the field distribution in the focal plane is equal to the magnitude of the transfer function squared and multiplied by ii,. Using Eq. (61), one obtains:

The evanescent waves ( k , > k ) can be neglected for sufficiently large values of z f / l (cf. Fig. 5). With this “low-pass’’ approximation, the field in the focal plane becomes

In this approximate solution, only waves with transverse wave numbers k , k all having the same amplitude iio contribute to the field in the focal plane. The inverse transform of Eq. (68) to the spatial domain, which is possible analytically, yields the field distribution

-=

250

t

1.o

a‘dB

/

z, = 5d

1.5

-

2.0 k,,/k

FIG.5. Attenuation a of the evanescent waves ( k , > k).

352

MICHAEL KAISER

in the focal plane. The optimum focusing illumination in the input plane according to Eq. (62a) and the resulting field distribution in the focal plane with and without taking the evanescent waves into account are shown in Fig. 6 for various input plane/focal plane distances. Only in the extreme near field (e.g., zf = 0.1 A) do the two field distributions differ from each other, whereas they differ only slightly for zf > ,Ias expected. Using Eq. (69) yields

-

k 27rp

U ( p = 0 , ~ =~ limu^,-J,(kp) ) p-0

=

k2 uo471 A

as an approximate value for the field strength at the focus. For 2-, >> A, the exact value resulting from Eq. (64) approaches this value for the given input illumination:

In Fig. 6a the width of the focus resulting from the approximate solution is essentially broader than the one resulting from the exact solution. According to the uncertainty principle, which is valid for the Fourier transform (Papoulis, 1968), using the band-limited spectrum always yields a larger value for the 3-dB radius than the complete spectrum, i.e., taking into account the evanescent waves. The 3-dB radius is the distance from the z-axis where the amplitude has dropped to l/& times its maximum value (cf. Fig. 6a). This quantity can be considered a measure of the “quality” of the focus. The approximation according to Eq. (69) yields 1.62 ,I A r3dB= -~ 32 -. 271 4 So the field in the focal plane is concentrated in an area with diameter of approximately half a wavelength. The axial resolution has been investigated 10 A by computing for a distance between input plane and focal plane of z f. = the field by means of Eqs. (18) and (62a). It is less distinct than along the transverse coordinates (cf. Figs. 7 and 6c).

C. The EfSect of Limiting the Optimum Focusing Illumination

The optimum focusing illumination derived in the previous section by matching it to the impulse response is of infinite extent, which, of course, cannot be realized. Therefore, the effect on the shape of the focus of limiting the optimum illumination in accordance with Eq. (62a) by an aperture of

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

353

t

1.C

1

lp5

0

0.5

0 0 0.5 1.o 1.5 2.0 PlA FIG.6. Optimum focusing illumination in the input plane and resulting field distribution in the focal plane for different input plane/focal plane distances (with and without taking the evanescent waves into account). (a) Input plane/focal plane distance zf = 0.1 1.(b) Input plane/focal plane distance 2, = 1.(c) Input plane/focal plane distance z, = 10 1.

3 54

MICHAEL KAISER

0

2

0

0.5

6

4

8

1.o

0.5

0

1.o

FIG. 6. (Continued)

1.5

2.0

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

w

0

10

30

20

40

P’ll

t 1 .o

0.5

0

0

0.5

1.o

FIG.6 . (Continued)

1.5

2.0

355

356

MICHAEL KAISER

1

1 -

fi

0

z, = 101

0

z

FIG.7. Field distribution along the z-axis for the optimum focusing illumination and a distance between input plane and focal plane of 2, = 10 1.

radius a will be examined. This yields the illumination (cf. Fig. 8) q,opt(P’,0) = U0*,1(P’,O)circ(p’/a)

(73)

with circ(p’/a) =

1,

0,

OIp‘la p’ > a

(74)

and UJp’, 0) according to Eq. (62a). Among all possible illuminations with U,(p’,O) non-zero only if p‘ < u and where the integral

takes the same value, this illumination produces the maximum field strength at the focus as shown in the Appendix (V1.C). The analytical determination of the field distribution in the focal plane U,,op,(p,2,)

= W S * ( p , zf)circ(Pl41

I

HT

** S(P9 Zf)

(764

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

357

1.0

0

---------_____ 0

a

P'

FIG.8. Finite focusing illumination in the input plane.

resulting from Eq. (64) is not possible either by evaluating the convolution integral or by a transformation to the spatial frequency domain. Therefore Eq. (76a) is evaluated numerically. The 3-dB radius of an aperture focused at a distance zf = 10 2 as a function of aperture radius relative to fixed focal distance is shown in Fig. 9. As expected, the 3-dB radius becomes broader as u/zf decreases. However, a significant deterioration does occur if the aperture radius approaches the focal distance or becomes smaller. With p = 0, the amplitude at the focus can be determined analytically from Eq. (76a):

z,= 101

optimum focu ing illuminati n ( a - 0 0 )

0.5 0

0

1

2

3

4

5

alzr

FIG.9. 3-dB radius as a function of the aperture radius

358

MICHAEL KAISER

As expected, for finite values of a the amplitude is smaller than the value from Eq. (71). As a + 03, it approaches this value.

D. Comparison with the Conventional Focusing Illumination The focusing criterion given in the literature (Wehner, 1949; Bickmore, 1957; Sherman, 1962) requires a phase distribution that compensates for the different distances between the aperture elements and the required location of the focus. The contributions of all elements then add up in-phase at the focus (cf. Fig. 10).Until now, conditions for the amplitude distribution have not been derived. The latter has been taken constant over the aperture. The assumed relative phase distribution $(p’) = kAR(p’) = k ( J m - zf)

(78) in the input plane causes a radiation of spherical waves converging to the focus. Rewriting Eq. (62a) yields

with =k

J m -a r c t a n ( k J m ) (80) and allows a comparison of the phase distribution according to Eq. (78) with the focusing criterion derived previously. In both cases the phase variation with p‘ is very similar for zf > 1.

A

I

\

\/I’ = const.

FIG.LO. Spherical waves converging to the focus.

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

359

Now let us compare the field radiated from an aperture of radius a with an optimum illumination according to Eq. (73) to the field radiated from a 0) of the same radius. In order to conventional focusing illumination Ua,con(p', minimize the effect of limiting the illumination, which theoretically extends to infinity, the aperture radius a is chosen so that the amplitude for p' = a is only 10% the amplitude at the center of the aperture (p' = 0), ie., With zJ 2 1 this occurs at a = 3 zJ (cf. Eq. (79). The amplitude of the conventional focusing illumination is chosen so that the integral

takes the same value for both illuminations. Figure 11 shows the fields along the aperture axis and in the focal plane for the optimum and conventional illumination. The focal distance is one wavelength. With the optimum focusing illumination field strength is maximum at the focus, whereas the conventional illumination produces the maximum field at a distance shorter than the desired focal distance. Because of the term a r c t a n ( k J m ) in the equation of the optimum illumination, phase variation with p' is less rapid than with the conventional illumination. So the in-phase superposition of the contributions of all aperture elements occurs at a distance which is slightly larger than expected from the conventional method according to Fig. 10. Furthermore, the field at the focus radiated by the optimum focusing aperture illumination is higher than the one radiated by the conventional illumination. The radial extension of the focus radiated by the latter is smaller, whereas the axial extension is nearly equal in both cases. The side lobes in the focal plane are lower for the optimum illumination. Figure 12 shows a comparison of both illuminations for a focal distance z - 10 A. The maximum of the radiated field occurs at z = z, in both cases, as f: the influence of the term arctan(k,/-;) has decreased compared with the previous example. E . The Electromagnetic Case

Suppose we wish to consider the vector character of the electromagnetic field, for example, in determining the radiated field. According to Eq. (35) this can be done using either the tangential components of the electric field in the input plane or, according to Eq. (36),the tangential components of the magnetic field. Which of the two methods is best, or even which is the only one

360

MICHAEL KAISER

1.o

.\0

31

P

0.5

z, = r?

0

n-

Z

t

1.o 1.5 2.0 P/A FIG.1 I. Fields of optimum focusing aperture and conventional focusing illumination for an aperture radius a = 3 iwith an input plane/focal plane distance of 1 1.( 1 ) Along the z-axis. (2) In the focal plane. 0

0.5

applicable, depends on the technical realization of a focusing radiator. In this section focusing by an impressed electric field strength will be treated. The vector of the electric field strength in the input plane z = 0 shall be parallel to the x-direction (Fig. 3). As in the scalar problem, the input distribution that maximizes the radiated power density at the focus shall be determined. According to Eqs. (37a-c),and input field strength oriented along

361

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

1.o

t

0.5

0 2, = 102

0

t

\

0. 0

0

/ - - -

0.5

1.o

1.5

2.0

.

c

PIJ

FIG. 12. Fields of optimum focusing aperture and conventional focusing illumination for an aperture radius a = 30 I with an input plane/focal plane distance of 10 I . (1) Along the z-axis. (2) In the focal plane.

the x-axis produces the components Ex and E, of the radiated field. In order to maximize the x-component, the input field strength has to be matched to the component r,, of the impulse response tensor. With Eq. (59), one obtains where go is the amplitude of the illumination of the input plane and the focus,

362

MICHAEL KAISER

1l k l

z

FIG.13. Magnitude of Ex in the xz-plane (2, = 10 A).

Z

FIG. 14. Magnitude of E, in the xz-plane (z, = 10 A).

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

363

dB

0 -

10

- 20 -

30

- 40 - 50 - 60

0 FIG.

1 2 XI1 15. Relative magnitude of Ex and E, in the focal plane (zf = 10 A).

without any loss in generality, is assumed to be on the z-axis (xr = yf = 0). Since the element r,., of the impulse response tensor in Eq. (38a) is of the same form as the scalar impulse response in Eq. (19), the input field distribution has the same shape as the optimum focusing scalar illumination (Eq. (62a)). The field components Ex and E, are computed by means of Eq. (83), applying Eqs. (47a,b)and (49a,b). Figure 13 depicts [ E x for [ a focal distance zr = 101and the circular symmetric input illumination given in Fig. 6; like the aperture illumination, IE,I is circular-symmetric with respect to the z-axis. As expected, one obtains a field strength that greatly exceeds the one generated by a homogeneously illuminated aperture. The result for the z-component is different. Because of symmetry properties, it vanishes along the z-axis for the chosen input illumination. The magnitude of E, in the xz-plane is depicted in Fig. 14, where the maximum is about 50 dB below IExI at the focus (Fig. 15). IV. FOCUSING IN STRATIFIED MEDIA A . Propagation of Plane Waves A structure composed of planar dielectric layers with different permittivities and conductivities (Fig. 16) may be a good model (Guy, 1971) for the treatment of wave propagation in stratified media (e.g. biological tissue, geophysical structures), as long as the wavelength is short by comparison with

364

MICHAEL KAISER

'///// FIG. 16. Structure composed of planar layers (r, is the reflection coefficient at the (n + l)th boundary).

the radii of curvature of the layers. In this chapter the field produced by an electric or magnetic field strength impressed in the input plane z = 0 in such a layered structure will be determined via the spatial frequency domain. For this purpose the transfer function of the input plane/output plane system within the layered structure must be determined. According to Chapter 11, a field computation in the spatial frequency domain is equivalent to a field representation by a spectrum of propagating and evanescent plane waves. The solution of the problem of reflection and refraction of plane waves at parallel boundaries can be found in several standard textbooks (e.g. Wait, 1970; Kong, 1975).Here, however, special emphasis is placed on a representation suited to the given task of field computation via the spatial frequency domain. According to Fig. 16, the structure shall fill the half space z > 0. The n-th layer consists of a (in general) lossy dielectric with complex permittivity E,, and thickness d,,. All layers shall be non-magnetic (ppn= 1). The field in the n-th layer can be expressed as the superposition of an incident and a reflected wave according to

the propagation vectors of which can be described as

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

365

and

since the components u,k,

+ u,kY = K,

K

=

J

~

parallel to the stratification are equal at adjacent sides of the boundaries due to the boundary conditions for the electromagnetic field. With k:r. kir

=

k:

= kie,,

(89)

for the longitudinal part of the propagation vectors in the n-th layer. The positive real root is chosen when k>; is real, and the root with a negative imaginary part when k:; is imaginary so as to satisfy the radiation condition. With the unit vectors u;, = u;, = Ul

=

u, x K K ' ~

and

introduced in Fig. 17, the field according to Eq. (84) can be divided into components parallel and perpendicular to the plane of incidence, respectively:

This representation is valid not only for plane homogeneous waves ( K < k,), but also for evanescent waves ( K k,,). Then uflnand u i l nare complex even for k, real, and thus d o not have the geometrically intuitive meaning of direction vectors. However, for a field representation in the form of a superposition of plane waves as presented in the next section, Eqs. (91)-(93) can still be used (Kerns, 1981). The reflections at the boundaries between two layers can be described starting from the reflection coefficients of two adjacent semi-infinite media with appropriate material properties. For parallel polarization, the reflection coefficient at the boundary to the (n + 1)th layer (cf. Fig. 16) is given by (Kong,

=-

366

MICHAEL KAISER

and for perpendicular polarization by

Starting from the last layer n = N, the reflection coefficientsIland l, rln at the boundaries of the layered structure are obtained by the ~ecursionformula

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

367

with

w, = e-J' 2 d n & 5 ?

(98) The amplitudes A\,, and A:, in Eq. (94)can be obtained from the amplitudes Ail, and A i 0 of the incident wave known at the first boundary z = 0 utilizing the relation

with the recursion formulae

and

following from the boundary conditions. In analogy to Eq. (99), the terms

+1 k,(,+l)k,

1 + rl/n

e-jk.mzn

kznkn

e-jk~(.+r)zn

1+

II(n+ 1 )

e-jkr(n+i)dn+i

and e-jksnzn e-ikz(n+

112,

1

1 + rLn + T l ( n + l F -jk=c,

+ l ) d n+ I

represent the transmission coefficientsbetween the n-th and the (n + 1)th layer at the boundary z = z,. Thus Eq. (94)for the field in the n-th layer can also be written in the form E, = j?JAi ne-jkk.r + e-j2krnzne-Jk; .r) I1

+

( II

Ai ( e - j k k . r I I n

n Iln

e-j2k.,z,e-jkL.r

+ I n

(102)

)I.

B. The Transfer Function Tensor 1. Electric Field Impressed in the Input Plane

In the plane z = 0 of the stratified medium of Fig. 16, a tangential electric field strength E, is impressed. The field that radiates into the layered halfspace z > 0 shall be computed via the spatial frequency domain. Therefore, the transfer function y,(k,, k,; z) of the system input plane z = 0 and output plane z > 0 within an arbitrary layer of the structure of Fig. 16 has to be evaluated. In what follows, this will be termed the transfer function of the stratified medium.

368

MICHAEL KAISER

According to Eq. (43), the field within the n-th layer can be represented as a continuous spectrum of incident and reflected waves, as in Eq. (102) (Clemmow, 1966):

+

,i

I

I n

(K)(e-jkL.r

+

I n

e-j2kznz,e-jkL.r

11d k x dk, .

With the definition of the two-dimensional Fourier transform (13a,b), one obtains the spectral functions of the field in the first layer from the field Eo:

and

(107a)

369

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH I ' y y n ( k x j k y ; Z)

k: k2 ~ i , , ( K Z) ; + S S I I ~ ( Kz), ;

=z

( 107c)

(1 07d) and

Yzyn(kx,k,;

4

k -Z t n ( K ;4

=

( 107e)

kzn

are the elements of the transfer function tensor y,,(kx,k,; z) from the plane z = 0 to the n-th layer of the structure. The factors slln(K;z ) and s,,,(K; z ) are computed as follows:

n (1 +

n-1

SIl,Ln(R

4=

n n

r II , L i ) e - j 2 k z l d i

i= 1

+ r l l, L n e - j 2 k z n z n e + j k z n z

(e-jkznz

(1 + rll,lie-i2k.,di)e-jk."i"~

1.

L

i= 1

( 108)

for both polarizations, whereas n- 1

JJ (1 + r

tAK; 4 =

n

i)e-j2kz~d~

i= 1

n

(1 +

(e-jkznz

II i e - i 2 k z L d i ) e - j k z n z n -

-

rllne

-j2kznzne+jkz,z

). (109)

I

i= 1

In Eqs. (108) and (109), the term

n

(1 + r I I, l i e - J 2 k r i d i ) e - j k , " z ,

-

1

i=l

represents the transmission coefficient from the boundary z = 0 of the stratified medium to the boundary of the n-th layer. With the elements of the transfer function tensor, the field in the stratified medium can be computed usings Eqs. (50) or (40) derived in Section 1I.C. 2. Magnetic Field Impressed in the Input Plane

If one assumes that a tangential magnetic field strength H, is impressed in the plane z = 0 of the layered structure in Fig. 16, the magnetic field in the n-th layer can be represented as a superposition of plane waves in an

370

MICHAEL KAISER

analogous manner as the electric field in the preceding section. Applying Maxwell's equations to the electric field (103) in the n-th layer yields the associated magnetic field

+ bi,(K)[(k;

+ (k;

x

+ (k:

x uI)r I n e-i2kznzne-jk~"])dk, dk,,

Uiln)rllne-j2f~~z~e-ik:.r]

x u,)e-Jk;"

(111)

JX

where Z, = is the intrinsic impedance of the n-th layer. As in Section B.l, one finds for the spectral function within the first layer:

and

The spectral functions bf,,(K) and b\,(K) for the other layers are computed by making use of the recursion formulae (100) and (101). In analogy to Eqs. (107a-e), the transfer function tensor H

Y3kx,k,;z)

H

=p :; i: Yzxn

(114)

Yzyn

can be determined. The electric field in the spatial frequency domain en = y f ( k x ,

0

k y ; Z) *

FT En = I';(k,, k,;

Z)

(uz x ho)

(115a)

** (u,

(115b)

x H,)

is obtained by multiplying the transfer function tensor by the Fouriertransformed input illumination. The electric field (in the spatial domain) can be determined alternatively by an inverse two-dimensional Fourier transform or according to Eq. (115b) by convolving the impulse response tensor

with the vector u, x Ho.

FOCUSING CRITERION BY A SYSTEM-THEORETICAPPROACH

371

The elements of the impulse response tensor are: (117a)

and (117e) with n- 1

and "-1

(e-jkznz

-

-j2k.,zne+jkz.z rllne

1.

If the field radiated by a dipole antenna located in the input plane is to be determined,it is more advantageous to start from the dipole current. For that purpose, the relation U, x

H, = J,

( 120)

372

MICHAEL KAISER

is introduced, where Js denotes the surface current density in the input plane. The current distribution is obtained via the current element of an infinitely thin dipole (Borgnis and Papas, 1965): Js(xb, yb) = 1, dl 6 ( ~ ’ xb, y’ - yb).

(121)

Here I , denotes the dipole current and dl the length of the dipole. Thus the relations derived above are also applicable to the determination of the field of a distribution of discrete currents. C . Exa.mples

1. Perpendicular Incidence of a Plane Homogeneous Wave on a Structure of Two Parallel Plane Layers (Biological Tissue) As an introductory example, we will discuss perpendicular incidence of a plane homogeneous wave on a structure (Fig. 18) to illustrate field computations in a stratified medium via the spatial frequency domain. This example may serve as a model for biological tissue. Superimposed on muscle tissue, which is assumed to be of infinite extension in the positive z-direction, there is a layer of fat tissue, typically 2 cm thick (Guy, 1971). A skin layer above the fat layer can be neglected in the model (Guy, 1971), as it is only about 0.4 mm in thickness, which is small by comparison with the wavelength of the radiation used (& ‘v 45 mm at 915 MHz and & ‘v 18 mm at 2,450 MHz; Johnson and Guy, 1972). The assumption of infinite extension of muscle tissue can be justified by the fact that reflections at deeper boundaries can be neglected due to high losses in muscle tissue (tan 6 N 0,6). It is obvious that this boundary-value problem can also be solved in the classical manner by superposing incident and reflected waves in both layers,

L FIG. 18. Structure of two layers as a model for biological tissue.

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

373

the amplitudes of which can be determined by observing the boundary conditions. This simple example, however, shall be used to demonstrate the applicability of system-theoretic methods to the solution of wave-propagation problems within stratified media. The incident wave is assumed to be linearly polarized in the x-direction, thus

E' = u x E o e - j k O z , The field in the plane z = 0 is given by Eo = uxEo(l

D

( 122)

+ ro)

(123a)

FT (123b) eo = uxEo(1 + r 0 ) W x ) W y ) 3 where ro is the reflection coefficient at the air/fat tissue boundary. The field within the layered structure can be determined by substituting Eq. (123b) in Eq. (50),where the transfer function tensor is given by Eqs. (107a-e) and by a subsequent transformation into the spatial domain. With a reflection coefficient rl at thefat/muscle tissue boundary, the elements of the transfer function are given by (cf. Eqs. (107a-e):) -jk.

Yxxl

= Yyyl =

I

+

- j k . id1 e j k z t z

1 + rle-J2kz~dt

( 124)

and (1 + r l ) e - j k z l d t e j k z z z Yxx2

= Y y y 2 = (1 + r l e - j 2 k z i d t ) e - j k z z z '

(125)

The remaining elements are zero on both regions. As expected, the electric field-like the incident wave-has only an x-component in both layers given by

and

314

MICHAEL KAISER

The field distribution in the fat/muscle tissue structure is shown for both frequencies in Fig. 19. One can see that in addition to the propagating wave for f = 2,450 MHz there also is a standing wave, which leads to a maximum of the field strength in the fat tissue producing undesired high-energy dissipation in the fat tissue. In addition, the field within the muscle tissue decays somewhat faster than at 915 MHz, which means that input power has to be increased proportionately to generate the same field strength at the same depth. This results in a further increase of energy dissipation within the fat tissue. 2. Focusing Illumination on a Biological Tissue In order to focus electromagnetic radiation, it is necessary to match the aperture illumination to the impulse response of the input plane/focal plane system, as discussed in Chapter 111. The layered structure depicted in Fig. 19 with the material parameters given in the tissue shall again serve as a model of body tissue. The “target area” (focal plane) of the microwave radiation shall be at an overall depth z = zr = 4 cm. In the plane z = 0, the focusing illumina-

/

/

I

I

I I

1-

/

I

I

\i;

2450 MHz

1.0

\

L915MHz

0.5

0 0 2 4 6 z/cm FIG.19. Field within biological tissue (material parameters taken from Johnson and Guy, 1972) at 915 (2,450)MHz for the perpendicular incidence of a plane homogeneous wave.

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

375

tion, by Eq. (83), given as

(128b) e(k,,k,; 0)= u,~Oy:x;z(L k,; Zf) shall be impressed. The field radiated into the tissue is obtained either by Eq. (40), by convolving the input illumination with the impulse response tensor given by Eqs. (107a-e) and (52), or in the spatial frequency domain by Eq. (50) by multiplying the Fourier-transformed input illumination by the transfer function tensor given by Eqs. (107a-e). The transfer function and, thus, the focusing input illumination in the spatial frequency domain as well can be given by analytical expressions, whereas numerical solutions are required for the corresponding quantities in the spatial domain. For that reason, the radiated field has been first computed in the spatial frequency domain according to Eq. (50). Then it has been tranformed to the spatial domain using an FFT algorithm, where the difficulties arising from the numerical evaluation of Fourier integrals have to be taken into account (Bergland, 1969; Howard, 1975). The focusing input illumination and its spectrum are shown in Fig. 20 as described by Eqs. (128a,b). The resulting field in the focal plane z = z, is depicted in Fig. 21, the field along the z-axis in Fig. 22. The y- and zcomponents of the radiated field are the same as in the case of focusing in a homogeneous lossless medium again much smaller than the x-component. Contrary to expectations and unlike the case of a homogeneous lossless

f = 915 MHz

(4

(b)

Fic. 20. Focusing illumination for biological tissue according to Fig. 19 (focus in a depth of 4 cm). (a) Spatial domain. (b) Spatial frequency domain.

376

MICHAEL KAISER X

0

dB

t

5

10

15

20

crn

25

X

f = 915 MHz

5 10 15 20 crn 25 Y FIG.21. Field in the focal plane for the illumination given in Fig. 20. (a) Along the x-axis. (b) Along the y-axis. 0

medium, there is no local field maximum with respect to the axial direction at the desired focus. This is due to several effects. As a comparison of Fig. 11 and Fig. 12 reveals, the local maximum at the focus becomes less prominent as the focus approaches the aperture due to the diminishing influence of edge zones of the aperture, and in the given case the focal distance is in the order of half a wavelength in the medium. Furthermore, the high attenuation of biological tissue causes rapid decay of the field in the axial direction. Thus, the spatial

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

377

0 -8

-16 -24

-32

.

-40

I

0

4

_I

I

!

2

4

6

a

cm

10

c z

FIG.22. Field along the z-axis for the illumination given in Fig. 20.

resolution in the radial direction, which can also be observed for a homogeneous illumination, can be attributed primarily to the finite extension of the radiating aperture. Of course, the effect is amplified by in-phase superposition of the contributions of the aperture elements due to the focusing criterion derived in Chapter 111. In general, one cannot speak of significant focusing effect within biological tissue, though for the example we have chosen the field strength at the focus is about 6 dB higher than that generated by a comparable homogeneous illumination. For comparison, the field generated in the layered structure by a circular homogeneous illumination of radius a in the input plane, E, = uxiocirc($),

has been included in Fig. 23. The amplitude of the homogeneous illumination is, by design, equal to the amplitude of the focusing illumination; because of the choice of the radius a equal power is radiated from both illuminations. As expected, the focusing illumination generates a higher field strength at the focus than the homogeneous illumination (Fig. 23a); in addition, spatial resolution is better. In the axial direction, the field decay of the focusing illumination occurs more slowly than that of the homogeneous illumination. 3. Nondestructive Testing A further possible application of focused electromagnetic waves lies in non-destructive testing of nonconducting materials for flaws and other

378

MICHAEL KAISER

- - --

focusing illumination (4 = 4 cm)

. . ,

-10 '. -20

--

homogeneous illumination

__c

O

a

X

-30 *

-40.. .

..

-50 .-

f = 915 MHz

c

5

0

10

dB

f I €,(On 0, z)/% I

O

L

20

15

cm

25

X

f = 915 MHz

focusing illurnination homogeneous iIIumi nation

.'.. .. 0

2

4

cm

6

Z

FIG.23. Field for the focusing illuminatian given in Fig. 20 and for a homogeneous illumination.(a) In the focal plane along the x-axis. (b) Along the z-axis.

inhomogeneities that cause a regional change of electric properties. If the object to be investigated is illuminated by an electromagnetic wave, the energy will be partly scattered back (Holler, 1983).A receiving antenna will deliver a particularly strong output signal if focused at the location of the scatterer. If a criterion for focusing at the scatterer is known, information as to its location can be derived from this criterion. By comparison with reflectivity measurements using non-focused antennas, in this method the signal re-radiated from the inhomogeneity is more strongly weighted than are contributions from other points of the object to be investigated. The result in an increase in sensitivity.

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

379

To better understand the underlying principles, the two-dimensional model of Fig. 24 shall be considered in the subsequent mathematical treatment. In this case the vector problem reduces to a scalar one. The principal results may be transferred to the three-dimensional case. We start with a line inhomogeneity at x = xo, z = zo in a lossless dielectric plate which is assumed to have an infinite extension in the x- and y-directions. The radiation emanating from the electric line source S is linearly polarized in the y-direction. It induces a secondary line source iio6(x - xo) at the location of the disturbance, radiation from which will also be linearly polarized in the ydirection. The radiation generated by this secondary line source produces the field UAX, 0) =

r,(x

-

( 130)

x0,z 0 )

in the plane of the receiving antenna z = 0, where T,,(x,z,) is obtained by performing an inverse (one-dimensional) Fourier transform of yyy(kx;z ) by Eq. (107c). This signal will be received optimally by an antenna focused optimally at the location of the inhomogeneity (xo,zo), which, by Eq. (59), must be illuminated by U,,(X’,O)

(131)

= Gor,*,c-(x’ - xo),zol,

as it is a transmitting antenna. Because of certain reciprocity relations satisfied by any electromagnetic field, a receiving antenna focused at this location will weight the radiation at that point in accordance with Eq. (131). To determine the position of the scatterer (inhomogeneity), the receiving antenna has to be focused successively to any line x, z of the object to be investigated; the received signal will take its maximum value if the antenna is focused at the location of the inhomogeneity. For the procedure described in this contribution, focusing is performed “synthetically”. To that end, the scattered signal Us that has been recorded by the antenna R along the x-axis and stored in a computer must be weighted by the illumination U,,, corresponding to a

Er3

=1

FIG.24. Dielectric plate with an inhomogeneity acting as a scatterer.

380

MICHAEL KAISER

receiving antenna successively focused to all lines within the object to be investigated. Mathematically, the scattered signal Usin the plane z = 0 has to be correlated with the “illumination” of the receiving antenna. This correlation has to be performed successively for all depths z within the thickness of the plate (for practical purposes, in appropriately chosen steps A z ) in order to obtain information on the depth zo of the inhomogeneity: @(x,Z ) = Us(x,0) Uopi(x,Z ) =

s_a &(t,

O)Kpt(t

r(t - x o ) T * ( x - l , z ) d < ,

- X , 2)d t

d, < z < d ,

+ d,.

(132)

The correlation function @ ( x ,z ) will reach its maximum if the receiving antenna has been focused according to Eq. (131) to the line x = x o at a depth z = z o . Then @ ( x , z ) is an autocorrelation function that reaches its maximum value for x = x o . After a Fourier transform, the correlation integral (132) can also be evaluated in the spatial frequency domain. For correlation and convolution, the relation @ ( x , z )= U,(x,O) U o p { ( - x , z )= US(x,O)* iior;y(x,z)

q ( k , ; z ) = u,(k,; 0 ) COY*(-k,;

Z)

(133a) (133b)

is valid (Luke, 1975), which turns the process of computing an integral to a multiplication in the spatial frequency domain. An inverse transform to the spatial domain yields the correlation function. Evaluating the correlation function in the spatial frequency domain has a further advantage, in that it is no longer necessary to compute the impulse response of the structure to be investigated by Fourier transforming the transfer function. Below, we will study PVC-plates with thickness of d = 150 mm for inhomogeneities, e.g. flaws, by the method described above as simulated by computer. The radiation scattered by the flaws is simulated by line sources at the assumed locations of the disturbances. In order to test the transverse resolution, it is assumed that there are two line sources located side by side at a depth of 85 mm and separated by a distance D. The emitted radiation is computed for the plane z = 0, Fourier-transformed and multiplied by the spectrum of the optimum illumination at the depth d = 85 mm in accordance with Eq. (133b). The signals obtained by inverse Fourier transforms for two distances D are depicted in Fig. 25. One can see that the two line sources can be resolved even at a distance of one and a half wavelengths (A, = 11.6 mm). With a separation of five wavelengths, the result is even more distinct (Fig. 25b).

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

381

5 10 15 20 25 cm 30 X 0 FIG.25. Simulated imaging of two flaws side by side at a depth of 85 mm for two different separations. (a) Separation of line sources D = 1.5 I , = 17.4 mm. (b) Separation of line sources D = 5 & = 58 mm.

(W

In a second simulation, it is supposed that there are two line sources below each other, separated by 80 mm. The signal in the plane z = 0 generated by the two line sources is correlated with the optimum illumination computed for various depths z. The correlation function O(x,,, z) for the different values of z obtained at the coordinate xo of the flaw is depicted in Fig. 26. One can see that

382

0

MICHAEL KAISER

5

10 15 crn 20 FIG.26. Simulated imaging of flaws below each other.

z

the method’s resolution capability in the vertical direction is considerably worse than in the horizontal direction. Maxima of the function O(xo,z) do not occur at the true locations of the line sources, the separation does not agree with that of the line sources, and, furthermore, the sources are indicated by very broad maxima. In addition, the performance of the method is degraded by reflections at the boundaries between the media, which results in the superposition of a standing wave. To obtain a better resolution in the axial direction, a combination of focusing techniques with time-delay measurements seems to be a promising approach.

V. CONCLUSIONS In this contribution, the applicability of system-theoretic methods to the treatment of wave propagation problems has been demonstrated. Starting from methods known from the literature for the computation of the field radiated from a plane source distribution, a system-theoretic description-at first in scalar, but then also in vector form-has been presented. For the scalar treatment, the input plane/output plane system can be characterized by its impulse response and by its transfer function in the spatial domain and the

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

383

spatial frequency domain, respectively. In the electromagnetic case, the impulse response and the transfer function are tensors. The spatial domain and the spatial frequency domain are interrelated by the two-dimensional Fourier transform. Using this system-theoretic representation, a focusing source distribution is given by matching the input illumination to the impulse response of the input plane/focal plane system. The illumination found in this way is optimum in the sense that it produces the maximum power density possible with the available input power. This criterion has been used for focusing electromagnetic radiation in homogeneous space as well as in a stratified structure. To compute the field in such a structure and determine the optimum focusing input illumination, the associated transfer function and impulse response are determined. The impulse response is obtained by an inverse twodimensional Fourier transform of a transfer function that describes the propagation of plane waves from the input plane to the focal plane. We discussed the perpendicular incidence of a plane homogeneous wave on a structure that serves as a model for biological tissue as an initial example to illustrate the application of system-theoretic methods to field-theoretic problems. Furthermore, the optimum focusing illumination for this structure and the resulting field distribution are determined. As a third example, the focusing criterion derived initially is applied to non-destructive testing of dielectric plates. These examples show that focusing in the transverse direction can be achieved very well, whereas resolution in the axial direction is less prominent. However, if compared to the field of a homogeneous illumination radiating the same power, the focusing illumination generates a field at the focus which is about 6 dB higher. If the focusingcriterion given in this contribution is applied to non-destructive testing, good agreement with the true locations of the disturbances is only found if they are located side-by-side. Further investigations are required if we wish to improve the focusing effect in the axial direction.

VI. APPENDIX A . Theorems of Fourier atad Hankel Transform Used in This Contribution 1. List of Theorems

From the two-dimensional Fourier transform and its inverse defined by Eqs. (13a,b), the following theorems used in this contribution can be derived.

384

MICHAEL, KAISER

With f ( k x ,k,) as the two-dimensional Fourier transform of F ( x , y ) , one obtains: 1. Differentiation in the spatial domain

2. Differentiation in the spatial frequency domain

3. Shifting in the spatial domain F(x - xo, y - yo)

f ( k , , k,)e+j'kx"o+kyyo)

(136)

4. Shifting in the spatial frequency domain F ( ~y ) ,e - i ( k x o x + L y o ~ )

* " f ( k , - kxo, k, - k y d

(137)

5. Complex conjugate function in the spatial domain

F*(X,Y)

::f * ( - k , ,

-k,)

(138)

6. Complex conjugate function in the spatial frequency domain F*(-x, -y)

f *(k,,k,)

(139)

2. Correlation of Impulse Response and Transfer Function b y the Two-Dimensional Fourier Transform ( E q . ( 2 7 ) ) Making use of the relation between the two-dimensional Fourier transform of circular symmetric functions and the Hankel transform given by Eq. (16), one can derive the following transforms used in the present chapter: 1. Impulse response S ( x , y , z ) and transfer function s(k,, k,; z ) for the scalar radiation problem (cf. Section I1.B) Taking into account the relation between the Fourier- and Hankel transforms given by Eq. (16) and making use of the Hankel transform given by Eq. (26), one can derive the Fourier transform pair 1 e - j k & c 2 + y 2 + z 2 FT -jdk2 - k i - k : z o -j 271 J x 2 + y 2 + z 2 J k 2 - k,2 - k; ' Differentiating both sides with respect to z yields the relation between the impulse response and the transfer function given by Eq. (27). 2. Impulse response tensor r ( x , y , z ) and transfer function y(k,, k,; z ) for the electromagnetic case (cf. Section 1I.C)

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

385

The elements Txx(x,y, z ) and Tyy(x,y, z ) are of the same form as the impulse response S(x, y, z ) for the scalar radiation problem. Therefore, the transform for the scalar case given by Eq. (27) is also valid for Txx(x,y,z) and yxx(k,, k y ; z )in Eqs. (39) and (51) respectively. y , z) and rzy(x,y, z ) can be obtained Using Eq. (38c) the elements Tzx(x, by differentiating the left side of the transform (140) with respect to x and y , respectively: (141a) (141b) According to Theorem (134),the right side of Eq. (140)has to be multiplied by ( - j k x ) and ( -jky), respectively. This yields the transforms ( 142a)

and (142b) Comparison with Eqs. (49a,b) reveals that the right sides of Eqs. (142a,b) are identical to the elements y z x ( k x , k y ; z )and yzy(kx,ky;z)of the transfer function tensor as given by Eq. (51). Thus, it has been demonstrated that the spatial domain and the spatial frequency domain are interrelated by the twodimensional Fourier transform in the electromagnetic case as well. B. Derivation of the Optimum Focusing Input Illumination (Eq. ( 5 9 ) )

m

u(kx,k,; O)s(k,, k,; zf)e-j(kxxf+ky”f)dk x dk Y ’ This delivers the following power density at the focus:

(143)

386

MICHAEL KAISER

Application of Schwarz’ inequality yields

Is(kx,k,; zf)e-j(kxxf+kyyf)12 dk, dk,.

IU(x’,y‘,0)12dx’dy’42:

(145) Here the right side is constant, as the value of the integral is determined by I U(x’,y’, 0)l and s(k,, k,; zJ). Therefore, the ratio in Eq. (53) is maximum if equality holds. This is true for:

’

u(k,, k,; 0) = iio[s(k,, k,,; zf)e-j(kxxf+kyyf) 1*

- Cos*(k k,;

f )e+j(kxxf+k,Yf).

X)

(146)

Using Theorems (136) and (1 39) of the Fourier transform, one obtains: U(x’,y’,O) = CoS*[-(x’

-

XI),

-(y’ - Y J ) , Z f ] .

(147)

With this input illumination, the ratio in Eq. (53)takes its maximum value:

C. Derivation of the Optimum Focusing Input Illumination with the Constraint of Finite Aperture Dimensions (Eq. (7 3 ))

If the input illumination U(x’,y’, 0) is non-zero only within a finite region S, the value of U at the focus can be computed thus: W X f ,YJ,Z f ) =

w, 0) ** S(X9Y , Y 7

ZJ)lx=x,, y = y f

u(x’,y’,o)s(xf - x’, yf - Y‘,Zf)dX’dY’. Application of Schwarz’ inequality yields r r

( 149)

FOCUSING CRITERION BY A SYSTEM-THEORETIC APPROACH

387

With the arguments from the previous section, the ratio

is maximum if equality holds. This is true if U(X’,y’, 0) = iioS*(x, - x’, y, - y’, zs)

holds within the region S.

REFERENCES Bergland, G. D. (1969). “A guided tour of the fast Fourier transform,” IEEE Spectrum, 6, July, 41-52 Bickmore, R. W. (1957). “On focusing electromagnetic radiators,” Can. J . Phys., 1292-1298 Blume, S. (1976).“Analogie zwischen elektrischen-und wellenphysikalischen Systemen,” Optik, 46, 333-335 Borgnis, F., and Papas, C. (1965). Randwertprobleme der Mikrowellenphysik. Springer, Berlin. Born, M., and Wolf, E. (1975). Principles of optics. Pergamon Press, Oxford. Bracewell, R. (1965). The Fourier TransJorm and its Applications. McGraw-Hill, New York. Clemmow, P. C. (1966). The Plane Waoe Spectrum Representation of Electromagnetic Fields. Pergamon Press, Oxford. Collin, R. E., and Zucker, F. (1969). Antenna Theory, pt. 1. McGraw-Hill, New York. Gaskill, J. D. (1978). Linear Systems, Fourier Transforms, and Optics. Wiley, New York. Goodman, J. W. (1968). Introduction to Fourier optics. McGraw-Hill, New York. Guy, A. W. (1971). “Electromagnetic fields and relative heating patterns due to a rectangular aperture source in direct contact with a bilayered biological tissue,” IEEE Trans. MTT-19, 214-223 Hansen, V. (1983). “Electromagnetic methods in geophysical probing,” Proc. URSI Symp. 1983, Belgium, 399-404 Holler, P. (edJ(1983).“New procedures in nondestructive testing,” Proc. Germany-U.S. Workshop, Fraunhofer-Institut, Saarbriicken, Aug. 30-Sept. 3, 1982, Springer, Berlin. Honl, H., Maue, A,, and Westphal, K. (1961).“Theorie der Beugung,” In S. Fliigge (ed.), Handbuch der Physik, Vol. 25, Springer, Berlin. Howard, A. Q. (1975). “On approximating Fourier integral transforms by their discrete counterparts in certain geophysical applications,” IEEE Trans. AP-23,264-266 Johnson, C. C., and Guy, A. W. (1972). “Nonionizing electromagnetic wave effects in biological materials and systems,” Proc. IEEE, 692-718 Kaiser, M., and Hetsch, J. (1984). “Derivation of an optimum focusing aperture illumination by a system-theoretic approach.” Optica Actn, 31, 225-232. Kerns, D. M. (1981). Plane wave scattering-matrix theory of antennas and antenna-antenna interactions. NBS monograph 162, Washington, U. S. Government Printing Office. Kong, J. A. (1975). Theory of electromagnetic waves, Wiley, New York. Luke, H. D. (1975). Signalibertragung. Springer, Berlin. Melek, M., and Anderson A. P. (1980). “Theoretical studies of localized tumor heating using focused microwave arrays,” IEEE Proc. (F), 319-321

388

MICHAEL KAISER

Myers, P. C., Sadowsky, N. L., and Barret, A. H. (1979). “Microwave thermography: Principles, methods, and clinical applications,” J. Microwave Power, 14(2). Papoulis, A. (1962). The Fourier Integral and its ,4pplications. McGraw-Hill, New York. Papoulis, A. (1968).Systems and Transforms with ,4pplications in Optics. Mdjraw-Hill, New York. Lord Rayleigh. (1877,78). The Theory of Sound. The Macmillan Company, New York. Severin, H. (1951).“Zur Theorie der BeugungelektromagnetischerWellen,” 2.Phys. 129,426-439 Sherman, J. W. (1962).“Properties of focused apertures in the Fresnel region.’’ IRE Trans. AP-10, 399-408 Silvers, S. (1949). Microwave Antenna Theory and Design. McGraw-Hill, New York. Sommerfeld, A. (1964). Optik (Optics).Akademische Verlagsgesellschaft, Leipzig. Wait, J. R. (1970). Electromagnetic Waves in Stratified Media (2nd ed.). Pergamon Press, Oxford. Wait, J. R. (Guest Editor). (1979). Special Issue on Applications of Electromagnetic Theory to Geophysical Exploration. Proc. IEEE, 979-1015 Wehner, R. S. (1949). Limitations of Focused Aperture Antennas. Rand Corp., RM-262, Santa Monica, Cal. Weyl, H. (1919). “Ausbreitung elektromagnetischer Wellen iiber einem ebenen Leiter,” Ann. d. Phys., 4. Folge, Vol. 60,481-500.

ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS. VOL . 75

LIGHTWAVE RECEIVERS GARETH F. WILLIAMS* NYNEX SCIENCE AND TECHNOLOGY WHITE PLAINS.NEWYORK

I . Introduction . . . . . . . . . . . . . . . . . . . . . . I1. Receiver and Device Requirements of Lightwave Systems . . . . . . . I11. Receiver System and Noise Considerations . . . . . . . . . . . . A . Digital Receiver System Considerations . . . . . . . . . . . . B. Pin-Photodiode Receiver Noise and Sensitivity Calculations . . . . . C. Avalanche Photodiode Receiver Noise and Sensitivity Calculations . . IV . First andsecond-GenerationLightwave Receivers . . . . . . . . . A . Voltage-Amplifier Optical Receivers . . . . . . . . . . . . . B. Integrating Optical Receivers . . . . . . . . . . . . . . . C. Transimpedance Optical Receivers. . . . . . . . . . . . . . D. Integrating Transimpedance Optical Receivers . . . . . . . . . V. Active-Feedback Lightwave Receiver Circuits . . . . . . . . . . . A . Introduction . . . . . . . . . . . . B. Micro-FET Feedback Receivers. . . . . . . . . . . . . . . C. Capacitive Feedback Receivers . . . . . . . . . . . . . . . D . Dynamic Range Extenders . . . . . . . . . . . . . . . . E. Hybrid IC Active-Feedback Receivers . . . F . IC Active-Feedback Receivers . . . . . . G . Design Scaling Laws for IC Receivers . . . . . . . . . . . . . H. Sensitivity Calculations for Present- and Future.-Technology IC Receivers References . . . . . . . . . . . . . . . . . . . . . .

. . . 389

. . . . . .

. . . . . .

. . . . . .

. . . . . .

393 395 397 401 417 422 423 424 427 429 431 431 433 437 438 442 445 453 454 458

I . INTRODUCTION This chapter covers the development of optical receiver circuits as influenced by photodetector and transistor noise physics and by lightwave system requirements. These considerations led directly to the new highsensitivity. wide.dynamic.range. active-feedback receiver ICs. This chapter also covers photodetector and transistor technology choices and some probably future device developments. in light of the receiver design advances. * Chapter prepared at AT&T Bell Laboratories. Holmdel. New Jersey 389 Copyright 1989 AT&T Bell Laboratories. reprinted by permission .

390

GARETH F. WILLIAMS

the noise physics, and the system requirements. This chapter is written for both receiver designers and device physicists. The earliest optical receiver circuits were simple voltage amplifiers and were very noisy. These receivers were followed by high-sensitivity integrating front ends (Personick, 1973a, 1973b; Goell, 1974). Because these receivers integrate the signal, they require subsequent equalization (differentiation) with its attendant problems and have limited dynamic range. Thus, although these early integrating receivers showed promise in laboratory experiments, neither they nor the voltage-amplifier receivers were ever used in a commercial system. The transimpedance receiver was developed next and was the first commercially viable lightwave receiver. Its advantage is that it avoids signal integration; unfortunately, it is less sensitive than integrating receivers. The first commercial transimpedance receivers were developed for 0.8-pm wavelength, 45-Mb/s applications and used silicon avalanche photodetectors (APDs) to achieve high sensitivity (R. G. Smith et al., 1978). More recently, a silicon bipolar IC transimpedance receiver for the TAT-8 transatlantic cable was pioneered by Paski (1980) and perfected by Snodgrass and Klinman (1984); it operates at 296 MB/s and a 1.3-pm optical wavelength. The superior sensitivity of integrating receivers has continued to attract interest; recently, integrating receivers have come back in long-wavelength (1.3- 1.6 pm) receivers using low-capacitance conventional (nonmultiplying) pin photodiodes and fine-line microwave FETs (Hooper et a!., 1980; D. R. Smith et al., 1980;Gloge et al., 1980; Ogawa et al., 1983).These improved integrating receivers offer sensitivities comparable to present 1.3- to 1.6-pm APDs below 100 Mb/s and good sensitivities to at least 1 Gb/s (Linke et al., 1983). However, they still require equalization and have limited dynamic range. As a result, these improved integrating receivers have primarily been used for laboratory demonstrations, where these difficulties do not matter, and in a very few leading-edge systems. Nearly all commercial systems have continued to use conventional non-integrating transimpedance receivers. A new type of lightwave receiver, the active-feedback receiver, offers high sensitivity with a wide-band nonintegrating response, plus the widest dynamic ranges yet achieved (Williams, 1982, 1985; Fraser et al., 1983; Williams and LeBlanc, 1986). Active feedback receivers avoid the signal integration of previous high-sensitivity designs, yet do not compromise the sensitivity; in fact, these new receivers are somewhat more sensitive than corresponding integrating receivers. The dynamic-range-extender circuitry incorporated in these receivers provides the wide dynamic range; in fact, most implementations of these new receivers cannot be saturated by present lightwave transmitters. Active-feedback receivers can be realized inexpensively in IC form using any fine-line FET IC technology and are the first step towards the single-chip lightwave regenerators of the future. Although specific commercial

LIGHTWAVE RECEIVERS

39 1

designs are beyond the scope of this chapter, note that IC versions for 1- to 50-Mb/s datalinks and local-area networks are in production by AT&T Technologies (Morrison, 1984; Steininger and Swanson, 1986); high-speed versions for transmission systems operating at bit rates to 1.7 Gb/s have also been announced (Gloge and Ogawa, 1985; Dorman et al., 1987). Because APDs were the only way to achieve high sensitivities with early receiver circuits, lightwave device physicists focused on APDs. With today’s high-sensitivity receiver circuits, sensitivity improvements can come from advances in either photodetector or transistor technology. The lightwave device physicist must now be aware of both. For example, present long-wavelength (1.3- 1.6 pm) APDs offer little sensitivity advantage over nonmultiplying pin photodiodes below 100 Mb/s when used with either the integrating receiver circuit or the new highsensitivity, active-feedback receiver circuits. Where this dividing line will be in the future depends on the rate of long-wavelength APD development versus that of the fine-line FET technology used in the receiver. In addition, at bit rates up to a few hundred Mb/s, reducing the longwavelength APD dark current to that of pin photodiodes is presently more important than achieving a higher ionization coefficient ratio, k , for lower excess multiplication noise; because receiver amplifiers are now so sensitive, the optimal APD gain is small and k is less critical. This makes high-sensitivity, long-wavelength APDs easier to develop. These receiver advances and their device implications can also influence research into new types of devices. For example, these receiver advances and their implications led directly to the staircase APD proposal of Williams et al. (1982). Only one type of carrier should impact ionize in this structure, giving very low noise multiplication. However, as mentioned in the previous paragraph, a low leakage current is just as important as low-noise multiplication. The low operating voltage and large average band gap of the staircase APD should allow even smaller leakage currents than in long-wavelength nonmultiplying pin photodiodes, where the entire depletion region is narrow gap. Finally, the staircase APD might not have been proposed without the new receiver circuits, because the maximum multiplication of the detector may typically be only 30; the high sensitivities of the receiver circuits mean that higher multiplications will not be needed in order to realize the full sensitivity of the detector. The circuit and device developments to date have been driven by systems requirements, especially the move from the 0.8-pm to the 1.3-pm wavelength. The early first-generation 0.8-pm transmission systems used high-gain, lownoise silicon APDs, which solved both the sensitivity and dynamic range problems of the early receiver circuits. However, second-generation transmission systems operate in the 1.3- to 1.6-pm region, where the lower fiber loss N

392

GARETH F. WILLIAMS

allows wider repeater spacings. Unfortunately long-wavelength APDs comparable to 0.8-pm silicon APDs do not as yet exist. These receivers must presently use either pin photodiodes or APDs with lower gains and higher noise than the 0.8-pm silicon APDs. What is lost in the detector has had to be made up in the following preamplifier. In addition, APDs and their associated variable high-voltage supplies are presently too expensive for datalinks and local-area networks and not reliable enough for transoceanic submarine cables. In these systems, the low noise amplification and wide dynamic range must at present be achieved in the preamplifier rather than in the detector. Section I1 briefly discusses lightwave systems requirements and assesses their impact on present and future optical receivers and devices. This area is sometimes ignored, but it will continue to determine which new device and receiver developments will be used. Section I11 lays the foundation for the receiver circuit discussions of Sections IV and V. Almost all lightwave systems are digital; Section I11 begins by reviewing how optical-fiber digital regenerators are designed to achieve the best sensitivity of which the optical receiver is capable and reviews the derivation of the digital bit-error rate as a function of the receiver noise. It then reviews the classic Smith and Personick (1980)calculations of the receiver input device noise, presents FET and photodetector technology figures-ofmerit, and discusses sensitivity calculations for the new active-feedback IC receivers of Section V. These calculations are then compared with sensitivity results reported in the literature. This section then covers technology choices between GaAs FETs and silicon FETs and between pin photodiodes, InGaAs/InP APD detectors, and Ge APD detectors. Section IV reviews and discusses the first- and second-generation lightwave receiver circuit designs. The early first-generation commercial 0.8-pm transimpedance receiver amplifiers traded sensitivity for bandwidth and used a silicon APD to make up the sensitivity loss; in the design of R. G. Smith et al. (1978),the APD improved the sensitivity by 15 dB over the same receiver with a pin photodiode. The later silicon bipolar transimpedance receiver ICs (Paski, 1980; Snodgrass and Klinman, 1984) were designed for 1.3-pm applications and use nonmultiplying pin photodiodes. They have achieved good sensitivities at bit rates of several hundred megabits per second. These receivers are a genuine tour de force and probably represent the end of their line of development. The second-generation 1.3- to 1.6-pm pin-FET receiver circuits use integrating or “high-impedance’’ receiver designs. Section IV discusses both the simple integrating receivers, which have been introduced commercially in a few leading-edge systems by the British Post Office (Hooper et al., 1980) and the integrating transimpedance receivers favored by Gloge et al. (1980) and Ogawa et al. (1983). Both types achieve very high sensitivities; however, they integrate the signal and have little dynamic range.

LIGHTWAVE RECEIVERS

393

Section V describes the new active-feedback receivers. This section first reviews the basic features of these designs, including both the novel feedback techniques and the dynamic-range-extender circuitry. Response, stability, and sensitivity considerations are covered in detail. This section then briefly discusses an illustrative design for a complete hybrid IC active-feedback receiver. This is followed by an extended discussion of active-feedback receiver ICs. Two generic IC designs are presented; these can be realized in any of the standard fine-line FET IC technologies, including NMOS, CMOS, and GaAs. Section V also presents scaling laws, which are used to generate different versions of these IC receiver designs for different bit-rate applications, then presents calculations of IC receiver sensitivities for bit rates between 10 Mb/s and 4 Gb/s. These high-sensitivity active-feedback IC receivers will achieve further sensitivity advances as the pin-photodiode and FET technologies develop. The development of submicron gate FETs is driven by very-high-speed GaAs and silicon digital IC programs; the new optical receiver ICs are compatible and will piggyback on these efforts. In addition, the pin-photodetector capacitance appears to be halving every two to three years. Section V concludes with theoretical calculations of the probable sensitivities to be expected from future IC receivers using these improved pin and FET technologies. In long-haul terrestrial systems at several hundred megabits per second or above, these IC receivers often use InGaAs/InP APD detectors instead of pin photodiodes. Future IC receivers for these applications will require APDs with lower dark currents and lower junction capacitances than present devices. The importance of a small ionization coefficient ratio, k, will depend on whether the dark currents are decreased or the commonly used bit rates are increased faster than the FET technologists are able to achieve shorter channel lengths. Smaller APD junction capacitances would also reduce the importance of a small k ratio.

11. RECEIVER AND DEVICE REQUIREMENTS OF LIGHTWAVE SYSTEMS

This section first looks at the requirements of present and future systems and then at the likely impact of these requirements on present and future receiver and device technologies. The important receiver design goals are high sensitivity, wide dynamic range, adaptability to high bit rates, and low cost. Most of the lightwave literature to date has focused on technology for long-haul transmission systems, which comprise the bulk of the present optical-fiber applications. The key goals in transmission systems are long repeater spacings, to minimize the

394

GARETH F. WILLIAMS

number of repeaters, and high channel capacity. Accordingly, most of the receivers in the literature are optimized for high sensitivity and high bit rate; wide dynamic range and low cost are less important. However, most future optical-fiber applications will be in the loop plant (between the central office and the subscribers), in local-area networks (e.g., on-premises office-of-the-future networks), in optical-fiber cable TV, and in data links (Miller, 1979). All require high sensitivity, wide dynamic range, and low cost; the bit rate varies with the application. The new high-sensitivity, wide-dynamic-range active-feedback receivers discussed in Section V were originally invented for these applications (Williams, 1982, 1985). The IC versions can be realized economically in fineline digital VLSI technology and are adaptable to bit rates in excess of 1 Gb/s. They therefore are a superior alternative for high-bit-rate transmission systems as well. These loop, local-area-network, cable-TV and datalink applications require high sensitivity because they all frequently include some type of veryhigh-loss optical data path. They all require wide dynamic range because the minimum loss path in each is a short, almost connectorless, fiber run with a loss of a few decibels or less; essentially the entire transmitter power can appear at a receiver. Field-installed optical attenuators are not a viable means to reduce the dynamic range requirement because of the expense of installing them and of changing them if the system is reconfigured, plus the need to keep a record of what value attenuator was installed where. Low cost is imperative because the total number of receivers in these applications is so large; each customer, work station, or computer may have its own receiver. In loop feeder, the highest loss path is between the central office and the telephone-call concentrator of the furthest neighborhood. The higher the sensitivity, the wider the radius that a single central office can serve without repeaters; the number of customers increases as the square of the radius and every decibel is an additional 1-2 kilometers. On-premises local-area networks also need high-sensitivity receivers, especially in star or tree configurations in which each transmitter’s power can be divided among many receivers and several fiber branches. The same applies to fiber cable-TV systems. Typical local-area-network signal paths can also have many fiber connectors, each with a signal loss; in addition, a local-area network should be extendable to several buildings, e.g., on a campus or in an industrial park; link lengths can be up to a few kilometers. The implications for circuit and transistor technology designers are clear. From now on, lightwave receivers will be IC receivers; the transistor technologies used will be IC technologies (with the possible exception of integrated pin-FETs). Most future applications are in loop, local-area networks, cable-TV, and datalink systems, which require relatively high per-

LIGHTWAVE RECEIVERS

395

formance and very low cost; those receivers absolutely must be realized as ICs. As mentioned in the introduction, GaAs and silicon FET IC technologies are presently preferred; any new transistor technology, e.g., high-electronmobility transistors or heterojunction bipolar transistors, should also be realizable in IC form for these applications. In addition, the new nonintegrating high-sensitivity wide-dynamic-range receiver circuits developed for those applications are best realized in IC form (Section 5). Furthermore, the big development efforts that can be expected for loop-feeder receivers will ensure that IC receivers will always be at or near the sensitivity limit of the detector and transistor technologies, and hence superior for transmission systems as well. New detectors will first be adopted in long-haul transmission systems, where sensitivity is most important and cost is secondary. They will be adopted in loop, LANs, cable-TV and datalink applications only if their cost comes down; these systems are cost critical, and the projected advances in pinFET sensitivities should adequately meet the future needs of these systems. Most of the applications will run on 5 volts; few can justify the expense of a high-voltage supply for an APD. Therefore, low-voltage detectors, such as heterojunction phototransistors, integrated pin-FETs, or staircase APDs are preferred possibilities; high-voltage devices like conventional APDs will probably be restricted to transmission systems only.

111. RECEIVER SYSTEM AND NOISE CONSIDERATIONS This section covers the systems background and the device noise theory for the receiver amplifier circuit discussions of Sections IV and V. It begins with a tutorial on receiver systems, then reviews the classic Smith and Personick (1980) expression for the input device noise of pin-FET receivers. It then extends these expressions to include the noise of the rest of the receiver circuit and derives figure-of-merit expressions for pin, FET, and FET IC technologies. It then calculates theoretical sensitivities for FET IC receivers and compares present silicon and GaAs FET IC technologies on the basis of these noise calculations, experimental results from the literature, and circuit considerations. This section concludes by reviewing the APD receiver noise theory of Smith and Personick (1980) and calculating sensitivities for present-technology APD-FET lightwave receivers. Lightwave receivers using bipolar transistors will not be considered because they are presently less sensitive than FET lightwave receivers at all bit rates. However, new bipolar technologies, such as heterojunction bipolar transistors (Kroemer, 1982,1983) may someday change this.

396

GARETH F. WILLIAMS

This section focuses on digital receiver systems because most optical fiber systems are digital. Analog video is the only major exception, but will likely be replaced by digital video once economical video codec ICs become available. These codecs are realizable in the silicon or GaAs FET technologies favored for the receivers; ultimately, the receiver and codec will probably be on the same IC. Section 1II.A is a brief tutorial for device physicists on the filtering and digital signal recovery circuits that extract the digital bit stream from the analog output of the optical receiver circuit and on how these circuits affect the digital-receiver sensitivity. This section contains just enough detail to place the optical receiver in the system and to motivate the noise treatment of Section 1II.B; circuit engineers may wish to consult Maione et al. (1978) to see a complete digital-signal recovery circuit. Section 1II.B covers sensitivity and noise calculations for pin-FET receivers, including the active-feedback IC receivers of Section v. This section first reviews the connection between the signal-to-noise ratio at the output of the receiver and the digital bit-error rate at the output of the digital signalrecovery circuit, and the assumptions involved. Subsection III.B.l then presents a semi-intuitive derivation of the Smith and Personick (1980)expression for the receiver front-end noise due to the input FET, the pin photodiode, and the input resistor. These devices account for the majority, but not all, of the receiver noise. Subsection III.B.2 extends this expression to include the noise contributions of the first-stage load device and of later stages; the resultant new expression can be used to calculate sensitivities for the complete IC receivers of Section V.F. Subsection III.B.2 also presents lightwave receiver figure-of-merit expressions for pin, FET, and FET IC technologies and then discusses the device implications. For example, microwave or VHSIC FETs are preferred even for 10- to lOO-Mb/s lightwave receivers, but for noise, not speed, reasons. Subsection III.B.3 calculates theoretical sensitivities of present-technology silicon and GaAs FET IC receivers as a function of bit rate. Sensitivity results from the literature are also included. In theory, the two technologies should give essentially the same receiver sensitivities; GaAs FETs have a higher saturation velocity, but silicon FETs have a lower channel-noise factor.' In practice, GaAs is presently a few decibels superior at most bit rates, probably because fine-line GaAs FETs began as an analog microwave technology. Therefore, GaAs noise problems were important and were solved. Fine-line silicon FETs began as a digital IC technology in which analog noise was unimportant and was almost ignored. The extra channel noise in fine-line GaAs FETs is largely due to high-field scattering from the high-mobility central valley to low-mobility satellite valleys (Baechtold, 1972). This effect is used in G u m diodes.

LIGHTWAVE RECEIVERS

397

Section 1II.C discusses noise and sensitivity considerations for avalanche photodiode (APD) receivers. This section includes an APD receiver noise tutorial following the APD receiver noise theory of Smith and Personick (1980),which is based on the APD device noise theory of McIntyre (1966). It then shows optical receiver sensitivity calculations for present-technology APDs used with present-technology IC receivers. It considers only 1.3- to 1.6-pm wavelength APDs; 0.8-pm APD systems are obsolete because their repeater spacing is less than that of 1.3-pm pin-photodiode systems due to the much greater fiber loss at 0.8pm. These calculations indicate that receivers using present-technology low-leakage-current InGaAs/InP APDs are more sensitive at all bit rates than receivers using present germanium APDs and that at a typical maximum operating temperature of 85”C,present-technology pinFET receivers can be more sensitive than present germanium APD receivers at bit rates below 200 Mb/s. In fact, these calculations indicate that even present InGaAs/InP APDs have little sensitivity advantage (at 85OC) over presenttechnology pin-FETs below 100 Mb/s and are less sensitive below -50 Mb/s due to their higher leakage currents. A . Digital Receiver System Considerations

A typical digital receiver system block diagram is shown in Fig. la; associated waveforms are shown in Fig. 1b. For now, assume a non-return-tozero (NRZ) signal format; if a bit is a “one,” the signal is high during the entire bit interval; if a bit is zero, the signal is low during the entire bit interval. The receiver amplifier takes the photocurrent signal input, ,i and gives an output voltage, v,, of the same waveshape; however, this output voltage also includes noise due to the detector and the receiver amplifier. The average noise shown on v, is much less than the signal; however, occasionally a random fluctuation is bigger than the signal and can turn a 1 into a 0 or a 0 into a 1, causing an or one error per error. Typical systems require a bit-error rate (BER) of billion bits. The digital-receiver sensitivity is then the input optical signal power required for a signal-to-noise ratio high enough to get a BER of The receiver output noise can be reduced by filtering; the receiver bandwidth is usually wider than necessary to pass the photocurrent signal; the extra bandwidth contains extra noise. This extra noise is removed by the channel filter, improving the signal-to-noise ratio at the input to the digital decision circuit. The total noise power on the channel filter output signal, uf is the spectral noise power density S(w) of the amplifier plus that of the photodetector integrated over the filter bandwidth; the lower the filter cutoff frequency, f,,the lower the noise bandwidth and the less noise at the input to the decision circuit.

398

GARETH F. WILLIAMS

CHANNEL FILTER

-

DlGlTA L

vf

L

BIT STREAM

OUTPUT

CIRCUIT

1 0 1 0 0

1 0 0 1'1 1

(b) I

>voh t

!

:

:.TIME

TIME

T I ME DIGITAL 1 OUTPUT 0

h-•

TIME

FIG. 1. Optical receiver system: (a) block diagram, (b) typical waveforms.

The digital decision circuit recovers the digital bit stream from uf . There are two types of decision circuits. The asynchronous decision circuit is presently cheaper and has been used in datalinks. The synchronous decision circuit, in which a clock is recovered from the signal and is used to sample of at the center of each zero or one (Maione et al., 1978), is more sensitive because it allows the use of a narrower channel-filter bandwidth, which removes more noise. In time, the synchronous circuit will be included in the receiver IC and will thus cost essentially nothing. Note that the receiver amplifier typically includes automatic gain control (AGC) so that the peak-to-peak signal at the decision circuit input is held constant, independent of the input optical signal level. This ensures proper decision circuit operation. This AGC function has typically been implemented in a postamplifier (not shown) after the input receiver amplifier (Maione et al., 1978); in the new high-sensitivity, wide-dynamic-range amplifiers of Section V, AGC is provided in the input amplifier as well. The asynchronous decision circuit (Fig. 2) is just a discriminator. It gives a logic zero output when the filter output, uf, is less than the decision threshold voltage, V,; it gives a logic one output when uf is greater than V,. (Typically, V,

399

LIGHTWAVE RECEIVERS DECISION

vT

FIG.2. Asynchronous decision circuit optical receiver system.

is midway between the “zero” and “one” analog signal levels.) Each digital transition (zero to one or one to zero) in the output bit stream occurs at the moment the signal uf passes through V,. Any noise as the signal transition passes through V, will cause a transition timing error. These timing errors are minimized by maintaining fast rise and fall times on the signal u,; this means a wide channel-filter bandwidth, hence high noise, thus reducing the achievable sensitivity. The channel-filter bandwidth for an NRZ asynchronous receiver would typically be about twice the bit rare B,giving a risetime of about 0.17 times the bit interval. (The 10% to 90% risetime is 5, x 0.35/BW). The synchronous decision circuit (Fig. 3a) samples the filtered signal, u,, at the center of each bit interval, as indicated by the arrows in Fig. 3b, and decides whether it represents a zero (u, < V,) or a one (of > V,). The circuit then puts out a standard length one or zero pulse exactly one bit-interval long. The sampling circuit is triggered at the center of every bit by the clock recovery circuit,* which reconstructs the transmitter’s digital clock signal (timing) from the received signal. The decision circuit output bit intervals are each one cycle of the recovered clock. Since the signal is sampled only in the center of each bit interval, the rise and fall times can be very slow; this means a narrow channelfilter bandwidth and hence low noise. Thus, the synchronous decision circuit gives the best digital-receiver sensitivities. The channel-filter bandwidth for a synchronous NRZ receiver is typically set at 0.56 times the bit rate (Maione et al., 1978; Smith and Personick, 1980), thus removing the high-frequency part of the NRZ signal spectrum. The resultant waveform, u,, is shown in Fig. 3b; the risetime is now about 0.6 times the bit interval; the waveform is still satisfactory because the time between sampling points is one bit interval. The noise bandwidth is now only 28% of

* Typically, the clock recovery circuit is a phase-locked loop in which a voltage-controlled oscillator is synchronized on the bit transitions in the analog signal (Maione et al., 1978). This circuit effectively averages over many bit periods and many transitions; noise on any one transition is averaged out.

400

CHANNEL FILTER

+

DIGITAL DECISION

OUTPUT

FIG.3. Synchronous decision circuit optical receiver system: (a) block diagram, (b) typical waveforms.

that of the asynchronous receiver example; the receiver sensitivity is increased accordingly. Typical optical receiver noise power spectra (Section 1II.B) contain a term independent of frequency, which dominates at low frequencies and a term proportional to the frequency squared, which dominates at high frequencies. The frequency-independent noise term is due to the device leakage currents and the circuit; the frequency-squared term is primarily due to the input FET. In high-sensitivity designs (Sections IV and V), the frequencyindependent term can essentially be eliminated over most of the bandwidth. Therefore, the frequency-squared noise term is dominant; integrating this term over the signal bandwidth gives a total electrical noise power proportional to the receiver bandwidth cubed. Since the synchronous receiver example has only 28% of the noise bandwidth of the asynchronous receiver, for example, the electrical noise power is only 0.283 = 0.022 times as much; the noise volt= 0.1 5 times as much. Since the signal voltage is proportional to age is the optical power, the minimum optical signal is 0.15 times smaller. This is an 8.3-dB optical sensitivity advantage for the synchronous detection example. Until now, transmission systems have used synchronous decision circuits for maximum sensitivity; asynchronous circuits are used in some very-short-

-

LIGHTWAVE RECEIVERS

40 1

haul, low-sensitivity datalinks. However, since a linear channel and synchronous detector circuit can ultimately be very economically integrated on the same IC as the receiver amplifier, asynchronous systems will no longer be cheaper. This means that almost all transmission, loop, and local-areanetwork receivers plus many datalink receivers will use synchronous detection for better sensitivity. Therefore, the noise treatment of Section 3.2 assumes synchronous detection. So far, this discussion has assumed an NRZ optical data transmission format. Other formats such as return-to-zero (RZ), biphase or Manchester, and block codes have been considered (Takasaki et al., 1976). Both RZ and biphase approximately double the receiver noise bandwidth, for an approximately 4.5-dB optical sensitivity penalty; they therefore are not preferred. Block-coding schemes (Rousseau, 1976; Brooks, 1980) in NRZ format have been used with the integrating receivers of Hooper et al. (1980) discussed in Section 1V.B. These involve a smaller sensitivity penalty. The biphase and block-coding schemes reduce the low-frequency content of the signal, which can be helpful in ac-coupled systems or in integrating receivers; in addition, they provide more frequent signal transitions to help maintain synchronization of the clock recovery circuit. At present, the NRZ format is preferred for sensitivity; however, it may be used either with self-synchronizing scrambling or with block coding; both can be realized in IC form on the same chip. B. pin-Photodiode Receiver Noise and Sensitivity Calculations

This section discusses noise and sensitivity calculations for lightwave receivers using (nonmultiplying) pin photodiodes and FET input stages, and lays the noise-theory foundation for the receiver circuit designs of Sections IV and V. Subsection III.B.l is a tutorial review of the classic Smith and Personick (1980) expressions for the receiver front-end noise due to the input FET, the photodetector, and the input resistor. Subsection III.B.2 first extracts pin- and FET-technology figures of merit from the Smith-andPersonick front-end figure of merit, then discusses the device implications. It then extends the noise expressions to include noise from the first-stage load device and from later stages; the resultant expression gives accurate sensitivities for IC receivers. Subsection III.B.2 also defines a simple expression for an IC technology figure of merit. Note that the figures of merit help dictate device design; the noise expressions as a whole help dictate the receiver circuit designs of Section V. Subsection III.B.2 calculates theoretical sensitivities of present-technology GaAs and silicon FET ICs versus bit rate, and compares the two technologies on the basis of the calculations, some results from the literature, and circuit considerations.

402

GARETH F. WILLIAMS

The receiver-circuit noise expressions are for equivalent RMS input noise currents; the real noise sources in the amplifier and detector are replaced by a single equivalent noise current source at the input to a noise-free equivalent amplifier. This is convenient because the signal-to-noise ratio at the amplifier output is then just the photocurrent divided by the equivalent input noise current. In addition, these results are not affected by the feedback techniques used in the new receivers of Section V; the noise sources can be referred to the input before mentally closing the feedback loop; the loop is thus closed around the noiseless equivalent amplifier; by inspection, the equivalent input current noise source, which is outside the loop, is unchanged. Thus these open-loop noise calculations apply to all the circuits of Section V. For digital receiver systems, the signal-to-noise ratio at the digital decision circuit input is given by the photocurrent signal divided by the receiveramplifier-equivalent input-noise expression, provided that the noise bandwidth is taken as the channel-filter bandwidth. As mentioned in Section III.A, the channel-filter bandwidth for a synchronous NRZ receiver system is typically 0.56 times the bit rate B. The photocurrent signal can be had from the optical signal power by remembering that a I-eV photon has a wavelength of 1.240 pm; at that wavelength, 1 watt of detected optical power gives 1 ampere of photocurrent. This then gives -

-

A

I = qP1.240'

where I i s the average photocurrent in amperes, q is the photodiode quantum efficiency, i' is the average optical power in watts, and A is the optical wavelength in micrometers. The problem now is to turn this signal-to-root-mean-square noise ratio at the decision circuit into a digital bit-error rate (BER). Consider a synchronous detection digital receiver system, as discussed in Section 1II.A. The digital decision circuit samples the signal once at the center of each bit interval; if the signal, s, is below the decision threshold D,the bit is read as a zero; if s is greater than D it is read as a one. The probability of error, i.e., the BER, is the probability that the noise at the sampling instant will bring the zero signal above D or the one signal below D. By the central limit theorem, the noise amplitude probability distribution is Gaussian if the noise amplitude is the sum of many small independent physical process (Davenport and Root, 1958). The probability distribution for the zero-level signal is then, following Smith and Personick (1980).

403

LIGHTWAVE RECEIVERS

where s(0) is the noise-free, zero-level signal, oo is the signal variance, or root-mean-square zero-level analog noise. The probability E,, of mistakenly identifying a zero bit as a one is then (Fig. 4) Eo, = P(s > D) = ___ &To

Jw

ds e - [ s

-s(o)Iz/2~i

0

The derivation of Elo, the probability of mistakenly identifying a one-bit as a zero, is similar. Changing variables, the general error probability is P(E) = -

d x e-xz/2 = 1 erfc

2

(g) ,

,DECISION LEVEL

PROEAElLl TY

FIG.4. Error probabilities for two-level digital system. E,, is the probability of mistakenly identifying a zero bit as a one bit; E,, is the probability of mistakenly identifying a one bit as a zero bit. [After Smith and Personick (19801.1

404

GARETH F. WILLIAMS

where

and for j = 0, P ( E ) = Eel; for j = 1, P ( E ) = El,. For pin receivers, the zero-signal-level RMS noise, go,and the one-signallevel RMS noise, a,, are essentially equal. Assuming a random bit stream (maximum information content), ones and zeros are equally frequent. The optimum decision level D is then midway between s(0) and s(l), and Eol = Elo. The bit-error rate (BER) is now

where Q

=

41) - s(0) 2(,*,2)1/2 ’

where ( s : ) ~ /=~ a is the RMS noise. Q is then just half the peak-to-peak signal to RMS noise ratio at the digital decider input. However, as mentioned, this SNR is just the photocurrent signal to equivalent input RMS noise current ratio. Assuming that the zero-level photocurrent is zero, the average photo current is half the peak (one-level) photocurrent I p k and

where (i,2)1/2 is the RMS equivalent input noise current. Equations (4) and ( 5 ) give the pin-receiver bit-error rate (BER) in terms of the photocurrent signal to RMS equivalent input-noise current ratio. The BER as a function of Q is shown in Fig. 5. A typical system requirement is for a BER of lop9;this corresponds to Q = 6 or an average photocurrent of six times the RMS equivalent input-noise current. This corresponds to a peakto-peak signal at the digital decider input that is 12 times the RMS noise. The optical power to get that photocurrent is then the digital optical receiver sensitivity. The actual digital sensitivity of some early experimental receivers was less than that predicted from the measured RMS analog noise on the basis of the above Gaussian noise discussion (Williams 1982). The reason was nonGaussian device noise due to surface leakage or bulk defects; later receives with better devices did not have this problem. The fundamental device noise sources (Sections 1II.B.1 and III.B.2) are almost always Gaussian because they are the sums of many small independent physical processes, as required by the central limit theorem (Davenport and Root, 1958). For example, a photodiode bulk-leakage current of 1 nA corresponds to 63 independent

405

LIGHTWAVE RECEIVERS 10-5

I

10-6

-w

-a

10-7 10-8

L e

B (L

10-9

w

&

1o-io

t

L 10-li

-

J

m

20

10-’2

U

a

10-43

I 0-44 10-15

4.5

5

5.5

6

6.5

7

7.5

8

Q FIG.5. Probability of error versus the average signal-to-RMS-noiseratio, Q. [After Smith and Personick (1980).]

carrier generations per bit at 100 Mb/s; similarly, the channel Johnson noise current of a typical input FET is produced by the Brownian motion of about lo6 electrons. The result is the Gaussian noise amplitude probability distribution, which falls off exponentially in the square of the noise amplitude; this is why an average photocurrent that is only six times the RMS equivalent input-noise current can give a BER of On the other hand, device defeat or surface leakage noise can be the result of only a few physical processes at a few sites and therefore non-Gaussian; this means more bit errors for a given noise level. Thus, both the RMS noise spectrum and the noise-amplitude distribution must be measured for both devices and receivers. In addition, early devices with defect and surface noise problems may have non-Gaussian noise, leading to lower digital receiver sensitivities. For example, even though the ideal shot noise of a 20-nA photodiode leakage current may be acceptable in theory, a real diode with 2GnA leakage is unacceptable if the leakage is a non-Gaussian process due to a few device defects or surface leakage paths. Thus, a highleakage device may be acceptable in theory but unusable in practice if the

406

GARETH F. WILLIAMS

leakage is a non-Gaussian process. However, a mature device technology, in which surface and defect noise sources are under control, usually has Gaussian noise. I . Input Device Noise Theory for pin-FET Receivers

Figure 6 shows a generalized pin-FET receiver amplifier. This subsection presents a tutorial review of the classic Smith and Personick (1980)theory for the noise due to the input devices shown in Fig. 6, i.e., the channel noise of the input FET, the Johnson noise of the bias/feedback or input resistor, and the shot noise due to the input FET gate and pin-photodioide leakage currents. This theory gives much of the information about FET- and pin-technology requirements and amplifier design constraints. Subsection III.B.2 derives pin, FET, and FET IC technology figures of merit, and extends the Smith and Personick expression to include the noise of the input FET's load device and the noise of subsequent stages. This is useful for sensitivity calculations for the circuits of Sections IV and V. As mentioned, the equivalent input noise current is the same whether or not the feedback techniques of Section V are used; for simplicity, the discussion below assumes no feedback. The input bias or feedback resistor R i contributes a mean Johnson thermal input noise current squared per frequency bandwidth df of 4kT d ( i i ) R = -df. Ri The different frequency components of the noise are both independent and orthogonal. Therefore, the total mean-square input noise ( i i ) due to the input resistor is simply the mean-square spectral noise density of Eq. ( 6 )integrated over the channel frequency bandwidth. If IF( f )I is the magnitude of the normalized channel-filter frequency response, then

FIG.6. General pin-FET receiver amplifier.

LIGHTWAVE RECEIVERS

407

the independent noise currents squared are multiplied by the filter response magnitude squared, then integrated. By inspection, this resistor-noise integral is proportional to the bandwidth or to the bit rate B times a numerical factor that depends on the shape of the filter frequency response function F(f).(The same filter function is scaled up or down in frequency for different bit rates.) Changing variables from the frequency f to y = f / B (the frequency normalized to the bit rate) gives (i:)R

4kT

= -I,B, Ri

where

where F * ( y ) is the filter frequency response shape; the numerical factor I , is called the second Personick integral. The next question is how to normalize the channel frequency response F(f). F(f) must give the fraction of the input-noise-current frequency component that appears at the linear channel output to the digital decider. This is best done by first normalizing in the time domain so that a unit input photocurrent pulse maps to a unit output pulse to the decider. The corresponding filter frequency function normalization then applies to the noise as well. The receiver input is a photocurrent; the input pulse shape h,(t) is normalized to correspond to a unit photocurrent over a bit interval T = 1/B:

i T

Jm

h,(t)dt

=

1.

-m

These receivers are used with synchronous decision circuits, which sample the waveform at the center of each bit interval when the pulse is a maximum (logic one bit) or a minimum (logic zero bit). Thus, a unit filtered pulse is one which is of unit amplitude at the sampling instant, t o ; hOlJI(t0)

=

1.

(11 )

Defining H,(f) and H,,,,(f) as the fourier transforms of h,(t) and ho,,(t) then gives the normalized filter function:

Note that, for a given filter design, changing the input pulse shape changes the magnitude but not the frequency dependence of F(f);F(f)is normalized such that a unit average photocurrent pulse of whatever pulse shape chosen gives a

408

GARETH F. WILLIAMS

unit amplitude input at the sampling instant to the decision circuit, but the shape of F( f ) depends only on the channel filter design. The remaining question is how to arrive at the frequency dependence of the channel filter function F( f ); in other words, how to design the channel filter. By inspection, a narrow channel-filter bandwidth means smaller Personick integrals and less receiver noise; however, a narrow filter bandwidth also means slow rise and fall times, which cause the filtered bit-pulse houl(t) to spread into the neighboring bit intervals. Thus, choosing the filter function is a tradeoff between noise considerations in the frequency domain and pulseshape considerations in the time domain. Since the decision circuit samples the waveform at the center of each bit interval, the bit-pulse houl(t) should peak at the center of its own bit interval and should be zero at the centers of the neighboring bit-intervals to avoid inter-symbol interference (ISI). The filter typically is designed by iterating between the time and frequency domains, alternately minimizing the IS1 and the noise bandwidth, respectively. The result usually is a filter bandwidth of 0.5-0.6 times the bit rate for a NRZ signal. For such a typical filter design, I , worked out to be 0.56 (Maione ef al., 1978). The FET gate-leakage current and the pin-leakage current contribute an input shot noise current squared per unit frequency bandwidth of d < i i > , = 2qlidf,

(13)

where the total leakage current I , is the sum of the pin-photodiode leakage current legin plus the FET leakage current I,,,,, and q is the electron charge. Integrating over frequency, as before, gives the total input-leakagecurrent noise in the channel filter bandwidth: (ii>/ =

s.'

(ii),

2yI,I, R,

0s:

=

2 q 1 / l F ( /')I'd/'

(14)

(15)

where I , is the second Personick integral as before. 'The input FET channel noise is csscntially the Johnson noise of the unpinched-off portion of the channlel ncxt to the source. This channel conductance is simply the transconcluctanct: of the FET; the drain current noise per unit bandwidth is then

d ( i , ~=)4 k~ 7 ~: y ,~, w~7 ~ whew

(16)

r is a factor to account lor high-lield effects in short-channel transistors

(Bacchtold, 1972). For I-pm gate length silicon MOSFETs, is typically 1 to 1.2 (Ogawa er ul., 1983);for 1-pm GaAs FETs, I- is typically 1.4 to 1.8 (Ogawa, 198 1 ) .

LIGHTWAVE RECEIVERS

409

This mean-square FET drain noise current must be turned into an equivalent input mean-square noise current. The drain noise current squared can be turned into an equivalent gate (input) noise voltage squared by dividing by the transconductance squared (i,” = gie;):

The corresponding mean-square input noise current is simply the meansquare gate noise voltage divided by the input impedance squared (without feedback.) For small input bias/feedback resistor values, that mean-square equivalent input noise current would be just (ii)FEr= (f?i)FET/R;. However, this case is of no practical interest because the Johnson noise of that smallvalue resistor would swamp the FET noise and ruin the sensitivity. In highsensitivity amplifiers, the input impedance (without feedback) is essentially the input FET gate capacitance, CFET, in parallel with the junction capacitance of the pin photodiode, Cpin,plus any stray capacitance, C,. The equivalent input noise current is then

where the total input capacitance, CT,is the sum of the FET input capacitance, CFET,plus the photodiode capacitance, Cpin,plus C,. The physical assumption behind Eq. (18) is that in the absence of feedback, the photocurrent signal will be integrated by the input capacitance CT( V = l/CT idt);that Ri is so large (for low noise) that it can be neglected over most of the bandwidth. For example, in a 45-Mb/s receiver with Ri = 1 MQ, and C, = 1 pF, C , dominates above 80 kHz. This means that the input voltage produced by the photocurrent is inversely proportional to the frequency; since the input noise voltage is fixed, the signal-to-noise ratio is also inversely proportional to the frequency. When the signal is equalized (differentiated), the low-frequency signals (and noise) are attenuated; the high frequencies are boosted. Thus, with the signal now proportional to the input photocurrent, the noise is proportional to frequency, and the meanaquare noise is proportional to the frequency squared, as in Eq. (IS), Again, using feedback to avoid signal integration does mt change the signal-to-noise ratio versus frequency. The total equivalent man-square input nom current of the FET i s obtained by integrating over the channel-filter Entgue~cyrespagse as before:

410

GARETH F. WILLIAMS

or, changing variables from f to y

=f

/ B as before,

where

is the third Personick integral. The total mean-square equivalent input-noise current due to the input devices (pin photodiode plus FET plus bias resistor) is then (Smith and Personick, 1980) (ii>T

= (i,2>R, + ( C > e

+ (ii)FET

(22a)

or

where ( i i ) R , , ( i i ) d , and (i,Z)FET are given by Eqs. (8), (15), and (20), respectively; the Personick integrals I , and l3 are given by Eqs. (9) and (21), respectively; and I d = Id,,, Idpin and CT = Cpin CFET C,. Equation (22) for the total equivalent input noise due to the input devices contains much of the information about pin-photodiode and FET technology requirements and about amplifier design constraints. In high-sensitivity designs, the input resistor Johnson noise term (i,2)R, is made almost negligible by making Ri large. The new receiver circuits, which can use large Ri values (e.g., 1 - 10 Mi2 at 45 Mb/s) without sacrificing bandwidth, are described in Section V; earlier lower-sensitivity receiver circuits used low values of Ri to achieve the required bandwidth and were dominated by the resultant Johnson noise. (Typically Ri was from a few hundred ohms to a few thousand ohms at 45 Mb/s, giving 1000- 10,000 times more Johnson noise power.) The leakage current shot noise has been made negligible by reducing the pin photodiode and FET leakage currents. This leaves the input FET noise as the fundamental noise source; the receiver noise with low-leakage devices and state-of-the-art circuitry reduces to approximately.

+

+

+

-

(This omits the noise of the rest of the receiver circuit, which will be discussed next in Section III.B.2.) Thus, the circuit-equivalent input-noise power (or mean-square noise current) is approximately proportional to B 3 , C . f / g , and the channel noise

41 1

LIGHTWAVE RECEIVERS

factor r. One can write an input circuit figure of merit that is independent of bit rate (Smith and Personick, 1980):

The receiver optical sensitivity is inversely proportional to the root-meansquare noise current and therefore is approximately proportional to ,/%B-3’2. Note that if the mean-square noise current were proportional to B 2 rather than B 3 ,the photocurrent charge per bit would be constant. In fact, the charge per bit goes up approximately as the square root of the bit rate. 2. Complete Receiver Circuit Noise Expressions with Device Figures of Merit This subsection first derives figure-of-merit expressions for pinphotodiode, FET, and FET IC technologies, then extends the Smith and Personick noise expression of Section 3.2.1 to include the noise from the input FET load device and the rest of the receiver circuit. The resultant expression is used to calculate the theoretical receiver sensitivities of Sections III.B.3 and V.H The front-end figure of merit of Eq. (24) can be rewritten as the product of an FET technology figure-of-merit times a pin-photodiode figure-of-merit. In a typical high-performance FET technology, the source-to-drain spacing or channel length is fixed at the minimum reliable resolution of the lithography. The FET transconductance and input capacitance are both proportional to the channel width (typically a few tens to a few hundred micrometers); the gm/CFET ratio is set by the FET technology. Rewriting gm as ( & , , / c ) c F E T , where CFET determines the FET size, and taking CT = CFET Cpin C, gives

+

+

Differentiating the second bracketed term with respect to CFET says that the optimum-size FET in a given technology has a gate width such that

c,,

= Cpjnf

c, = 9CT.

(254

Assuming such an optimum-sized FET, one can now write = MFET Mpin

9

where the FET-technology figure of merit is

where

fT

= gm/2dFET

is the unity gain frequency of the FET technology.

412

GARETH F. WILLIAMS

Mpinthe pin-photodiode (and stray capacitance) figure of merit is

1 4(Cpin+ C,) The optical receiver sensitivity is, again, proportional to the square root of the product of these figures of merit; these figures of merit will be used in Section V.H to discuss the sensitivity improvements calculated for futuretechnology receivers. The FET figure of merit indicates that high-sensitivity receivers should be made with microwave or VHSIC technologies because such FETs have the highest I;s. The optical sensitivity of such receivers is approximately proportional to the square root of fT; thus, these high-frequency FETs are preferred, even at low bit rates (e.g., 10 Mb/s), but for noise, not frequencyresponse, reasons. The two presently preferred FET technologies are 1-pm gate-length GaAs MESFETs and 1-pm gate silicon MOSFETs. The two technologies offer roughly the same sensitivity in theory; the GaAs has higher g,/C, but the silicon has a lower channel noise factor (Section III.B.3). In practice, GaAs is presently somewhat superior in performance, but silicon is much less costly. The photodiode and stray-capacitance figure of merit of Eq. (28)indicates that the pin-photodiode capacitance Cpinand the stray capacitance C, must be made as small as possible. Assuming an optimum-size amplifier input FET (gate width such that CFET= Cpin+ CJ, the optical receiver sensitivity is inversely proportional to the square root of (Cpin Cs).For small C,, the optical sensitivity is inversely proportional to the square root of the pinphotodiode capacitance. Total front-end capacitances C , are presently about a picofarad or less. Thus, the pin-photodiode technology objectives are low capacitance, low leakage current, and high quantum efficiency. Low capacitance means either a - 10'5/cc) for a wide depletion region, a small diameter low doping (area), or both. For multimode systems, which are becoming obsolescent, the photodiode diameter is typically somewhat larger than the fiber core diameter for efficient optical coupling. Present photodiodes are typically 50-100 pm in diameter. However, the output of a single-mode fiber can be focused to a diffraction limited spot; this makes possible 5- to 25-pm diameter photodiodes. How small a photodiode for a single-modesystem can be made is primarily an optical coupling and device packaging problem; changing the mask to make a smaller diode would be easy. Note that the problem of efficiently coupling a single-mode fiber to a small-area optical device has already been solved for semiconductor laser transmitters. Such packages soon will be low M .

=

+

LIGHTWAVE RECEIVERS

413

cost as laser transmitters ride down the learning curve. A four-fold reduction in capacitance appears easily practical; this would correspond to a 3-dB improvement of the optical receiver sensitivity. However, for low-cost applications, the preferred alternative may be to leave the photodiode diameter large, use the simplest packaging, and concentrate on reducing the doping instead. Thus, a high-sensitivity pin-FET receiver is a combined FET technology problem (high g,/CT), pin-photodiode problem (low Cpin,low leakage current), circuit design problem (large Ri for low Johnson noise while preserving a wide bandwidth and dynamic range), and packaging problem (low stray capacitance C,, small photodiode diameter if economic). The pin-photodiode figure of merit can also be read as a total-inputcapacitance figure of merit. For an optimized receiver, Mpin= 1 / ( 2 c T ) by Eq. (25a); thus, for a given FET figure of merit, the sensitivity of an optimized receiver is inversely proportional to the square root of CT. Actual sensitivity calculations should also include corrections for the noise due to the input FET drain load device and for the noise of following stages. The Smith and Personick noise calculations are readily extendable to include these effects. In the new IC optical receivers of Section V, the input FET's load device Q1 is typically another FET. [In hybrid IC (HIC) optical receivers, the load device is typically a resistor but HIC receivers will become absolescent.] The IC load transistor QLadds an extra mean-square noise current at the drain of QI of (29) k gmLr ~ dj. where gmLis the transconductance of QL. Equation (16) can then be rewritten as (30) d(i:)cirain = 4kT(gm1+ gmL)rdf, d(i:

)L

=4

where gmlis the transconductance of input FET Q 1 . Retracing the derivation of Eqs. (17)-(20) then gives a total mean-square equivalent input noise current due to Q1 (input FET) and QL (load FET) of

where I,, the third Personick integral, is given by Eq. (21). The optimum ratio of QLsize (gate width) to Q1 size is a tradeoff. A large QL gives a higher drain current density in Q1, and therefore a higher gml;a small Q L gives less drain current noise due to Q L . Assuming an optimum QL to Q1 ratio for the particular technology, one can revise Eq. (27) to give a

414

GARETH F. WILLIAMS

figure of merit for the FET IC technology: 2

MI, =

9ml (gml -k gmdrCFET’

(32)

MI, depends only on the IC technology, so long as the input FET Q1 is scaled so that CFET= CT/2and QLis scaled for minimum noise. The receiver figure of merit now is MI, Mpin,where Mpinis given by Eq. (28) as before; the overall receiver sensitivity is approximately proportional to the square root of the receiver figure of merit. As will be shown below, the QLnoise term typically is more important than the all following stage noise terms combined; thus, Eq. (32) is a good IC technology figure of merit and gives good approximate sensitivities. Note that fine-line GaAs depletion-mode MESFETs typically require a higher drain current, hence a larger QL, than fine-line silicon enhancementmode MOS technologies. This effect cancels part of the g,/C advantage of GaAs, as does the higher channel noise factor r in GaAs. The noise of the following stages of the receiver can be represented as an equivalent stage input mean-square noise voltage of d(ef ) per unit bandwidth df. If as-l is the total voltage gain of the s- 1 stages preceding stage s, the mean-square equivalent input noise voltage due to stage s is

-

d(ef)r = d(ef )/a:-

(33) Assuming ( e f ) is constant in frequency, one can repeat the derivation of Eqs. (17)-(20), substituting d(e,Z)I for d(e:)F,T. This gives the mean-square equivalent input noise current due to stage s: (i:)s

=9 ( 2 n C T ) 2 1 3 8 3 as- 1

(34)

The total mean-square equivalent input noise current of the receiver is then 4kT

( i : ) = -I2B Rl

+ 2ql,12B + 4kTr-(2nCT)2 gm 1

13B3

where the first two terms are the input-resistor Johnson noise and the leakagecurrent shot noise, respectively, the third term is the input FET noise, the fourth term is the noise of the input FET load (QL),and the last term is the sum over the noise contributions of the following stages. Generally, the following-stage noise current is less important than the load device contribution; even the second-stage mean-square noise is divided by the first-stage

LIGHTWAVE RECEIVERS

415

voltage gain squared. Thus, Eq. (32) is a good figure of merit for comparing FET IC technologies. 3. Sensitivities of Present-Technology pin-FET Receivers

This subsection calculates theoretical sensitivities for present-technology silicon and GaAs FET IC receivers and compares the two technologies. It also includes sensitivity results from the literature. Section V.H will extend the calculations of this section to include sensitivity calculations for probable future-technology IC receivers. For the numerical sensitivity calculations, g,/C is taken as 70 mS/pF for 1-pm silicon MOSFETs and 90 mS/pF for 1-pm GaAs MESFETs; r is taken as 1.2 for the silicon and 1.5 for the GaAs. Since the FET technology figure of merit is g,/(CT) by Eq. (27), the silicon and GaAs sensitivities are essentially equal in theory; GaAs FETs have higher transconductances, but silicon FETs have lower channel noise. In practice, GaAs designs are presently a few decibels more sensitive. The Personick integrals I , and I , can be roughly estimated by remembering that the channel-filter bandwidth is typically 0.56 times the bit rate B for synchronous detection NRZ receivers. Thus, taking IF(y)(= 1 for y c 0.56 and F ( y ) = 0 for y > 0.56 in Eq. (9) for I , gives I , = 0.56; using this approximation in Eq. (21) for I , gives I , = 0.059. In fact, I , is taken as 0.564 and 1, as 0.0868 (Paski, 1980b). Values for I , and I , for different input and output (filtered) pulse shapes are found in Personick (1973) and in Smith and Personick (1980). Figure 7 shows theoretical digital optical-receiver sensitivities versus bit rate for 1-pm gate length GaAs and silicon IC receivers using InGaAs pin photodiodes. The optical wavelength is 1.3 pm. The calculations assume the new high-sensitivity, micro-FET feedback IC receiver designs, as described in Sections V.A, V.B, and V.F, in which the input/feedback resistor noise is almost negligible; the sensitivities were calculated on paper designs using Eq. (35) for the total receiver noise (see Section V.H). Figure 7 assumes a l-pF total front-end capacitance, e.g., Cpin= 0.40 pF, C, = 0.10 pF, C,,, = 0.5 pF; the silicon input FET then has a transconductance of 35 mS; the GaAs input FET has a transconductance of 45 mS. The photodiode leakage current is taken as 1 nA at 20°C and 15 nA at a maximum operating temperature of 85°C (the 15x increase assumes that the leakage is a G-R current via midgap states.) The bit-error rate is Since the theoretical sensitivities of the silicon and GaAs FET IC receivers are essentially the same, only one curve is plotted for both. Note also that the calculated sensitivities go approximately as B - 3 / 2 ; the first two terms in Eq. (35) for the input noise squared are negligible except at low bit rates; the

416

GARETH F. WILLIAMS

m 0

A

/

a -40 J

-0

& -50 0 W I-

s -60 !Y W

0

--

10 Mb/s

100

I Gb/s

BIT RATE

FIG. 7. Theoretical sensitivities of present-technology pin-FET lightwave receivers. Dots show GaAs FET receiver measurements from the literature; triangles show silicon FET receiver measurements.

others go as B 3 except for the (small) following-stage noise summation, which increases slightly faster than B 3 because the gain per stage is less for wider bandwidth stages. Finally, note that the calculated sensitivities are competitive with those calculated for present APD receivers to 100-200 Mb/s. Section V.H presents further detail on these IC optical receiver sensitivity calculations and extends them to include sensitivity calculations for probable future-technology IC receivers. It also presents scaling laws for the basic IC design of Section V.F and uses the device figures of merit from the last section for an intuitive discussion for the calculated sensitivity improvements to be expected from the future-technology receiver designs. Figure 7 also shows the best experimentally achieved pin-FET receiver sensitivities as of this writing. The GaAs FET receiver measurements at 34 Mb/s, 140 Mb/s, 280 Mb/s, and 565 Mb/s were by D. R. Smith et al. (1982); the 45-Mb/s measurement was by Williams and LeBlanc (1986); the 1Gb/s measurement was by Linke et al. (1983).The 45-Mb/s silicon MOSFET receiver measurement was by Ogawa et al. (1982); the 800-Mb/s silicon result was by Abidi et al. (1984). All were measured at room temperature. Unfortunately, the silicon MOSFET optical receivers are presently a few decibels less sensitive than either theory or RMS noise measurements would indicate. This may be because fine-line silicon FETs were developed as a digital IC technology in which noise was not important. The fine-line GaAs MESFET technology was developed as an analog technology for microwave receivers. Therefore, the GaAs noise problems were important and were solved.

LIGHTWAVE RECEIVERS

417

At present, silicon FETs appear preferable for high-volume, low-cost designs in which a modest sensitivity penalty is acceptable; GaAs takes over for higher-sensitivity, higher-cost designs (e.g., long-haul transmission systems). Silicon may well take over the high-sensitivity applications when its noise problems are solved. On the other hand, the GaAs technology may become cheaper and its scale of integration may be increased. The two technologies are in a race with each other; the silicon technologists must improved their sensitivity; the GaAs technologists must reduce their costs. GaAs FET IC technologies are also presently preferred over silicon for optical fiber receivers above 1 Gb/s, both because of the higher saturation velocity in GaAs and because GaAs FET ICs have lower parasitic capacitances. Short-channel GaAs MESFETs are fabricated on a semiinsulating substrate; the parasitic capacitances to the substrate are very small (Sze, 1981). Short-channel silicon MOSFETs are presently junction isolated and have the full junction capacitance to the substrate (Sze, 1981).In addition, the gates of silicon MOS transistors overlap both the source and the drain; GaAs MESFETs do not have this overlap capacitance; the Schottky gate metallization defines the channel and therefore cannot overlap the source and drain.

C. Avalanche Photodiode Receiver Noise and Sensitivity Calculations In an avalanche photodiode (APD), the primary photocurrent is multiplied by impact ionization in the p-n junction, which is operated reversebiased near breakdown. The multiplied photocurrent signal goes to the receiver circuit. Consider an APD in which the primary photocurrent is multiplied (on average) by a factor (M). If the multiplication process were noiseless and the APD leakage current negligible, the optical signal power required by the receiver would be decreased by a factor (M) and the optical receiver sensitivity increased by a factor (M). In fact, the multiplication process is noisy because it is the result of random impact ionizations. The equivalent mean-square primary photocurrent noise d(i,2), per bandwidth df due to the APD is the shot noise of the photocurrent I,, plus leakage current 1, times the McIntyre excess noise factor F ( ( M ) ) (McIntyre, 1966): (36) d(i,t>D = 2 q U p i l + W ( < W ) d J For noiseless multiplication, F ( < M >) = 1, and the noise is just the photocurrent plus leakage-current shot noise; F((M)) is the factor by which the real avalanche multiplication increases the noise over that of a noiseless multiplication.

418

GARETH F. WILLIAMS

If the avalanche is initiated by injection of photocarriers at one side of the avalanche region, F ( ( M ) ) is given by (McIntyre, 1966)

[

F ( ( M ) ) = ( M ) 1 - (1 - k)

(y; ‘)’I.

(37)

where k is the ratio of the electron and hole ionization coefficients. In silicon, electrons have the higher ionization coefficient; the 0.8-pm wavelength silicon APDs use photoelectron initiated multiplication, and k is the ratio of the hole ionization coefficient to the electron ionization coefficient.In InGaAs/InP 1.3to 1.6-pm wavelength APDs, the avalanche is photo-hole initiated and k is the ratio of the electron ionization coefficient to that of holes. In silicon APDs, k is typically 0.02 (Melchior et al., 1978);in InGaAs/InP APDs, k is typically 0.4 at present. The APD is not the “solid state equivalent of a photomultiplier” because both electrons and holes can impact ionize in an APD; in a photomultiplier, only electrons impact ionize (on the dynodes); there are no holes in a vacuum. In an electron-initiated APD, the electrons of the primary avalanche travel downstream, creating electron-hole pairs by impact ionization; the resultant holes travel upstream and can create more electrons, thus initiating secondary electron avalanches, etc. This hole feedback makes avalanche carrier multiplication much noisier than photomultiplier electron multiplication. When the electron-hole feedback loop-gain becomes unity, the avalanche is selfsustaining and the diode breaks down. The higher the multiplication, the closer to breakdown and the noisier the multiplication. For k = 0 (best case), only one carrier ionizes and the APD multiplication process is similar to that of a photomultiplier. In this limit, F ( ( M ) ) becomes 1 F((M))=2 -(M). For k = 1 (worst case), F ( ( M ) ) becomes F((M)) = (M);

(39)

the avalanche multiplication is then more noisy due to the carrier feedback. In general, the lower the k, the less noisy the multiplication. Thus, the silicon APDs (k 0.02) are much less noisy for a given multiplication than the InGaAs APDs ( k 0.5); unfortunately, silicon is transparent below 1.1 pm and cannot be used for 1.3- to 1.6-pm APDs. Following Smith and Personick (1980), the total equivalent mean-square primary photocurrent noise (if)ph is the APD noise integrated over the receiver bandwidth plus the equivalent mean-square input noise current ( i : )T

- -

419

LIGHTWAVE RECEIVERS

of the receiver amplifier, divided by (M)2: = 2q(iph

+ z8)F((M))zlB

( i 3 T +7 ,

(40) (M) where the integral over frequency is I I B ; I, is the first Personick integral. The receiver amplifier circuit noise (i;)* is given by the expressions of Section 1II.B. When a logic-zero bit is transmitted, the optical signal, hence the photocurrent iph, is ideally zero; the zero-level mean-square noise then is due only to the APD leakage current and to the amplifier noise. When a logic one bit is transmitted, the mean-square noise is increased by 2 q l p h l F((M))Z, B, which is the noise due to the avalanche multiplication of the one-level photocurrent. Since an APD receiver's one-level noise is greater than its zerolevel noise, the decision threshold D is typically set closer to the zero level than to the one level. In practice, the optical transmitter is not completely turned off during zero bits, for reasons of transmitter response speed and (for laser transmitters) for reasons of optical frequency stability. Take the zero-bit optical signal level P(0) as a fraction r of the one-bit optical signal level P(1): (i:>ph

where (iph(0)) is the expected photocurrent at the sampling instant for a zero bit; (iph(l)) is the photocurrent for one bit. r is called the transmitter optical extinction ratio. Ideally, r should be zero; in practice, r is may be as high as 0.2.This means a smaller photocurrent signal component for a given average optical power. In addition, the zero-level photocurrent is the functional equivalent of a leakage current and adds a corresponding noise term to both the zero and one signals. Taking the avalanche-noise amplitude distribution as approximately Gaussian, the error probability or bit-error rate (BER) is given by Eq. (3): BER = f erfc(Q/fi) where, for a zero bit, and ith is the decision threshold and (i:o)'iz is the root-mean-square noise for a zero bit. Similarly, for a one bit,

[Eq. (3), Fig. 4b]. Qo = Q1 = 6 gives a bit-error rate of Following Smith and Personick (1980), taking Qo = Q1 equal to the Q required for the desired, BER, and using the APD receiver noise equation (40)

420

GARETH F. WILLIAMS

to give ( i i 0 ) l / * and ( i i l ) l / ’ in Eqs. (42) and (43), yields the photocurrent signal required for the desired BER. Assuming that zero bits and one bits are equally probable, and converting the photocurrent to an input optical power, gives an equation for the average optical power required for a given BER:

(44) where Q is the photocurrent signal-to-average-noise ratio for the given BER, multiplied by a prefactor that is unity for zero (ideal) extinction ratio r:

(i,2)T is the equivalent mean-square amplifier noise, Idm is the leakage current of the APD that undergoes multiplication, and F ( ( M ) ) is the McIntyre excess noise factor of Eq. (37). Note that a high multiplication ( N ) reduces the effect of the receiver amplifier noise but gives more avalanche multiplication noise and a higher F(( M)) in Eq. (40).A low (M) gives lower avalanche multiplication noise but increases the effect of amplifier noise. The optimum (M) is determined by this trade-off. In theory, Eq. (44)can be differentiated to find the optimum gain; the result is an unmanageable expression of no particular physical interest. In practice, one is much better off using a minimum finder program to find the optimum gain (M ) and the minimum qfj numerically. InGaAs/InP heterostructure APDs are presently preferred for 1.3- to 1.6-pm applications because they offer low leakage currents and a reasonable k-ratio. The light is absorbed in a low-field InGaAs (E, = 0.73 eV) layer, and the avalanche multiplication takes place in a high-field InP layer (E, = 1.35 eV). The use of a wide-gap avalanche region avoids the tunneling current problem of all-InGaAs APDs (Nishida et al., 1979; Kanbe et al., 1980; Susa et at., 1980). Until recently, these APDs were slow due to hole trapping by the InGaAs-to-InP valence-band step; this problem was solved by adding several intermediate band-gap layers between the InGaAs and the InP or by continuously grading the composition between InGaAs and InP (Matsushima et at., 1982; Campbell et at., 1983). Figure 8 shows theoretical InGaAs/InP APD receiver sensitivities versus bit rate. The figure assumes that the APD is used with an optimized 1-pm gate

LIGHTWAVE RECEIVERS

42 1

-40

n w

L I-"

w n

-50 nGoAs/TnP APD

-60 10 Mb/s

100 BIT RATE

1 Gb/s

FIG.8. Theoretical InGaAs/InP APD receiver sensitivities. 1 = 1.3 pm, I , = 3 nA at 20"C, 45 nA at 85"C, CT = 1 pF, high-sensitivity receiver ICs. Dots show APD receiver measurements from the literature.

technology GaAs FET receiver amplifier, which is the most sensitive amplifier type at present. It also assumes a total receiver input capacitance of 1 p F (this includes the APD capacitance), a primary leakage current at 20°C of 3 nA, and an APD k-ratio of 0.4. Figure 8 shows the pin-FET receiver sensitivities from Fig. 7 for comparison, plus the 20°C and 85°C InGaAs/InP APD sensitivities. The 85°C leakage current is taken as 45 nA. This assumes that the APD leakage current is a generation-recombination current via midgap states in the InGaAs, which gives a 15x increase in leakage current from 20°C to 85°C. Figure 8 also shows InGaAs/InP APD-FET receiver measurements from the literature. The measurements at 420 Mb/s and 1 Gb/s were by Campbell et al. (1983); those at 2 Gb/s and 4 Gb/s were by Kasper et af. (1985). All are the best values at those bit rates, as of this writing; all used the APD of Campbell et al.. Note, however, that these measurements were made at room temperature; however, in field use, the maximum operating temperature is typically 85°C. Unless the receiver and APD temperature is controlled, the 85°C sensitivity is what matters in practice. Figure 8 shows that present InGaAs/InP 1.3- to 1.6-pm APDs in theory offer little sensitivity advantage below 100 Mb/s at a maximum operating temperature of 85°C. This is due to the noise caused by the avalanche multiplication of the primary leakage current; an APD without leakage

422

E

m

GARETH F. WILLIAMS

-20

85°C

s

a

W

in5 2

I

-30

-40

0

t 0

-50

W

10 Mb/s

100

1 Gb/s

B I T RATE

FIG.9. Theoretical Ge APD receiver sensitivities. I = 1.3 pm,I, = 100 nA at 20”C, 1 pA at 8 5 T ,CT = 1 pF, high-sensitivity receiver ICs.

current would offer a sensitivity improvement at all bit rates. In addition, reducing the leakage current is presently more important than improving the k-ratio for bit rates less than 500 Mb/s at 85°C. Germanium APDs have also been used in 1.3- to 1.6-pm optical receivers (Mikawa et al., 1981). Present Ge APDs have very high leakage currents but until recently were faster than InGaAs/InP APDs and therefore once were preferred for Gb/s applications. Figure 9 shows theoretical germanium APD sensitivities versus bit rate assuming 16 = 100 nA at 20°C 16 = 1 pA at 85°C k = 0.5, and a GaAs IC front end. The InGaAs/InP APD is presently superior to the germanium APD at every bit rate. In fact, at 85”C, InGaAs pin photodiodes are superior to germanium APDs for bit rates below 300 Mb/s.

-

IV. FIRST-AND SECOND-GENERATION LIGHTWAVE RECEIVERS

This section traces the evolution of lightwave receivers from the early voltage-amplifier and transimpedance designs, in which the sensitivity was limited by circuit noise, up through the present versions of the integrating or “high impedance” receiver, in which the sensitivity is limited primarily by the fundamental device noise. Unfortunately, the high-sensitivity, integrating receivers require equalization, with its attendant problems, and have very limited dynamic ranges. Therefore, almost all commercial systems still use the transimpedance receiver, despite its lower sensitivity. In time, both transim-

LIGHTWAVE RECEIVERS

423

pedance receivers and integrating receivers will be displaced by the highsensitivity, nonintegrating active-feedback IC receivers of Section V. Subsection 1V.A describes the early voltage-amplifier optical receivers. These receivers had very low sensitivities but were the first lightwave receivers. Subsection 1V.B describes the simple integrating optical receivers (Personick, 1973; Goell, 1974), which can achieve very high sensitivities but are vulnerable to saturation on long strings of ones or zeroes in the data. This problem was solved by encoding the data stream (Brooks, 1980); upgraded versions of these receivers have been introduced in a few 1.3-pm transmission systems (Hooper et af.,1980; D. R. Smith et a]., 1980). Subsection 1V.C describes conventional transimpedance receivers. These receivers do not integrate the signal but are less sensitive than integrating receivers. These were first used in 0.8-pm transmission systems; a silicon APD detector was used to make up for the lower sensitivity (R. G. Smith et al., 1978). A bipolar IC version for 1.3-pm systems has achieved good sensitivities at bit rates of several hundred Mb/s while using a nonmultiplying pin photodiode (Paski, 1980; Snodgrass and Klinman, 1984). Subsection 1V.D describes integrating transimpedance receivers (Gloge et al., 1980; Ogawa et al., 1983). These receivers are as sensitive as the simple integrating receivers of Subsection 1V.B and d o not require encoding of the data stream because they integrate the signal over less of the bandwidth. However, the integration pole frequency is temperature sensitive; these receivers are difficult to equalize reliably, e.g., if the temperature can vary over a commercial temperature range of - 25°C to + 85°C. A. Voltage-Amplijier Optical Receivers

Figure 10a shows a generalized schematic of a voltage-amplifier optical receiver. The photocurrent develops a voltage across a series load resistor RL; this voltage is then amplified by a voltage amplifier of gain A. Ideally, the output voltage is vOut = ARLiph.Note, however, that RL must be small enough to keep the photocurrent signal from being integrated by the photodiode capacitance Cpinplus the capacitance Cx due to the input transistor of the amplifier, plus any stray input capacitance C,. Explicitly, R , must be less than the capacitive impedance l/(oc,)over the desired bandwidth, where C , = Cpin+ C , + C, is the total capacitance seen by the photocurrent (Fig. lob). For a bandwidth, or pole frequency, greater than the bit rate B,

I 1/(27CC,B). (46) Such voltage-amplifier receivers have low sensitivities because of the RL

Johnson noise current of the small-value RL needed to avoid signal integration

424

GARETH F. WILLIAMS

FIG. 10. Voltage amplifier receiver: (a) receiver circuit, (b) equivalent circuit.

by CT(Personick, 1973b);by Eq. (8), 2

4kT

= -I2& RL

(47)

Consider a 44.7-Mb/s design example; take CT = 4 pF (typical for early receivers). By equation (46), RI I 8 9 0 R. Assuming a noiseless voltage amplifier,so that the Johnson noise associated with R, is the only noise source, the best possible sensitivity for the voltage-amplifier receiver at 1.3 pm [by Eqs. (1) and (811 is only -35 dBm optical. The state-of-the-art receiver sensitivity at that bit rate is -51.7 dBm, an improvement of 16.7 dB optical or 33 dB electrical over the voltage-amplifier receiver circuit. If C, for the voltage-amplifier receiver were reduced to the 1 pF of present state-of-the-art receivers, the sensitivity penalty still would be 13.7 dB optical. The simple voltage amplifier optical receiver now is not preferred for any application and has disappeared from the literature. B. Integrating Optical Receivers

The integrating receiver or “high-impedance” receiver (Fig. 1la) can achieve excellent sensitivities because R , is made large enough so that its Johnson noise is negligible, (Personick, 1973a, 1973b; Goell, 1974; Hooper

LIGHTWAVE RECEIVERS

425

et al., 1980). However, the photocurrent signal is then integrated by C,; the signal must be differentiated (equalized) later, as shown. Waveforms are shown in Fig. 11b. For 45-Mb/s receivers using state-of-the-art photodiodes and 1-pm gate-length GaAs FETs or silicon MOSFETs, RL should be 2 1 MR for best sensitivity. For R , = 1 MR and CT = 1 pF, the resultant input current-to-voltage pole is 160 kHz (Eq. 46); above 160 kHz the photocurrent signal is integrated by C,. Thus the output signal, uo, is integrated over most of the signal bandwidth, even though the voltage gain, A, is flat. For the 45 Mb/s NRZ case, the Nyquist signal bandwidth I,B is -25 MHz; thus, the signal is integrated over seven octaves, from 160 kHz to 25 MHz. The equalizer must differentiate the signal over these same seven octaves; to do this, the equalizer zero frequency is set equal to the input pole frequency. The equalization technique of Fig. 1 l a introduces a noise penalty. The reason is that the passive equalizer attenuates the signal; the peak attenuation is theequalization ratio I , B / f , = 25 MHz/l60 KHz = 156X for the45-Mb/s example. This signal attenuation enhances the effect of noise in stages following the equalizer, resulting in an equalizer noise penalty; here, if A is less than 156, the low-frequency signal voltage after the equalizer is smaller than the signal at the amplifier input! Active equalization, which would reduce this

EQUALIZER

n

.

: *TIME

FIG. 11. Integrating receiver with equalizer:(a) circuit, (b) waveforms.

426

GARETH F. WILLIAMS

noise penalty, is difficult for these high frequencies and large equalization ratios, due to stability problems. In theory, this equalizer noise penalty may be reduced by increasing the gain, A, before the equalizer. However, for a random bit stream (maximum information content), increasing A reduces the photocurrent for which saturation of the integrating amplifier on long strings of ones (or zeroes if ac coupled) causes an unacceptable bit-error rate. This reduces the dynamic range, resulting in a sensitivity versus dynamic-range tradeoff. Even with minimal dynamic range, A is still limited and the equalizer noise penalty is still appreciable at high bit rates. A second sensitivity versus dynamic-range tradeoff is involved in choosing RL. A high R, improves sensitivity by reducing the input Johnson noise current but, for the maximum-information-content random-bit-stream, increases the probability of saturation on long strings of ones or zeroes because the integration pole frequency, fp, is lower. A low R, reduces sensitivity but improves dynamic range; both the Johnson noise and fp are increased. Both sensitivity versus dynamic-range tradeoffs may be reduced by encoding the data stream to limit the number of consecutive ones or zeroes; a 7B/8B encoding technique (Brooks, 1980) is used by the British Post Office (BPO) and others. In this technique, eight bits are transmitted for every seven bits of data; the redundancy allows the total disparity (difference between the number of ones and zeroes transmitted) to be kept close to zero. This removes the very-low-frequency signal components; large equalization ratios are not needed; the equalizer attenuation is reduced, and the allowable gain before the equalizer is increased; the equalizer noise penalty almost vanishes, and RL can be made large. These encoding techniques make the simple integrating receiver of Fig. 11 practical for transmission applications. However, there is a small sensitivity penalty because more bits must be transmitted to pass the same amount of information. If the receiver is FET noise limited, the mean square noise current goes as B 3 (Eq. 35) and the optical sensitivity goes as B-3'2. A 7B/8B code requires an 8/7 higher bit rate, reducing the optical sensitivity by a factor (7/8)312or 0.9 dB optical. In addition, for encoding schemes in which any error in an eight-bit transmitted block causes all seven recovered data bits to be in error, the bit-error rate (BER) is eight times larger for a given signal-to-noise ratio. Using Eq. (3) or Fig. 5 for the BER versus signal-to-noise ratio says that this is equivalent to an additional encoding sensitivity penalty of 0.2 dB optical for a total encoding sensitivity penalty of 1.1 dB. If a seven data bit plus one sign bit-encoding scheme is used, only a sign bit error will cause seven data bit errors; the BER is only doubled; the additional sensitivity penalty is then only 0.08 dB, for a total encoding sensitivity penalty of 1 dB.

-

-

427

LIGHTWAVE RECEIVERS

C. Transimpedance Optical Receivers

Transimpedance receivers were developed to increase the sensitivity achievable without signal integration; the ideal transimpedance circuit of Fig. 12a theoretically allows R, to be increased to the same value as in an integrating receiver, but without integrating the signal. The reason is that R , (now R F )is connected around the voltage gain element; this negative feedback raises the input pole frequency. For the ideal circuit of Fig. 12a,

ir

=

= uln(1

+A)

Rf RF This produces a virtual input resistance (Fig. 12b) of a

re should be made small enough to keep the photocurrent signal from being integrated by the receiver input capacitance CT;explicitly, re should be less than the capacitive impedance l/(oC,) over the signal bandwidth. Thus, the receiver bandwidth is simply the input pole frequency due to re in parallel with C, :

re=

RF/(A+I 1

FIG.12. Ideal transimpedancereceiver: (a) circuit, (b) equivalent circuit.

428

GARETH F. WILLIAMS

Thus, the ideal transimpedance feedback configuration would theoretically increase the input current-to-voltage pole frequency or bandwidth by a factor of the gain, A, plus 1. Ideally this pole could be placed above the signal bandwidth by increasing A, eliminating integration of the signal. This would eliminate the need for equalization or coding and thus eliminate the associated noise penalty. Unfortunately, real feedback resistors include a parasitic feedback capacitance C, shunting the physical RF (Fig. 13a). For frequencies greater than fpR = 1/(2nR~C,), the feedback resistor acts like a feedback capacitor and the signal is integrated even for large A. (The exact pole frequency is fp = 1/[2dF(CR + C,/(A + l))] by inspection of Fig. 13b). In addition, even if an ideal feedback resistor had been available, the hybrid IC gain elements used in most conventional transimpedance receivers do not have enough voltage gain, A, to place the input pole frequency, f p , above the passband. Adding stages would add extra phase shift inside the feedback loop and typically would cause instability problems. Thus, the usual approach was to use a low-value R, to avoid signal integration due to C, and to reduce the gain required; the resultant Johnson noise penalty was accepted. The 0.8-pm wavelength, 45-Mb/s receivers of R. G. Smith et al. (1978) used R, = 4 kR, and used a silicon APD to make up the sensitivity loss. The later 1.3-pm wavelength, 274-Mb/s design of Ogawa

"0

0

re

= RF/(Atl)

FIG.13. Real transimpedance receiver: (a) circuit with parasitic capacitances, (b) equivalent circuit.

429

LIGHTWAVE RECEIVERS

and Chinnock (1979) used RF = 5 kR and accepted a modest signal integration in return for a somewhat smaller sensitivity penalty. Paski (1980) demonstrated both approaches in his experimental silicon bipolar receiver ICs; the commercial silicon bipolar receiver IC by Snodgrass and Klinman (1984) was designed to avoid signal integration. D. Integrating Transimpedance Optical Receivers

The transimpedance amplifier circuit now is sometimes used as an integrating receiver (Gloge et al., 1980; Ogawa et al., 1983); this reduces the large equalization ratio often required with the simple integrating amplifier of Fig. 11. The best conventional high-value resistors have C, 0.05 pF. Thus, for the 45 Mb/s case discussed earlier, taking R , = 1 MR, the amplifier must integrate the signal abovej, = 3.2 MHz. Equalization is still required, though the equalization ratio is typically 10-20 times less than for the simple integrating receiver example of Fig. 11. The equalizer noise penalty and the dynamic range/sensitivity tradeoffs for a nonencoded bit stream are correspondingly improved; at low bit rates (B < 100 Mb/s), further improvement might be obtained by active equalization. However, equalization noise is entirely eliminated for all bit rates by the nonintegrating receivers of Section V; the IC implementations are also much cheaper to manufacture than equalized receivers. Both the simple integrating amplifiers of the BPO (Hooper et al., 1980; D. R.Smith ef al., 1980) and the transimpedance integrating amplifiers of Ogawa et al. (1983) use a forward voltage amplifier with a FET input transistor followed by a bipolar junction transistor (BJT) cascode. The folded cascode of Ogawa et al. is illustrated in Fig. 14.The input transistor Q1is a l-micron gatelength GaAs FET for low noise; this is followed by a level shifting P N P cascode stage Q2 and an emitter follower output buffer Q 3 .The input voltage, F,,, creates a signal current Fngmlat the drain of Ql.The emitter of Q2 absorbs most of this current because it is a forward-biased diode and acts like a short. The Q1drain signal current passes through Qz to Rez; the amplifier voltage gain is then A = -gmlre~, (5 1)

-

where gmlis the input FET transconductance. Typically these amplifiers are implemented in hybrid IC (HIC) form (- 1 mil conductor and resistor patterns on a ceramic substrate with discrete semiconductor device chips). For the BPO -type receiver, the input resistor would be connected to a bias source; for a transimpedance design, the input resistor is connected to the output. One problem with GaAs FET transimpedance integrating amplifiers using the voltage gain element of Fig. 14 is that their response pole frequency, f,, is

430

GARETH F. WILLIAMS

. I

\I

I

I I

I I

I

I

"in

I I

I

I

' '-h v

s

FIG.14. FET-BJT cascode voltage amplifier.

approximately inversely proportional to the absolute temperature. For a -25°C to + 85°C temperature range in the field, f, would vary by 1.45:l. The equalizer zero frequency must track this variation in the receiver pole frequency for reliable operation; such tracking is difficult to achieve in practice. This problem arises because f, is approximately proportional to the voltage gain A (Eq. 50), A is proportional to the input FET transconductance gml (Eq. 51), and gml is proportional to the electron velocity in the channel. Now the electron velocity in GaAs near 300°K is approximately inversely proportional to the absolute temperature, due to phonon scattering (Ruch and Fawcett, 1970). This applies from low fields through to velocity saturation. Thus, f , goes approximately as 1/Ttoo. The simple integrating receiver with restricted disparity encoding and limited equalization used by the BPO (Hooper et al. 1980; D.R. Smith et al., 1980) neatly side-steps the equalization tracking problem. The encoding removes the low frequenciesfrom the signal; since no feedback is used, the pole frequency (above which the signal is integrated) is below the lowest frequency signal component. Exactly where the pole is does not matter; the equalizer can simply differentiate over the entire spectrum of the signal. The BPO design also allows ac coupling throughout the amplifier. Of course, the nonintegrating, high-sensitivity active-feedback receivers of Section V avoid equalization problems entirely. Finally, for both types of integrating receiver, the maximum photocurrent is still restricted by the dc drop across RF,even if encoding is used. For the 45-Mb/s case cited, assuming 5 V supplies, this corresponds to a maximum

LIGHTWAVE RECEIVERS

-

43 1

photocurrent of is 2 to 4 PA, or a dynamic range of 22 dB. Thus, the best integrating receiver dynamic range still implies the need for field-installed attenuators in many applications. N

V. ACTIVEFEEDBACK LIGHTWAVE RECEIVER CIRCUITS A. Introduction The new active-feedback receivers of this section are the first lightwave receiver circuits to achieve high sensitivities without integrating the signal and the first to achieve wide dynamic ranges without compromising the sensitivity. In addition, the IC designs can be realized very inexpensively for commercial applications. The high sensitivity, plus the response, dynamic range, and cost advantages mean that these new lightwave receivers are suitable for localarea-network, datalink, and loop-plant applications, as well as for the longhaul transmission systems addressed by most previous receiver designs. The previous lightwave receiver designs of Section IV offered a choice between high-sensitivity circuits that integrate the signal and lower-sensitivity transimpedance circuits that do not. The high-sensitivity circuits were eventually adopted for a few leading-edge transmission systems; however, because they integrate the signal, they have limited dynamic range and require subsequent equalization, with its attendant problems and sensitivity penalty. In addition, the data stream often must be encoded, introducing another modest sensitivity penalty. Finally, it would be almost impossible to extend the dynamic range of these integrating receivers; most possible techniques would vary the input pole frequency above which the signal is integrated, thus requiring a tracking equalizer. The active-feedback receivers described in this section comprise a basic nonintegrating high-sensitivity receiver, plus a dynamic-range-extender/ automatic-gain-control (AGC) circuit. The basic receiver can be realized using either the capacitive feedback circuit or the micro-FET feedback circuit; the former is preferred for hybrid IC designs, the latter is preferred for monolithic IC designs. Both basic receivers offer excellent sensitivity, but their dynamic range is little better than that of previous designs. Accordingly, the dynamic range extender circuits are necessary for most loop, localarea-network, and datalink applications and are useful for many transmission applications. These active-feedback receivers are somewhat more sensitive than the integrating receivers of the prior art; the equalizer and its noise penalty are eliminated, encoding and its noise penalty are unnecessary, and the dynamicrange-extender circuits permit the basic receivers to be optimized solely for

432

GARETH F. WILLIAMS

sensitivity, eliminating all sensitivity versus dynamic-range tradeoffs. The dynamic range is essentially unlimited (54 dB optical, 108 dB electrical demonstrated) and extends to higher optical powers than available from present transmitters. A hybrid IC active-feedback receiver was demonstrated and monolithic IC active feedback receivers were proposed by Williams (1982, 1985); the first such IC was demonstrated by Fraser et al. (1983). IC versions are now in production by AT&T Technologies(Morrison, 1984;Steininger and Swanson, 1986), and a l.ir-Gb/s implementation has been announced. (Dorman et al., 1987). However, these commercial designs are beyond the scope of this section. In theory, the ideal transimpedance amplifier of Fig. 12a (Section 1V.C) could have been used for the basic nonintegrating, high-sensitivity receiver. However, as was discussed in Section IV.C, previous attempts to realize this basic receiver encountered two problems, neither of which was solved in those designs. Both problems were caused by the need to use a large-value feedback resistor R , for low Johnson noise. The first problem was that the gain elements used did not have enough voltage gain to avoid signal integration by the receiver input capacitance if a large-value feedback resistor was used. Adding stages to these designs would have added phase shift within the feedback loop, which typically would cause ringing or instability. Solutions to this problem will be covered in the sections following. The second, and more significant, problem was that conventional largevalue feedback resistors act like feedback capacitors over most of the bandwidth, owing to their parasitic shunt capacitance CR (Fig. 13a). This causes signal integration over most of the bandwidth; thus, using a large-value conventional feedback resistor for high sensitivity gives the integrating transimpedance receiver of Section 1V.D. To avoid this problem in a typical 45-Mb/s design with R, = 2 MQ would require CRto be less than CkW1 pF, which is impossible with conventional feedback resistors. There are two ways to solve the problem of the parasitic feedback capacitance, C,, and make a nonintegrating high-sensitivity receiver; C, can either be eliminated by use of a nonconventional feedback resistor or CRc a be used as part of the feedback element. Section V,B discusses the micro-FETfeedback basic receiver, in which CRis eliminated by using a special-design micro-FET as the feedback resistor; these designs are preferred for IC implmentations. Section V.C discusses the capacitive feedback basic reccmr, in whit24 C, is used as part of the feedback element; these designs ale pse, fened for hybrid IC (HIC) implementations. Section V.D then describes the dynamic-range-extender/AGC circuits used with thaw two b a ~ creccivet

LIGH'I'WAVI: REC'EIVEHS

433

circuits to form the complete high-sensitivity, wide-dynamic-range, nonintegrating receivers. The remaining four scctions discuss complete active-feedback receiver designs. Section V.E describes a hybrid IC design using the capacitive-feedback basic receiver circuit. Section V.F describes FET IC designs using the microFET-feedback basic receiver. The IC designs are preferred for commercial applications. Two principal IC circuit examples are presented, with a discussion of NMOS, CMOS, and GaAs implementations. Scction V.G discusses how these 1C receiver designs are scaled to different bit rates; Section V.H then presents calculations of IC receiver sensitivities for bit rates between 10 Mb/s and 4 Gb/s. These calculations include scnsitivities of receivers using present pin-photodiodc and FET IC technologies, plus sensitivities to be expected from probable future receivers, given expected advances in pin-photodiode and FET IC technologies. The scnsitivity improvements expected in these future-technology receivers are then explained on the basis of the device figures-of-merit of Section III.B.2. Finally, note that although the new active-feedback receiver designs behave much like the ideal transimpedance receivers of theory, thecircuits are quite different. Both active-feedback designs use unconventional feedback elements never contemplated in the simple ideal-transimpedancc receiver; for csitmpk. both include irn itctivc coniponcnt (c.g.,;I tritnsistor)in thc feedback circuit. A second dilli-rcncc is that h ~ t hiictivc-f~cdhiickreccivers tisuitlly include one of the new dynainic-raiigc-e~tetider~ AGC circuits, which givcs these receivers thc tlytiitnjic rangc needed for most loop, Icxal-area-network. iiiid ilatalink applicittions. The new A
434

GARETH F. WILLIAMS

'sd 'RF

FIG. 15. Micro-FET feedback nonintegrating receiver: (a) circuit, (b) simplified equivalent circuit. [After Williams, 1982, 1985.1

performance, yet can be realized very inexpensively in IC form. Thus, these designs are preferred for commercial applications; typically, one of the dynamic-range-extender circuits of Section V.D is then integrated on the same chip as the basic receiver. The three design goals for these new receivers are a high sensitivity, a wideband nonintegrating response, and stability against ringing or oscillation. The noise analysis of Section 1II.B says that a high-sensitivity receiver must use a high-resistance feedback element for low Johnson noise, must have the minimum possible input capacitance C, (typically 5 1 pF), and, at present, should be realized in either fine-line GaAs FET or fine-line silicon MOSFET IC technologies. Conventional large-value feedback resistors cannot be realized in either IC technology; the feedback micro-FET used in the new receivers is easy to make in both. For a wide-band nonintegrating response, the current feedback must be independent of frequency; as was mentioned in Section V.A, even a small parasitic shunt feedback capacitance associated with a high-resistance feedback element will cause signal integration. In the example of a 45-Mb/s receiver with a feedback resistance of 2 MR, the parasitic shunt capacitance had to be less than 0.001 pF, which was impossible with conventional feedback resistors. This parasitic shunt capacitance is essentially eliminated in a properly designed feedback micro-FET. Finally, the

LIGHTWAVE RECEIVERS

435

current feedback through Q F must be large enough to avoid signal integration by the receiver input capacitance CT.Since the feedback resistance R F = rsd must be large, the forward voltage gain A must be large. The problem is how to achieve this large voltage gain without inducing instability or oscillations; fortunately, the stability requirements for the micro-FET receiver are very lenient. This section first discusses the micro-FET feedback element used in these receivers, then covers the forward voltage gain and stability requirements. The gain and stability requirements indicate that the forward voltage amplifier used in these receivers can be a simple multistage design, with a bandwidth only slightly greater than the photocurrent signal bandwidth. Such voltage amplifiers are readily realized in any of the fine-line FET IC technologies. The special-design feedback micro-FET (QF) used in these receivers (Fig. 15a)acts as a feedback resistor (Fig. 15b) because it is operated near zero source-to-drain voltage, in the linear drain-current versus drain-voltage region. The channel is then an undepleted bar of semiconductor with the number of carriers, hence resistance, determined by the gate bias. The parasitic feedback capacitances are tiny (C, 0.001 pF) because Q F is tiny, typically 1-2 pm wide by 1-5 pm from source to drain, with a total area of only a few square micrometers. Such transistors are readily fabricated in the high fT/low noise 1-pm fine-line silicon and GaAs FET technologies presently preferred for IC receivers. Feedback resistances greater than 1 MR are readily achieved by biasing the gate of Q F barely above the turn-on threshold. Thus, Q F functions as an ideal feedback resistor with a negligible parasitic feedback capacitance and makes possible high-sensitivity, nonintegrating IC receivers. Unfortunately, Q F only acts as a resistor for voltages less than the gate voltage above the turn-on threshold. The channel of Q F pinches off whenever the output voltage swing is greater than the gate voltage above threshold; Q F then acts as a constant current source. Thus, the maximum photocurrent through QF is approximately the gate voltage above threshold divided by the feedback resistance; accordingly, the dynamic range of the basic IC receiver of Fig. 15a is somewhat limited. In fact, the resistive micro-FET feedback receiver would be commercially impractical for some applications if used without a dynamic-range-extender circuit; fortunately, the dynamic-range extenders of .Section V.D are readily integrated on the same IC as the basic receiver at essentially no additional cost. Furthermore, most applications (e.g., datalinks, local-area networks or loop feeders)would require a dynamic-range extender in any case. The complete IC receiver designs of Section V.F include both the basic micro-FET feedback receiver and a dynamic-range extender on the same IC. The forward voltage gain, A, must be large enough so that the current feedback through Q F is sufficient to keep the photocurrent signal from being

-=

436

GARETH F. WILLIAMS

integrated by the receiver input capacitance C,. (C, is the sum of the photodiode capacitance Cpin,plus CFET due to the receiver input FET, plus the input stray capacitance Cs.) Since the simplified equivalent circuit of Fig. 15b is identical to the ideal transimpedance amplifier of Fig. 12a (Section IV.C), the receiver bandwidth is given by Eq. (50),taking RF = i,d of Q F . For a nonreturn-to-zero bit stream, the signal bandwidth is 0.56 times the bit rate B (Section 1II.B.i). If the receiver bandwidth is set equal to the bit rate B, the minimum forward gain is then A =~~RFCT -B1.

(52)

For high sensitivity, CT is made as small as possible and R F is made large enough so that its Johnson noise current is minor compared to the noise of the receiver input FET. For the 45-Mb/s receiver example, in which C, was 1 p F and R F was 2 MR, Eq. (52) gives a minimum voltage gain, A, of 570. The principal stability requirement is that the voltage-amplifier pole frequencies be above the receiver bandwidth so that the phase shift in the voltage gain A is less than 45" over the receiver bandwidth. Now the receiver photocurrent response was determined by the current feedback through Q F , which therefore had to be frequency independent; the stability is determined by the voltage feedback, which is not frequency independent. The voltage feedback is generated by the frequency-independent current feedback driving the input capacitance CT (Fig. 15b). Thus, the voltage feedback is inversely proportional to frequency over most of the loop bandwidth because the feedback current is integrated by CTfor frequencies above the R,CT voltagefeedback pole frequency. (For the 45-Mb/s receiver example with RF = 1 MQ and C , = 1 pF, the voltage-feedback pole frequency is 159 kHz). Thus, the voltage-feedback pole must be used as the dominant voltage-loop pole; since the voltage feedback gives a 90" phase shift over most of the loop bandwidth, 90" more due to poles in A would give positive feedback and cause oscillation. In practice, the phase shift in A should remain less than 45" as the frequency is increased, until the RFCT feedback integration carries the loop voltage gain below unity. Mathematically, this unity loop gain frequency works out to be equal to the photocurrent-to-output-voltage response-pole frequency, or receiver bandwidth; thus, the stability requirement is that the phase shift in the forward voltage gain A be less than 45" over the receiver bandwidth. Thus, the voItage amplifier used in these IC receivers can be a simple, multistage video amplifier, with a bandwidth a little greater than the receiver bandwidth so that the phase shift in A is less than 45" over the receiver bandwidth. The out-of-band rolloff is not important because the out-of-band voltage-loop gain is less than unity. These lenient stability requirements allow these receiver designs to be scaled to bit rates well in excess of 1 Gb/s.

LIGHTWAVE RECEIVERS

437

C . Capacitive Feedback Receivers

Hybrid IC active feedback receivers use the capacitive-feedback, nonintegrating current-amplifier circuit of Fig. 16 (Williams, 1982, 1985; Williams and LeBlanc, 1986); as mentioned, any conventional large-value hybridtechnology feedback resistor is in fact a resistance in parallel with a parasitic capacitance, CR,and acts as a feedback capacitor over most of the bandwidth. The feedback current, if, is (approximately) the total feedback capacitance C, = Cl, C , times the time derivative of the applied voltage, u,; uf is the integral of the output voltage, v,. Since if is the derivative of the integral of v,, if is proportional to v,; this capacitive feedback element driven by an integrator acts like an ideal feedback resistor. Thus, the capacitive feedback amplifier functions as an ideal nonintegrating transimpedance current amplifier. As mentioned in Section (V.A), the usable hybrid IC voltage gain A is low because of the extra gain rolloff and phase shift caused by the relatively large hybrid circuit capacitances. In the capacitive feedback current amplifier, the missing gain is supplied by the feedback integrator, which is typically an ordinary amplifier stage with an extra collector capacitance to make it integrate. In addition, the photodiode capacitance Cpinis almost always used

+

Vf

FIG.16. Capacitive feedback nonintegrating receiver: (a) basic circuit ( C ; is optional), (b) photodiode capacitance feedback receiver (C, acts as Cb). [After Williams, 1982, 1985.1

438

GARETH F. WILLIAMS

as part of the feedback element by ac coupling uf to the bias side of the photodiode, as shown in Fig. 16b; the high-frequency feedback impedance is then lo2 times smaller than that of R , alone. The input-voltage-to-of highfrequency gain needed is reduced by the same factor (Section V.E). Note that using the photodiode capacitance as C;: adds no extra capacitance to the frontend capacitance CTand therefore does not reduce the receiver sensitivity. A complete hybrid IC capacitive feedback receiver using capacitive feedback through the photodiode will be discussed in Section V.E; the frequency-response and loop-stability considerations will be presented with that circuit.

-

D. Dynamic Range Extenders

Both the resistive FET feedback IC basic receiver of Fig. 15 and the capacitance feedback hybrid IC basic receiver of Fig. 16 offer excellent sensitivity, but their dynamic range is typically little better than that of previous designs and is not enough for many commercial applications Therefore, the preferred receiver designs also include dynamic-range extender/AGC circuits. Both the IC and the hybrid nonintegrating basic receivers can be represented by the equivalent circuit of Fig. 15b because both act as ideal transimpedance amplifers (Fig. 12a). Therefore, the same AGC circuit works for both. To the AGC circuit, both basic receivers look like a voltage amplifier, of gain - A , with the input shunted to ground by an almost noiseless virtual input resistor, r e , produced by the feedback, as was shown in Fig. 12b. The input is also shunted by the total capacitance, CT;however, to avoid signal integration, the design is such that re is low enough to be the dominant input admittance over the signal bandwidth. The preferred dynamic-range extension/AGC technique (Williams, 1982) is to add a variable resistance input shunt device R , to divert the excess photocurrent from the input of the high-sensitivity basic receiver (Fig. 17a);R , and re divide the photocurrent. (Note that in Fig. 17a, the basic receiver is represented by the equivalent circuit of Fig. 12b.) Usually (Fig. 17b)the variable input shunt device is a FET (Q,) operated in the linear drain-current-versus-drain-voltage region; the shunt resistance is the source-to-drain resistance of Q,. The value of R , is controlled by the gate voltage of Qs. The AGC servo circuit varies the value of R , to maintain the amplifier output signal voltage, no, constant over the input shunt AGC range. The basic receiver shown in Fig. 17b is the resistive FET feedback IC design of Fig. 15a. The input shunt cannot be used for AGC at low photocurrents without ruining the sensitivity. The reason is that R , must be comparable to or less

439

LIGHTWAVE RECEIVERS

VO

I

(b)

B A S I C IC RECEIVER

THRESHOLD AGC

ip

FIG.17. Receivers with input shunt AGC: (a) general circuit, (b) FET input shunt AGC on basic IC receiver of Fig. 15a, (c) AGC characteristic. [After Williams (1982).]

than re to divert any appreciable photocurrent, and R,, a real resistance, has Johnson noise; although r e , a virtual resistance, does not. At moderate bit rates (10-100 Mb/s), re is typically 100-1000 times smaller than the feedback resistance RF; since R, 6 re, the Johnson noise is typically increased 10-30 times ((i;)’’’ K 1/R) when the shunt is used. Therefore, the shunt is turned on only when the photocurrent is large enough that the extra Johnson noise does not matter. Above this AGC threshold, R, is servoed to maintain the peak-topeak u, signal constant; below this threshold R , or Q, is OFF and any AGC a t low photocurrents must be provided by a post-amplifier stage. The resultant receiver AGC characteristic is shown in Fig. 17c.

440

GARETH F. WILLIAMS

This AGC technique is implemented in the receiver of Fig. 17b by having the AGC servo circuit vary the gate voltage of input shunt Q , to limit the output signal voltage, uo, to a value equal to the desired AGC threshold photocurrent times the feedback resistance RF of QF. For photocurrents less than the AGC threshold, u, is less than the limiting value, the gate of Q, is biased below the turn-on voltage, and Q, is OFF. For photocurrents greater than the AGC threshold, the gate voltage is servoed to vary the source-todrain resistance of Q , so that the amplitude of the u, signal is held constant. The AGC servo itself is typically just a peak detector followed by a slow integrator that drives the gate of Q, with the integral of the difference between the peak detector output and a reference voltage. Note that shunt AGC does not change the signal-frequency response of the receiver; the input pole frequency is already above the signal passband due to the virtual input resistance, r e , caused by the feedback; R, just moves the pole frequency even higher. Note also that there are no stability problems; the shunt AGC decreases the voltage feedback ratio, which actually improves the stability. In contrast, transimpedance AGC schemes, which use a variableresistance feedback element, increase the voltage feedback ratio as AGC is applied. If this variable feedback were applied around the whole amplifier, the AGC range before the onset of instability would be limited unless the forward gain A were decreased as R, were decreased. Therefore, shunt AGC is usually preferred at present, though transimpedance AGC applied around the first stage only may be used in IC versions for increased dynamic range (see Section V.F). The variable input shunt R, must usually be connected to a dc bias source so that the amplifier input bias does not change when the shunt is turned on; any change is multiplied by the voltage gain A . In high-bit-rate IC implementations the shunt bias source is typically an on-chip fixed-voltage source accurately matched to the input stage. (The matching is least critical in high-bit-rate designs because the forward voltage gain is low, due to the design scaling laws of Section V.G). In low-bit-rate designs, a slow shunt-bias feedback integrator or a digital equivalent is typically used; these also can readily be integrated on a receiver IC. This shunt AGC circuit is adequate for many applications; however, for further dynamic-range extension, the forward voltage gain A can be decreased above a second AGC threshold after the input shunt FET (Q,) has been servoed to its minimum resistance. Any voltage-gain reduction technique that preserves the bandwidth of the voltage gain and has the dynamic range needed can be used. 50:l decreases in A , with corresponding increases in the maximum photocurrent, are readily achieved.

LIGHTWAVE RECEIVERS

44 1

Note that although decreasing A decreases the current feedback, thus increasing r e , the minimum R, is much less than re and prevents signal integration by CT. For full AGC, this results in a three-stage AGC system (Fig. 18): postamplifier AGC for the lowest photocurrents (AGC-I), followed by input shunt AGC (AGC-2), followed by reduction of the forward voltage gain, A, of the receiver (AGC-3). Both the receiver and post amplifier plus the control circuitry can readily be realized on one IC. A hybrid IC version (Williams, 1982), was tested at 45 Mb/s, using a modified AT&T Technologies FT3 regenerator board (of T.L. Maione et al., 1978); a dynamic range of 54 dB optical (108 dB electrical) was achieved (Williams and LeBlanc, 1986) and the component count was less than that of the unmodified FT3 board. Another way to increase the AGC dynamic range would simply be to increase the physical size of the input shunt FET so that voltage gain reduction would be unnecessary. At moderate bit rates, re is typically a few thousand ohms (for CT 1 pF); the dynamic range without AGC is typically 20 or 25 dB. For a 50-dB plus dynamic range, the maximum photocurrent must be increased by three orders of magnitude. Thus, the minimum Rs would be only a few ohms; Q , would then be larger than the amplifier input transistor. The extra input capacitance due to Qs decreases the sensitivity; the dominant mean-square photocurrent noise terms go as C$ (Section 1II.B). The reason for this problem is that the maximum input-voltage swing without voltage-gain reduction is typically only a few millivolts in receivers designed for moderate bit rates; this is why Q, would have to be a power FET with a saturation resistance of only a few ohms to handle a maximum

-

-

RECEIVER

POST AMPLIFIER

FIG.18. Basic IC receiver with three-stage AGC.

442

GARETH F. WILLIAMS

photocurrent of only a few milliamperes. The voltage-gain reduction technique increases the input voltage swing; this drives the same photocurrent through a small, higher-resistance Q s . The capacitance of this small Qs is negligible, eliminating any sensitivity penalty. Thus, the input-shunt/voltage-gain-reductionAGC is preferred for dynamic ranges of 50 dB or more. Such dynamic ranges are presently found only in low to moderate bit-rate (less than -200 Mb/s) systems using semiconductor laser transmitters. These systems will become more common as semiconductor lasers become cheaper. Shunt AGC only is preferred for dynamic ranges of 40 dB or less, which is adequate for present LED-based systems, even at low bit rates, and for present laser-based systems at high bit rates.(High-bit-rate receivers are less sensitive and require less dynamic range for a given transmitter power.) Future lasers, however, may be more powerful and would require more dynamic range, even at high bit rates. A transimpedance AGC circuit for IC receivers is discussed in Section V.F.

E . Hybrid IC Active-Feedback Receivers Figure 19 shows a capacitive-feedback, high-sensitivity receiver circuit with input shunt AGC, as developed for hybrid IC (HIC) optical receivers (Williams, 1982, 1985). A 45-Mb/s version achieved a sensitivity of -51.7 dBm at a 1.3-pm wavelength and lo-’ BER, with a dynamic range of 54 dB optical (Williams and LeBlanc, 1986); the sensitivity was within 1.5 dB of the best 1.3-pm APD result at the bit rate (Forrest et al., 1981),the dynamic range extended to an average optical power of 1.8 mW. However, the IC versions of Section V.F will soon exceed this performance and are cheaper to manufacture. On the other hand, the HIC designs of this section are much easier to demonstrate in a laboratory and are particularly useful for early tests and models of receivers using new transistor technologies that are not yet available in IC form. At low photocurrents, the input shunt FET Qs is turned off by the AGC servo (Section V.D), and the circuit reduces to the photodiode capacitance feedback receiver of Fig. 16b. The forward voltage amplifier is a FET-BJT folded cascode with an emitter-follower output buffer (Fig. 14). This type of voltage amplifier was first disclosed by Ogawa and Chinnock (1979); it was also used in the integrating receivers of Ogawa et al. (1983), and its operation was discussed in Section 1V.D. The feedback integrator is a common base stage, Q,, with an extra capacitance, C,, from the collector to the ground, that integrates the signal. The integrated feedback signal, o f , is connected directly to the feedback resistor, R F ,and is ac-coupled by C, to the bias side of the photodiode; thus, the photodiode capacitance, Cpin,is used as part of the feedback element.

-

443

LIGHTWAVE RECEIVERS

vddl

VOLTAGE AMPLIFIER

VO

-0

TO AGC

FIG. 19. Capacitive feedback HIC receiver circuit. [After Williams (1982).]

The AGC/dynamic-range-extendercircuitry is identical to that of Fig. 18. At low photocurrents, Qs is off and AGC is provided by a variable-gain post amplifier. The AGC circuit turns Qs on and uses it as a variable-resistance input shunt for AGC at larger photocurrents where the extra Johnson noise due to Qsdoes not matter. The bias voltage for the variable shunt resistance is provided by a slow feedback integrator. At yet higher photocurrents, the resistance of Q, is at its minimum value, and the forward gain A is reduced to provide a third stage of AGC. The AGC circuit reduces A by reducing the input FET transconductance. It does this by reducing the drain voltage and drain current by reducing V,,, and the bias voltage on the base of Qztogether. (Alternatively, a dual-gate input FET could have been used.) Now consider the frequency response of the photodiode capacitance feedback amplifier without AGC. The feedback element is the resistance R F in parallel with the photodiode capacitance, Cpin,and the feedback resistor’s parasitic feedback capacitance, C,. The capacitances dominate over most of the bandwidth. Writing the total feedback capacitance as CF = Cpin+ C,,

444

GARETH F. WILLIAMS

the feedback current, if, is approximately if = C,-,do,

(53)

dt

where uf is the feedback voltage from the integrator. Writing

v,= a

s

u,dt

(54)

where u, is the output voltage and a is the feedback integrator gain constant, gives if = aC,v,. (55) aC, is the equivalent feedback conductance of the feedback element driven by the feedback integrator. Since u, = - Auin, if = - AaCFuin, and the equivalent input resistance re due to the feedback is duin

re = -= ( A d F ) - ’ dif

(56)

(57)

The receiver bandwidth is simply the input pole frequency due to re in parallel with the total input capacitance C,: 1 f,

=

--A d , -

2nC,’

If the receiver bandwidth is set equal to the bit rate, B, then A d F = 2nCTB.

(59)

This sets the minimum product AaC, of the forward gain times the feedback integrator gain-constant times the feedback capacitance. Using Cpinas part of C , typically increase C Fby about a factor of 5-10 and reduces the forward voltage gain and feedback integrator gain-constant needed accordingly. By Eq. (51), the forward voltage gain of Fig. 19 is A = gm1Rc2

9

(60)

where gmlis the transconductance of the input FET Q1 and Rc2 is the collector resistor of Q2. The gain constant of the common base feedback integrator is approximately 1

445

LIGHTWAVE RECEIVERS

where Re, is the emitter resistor of the integrator transistor Q,,and C, is the integrating capacitor on the collector of Q , . (This expression assumes that the Q, emitter input impedance is much less than Rel.) a has the dimensions of inverse seconds, as required. These circuits can readily be used up to 100 - 200 Mb/s because the voltage gain needed from vin to uf at the high-frequency bandwidth limit (i.e., at the transimpedance response pole, f,) is only -2 - 3. The extra gain needed at lower frequencies is supplied by the feedback integrator. The feedback integrator has a very wide bandwidth because the dominant collector pole is used as the integrator pole by adding the extra capacitance C,. The gain needed at f , = 1/(2nr,C,) is derived by noting that, at that frequency, the current into the total front-end capacitance, CT,is equal to the current through re, i.e., to the feedback current though the feedback capacitance, CF.Thus, the voltage gain required from uin to vf at f, is simply CT/C,. Now, C, = Cpin+ C, is essentially equal to Cpin.AS before, CT = Cpin + CFET + C,, where the stray capacitance C, (which includes C,) is small compared to Cpinand C,ET. For low noise, the size of the receiver input FET is scaled so that CFE, = Cpin+ C, (Eq. 25a); thus, CT z 2Cpin.Therefore CT/CF, the voltage gain needed from uin to uf at f,, is only - 2 - 3. At low frequencies, the parallel combination of R,, CR,and Cpinacts like a feedback resistor rather than a feedback capacitance; in addition, the feedback integrator acts like a feedback amplifier at low frequencies. Therefore, at low frequencies, the capacitive feedback receiver essentially reduces to a conventional resistive feedback transimpedance receiver. For a flat overall CR)], frequency response, the feedback zero frequency f, = 1/[27&(Cpin below which the feedback is resistive rather than capacitive, must be equal to the feedback integrator pole frequency, fp = l/(2nRclC,), below which the feedback voltage is not integrated. Typically, R,, is trimmed accordingly. This compensation is relatively insensitive to temperature. The loop stability considerations are essentially identical to those discussed in Section V.B for the micro-FET feedback IC receivers; the phase shift due to poles in the forward voltage gain A (plus any extra poles in the feedback integrator) should be less than 45" over the bandwidth of the receiver. (Since the main pole of the feedback integrator and the capacitive part of the feedback element compensate each other, they contribute no net phase shift.) F. IC Active-Feedback Receivers

-

+

This section first presents two IC active-feedback receiver examples, with a discussion of implementations in NMOS, CMOS, and GaAs IC technologies. Both examples use the micro-FET feedback basic receiver of Section V.B; they differ in the design of the voltage amplifier and in their

446

GARETH F. WILLIAMS

dynamic-range-extender/AGC circuitry. The first example (Fig. 20) uses the input shunt AGC technique of Section V.D; this technique is presently preferred for most applications. The second example (Fig. 22) uses a transimpedance AGC circuit in which the AGC element is connected around the first stage only. This section concludes by discussing two design details concerning the amplifier response. The first extends the response discussion of Section V.B to include the parasitic gate-to-channel capacitance of the feedback FET, QF; the second describes how to approximately cancel both the effect of this capacitance and of the residual nonlinearities in QF, to first order. Both the capacitive effects and the nonlinear effects in QF are readily minimized in the design of optimized single-bit-rate receivers; canceling these effects offers useful but minor advantages. However, in “universal” receivers designed to operate at many different bit rates, canceling these effects is very helpful indeed. Design scaling laws for these IC receivers are presented in Section V.G; sensitivity versus bit-rate calculations for present-technology implementations and for implementations in probable future technologies are presented in Section V.H Figure 20 shows a simplified diagram of a typical active-feedback, highsensitivity receiver IC with input shunt AGC. This analog circuit is compatible with both silicon MOSFET and GaAs MESFET fine-line digital IC technologies. From an IC production viewpoint, it looks like a very small memory.

FIG.20. Micro-FET feedback receiver IC with input shunt AGC. [After Williams (1982).]

LIGHTWAVE RECEIVERS

447

This receiver circuit is a realization of the AGC receiver circuit of Fig. 17b discussed in Section V.D. The basic micro-FET receiver comprises a voltage amplifier of gain - A , formed by Q I A through Q4, plus the micro-FET Q F , which acts like a feedback resistor (Section V.B); Q B and I, form the gate bias voltage supply for Q F . The dynamic-range-extender/AGC circuit comprises the input shunt FET, Qs, the bias voltage supply ( Q S A - Qsc) for the source of Qs, and an AGC control circuit (not shown). At low photocurrents, AGC is provided by a post amplifier (not shown); at higher photocurrents, AGC is provided by input shunt FET Qs. The third AGC stage, in which the forward voltage gain A is reduced, is omitted from this example; stage three AGC techniques will be discussed later. As mentioned in Section V.B, R, is chosen to be large enough so that its Johnson noise (Eq. 8, Section III.B.l) is small compared to the input FET noise (Eq. 20), which is the majority of the noise in a well-designed receiver. The voltage gain, A, needed to keep the photocurrent signal from being integrated by the receiver input capacitance, C,, is determined by R F , C,, and the bandwidth needed. If the receiver bandwidth is set equal to the bit rate, B, the minimum forward gain is then A = 2nR,C,B - 1 by Eq. (52) of Section V.B. Since R, scales inversely as the bit rate squared (Section V.G), the required voltage gain A is inversely proportional to the bit rate; high-bit-rate designs need less voltage gain. Thus, this receiver design readily scales to high bit rates without requiring any increase in the number of stages and without stability problems. A bit rate of 1.7 Gb/s has been reported in a production version of this receiver (Dorman et al., 1987). The voltage amplifier in the lightwave receiver circuit of Fig. 20 has three similar stages of the shunt feedback type demonstrated by Hornbuckle et al. (1981) for GaAs wide-band voltage amplifiers. Stage one is transistors QIA - Qlc; stage two is Q2, - Qzc; stage three is Q j A - Q 3 c . In silicon NMOS or GaAs MESFET ICs, the current source loads I , , - Z4 are n-channel depletion FETs with gate shorted to source (Figs. 21a, 21b); in CMOS, the loads are p-FETs for the common-source FETs and n-FETs for the sourcefollower FETs (Fig. 21c). Consider stage one. QIA is a common-source stage driving the source follower, Q l c . The source follower increases the gainbandwidth product of the common-source stage because it isolates the collector of the common-source stage from the Miller-enhanced input capacitance of the following stage. Q 1 B is a shunt feedback transistor. Without Qle, the low frequency gain of the stage would theoretically be infinite (assuming for now a perfect current-source load IIAand infinite Q 1 A drain resistance). With Q 1 B , the stage gain is the ratio of the Q 1 A size to the Q 1 B size. For example, for a typical amplifier implemented in 1-micron gate-length technology (source to drain), if Q 1 A is 500-pm wide and Q I B is 100-pm wide, the gain of the first stage is approximately 5.

448

GARETH F. WILLIAMS

NMOS (0)

GoAs

CMOS

(b)

(C)

FIG.21. Typical IC voltage gain stages.

In the lightwave receiver design of Fig. 20, the stages are all identical, except for size; they are all scale models of a common design. Thus, the closedloop quiescent input and output voltages of all the stages are equal to the same voltage. This gives several advantages. First, since the quiescent input voltage of each stage is equal to the quiescent output voltage of the stage, the common-source input transistor of each stage (e.g., Q I A ) and the shunt feedback transistor of each stage (e.g., Q l B ) have the same quiescent gate voltage, hence the same transconductance per unit gate width, so that the stage gain is set by the size ratio, as desired. This is most important for depletionmode devices, such as GaAs MESFETs, where the transconductance depends strongly on the gate voltage. Second, the dc bias calculations need be done for only a single stage rather than for a three-stage system; in addition, this type of receiver design reduces dc biasing problems caused by processing variations that change the dc parameters of the FETs on a given wafer. Third, a single such stage with input shorted to output gives a voltage equal to the quiescent amplifier input voltage. This is used as the bias source ( Q S A - QSC)for the AGC input shunt FET, Qs, in this design. (At low bit rates ( B c 100 Mb/s), a slow integrator is often preferred, as was discussed in Section V.D. This also can readily be integrated on the IC.) An output source-follower buffer, Q4, is used to avoid capacitive loading of the last stage, which would degrade the loop stability (Fraser et al., 1983). The gate bias source for the resistive feedback FET, QF,in Fig. 20 is formed by a matched bias transistor, QB, and a current source, I , . The ratio of I, to the gate width (size) of Q B determines how far Q B , hence Q F , is biased above threshold. If QF were operated in the constant drain current region, QBand QF

LIGHTWAVE RECEIVERS

449

would form a FET current mirror; though QF operates instead in the linear drain-current versus drain-voltage region at low drain voltage, the biasing action of QBis the same. Of course, I, is typically a FET current source similar to these used as the drain loads I,, - I , , for the common-source gain FETs. Note also that GaAs MESFET designs typically use a source-follower and level-shifting diodes between the drain of Q B and the gates of QB and QF. The scaling of the three stages is set by the noise considerations of Section III.B.2. The optimum-size input FET Q I A is scaled so that its gate input capacitance is equal to the rest of the front-end capacitances combined (Eq. 25a). This sets the scale of the input stage; typically the input stage transistors are a few hundred micrometers wide, and the stage draws a few tens of milliamperes. The following stages can be made smaller to reduce both the chip area and the power consumption without much of a noise penalty; as usual, most of the noise comes from the first stage. Consider an amplifier with Q I A 500-pm wide, Q Z A 100-pm wide, and Q3A50-pm wide, with a gain of 5 per stage. The mean-square equivalent input noise due to each stage is inversely proportional to both the stage size and the voltage gain squared preceding the stage. Thus, in the example, the second stage contributes 5/25 = l/5 the equivalent mean-square photocurrent noise of the first stage; the third stage contributes 10/625 = 0.016 times as much as the first stage. The total root= 1.10 times that of mean-square equivalent photocurrent noise is only the first stage alone. The second and third stage noise thus reduce the optical sensitivity by only 0.4 dB. Simpler designs for voltage amplifier stages can be used, especially at lower bit rates. For example, the source-follower buffers may often be omitted in lower-bit-rate silicon MOS designs, as in the receiver of Fraser et al. (1983).(In GaAs MESFET versions, the source-follower buffer is still needed to drive the level-shifting diodes.) The shunt feedback transistor Q B may often be omitted in GaAs or silicon NMOS stage designs, leaving an ordinary common-source , stage. Most GaAs MESFETs have a relatively low drain resistance i d which may sometimes be used as the signal load in place of the shunt feedback FET QB (Hornbuckle et al., 1981).Similarly, in NMOS amplifier stages, the n-FET active loads have an output conductance equal to their back-gate transconductance; this conductance can be used as the signal load in place of Q B , as in the receiver of Fraser et al. (1983). The IC yield then depends on how well the GaAs FET drain resistance or NMOS back gate effect are controlled. The circuit of Fig. 20 is adequate for most LED-based systems. Typical LED transmitter optical outputs into a fiber are presently on the order of 0.05-0.1 milliwatts; at a 1.3-pm wavelength this corresponds to a maximum receiver photocurrent of 0.05-0.1 milliamperes, which the input shunt AGC circuit of Fig. 20 can easily handle.

450

GARETH F. WILLIAMS

-

Semiconductor laser transmitters presently put 1 milliwatt optical into a fiber; at 1.3 pm, this corresponds to a maximum receiver photocurrent of about a milliampere. For low-bit-rate receivers (e.g., 45 Mb/s), the minimum input shunt resistance of Qs must then be only a few ohms; this means a large Qs, extra input capacitance, and lower sensitivity, as was discussed in Section V.D. However, the minimum shunt resistance needed goes up proportional to the bit rate, as will be discussed in Section V.G. Thus, the circuit of Fig. 20 is presently adequate for most laser-based systems above a few hundred megabits per second. Both laser and LED transmitter powers will increase; in addition, lasers will become cheaper and probably will be used in more low-bit-rate systems in the future. These future systems may well need more dynamic range than the two-stage AGC circuit of Fig. 20 can provide. A solution is to add a third stage of AGC to further extend the dynamic range. One third-stage AGC technique is to reduce the forward voltage gain A, once Qs has been servoed to its minimum resistance, as was discussed in Section V.D. One way to do this is to reduce the transconductance of the input FET, either by using a dual-gate input FET or (equivalently) by using a cascode input stage. The input FET transconductance is then reduced by reducing the input FET drain voltage by reducing the cascode FET gate bias and reducing the input FET drain current, e.g., by partially turning off the input stage load devices. This gain reduction technique was discussed for the hybrid IC designs of Section V.E. Other voltage-gain reduction techniques may also be used. A typical transimpedance AGC receiver circuit is shown in Fig. 22. This circuit also increases the AGC range and in addition reduces the AGC threshold. In this circuit, a variable-resistance AGC FET, Q T Z , is connected around the first amplifier stage. At low photocurrents, Q T Z is off to avoid adding excess Johnson noise to the input photocurrent; AGC is provided by the postamplifier (not shown). Q T Z is turned on only when the photocurrent is large enough so that the extra Johnson noise does not cause bit errors. Above this AGC threshold, Q T Z is servoed to provide AGC, just as in the shunt AGC circuit. The voltage available to drive the photocurrent through Q T Z is the input voltage times the first-stage gain plus one. Thus, if the first stage has a gain of 5, the maximum photocurrent is increased by a factor of 6 over the corresponding input shunt AGC circuit. In addition, the AGC threshold is = 3.9 dB because the resistance of Qrz at the AGC reduced by a factor of threshold is six times larger than that of a corresponding input shunt AGC FET; therefore, the RMS Johnson noise current is reduced by a factor of &. In the circuit of Fig. 22, the stages are all identical except for size, and operate with equal quiescent input and output voltages as before. This is the simplest way to avoid changing the amplifier operating point when Q T Z is turned on. The same idea was used in the shunt AGC design of Fig. 20; note,

-

46

45 1

LIGHTWAVE RECEIVERS

I

0

VAGC

FIG.22. IC receiver with transimpedance AGC.

however, that the output buffers on each stage are omitted in Fig. 22, which removes one high-frequency pole from inside the transimpedance AGC loop but reduces the high-frequency performance. (Note again that in GaAs designs, the buffers are needed for dc level shifting.) The transimpedance AGC receiver of Fig. 22 might appear to be only marginally stable; for some value of the AGC feedback resistance, RT, of QTZ, the RTZCT feedback pole frequency will equal the first-stage collector pole frequency. However, if the low-frequency gain of the first stage is 5, the loop gain at the double-pole frequency is only 2.5 and the phase margin at unity loop gain is 53". The feedback zero due to the input transistor gate-to-drain capacitance provides additional phase margin. Of course, the receiver would be unstable for low values of R,, if QTZwere connected around all three stages. The same would apply if Q F were used to provide AGC. ( Q F might be used to provide a limited amount of initial AGC in very-high-sensitivity, very-low-bit-rate receivers.) In order to best design these IC receivers, the parasitic feedback capacitance effects must be considered in more detail than in Section V.B. The receiver may be represented by equivalent circuit of Fig. 23 in which the resistive feedback FET, QF, is represented by a feedback resistance, R,, which is the source-to-drain resistance, shunted by a parasitic source-to-drain capacitance, Csd,and with a distributed capacitance, C, from the gate to the channel. The total input capacitance (photodiode plus input FET plus strays) is again represented by CT.

452

GARETH F. WILLIAMS

FIG.23. IC receiver equivalent circuit. [After Williams, 1985.1

The parasitic source-to-drain capacitance, Csd, appears across RF as an alternate feedback path and can cause signal integration. If the RFCsd feedback bandwidth is set at twice the bit rate B, then

For the original 45-Mb/s receiver example, with R, = 1 MR, c,d must then be less than 0.002 pF. Experimentally, csdis negligible because the gate of QFacts as an electrostatic shield between the source and the drain; the flux lines go from source to gate and from drain to gate rather than from source to drain. The gate-to-channel capacitance, C,, , forms a distributed R-C delay line with the channel resistance, rsd = RF. If the gate is at ac ground, as in Fig. 23, the effect is to reduce the high-frequency feedback and to introduce an additional lagging phase shift into the feedback loop. Both effects increase the high-frequency response; the extra phase shift also reduces the phase margin against oscillation. Solving the R-C delay line equations and setting the extra phase shift less than 30°,

For the 45-Mb/s receiver example, C,, must then be less than 0.01 pF. This means the total gate area must be tiny; as mentioned, a typical feedback FET is 1-pn wide by 2-10 pm from source to drain. The gate-to-channel capacitance effects can be canceled to first order in frequency by applying one third of the ac output voltage to the feedback transistor gate. (In MOS designs, the parasitic overlap capacitances of QB, QF, and ZB can be tailored to form a three-to-one capacitive voltage divider.) Then, if Eq. (63) is satisfied, the magnitude of the current feedback at the highfrequency end of the receiver bandwidth is 96% of the magnitude of the lowfrequency current feedback. In addition, the extra phase shift due to C,, is now leading rather than lagging, which increases the loop stability. The improved phase margin is the more important benefit in optimized single-bit-rate

LIGHTWAVE RECEIVERS

453

receiver designs and usually justifies using this technique. In “universal” receivers designed to operate at many different bit rates, this technique is important for both response and stability reasons. Finally, although QF is used as a resistor, its current-voltage characteristic is not perfectly linear, even though the AGC circuit is used to restrict the range over which QF is used (Section V.B). If necessary, the residual nonlinearity can be canceled to first order by applying half the output signal voltage to the gate of QF so that 1

ugs

= vgso

+jiphRF’

(64)

where V,,, is the quiescent gate bias and iphRFis the output signal voltage due to the photocurrent i,, flowing through the feedback resistance R F .Of course, this technique also approximately cancels the effect of the gate-to-channel capacitance; however, this technique is slightly more difficult to implement because it requires a modified gate bias circuit for QF. This technique is principally of interest for “universal” receivers that cannot be optimized for a single bit rate and for very-low-bit-rate receivers. G. Design Scaling Laws for IC Receivers

This section discusses how the IC receiver designs of Section V.F scale to high bit-rates; the following section presents calculations of IC receiver sensitivities for bit-rates between 10 Mb/s and 4 Gb/s. These calculations include sensitivities of receiver designs using present pin-photodiode and FET IC technologies, plus sensitivities to be expected from probable future receiver designs, given expected advances in pin-photodiode and FET IC technologies. These nonintegrating IC receiver designs can readily be extended to bit rates in excess of 1 Gb/s by appropriately scaling the design parameters. The feedback resistance RF was chosen so that its Johnson noise was some small fraction of the input FET noise. Assume that the total input capacitance CT due to the photodiode, the input FET, plus strays, is constant. The noise of the input FET scales as B 3 (Eq. 20); the Johnson noise of RF scales as B / R F (Eq. 8). If the ratio of these two noise sources is kept constant as the bit rate is increased, RF scales as: RF

1

BZ‘

N-

By Eq. (52),the forward voltage gain A is proportional to RF times the bit rate. Using Eq. (65) for the scaling of RF, the voltage gain A scales as 1

AmB’

454

GARETH F. WILLIAMS

Thus, the forward voltage gain needed to avoid signal integration decreases as the bit rate increases because the noise considerations of Section I11 allow RF to be decreased as the bit rate squared (Eq. 65). This permits these activefeedback IC receiver designs to be adapted to bit rates well in excess of 1 Gb/s. The allowable QF parasitic feedback capacitances, c , d and C,,, are inversely proportional to R,B [Eq. (62)and (63)]. Since R , scales as 1/B2,the allowable QF parasitic capacitances scale as Csd,

cgc

-

B*

(67)

Thus, the parasitic feedback capacitance effects diminish at high bit rates, again because noise considerations allow RF to scale as 1/B2. If input shunt AGC is used, the size of the input shunt FET is inversely proportional to the bit rate. Thus, a third, gain reduction, AGC stage is not needed in high-bit-rate designs. Assume that the transmitter optical power is the same at all bit rates and assume that the maximum output voltage swing of the receiver is the same at all bit rates. Since A is inversely proportional to the bit rate, the input voltage swing available to drive the photocurrent through Qs is proportional to the bit rate; therefore, the required conductance, hence size, of Qs is inversely proportional to the bit rate.

H . Sensitivity Calculations for Present- and Future-Technology ZC Receivers Now consider the sensitivities achievable in optical receiver designs using present pin-photodiode and FET IC technologies and in receiver designs using probable future pin-photodiode and FET IC technologies. The receiver circuits used for these calculations are micro-FET feedback IC designs of the type illustrated in Fig. 20 in Section V.F; the sensitivities are calculated using the noise expression of Eq. (35), which includes all the noise sources in the circuit. For a given FET IC technology and total input capacitance, CT,the different receiver designs for the different bit rates, B, are generated by scaling R , as l/B2 (Eq. 65), scaling A as 1/B (Eq. 66), and redesigning QF and the voltage amplifier accordingly. The sensitivity versus bit-rate calculations for typical present-technology IC receivers were presented previously in Section III.B.3, along with sensitivity results from the literature and a brief comparison of GaAs and silicon FET IC technologies. The calculations assumed a typical commercial 1-pm channellength GaAs MESFET or silicon MOSFET IC technology, and a total input capacitance, CT,of 1 pF due to the receiver-input FET capacitance CFET,the pin-photodiode capacitance, Cpin, plus any stray capacitance, C, (C, is typically due to the bonding pads, the shunt AGC FET, and the packaging.) For the silicon MOSFET IC designs, the ratio of the transconductance to the

455

LIGHTWAVE RECEIVERS

gate-input capacitance, g,,,/CFET, was taken as 70 mS/pF; for the GaAs MES FETs, gm/&T was taken as 90 mS/pF. Since C F E T was 0.5 pF, the silicon IC input FETs had gml = 35 mS; the GaAs IC input FETs had gml = 45 mS. The channel-noise-factor, r,was taken as 1.2 for the silicon FETs and as 1.5 for the GaAs FETs. The photodiode leakage current was taken as 1 nA at 20°C and 15 nA at 85"C, which is typical of present commercial InGaAs photodiodes. As was discussed, the GaAs MESFET and silicon MOSFET receiver sensitivities were essentially equal in theory; the GaAs FETs have a higher transconductance, but the silicon FETs have a lower channel noise factor. In practice, the silicon FET receivers are presently a few decibels less sensitive; this performance gap should narrow in the future. Future-technology receivers will offer superior performance due to improvements in both the FET IC technologies and the pin-photodiode technology. The principal FET IC technology improvement will be finer linewidths, which will permit shorter channel lengths. Now, to first order, the transconductance of an optimized short-channel FET is independent of the channel length from source to drain because the electrons in the channel move at the saturation velocity (Sze, 1983); however, the gate input capacitance, C F E T , is approximately proportional to the gate area and, therefore, is inversely proportional to the gate length. Thus, g m / C F E T and fT are inversely proportional to the gate length; therefore, using a finer-linewidth FET technology will reduce the principal noise terms is Eq. (39, thereby improving the receiver sensitivity. In addition, a higher fT will allow higher-bit-rate receivers, and, in lower-bit-rate designs, will increase the achievable voltage gain per stage. Since optimized 0.2-pm channel-length FETs were reported in 1983 (by and 0.3-pm FETs are now commercially available, it is only a Fichtner et d.), matter of time until 0.25-pm channel-length IC receivers are commercially practical, especially since the development of fine-line FET ICs is driven by the VLSI and VHSIC programs. The individual FETs in these future ratio and 0.25-pm IC technologies should have about four times the gm/CFET four times the fT of present 1-pm IC FETs. Therefore, g m / C F E T should be about 280 mS/pF for the 0.25-pm silicon technologies and about 360 mS/pF for the 0.25-pm GaAs technologies. The principal pin-photodiode improvement will be low-doping or reduced-area devices with very small capacitances. Reducing the photodiode capacitance Cpin allows the total capacitance C, to be reduced proportionately; as mentioned in Sections III.B.2 and V.F, the sensitivity is , = Cpin C, = maximized by scaling the size of the input FET SO that,,C 1/2C,; since the stray capacitance C, can be made negligible, CT E 2Cpi,. By inspection, reducing CT reduces all of the principal noise terms in Eq. (35), thereby improving the receiver sensitivity.

+

456

GARETH F. WILLIAMS

As discussed in Section III.B.2, the photodiode capacitance could be reduced immediately from the present -0.4 p F to -0.1 pF, simply by reducing the photodiode diameter from the present 75 pm to 30 pm. This means that CT would be reduced from the present -1 pF to -0.25 pF, simply by reducing the size of the input FET to match CFETto Cpin+ C,, and taking care to minimize C,. Note that a reduced-area photodiode can easily be coupled to present single-mode fiber by use of a microlens or a GRIN lens; the technology to efficiently couple a small-area optical device to an optical fiber has already been developed for semiconductor laser transmitters. The coupling optics should become inexpensive as laser transmitters ride down their learning curve. Eventually, the doping density almost certainly will be reduced to the point that full-size 0.1-pF InGaAs photodiodes become available; if the present rate of progress continues, this may well happen in the next five years. Future pin photodiodes will also have lower leakage currents, which will allow better receiver sensitivities at low bit rates. Kim et al. (1985) and Campbell et al. (1985) have already demonstrated long-wavelength photodiodes with a 50-pA leakage current at 20°C; it is only a matter of time until commercial InGaAs photodiodes match this performance. Figure 24 shows sensitivity-versus-bit-rate calculations for presenttechnology IC receiver designs, next-technology IC receiver designs, and future-technology IC receiver designs. The present-technology receiver calculations are taken directly from Section III.B.2 and, as mentioned, assume a I-pm channel-length GaAs or silicon FET IC technology, a CTof 1 pF, and a leakage current of 1 nA at 20°C and 15 nA at 85°C. The next-technology

-

-

FIG.24. Projected evolution of pin-FET receiver sensitivities. 1 = 1.3 pm,lo-’ BER.

LIGHTWAVE RECEIVERS

457

receiver calculations assume a 0.5-pm channel-length FET IC technology, a C, of 0.5 pF, and a leakage current of 0.2 nA at 20°C and 3 nA at 85°C. The future-technology receiver calculations assume a 0.25-pm channel-length FET IC technology, a C, of 0.25 pF, and a leakage current of 50 pA at 20°C and 0.75 nA at 85°C. The sensitivity improvements calculated for the “next” and “future” receivers of Fig. 24 can be understood intuitively on the basis of the figure-ofmerit discussion and analysis of Section III.B.2. For these two examples, the projected sensitivity improvements are equally due to the reduced gate length of the FETs and to the lower photodiode capacitances. The FET-technology figure of merit is MF,, = g,/(Cr) by Eq. (27); the optical receiver sensitivity is approximately proportional to the square root of M,,,. Since gm/C is inversely proportional to the FET channel length, the receiver sensitivity is inversely proportional to the square root of the FET channel length. Thus, using 0.5-pm FETs rather than present 1-pm FETs should increase the sensitivity by a factor of about the f i or 1.5 dB; using 0.25-pm FETs should give a 3-dB improvement. Similarly, the pin photodiode figure of merit is Mpin= 1/[4(Cp, + C,)] = 1/[.&] by Eq. (28) and thus is approximately inversely proportional to Cpinand exactly inversely proportional to C,. The receiver sensitivity is approximately proportional to the square root of Mpinand thus inversely proportional to the square root of CT. Reducing C , from 1 p F to 0.5 pF should increase the sensitivity by about 1.5 dB; reducing C, to 0.25 p F should give a 3-dB improvement. Thus, adding the sensitivity improvements due to the projected photodiode advances to those due to the projected FET advances indicates that the next-technology receivers (0.5-pm FETs, CT = 0.5 pF) should give about 3 dB better sensitivity than present-technology receivers; the future-technology receivers (0.25-pm FETs, CT = 0.25 pF) should give about 6 dB better sensitivity. This agrees reasonably well with the complete calculations of Fig. 24. Note that the calculated sensitivities in Fig. 24 scale quite accurately as K3I2, except at low bit rates. For a well-designed receiver, the principal terms in Eq. (35) for the mean-square receiver noise are the input FET noise, the noise of the load device for the input FET, and the feedback resistor Johnson noise. The mean-square noise of the input FET and its load device are both proportional to B 3 ; the mean-square noise of the feedback resistor usually is also proportional to B 3 because RF is usually scaled as 1/B2, following Eq. (65) of Section V.G. If these were the only noise terms, the mean-square noise would be exactly proportional to B 3 ;the root-mean-square noise would be exactly proportional to B3/2,and the receiver sensitivity would be exactly proportional to B - 3 / 2 . In fact, except at low bit rates, summing over these

458

GARETH F. WILLIAMS

terms usually give the sensitivity to within 0.3-0.4 dB of the exact calculations; the sum over the following stages gives a minor mean-square noise contribution (which scales approximately as B 3 ) ; the leakage current noise is proportional to B and becomes important only at low bit rates. At low bit rates, the calculated sensitivities are less than predicted by the K 3 I 2 law because of the leakage current noise and because R , cannot be increased indefinitely as l / B Z as the bit rate is decreased.

REFERENCES Abidi, A. A., Kasper, B. L., and Kushner, R. A. (1984). In “1984 IEEE International Solid State Circuits Conference Digest of Technical Papers,” p. 76. IEEE, New York. Baechtold, W. (1972). lEEE Trans. Electron Deu. ED-19,674. Brooks, R. (1980). Electron Lett. 16,458. Campbell, J. C., Dentai, A. G., Holden, W. S., and Kasper, B. L. (1983). Electron. Lett. 19, 818. Campbell, J. C., Dentai, A. G., Qua, G. J., Long, J., and Riggs, V. G. (1985). Electron Lett., to be published. Davenport, W. B., and Root, W. L. (1958). “Introduction to the Theory of Random Signals and Noise.” McGraw-Hill, New York. Dorman, P. W., Yoder, J. D., Tatsuguchi, I., Gibson, W. C., Wemple, S. H., and Owen, B. (1987). To be published. IEEE, New York. Fichtner, W., Fuls, E. N., Johnston, R. L., Watts, R. K., Weick, W. W. (1983). In “1982 International Electron Devices Meeting Technical Digest,” p. 384. IEEE, New York. Forrest, S. R., Williams, G. F., Kim, 0.K., and Smith, R. G. (1981).Electron. Lett. 17,917. Fraser, D. L., Williams, G. F., Jindal, R. P., Kushner, R. A., and Owen, B. (1983). In “1983 IEEE International Solid-state Circuits Conference Digest of Technical Papers,” p. 80. IEEE, New York. Gloge, D. C., and Ogawa, K. (1985). In “1985 Conference on Optical Fiber Communication, Digest of Technical Papers,” p. 84. Optical Society of America, Washington DC. Gloge, D., Albanese, A., Burrus, C. A., Chinnock, E. L., Copeland, J. A., Dentai, A. G., Lee, T. P., Li, T., and Ogawa, K. (1980). Bell Syst. Tech. J . 59, 1365. Short Hills, N.J. Goell, J. E. (1974). Bell Syst. Tech. J . 53,629. Short Hills, N.J. Hooper, R. C., Rejman, M. A. Z., Ritchie, S. T. D., Smith, D. R.,and White, B. R. (1980).In “Sixth European Conference on Optical Communications, Proceedings,” p. 222. Hornbuckle, D. P., and van Tuyl, R. L. (1981).IEEE Trans. Electron Devices. ED-28, 175. Kasper, B. L., Campbell, J. C., Gnauck, A. H., Dentai, A. G., and Talman, J. R. (1985). Electron Lett. 21,982. Kanbe, H., Susa, N., Nakagome, H., and Ando, H. (1980).Electron. Lett. 16, 163. Kim, 0.K., Dutt, B. V., McCoy, R.J., and Zuber, J. R. (1985). IEEE J . Quantum Electron.QE-21, 138. Kroemer, H. (1982). Proc. IEEE 70, 13. Kroemer, H. (1983).J. Vac. Sci. Technol. B1, 126. Linke, R. A., Kasper, B. L., KO,J.-S., Kaminow, I. P., and Vodhanel, R. S. (1983).Electron Lett. 19, 175. Linke, R. A., Kasper, B. L., Campbell, J. C., Dentai, A. G., and Kaminow, I. P. (1984). Electron. Lett. 20,489. Maione, T. L., Sell, D. D., and Wolaver, D. H. (1978). Bell Syst. Tech. J . 57, 1837.

LIGHTWAVE RECEIVERS

459

Matsushima, Y., Akiba. S., Sakai, K., Kushiro, Y., Noda, Y., and Utaka, K. (1982). Electron. Lett. 18,945. McIntyre, R. J. (1966). IEEE Trans. Electron Den ED-13, 164. Melchior, H., Hartman, A. R., Schinke, D. P., and Seidel, T. E., (1978). Bell Syst. Tech.J . 57, 1791. Mikawa, T., Kagawa, S., Kaneda, T., Sakurai, T., Ando, H., and Mikami, 0. (1981). IEEE J. Quantum Electron QE17,210. Miller, S. E. (1979). In “Optical Fiber Telecommunication,” (S. E. Miller and A. G. Chynoweth, eds.), Chap. 21. Academic Press, New York. Morrison, D. P. (1984). FOCLAN-84 Post-Deadline Paper. Nishida, K., Taguchi, K., and Matsumoto, Y. (1979). Appl. Phys. Lett. 35,251. Ogawa, K. (1981). Bell Syst. Tech. J . 60,923. Ogawa, K., and Chinnock, E. L. (1979). Electron. Lett. 15,650. Ogawa, K., Owen, B. and Boll, H. J. (1983). BellSyst. Tech. J. 62, 1181. Paski, R. M. (1980). Private communication. Personick, S. D. (1973a). Bell Syst. Tech. J . 52, 843. Personick, S. D. (1973b). Bell Syst. Tech. J. 52, 875. Rousseau, M. (1976). Electron. Lett. 12,478. Ruch, J. G., and Fawcett, W. (1970). J . Appl. Phys. 41,3843. Smith, D. R., Hooper, R. C., Ahmad, K., Jenkins, D., Mabbitt, A. W., Nicklin, R. (1980). Electron. Lett. 16, 69. Smith, D. R., Hooper, R.C., Smythe, P. P. and Wake, D. (1982). Electron. Lett. 18,453. Smith, R. G., and Personick, S. D. (1980), In “Semiconductor Devices for Optical Communication,” (H. Kressel, ed.), Chap. 4. Springer-Verlag, New York. Smith, R. G., Brackett, C. A., and Reinbold, H. W. (1978). Bell Syst. Tech. J . 57, 1809. Snodgrass, M. L., and Klinman, R. (1984). IEEE J. Lightwuue Tech. LT-2,968. Steininger, J. M., and Swanson, E. J. (1986). In “1986 International Solid-state Circuits Conference Digest of Technical Papers,” p. 60. IEEE, New York. Susa, N., Nakagome, H., Mikami, D., Ando, H. and Kanbe, H. (1980).I E E E J . Quantum Electron. QE-16,864. Sze, S . M. (1981). Physics of Semiconductor Devices, John Wiley, New York. Takasaki, Y., Tanaka, M., Maeda, N., Yamashita, K., and Nagano, K. (1976). IEEE Trans. Commun. COM-24,404. Williams, G. F. (1982). In “1982 IEEE International Solid-state Circuits Conference Digest of Technical Papers,” p. 160. IEEE, New York. Williams, G. F. (1985). U.S. Patent #4,540,952 (Filed 1981). Williams, G. F., and LeBlanc, H. P. (1986), I E E E J . Lightwave Technol. LT-4, 1502. Williams, G. F., Capasso, F., and Tsang, W. T. (1982). I E E E Electron Den Lett. EDL3,71.

This Page Intentionally Left Blank

A A15 compounds, 158,220-221 Abscissae, 267,268-269,271,274,300,327 Abscissa function, 259-260, 261 Active feedback receivers, see Capacitive feedback receivers; Dynamic range extenders; Micro-FET feedback receivers Advanced quadrupole optical imaging, 312-313 AGC, see Dynamic range extenders Amorphous carbon, 181-187 Amorphous hydrogenated carbon (a-C: H), 158, 181-187 Antenna receiving, 378 transmitting, 379 Aperture, 352, 363, 377 field, 345 radius, 357,359 Applications, system, and receiver requirements, 393-395 Astigmatic conjugated planes, 31 1 Astigmatic conjugation, 309-312, 313, 319, 324 Astigmatically conjugated pair, 312 Autocorrelation, 380 Averaging kernel, 65, 11 1

B Backus-Gilbert method, 64-67 Band limitation, 352 Band-limited functions, 29-31 Paley- Wiener space of, 30 Band-limiting operator, 29 Basis elements, 274-278, 284, 294 Basis-field coupling, 234, 248, 249-250, 262, 263-264,268,327 Basis form, 240, 266 Basis function, 272, 298 Basis length, 277 Basis transformation, 254,277,278,280,281285,286 Basis transport, 285

Basis variable, 240, 262, 273, 277, 279, 283, 288,327 Bessel function, 339,342 Biological tissue, 363, 372 Biorthogonal basis, 34, 39 dual basis, 34 Bipolaron, 191,205-209 Boundary, 341,364,367,369,373 condition, 365, 367,373 Boundary-value problem, 372

c Capacitive feedback receivers, 432,437-438, 442-445 with dynamic range extension, 443 frequency response, 443-445 hybrid IC implementations, 442-445 sensitivity, 431-432,442 stability, 445 Cardinal elements, 323-325 Cartesian coordinates, 340 Ceramic superconductors, see High-T, superconductors Channel filter, 397, 399,407,408 Chemical shift, 145 Chromatic parameter, 238,256 Circular symmetry, 338, 339, 350,384 Classification of quadrupole systems, 316-317 Clausius Mosotti formula, 178 Clock recovery circuit, 399-400 Coherence, 286-289,295-296 conditions, 287,289 Coherent bases, 289 Collineation matrix, 299,301, 307 Compact operator, 13-15 generalized inverse of, 59 Picard’s conditions, 15 self-adjoint, 15 singular functions, 14 singular value decomposition of, 14 singular values, 14 Complex triad matrix, 314

46 1

462

INDEX

Condition number, 10,41,60 Conducting polymers, 158, 187-215 Conductivity, 363 Conjugated polymers, 132,159, 187-215 Conjugation matrix, 3 19 Convolution, 338,345 cylindrical coordinates, 35 1 integral, 340, 357 two-dimensional, 340,346 Convolution operator, 15-16 generalized inverse of, 59 Core level excitations, 132, 144-147, 175, 204, 209-2 17,223- 225 Correlation, 380 Covariance matrix, 40 Cross section for elastic electron scattering, 126 for inelastic electron scattering, 124- 129 Current element, 372 Cylindrical coordinates, 350

Eikonal, 325 theory, 233 Electric network, 335 Electromagnetic case, 384 Electromagnetic field, 344,359 Electron-beam lithography, 257-258 Electron diffraction, 175-177 Electron-electron correlation, 189, 199 Electron mirror, 252,258,262 Electron-phonon coupling, 184, 199, 215 Energy dissipation, 374 Evaluation of linear functionals, 101-106 ill-posed, 103 well-posed, 102 EXAFS, see Extended X-ray absorption fine structure Exponential function, 337 Extended X-ray absorption fine structure, 146,175, 183

F D

Decision circuit asynchronous, 398-401 synchronous, 398-401,407 Deflection-system components, 234 Deflection vector, 234 Degrees of freedom, number of, 106-109 Delta function, 337,342, 348 Dielectric function, 128-144 effective, 143, 179-181 Differential equation of trajectories, 237-238 Diffraction theory, 342 Dipole antenna, 371 Discrepancy function, definition of, 93 Dispersion of plasmons, 132, 136-138, 160-164, 193-209 Drude-Lorentz model, 129-134 Dynamic range extenders, 432,438-443, 446-448,450-451 shunt, 438-442,443,446-448 transimpedance, 442,446,450-459 Dynamic structure factor, 126

E EELS, see Electron energy-loss spectroscopy Eigenfunction, 337 osculating, 274-275,287,288, 289,294 Eigenvalue, 337

Fermat’s principle, 233 FFT, 375 Field-basis coupling, see Basis-field coupling Field maximum, 374,376 Field representation, 246-247 Figures-of-merit, 396,411-415.455-459 and device improvements, 412,413, 455-457 FET, 415,457 IC technology, 4 13-41 5 input circuit, 41 1 pin photodiode, 412,457 Finite Stieltjes transformation, 33 Flaw, 377,380 Fluctuation-dissipation theorem, 126 Focal distance, 357,376 Focal plane, 352,356, 374, 375, 383 Focus, 348,352,356,358,360 Focusing criterion, 358, 377, 383 illumination, 348,350, 352, 358, 374, 375,377,383,385,386 Fourier transform, 335,338, 368, 383 inverse, 338,370,380, 383 one-dimensional, 338 two-dimensional, 338,368,370,383,384 Fourier transform inversion, 28- 31,42-44 with discrete data, 42-44 with limited data, 28-31

463

INDEX Frame matrix, 302-303, 307 Fraunhofer approximation, 342 field, 343 zone, 343 Fredholm integral equations of the first kind, 7, 52 Frequency domain, 341 Fresnel approximation, 342 field, 343 zone, 343 Fundamental matrix, 239-240, 241,252,253, 254,255,256,262-265,267,271,276, 294,295,297

G

GaAs FET, 396,414-417,429-430,434, 446-449.454-456 Generalized solutions, 55-63 Gram matrix, 39 Green’s function, 343

H Halfspace, layered, 367 Hankel transform, 338,341, 383 Helmholtz’ equation, 16, 18, 336, 346 Higher order aberrations, 257-258 High-impedance receivers, see Integrating receivers High-T, superconductors, 158,221-226 Hilbert matrix, 50-51 Huygens’ principle, 344 Hyperbolic state, 321-323

I Ill-conditioned problem, definition of, 10 Ill-posed problem definition of, 11 mildly, 101 severely, 101 well-behaved, 101 Illumination, 380 homogeneous, 377 Impulse response, 334, 337,340 tensor, 346,361,371,384 Impulse response function, 110-1 I3

and resolution limit, 1 I1 see also Averaging kernel Inelastic electron scattering, see Electron energy-loss spectroscopy Information theory, inverse problems and, 96-114 Inhomogeneity, 378 Initial matrix, 253, 254, 256, 264-265, 266,268 Input, 335 field, 345 illumination, 371, 386 plane, 339, 344, 348, 351, 359, 367, 371 power, 374 Integrating receivers, 390, 392,401, 424-426 coding for, 401,426,430 dynamic range, 422 equalization, 390,422,425-426 sensitivity, 424-426 Integration step, 234 Intensity, 348, 349,350 Interpolation, 44-49 of band-limited functions, 46-47 by spline functions, 47-49 Intrinsic impedance, 376 Inverse diffraction, 18-21 angular spectrum of plane waves, 21 evanescent waves, 21 from far-field data, 19-20 homogeneous waves, 21 plane-to-plane, 20-21 Inverse scattering problems, 5-6,21 Inverse source problems, 16-18 non-radiating sources, 17 Iteration cycle, 258,259, 260, 261 for trajectory calculation, 255-258 Iterative methods, 88-92 conjugate gradient, 91-92 Gerchberg- Papoulis, 89,92 Landweber-Bialy, 90-91 non-stationary, 88 as regularization algorithms, 88 stationary, 88 steepest descent, 91

K

Kirchhofl’s formula, 336 Kramers-Kronig relations, 129, 148

464

INDEX L

Laplace equation, 236,237 Laplace transform inversion, 31-34 with discrete data, 52 finite, 32 with limited data, 32 Least-squares solutions, 57 Lens calculation, consistency of, 290-294 Lens matrix, 234,276,277,278,279, 284-285,286,295-296, 327 Lindhard dielectric function, see Random phase approximation Linear combination, 338 Linear functionals, 37, 101 continuous, 37, 101 Riesz representation theorem of, 37 Linear inverse problems definition of, 6 with discrete data definition of, 36-38 examples of, 42-55 general properties of, 37-42 examples of, 16-36 general properties of, 10-16 scattering, 21-26 Born approximation, 22 for dispersed systems, 24-26 for perfectly conducting bodies, 23-24 physical optics approximation, 23 polydispersity analysis, 24 for semi-transparent objects, 22-23 Linear operator, 6 general properties of, 12-13 null space of, 12 range of, 13 Linear response theory, 126-127 Linear shift-invariant system, 337,349 Linearity, 337, 349 Line source, 379 Line-grating imaging, 296, 298,300-302,307, 308-309,311,312 Local basis, 245-246 Local field corrections from exchange and correlation, 137-139,160-164 from lattice, 139-141,193-198 Lorentz equations, 325, 326 of motion, 233,235

Loss function, 128-144 Low-pass filter, 342, 351

M Macro-lens, 269,272,287,288,289,291,293, 294,297,298 Madelung term, 145 Mapping, 260, 262 Matched filter, 350 Mathematical model, 335 Maxwell's equations, 370 Medium homogeneous, 375 stratified, 363,367,369, 372 Mellin transform, 25 inversion formula of, 25 Micro-FET feedback receivers, 432,433-436, 44-459 circuit, 433-436 with dynamic range extension, 435, 438-442,446-448 feedback, 435-436,451-453 capacitance cancellation, 452-453 nonlinearity cancellation, 453 frequency response, 434-436,452 IC implementations, 391,445-458 micro-FET design, 435,448,451-453 sensitivity, 415-417,431-432,434, 454-458 stability, 436,452 Micro-lens, 234-235, 270,275,277,278, 283, 285,286-289,291-292,294,295,296, 297-298,299,327 Micro-lens image, 270,271, 272,273 299,327 Micro-lens matrix, 292, 294 Microlithography, 326 Microwave, 374 Modulus of continuity, 97-98 convergence rate, 97 modulus of convergence, 98 Moment problems generalized, 34- 36 Hamburger, 35 Hausdore 35,49-52 moment discretization of Fredholm equations of the first kind, 52-55 Poisson transform inversion, 36

465

INDEX Moore- Penrose generalized inverse, 56-60 C-generalized inverse, 60-63 Multiple scattering, 147 N Near field, 352 Nearly-free-electron metals, 132, 160-1 67 Noise avalanche photodiode, 417-422 central limit theorem, 402,404,405 FET, 408-412 Gaussian, 402-405 input device, 406-41 1 Johnson, 406-408 leakage, 408.421-422 load-device, input FET, 413 McIntyre APD noise factor, 397, 417-418 non-Gaussian, 404-406 and device defects, 405 pin photodiode, 408,411-412,455-457 Smith-Personick theory, 397,406-41 1, 41 8-420 total, 414 Nondestructive testing, 377 Numerical derivation, 45-46,49 Numerical evaluation convolution integral, 357 Fourier transform, 375 Numerical filtering, 86 Nyquist sampling distance, 30 0

Operator, 336 Osculating eigenfunctions, 274-275,287,288, 289,294 Osculating micro-lenses, 294-296 Output, 335 field, 345 plane, 339, 344, 348, 367

P Packaging and device design, 412,413,456 and receiver sensitivity, 413,414 Paraxial region, 265-267 Parseval’s formula, 348, 349

Partial-lens, 234, 268-269, 270, 271-272, 273, 274,288,291,327 image, 269,270,271,327 matrix, 268-269, 292, 293 Permittivity, 363 Personick integrals, 407,408, 410,413, 415 Phantom matrix, 292,293,327 Phase distribution, 358 Phase-rotated step matrix, 243,244-245,253, 254,255,261,268 Phase-rotation, 243-245 Photodiodes avalanche, 391,392 germanium, 422 InCaAs/InP, 420-422 pin, 390, 391,393,408-413,455-457 projected improvements in, 393, 412-413,455-459 Photon correlation spectroscopy, 24, 3 I Plasmon, 129-144, 160-167 interband, 130, 141 intraband, 130, 141 surface, 132, 142-144, 177-178 width, 141-142, 164-165 Polarization parallel, 365 perpendicular, 366 Polaron, 191,204 Polyacetylene, 159, 188-2 17 Polyacrylonitrite, 210-21 1 Polyaniline, 158, 188-217 Pol ymethineimine, 2 10- 2 14 Polynaphtaline vinylene, 209-210 Polyparaphenylene, 158, 188-21 7 Polyparaphenylenevinylene, 159, 188-217 Polypyridinopyridine,210 Polypyrrole, 158, 188-217 Polythiophene, 158, 188-217 Power density, 360,370,383,385 Prolate spheroidal wave functions, 30 discrete prolate spheroidal sequences, 44 discrete prolate spheroidal wave functions, 43 Propagation vector, 338,364,365,366,372 Proper time, 236

Q Quadrupole conjugation, 299

466

INDEX

fields, 296 imaging, 313-317 Quadrupole lens, 234 matrix, 302, 304-305, 306 matrix, 241-242,252,255,320, 326-327 optical conjugation, 313, 317-318, 322 - 323 optical imaging, advanced, 312-313 systems, classification of, 316-317

R Radiation condition, 339,341, 365 Radiation pattern, 17 Radiation problem electromagnetic, 344 scalar, 339 Radon transform inversion, see Tomography Random phase approximation, 134- 142, 160- 167 Rare-gas bubbles, 159, 167- 189 density in, 167-177 pressure in, 167-177 Reciprocity, 379 Recursion formula, 366, 368,370 Reflection, 364 coefficient, 365 Reflectivity measurement, 378 Refraction, 364 Refractory compounds, 158,216- 220 Regularization parameter choice of, 92-96 cross-validation, 95 definition of, 83 discrepancy principle, 94 generalized cross-validation, 95 Regularization theory for ill-posed problems, 67-96 general formulation of regularization methods, 80-84 Ivanov-Phillips-Tikhonov regularization method, 68-80 extensions of, 78-80 Ivanov method, 70-73 Miller method, 75-78 Phillips method, 73-75 Regularizing algorithm for C-generalized inverse, 83 definition of, 82-83

for inverse problem with discrete data, 84 linear, 83 Relaxation parameter, definition of, 88 Reproducing kernel, 46 Hilbert space, 46 Requirements, see Applications, system and receiver requirements Residual, 88 Resolution axial, 352 spatial, 376 transverse, 380 vertical, 382 Resolution limits, 104-106 Rayleigh resolution distance, 106 resolution ratio, 106 Retarded quadrupole optical imaging, 312-313 Riemann-Lebesgue theorem, 7, 15,26 Rotation matrix, 240,287, 296, 309

5

Sampling theorem, 30 Scalar treatment, 339,348, 379 Scaling laws, circuit design, 449,453-454 Scatterer, 378 Schwarz’ inequality, 386 Sensitivity, optical future projections, 454-458 present-technology, 41 5-417,420-422 vs. bit-error-rate, 404-405 vs. dynamic range, 426,429,43 1-432 vs. noise, 402-405 Shannon number, 107 Shape resonance, 146,210-21 1 Shift invariance, 337, 348 Silicon FET, 396,414-417,434,446-449, 454-456 Slepian matrix, 43 Slepian operator, 30 Solitons, 190,203-204 Solution matrix, 242 Sommerfeld model, 134,160 see Radiation condition Spacelike mapping, 234 Spatial domain, 334, 341,351,370,384 Spatial frequency, 343

467

INDEX Spatial frequency domain, 334,338,341, 351, 357,367,370,384 Spectral function, 368, 370 Spectral windows, 84-88 for compact operators, 86 for convolution operators, 86 Gaussian window, 87 Hanning window, 87 as regularization algorithms, 85 triangular window, 87 Spectrometer for EELS, 148-157 Spectrum, 351 bandlimited, 352 incident waves, 368 reflected waves, 368 Spherical wave, 358 Spline functions, 49 Stability estimate, 97,99-101 Holder continuity, 100 logarithmic continuity, 100 Step matrix, 234,239,242-245,253, 266,268,271,276, 294, 295-296,327 Stimulus, 335 Stratification, 363, 365, 367, 369, 372, 373 Substitution matrix, 318, 322 Sum rules, 129 Superposition in-phase, 377 plane waves, 338,347, 364, 365, 369,372 Surface current density, 372 System, 335 input plane/focal plane, 349, 350, 383 input plane/output plane, 364 parallel planes, 335 System matrix, 303-305, 306-307, 309,310, 3 19- 320 differential equation for, 305-306 System theory, 333,335 T

Tangent parabola, 274-275 Tangential component electric field, 359, 367 magnetic field, 359, 269 Thin film preparation, 157-160 Third-order aberrations. 156-257.325 Time domain, 341

Tomography, 3-4,26-28 back projection, 27 limited angle, 27 object reconstruction from projections, 27 Radon transform inversion formula, 28 Transfer function, 334,337 stratified medium, 367,369 tensor, 348,370,373,375,384 Transformation matrix, 301 Transimpedance receivers, 390,422,423, 427-431,433 integrating, 429-431 sensitivity, 422,428, 429 wideband, 427-429 Transmission coefficient, 367, 369 Truncated singular function expansions, 86

U Uncertainty principle, 352 Universal. 446

V Vector character, 344, 359 Vector problem, 379 Video, 396 Voltage-amplifier receiver, 390,423-424

W

Wave evanescent, 351,352,364 homogeneous, 365 incident, 363,364,367,372,373 plane, 363, 364 plane homogeneous, 372 propagating, 364,374 reflected, 363, 364, 372 scalar, 346 standing, 374 Wave number, 351 Well-conditioned problem, definition of, 10 Well-posed problem, definition of, 10 Whittaker-Shannon expansion theorem, see Sampling theorem Wiener filters, 9-10, 104 Wronski determinant, 239

468

INDEX X

XANES, see X-ray absorption near-edge structure XPS, see X-ray induced photoelectron spectroscopy X-ray absorption near-edge structure, 144-147,183, 186, 187,209-217

X-ray absorption spectroscopy, 144 X-ray induced photoelectron spectroscopy, 144-145

Z Zone-boundary collective states, 141, 166