ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS
VOLUME 76
EDITOR-IN-CHIEF
PETER W. HAWKES Laboratoire d’Optique Electronique du Centre National de la Recherche Scientifique Toulouse, France
ASSOCIATE EDITOR
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California
Advances in
Electronics and Electron Physics EDITED BY PETER W. HAWKES Laboratoire d’Optique Electronique du Centre National de la Recherche Scientijique Tou louse. Frun ce
VOLUME 76
ACADEMIC PRESS, INC. Harcourt Brace Jovanovich, Publishers Boston San Diego New York Berkeley London Sydney Tokyo Toronto
COPYRIGHT 01989 BY ACADEMIC PRESS,INC. ALL RIGHTS RESERVED. NO PART OF THIS PUBLICATION MAY BE REPRODUCED O R T R A N S M I T E D IN ANY FORM O R BY ANY MEANS, ELECTRONIC O R MECHANICAL, INCLLJDING PHOTOCOPY, RECORDING, O R ANY INFORMATION S T O R A G E AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM T H E PUBLISHER.
ACADEMIC PRESS, INC. 1250 Sixth Avenue, San Diego, C A 92101
United Kingdom Edition published by ACADEMIC PRESS INC. (LONDON) LTD. 24-28 Oval Road. London NW1 7DX
LIBRARY OF CONGRESS CATALOG CARDNUMBER:49-7504 ISBN 0-1 2-0 14676-2 PRlNlED I N THE UNITED STATES OF AMERICA
89 YO Y1 Y2
9 8 7 6 5 4 3 2 1
CONTENTS CONTRIBUTORS ................................ PREFACE ....................................
I. I1. 111.
IV. V. VI .
VII . VIII . IX . X. XI . XI1.
XI11.
I. I1.
111.
1V. V.
VI . VII .
The Optics of Round and Multipole Electrostatic Lenses L . A . BARANOVA A N D S . YA. YAVOR Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Equations of Motion of Charged Particles and Methods for Field Distribution Calculation . . . . . . . . . . . . . . . . . . . The Basic Concepts of Paraxial Optics . . . . . . . . . . . . . . Aberrations of Electrostatic Lenses . . . . . . . . . . . . . . . . . Phase-Space Approach to Particle Beams . . . . . . . . . . . . . Current Density Distribution and Frequency-Contrast Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Round Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quadrupole Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . Two-Dimensional (Cylindrical) Lenses . . . . . . . . . . . . . . . Transaxial Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . Crossed Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Types of Lenses . Aberration Correctors . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Electron Microscopy of Fast Processes 0. BOSTANJOCLO Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interactions of Electrons with Matter . . . . . . . . . . . . . . . Modes of Electron Microscopy . . . . . . . . . . . . . . . . . . . Time Resolved Electron Microscopy . . . . . . . . . . . . . . . . Application of Real-Time Electron Microscopy to Fast Laser-Induced Processes . . . . . . . . . . . . . . . . . . . . . . . Space-Time Resolution of Real-Time Microscopy . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii ix
3 9 20 36 51 69 78 115 148 159 174 187 200 201
209 211 216 223 260 273 276 276 276
vi
CONTENTS
I. I1.
111.
IV. V. VI . VII .
High Resolution Transmission Electron Microscopy and Geology MARCELLO MELLINI . . . . . . . . . . . . . ................. Introduction Technical and Experimental Aspects . . . . . . . . . . . . . . . . Structure and Microstructure of Minerals . . . . . . . . . . . . . Structural Control Over Microstructure . . . . . . . . . . . . . . Mineral Reactions . . . . . . . . . . . . . . . . . . . . . . . . . . . Extraterrestrial Mineralogy . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
On Generalized Information Measures and Their Applications INIIER JEET TANEJA I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. Shannon’s Entropy and Its Generalizations . . . . . . . . . . . . 111. Generalized Distance Measures . . . . . . . . . . . . . . . . . . . IV. Generalized Measures of Directed Divergence . . . . . . . . . . V. Generalized Divergence Measures . . . . . . . . . . . . . . . . . VI . Generalized Entropies for Multivariate Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII . Applications to Statistical Pattern Recognition . . . . . . . . . . Entropy Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
282 284 292 297 305 317 323 324
328 329 352 353 359 368 386 410 411 415
CONTRIBUTORS Numbers in parentheses refer to the pages on which the authors’contributions begin.
L. A. BARANOVA(l), A. F. Ioffe Physico-Technical Institute, Academy of Sciences of the USSR, 194021 Leningrad K-21, USSR 0. BOSTANJOGLO (209), Optisches Institut der Technischen UniversitPt Berlin, D-1000, 12, Strasse des 17. Juni 135, Federal Republic of Germany MARCELLO MELLIN~ (281), Dipartimento di Scienze della Terra, Universita di Perugia, Italy INDERJEET TANEJA (327), Departamento de Matematica, Universidade Federal de Santa Catarina, 88.035, Florianopolis, S.C., Brazil
S . YA. YAVOR (l), A. F. Ioffe Physico-Technical Institute, Academy of Sciences of the USSR, 194021 Leningrad K-21, USSR
vii
This Page Intentionally Left Blank
PREFACE This volume of the Advances is biased strongly toward particle optics and electron microscopy. The first chapter, long enough to form a short monograph, did in fact begin life as a Russian book, in which the authors brought together much of the work of the Leningrad school led by S. Ya. Yavor, who is herself co-author with V. M. Kel’man of the standard Russian textbook on electron optics. I felt that the text of L. A. Baranova and S. Ya. Yavor deserved a wider audience, and the result is the English-language version published here. The second chapter, on the study of very fast processes in the electron microscope, is written by a specialist who has made numerous significant contributions to this difficult subject. The information that can be obtained in this way is not only of the greatest importance in microcircuit engineering but also sheds light on many fundamental physical processes. The third chapter was solicited in the spirit of a number of reviews in earlier volumes, in which we asked a specialist in a particular field to examine the benefits of using a particular technique or type of instrument. In this chapter, M. Mellini considers the contribution of high-resolution electron microscopy to geology. The wide range of examples shows convincingly how useful this technique is proving in this field. The final chapter is concerned with an aspect of statistics that is of particular relevance for pattern recognition: the use of generalized information measures. Much of the material in this article originated in the author’s own research group, and 1 am very happy to include this account in which the newer results are set in context. It is a pleasure to thank all the contributors for the trouble they have taken over their chapters. As usual, I conclude with a list of forthcoming reviews. Peter W. Hawkes
FORTHCOMING REVIEWS Parallel Image Processing Methodologies Image Processing with Signal-Dependent Noise Pattern Recognition and Line drawings Bod0 von Borries, Pioneer of Electron Microscopy IX
J. K. Aggarwal H. H. Arsenault H. Bley H. von Borries
X
PREFACE
Signal Analysis in Seismic Studies Magnetic Reconnection Sampling Theory Finite Algebraic Systems and Trellis Codes Electrons in a Periodic Lattice Potential The Artificial Visual System Concept Corrected Lenses for Charged Particles A Gaseous Detector Device for ESEM The Development of Electron Microscopy in Italy The Study of Dynamic Phenomena in Solids Using Field Emission Amorphous Semiconductors Resonators, Detectors and Piezoelectrics Median Filters Bayesian Image Analysis SEM and the Petroleum Industry Emission Electron Optical System Design Statistical Coulomb Interactions in Particle Beams Number Theoretic Transforms Phosphor Materials for CRTs Tomography of Solid Surfaces Modified by Fast Ions The Scanning Tunnelling Microscope Scanning Capacitance Microscopy Applications of Speech Recognition Technology Multi-Colour AC Electroluminescent Thin-Film Devices Spin-Polarized SEM The Rectangular Patch Microstrip Radiator Active-Matrix TFT Liquid Crystal Displays Electronic Tools in Parapsychology Image Formation in STEM Low-Voltage SEM Languages for Vector Computers
J. F. Boyce and L. R. Murray A. Bratenahl and P. J. Baum J. L. Brown H. J. Chizeck and M. Trott J. M. Churchill and F. E. Holmstrom J. M. Coggins R. L. Dalglish G. D. Danilatos G. Donelli M. Drechsler W. Euhs J. J. Gagnepain N. C. Gallagher and E. Coyle S. and D. Geman J. Huggett V. P. Il'in G. H. Jansen G. A. Jullien K. Kano et al. S. B. Karmohapatro and D. Chose H. Van Kempen P. J. King H. R. Kirby H. Kobayashi and S. Tanaka K. Koike H. Matzner and E. Levine S. Morozumi R. L. Morris C. Mory and C. Colliex J. Pawley R. H. Perrott
xi
PREFACE
Electron Scattering and Nuclear Structure Electrostatic Lenses CAD in Electromagnetics Scientific Work of Reinhold Riidenberg Atom-Probe FIM Metaplectic Methods and Image Processing X-Ray Microscopy Applications of Mathematical Morphology Focus-Deflection Systems and Their Applications Electron Gun Optics Thin-Film Cathodoluminescent Phosphors Electron Microscopy and Helmut Ruska
G. A. Peterson F. H. Read and I. W. Drummond K. R. Richter and 0. Biro H. G. Rudenberg T. Sakurai W. Schempp G. Schmahl J. Serra T. Soma et u1. Y. Uchikawa A. M. Wittenberg C. Wolpers
This Page Intentionally Left Blank
ADVANCES I N E L K 1 KONICS AND CLtCTKOh PHYSICS.VOL 76
The Optics of Round and Multipole Electrostatic Lenses
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
3
11. Equations of Motion of Charged Particles and Methods for Field Distribution
111.
IV .
V.
VI .
V11 .
VIII .
Calculation . . . . . . . . . . . . . . . . . . . . . . . . A . Equations of Motion. . . . . . . . . . . . . . . . . . . . B. Potential Distribution in Electrostatic Lenses . . . . . . . . . . The Basic Concepts of Paraxial Optics . . . . . . . . . . . . . . A . The Paraxial Equations of Trajectory . . . . . . . . . . . . . B. The Paraxial Characteristics of Electron Lenses . . . . . . . . . C . Cardinal Elements . . . . . . . . . . . . . . . . . . . . . D . The Matrix Method . . . . . . . . . . . . . . . . . . . . Aberrations of Electrostatic Lenses . . . . . . . . . . . . . . . A . Third-Order Geometrical Aberrations . . . . . . . . . . . . . B. Additional Data on Geometrical Aberrations . . . . . . . . . . C . Chromatic Aberration . . . . . . . . . . . . . . . . . . . D . Distortions Due to Mechanical Defects . . . . . . . . . . . . E. Experimental Methods for Determining Electron Optical Characteristics . Phase-Space Approach to Particle Beams . . . . . . . . . . . . . A. The Conception of Phase Space and the Liouville Theorem . . . . . B. Beam Emittance and Its Transformation in Electron Optical Systems . . C . The Beam Envelopes . . . . . . . . . . . . . . . . . . . . D . Crossover . . . . . . . . . . . . . . . . . . . . . . . . Current Density Distribution and Frequency-Contrast Characteristics . . . A. Calculation of Current Distribution in Space Beyond the Lens . . . . B. Frequency-Contrast Characteristics of Electron Optical Systems . . . . Round Lenses . . . . . . . . . . . . . . . . . . . . . . . A . Field Distribution and the Paraxial Optics of Round Lenses . . . . . B. Spherical Aberration of Round Lenses . . . . . . . . . . . . . C . Field Aberrations . . . . . . . . . . . . . . . . . . . . . D . Chromatic Aberration . . . . . . . . . . . . . . . . . . . E. Two-Electrode Immersion Lenses . . . . . . . . . . . . . . F. Einzel Lenses . . . . . . . . . . . . . . . . . . . . . . G . Multielectrode Immersion Lenses . . . . . . . . . . . . . . H . Some Applications of Round Lenses . . . . . . . . . . . . . Quadrupole Lenses . . . . . . . . . . . . . . . . . . . . . . A. Fields of Quadrupole Lenses . . . . . . . . . . . . . . . . B. The Paraxial Properties of Quadrupoles . . . . . . . . . . . .
. . . . . .
. . .
.
. . . . . .
. . . . .
. . . . . . .
.
. .
. . . .
. . . . . . . . . . .
9 9 12 20 21 24 27 30 36 36 41 44 45 47 51 52 57 61 65 69 70 75 78 79 x4 XX
92 94
. . . . .
101 110 114 115 . . . 116
. .
124
I 19x9 hy Acadcmic Prcas . Inc All rights or reproduction in any form rescrved . ISBN 0-I?-l11467h-?
Iknplt4i ~r.insl~ition copyright I
2
L . A . BARANOVA AND S . YA . YAVOR
IX .
X.
XI .
XI1 .
XI11 .
C . Quadrupole Systems . . . . . . . . . . . . . . . . . . D . Geometrical Aberrations of Quadrupole Lenses . . . . . . . E . Chromatic Aberrations of Quadrupoles. Achromatic Lenses . . . Two-Dimensional (Cylindrical) Lenses . . . . . . . . . . . . A . Optical Properties of Two-Dimensional Lenses . . . . . . . . B. The Parameters of Some Two-Dimensional Lenses . . . . . . Transaxial Lenses . . . . . . . . . . . . . . . . . . . . A . Potential Distribution and Focusing in Transaxial Lenses . . . . B. Geometrical Aberrations of Transaxial Lenses . . . . . . . . C . Chromatic Aberration . . . . . . . . . . . . . . . . . D . Transaxial Lenses Formed by Parallel Plates . . . . . . . . Crossed Lenses . . . . . . . . . . . . . . . . . . . . . A . A Three-Electrode Einzel Crossed Lens with Identical Rectangular B. Modifications of an Einzel Crossed Lens . . . . . . . . . . C. Systems of Crossed Lenses . . . . . . . . . . . . . . . D . Correctors of Geometrical Aberrations . . . . . . . . . . New Types of Lenses. Aberration Correctors . . . . . . . . . . A . Coaxial Lenses with Transverse Fields . . . . . . . . . . . B . Radial Lenses . . . . . . . . . . . . . . . . . . . . C. Correction of Geometrical Aberrations by Means of Octupoles . . D . Lenses with Partial Aberration Correction . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . .
. . . . . .
. . .
. . .
i28 139 143 148 149 . 152 159 . 161 . 165 169 . 170 174
. . . .
. . . . . . Apertures I76 . . . . 180
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . .
182 184 187
187 189
. 193 . 198 200 201
This chapter is concerned with electrostatic focusing of charged particles . The description is largely restricted to low-intensity beams. when the effect of intrinsic space charge of the beam can be neglected . Both round and astigmatic electron optical lenses are considered . Electrostatic lenses find wide application in many fields of science and technology . Along with the traditional applications (cathode-ray devices. input devices in spectrometers. etc.), there have been newer areas for their use . In recent years. methods have been devised to probe matter by charged particle beams . The data obtained in this field have been used to design instruments for effective technological control of production of microelectronic devices. One example is instruments to control solid surfaces . The problems of charged particle focusing that arise here are primarily resolved with electrostatic lenses . The number of publications on electrostatic lenses and their optimization is very large . There are a few books that tackle these questions to some extent . However. there is no monograph that covers all of the lens types of interest and that includes recent results of original research . This article considers the present state of the theory and application of electrostatic lenses. The methods for field calculation. the theory of focusing.
T H E OPTICS OF R O U N D AND MULTIPOLE ELECTROSTATIC LENSES
3
and aberrations of electrostatic electron lenses are discussed; recent data on round and quadrupole lenses are presented; and two types of astigmatic lenses, transaxial and crossed ones, which are not commonly known, are considered in detail. An outline of some new lens types and aberration correctors is also included. The article consists of 12 sections. After the Introduction, the next five sections are concerned with basic concepts relevant to all lens types: potential distribution (Section II), the theory of first-order focusing (Section 111) geometric and chromatic aberrations (Section IV) application of phase-space theory to problems of focusing charged particle beams (Section V), and current-density distribution in beams with finite phase space (Section VI). The chapters that follow (Sections VII-XII) deal with each type of lens individually, providing the basic characteristics, the methods to calculate lens systems, and ways of designing electron optical systems with corrected aberrations. Our objective is to give a coherent treatment of the theory of electrostatic lenses on a unified basis and to systematize the available data concerning their electron optical properties. The authors hope that this monograph will provide guidance to the extensive literature on this subject and will stimulate the solution of problems associated with the selection, calculation, and design of electron optical systems.
I.
INTRODUCTION
Electron lenses' are the basic components of most electron optical devices. Depending on the application, a device may also contain deflecting components, various mass and energy analyzers of the charged particles, and correctors to cancel the focusing and deflection errors. But, practically all schemes using charged particle beams include the focusing elements-lenses. The available monographs on electron optics all give much attention to lenses. It is generally believed that the history of electron lenses started with the publication of H . Busch's work in 1926 (Busch, 1926), which showed that electric and magnetic fields with rotational symmetry could focus beams of charged particles, that is, they could act as lenses. An experimental investigation of electrostatic round and two-dimensional lenses was conducted by Davisson and Calbick (1931). The further development of electron optics
' The terms electron lenses and electron optics originated early in the history of this field. They are not quite adequate. and it would be more suitable to speak of lenses for charged particles and of charged particle optics. However, for brevity and for the sake of tradition, we use here the commonly accepted terminology.
4
L. A. BARANOVA A N D S. YA. YAVOR
resulted in a detailed theory and design of both electrostatic and magnetic electron lenses. In practice, the choice between the two types of lenses is based on the ability of a lens to meet most of the requirements that a particular problem implies. Each of the two lens types possesses some advantages and disadvantages. The advantages of electrostatic lenses are their smaller weight and size, the lack of power consumption-which facilitates the stabilization and reduces the voltage supply weight-and the simplicity of the manufacture technology. They have zero response time and, for this reason, may be used to work with fast processes. Unlike iron-free magnetic lenses, electrostatic lenses provide a higher field precision. Another merit of the lenses, compared to iron magnetic lenses, is the absence of residual fields and, therefore, a better reproduction of field distribution. The optical power of electrostatic lenses does not depend on the mass of charged particles, but is determined only by their energy; so for focusing of heavy particles of moderate energies, they should be preferred to magnetic lenses. On the other hand, for the focusing of light particles, magnetic lenses possess a greater optical power, which is practically attainable at high energies. Therefore, electrostatic lenses should only be used at low and moderate energies. A traditional application of magnetic lenses is the focusing of highenergy electrons. As a rule, magnetic lenses show a lower level of aberrations. The development of electron optics has been closely related to its various practical applications. As far back as 1931, the idea of focusing charged particle beams to obtain an adequate electron optical image was realized in the first electron microscope by E. Ruska Electron microscopy has made great progress since that time. The resolution of a transmission electron microscope is about 0.1 nm, which is close to the theoretical limit. Various modifications of the electron microscope have been designed (high voltage and scanning microscopes, and others) that have found wide application in many fields. It should be noted that the recent modifications are based on magnetic focusing. The electrostatic lens is used only in the microscope electron gun to form an accelerated electron beam. In a high-voltage microscope, the electrons are accelerated by a series of electrostatic lenses. At present, ion scanning microscopes with an ion probe formed by a system of electrostatic lenses are in wide use (Levi-Setti, 1980). A schematic diagram of such a device is shown in Fig. 1. The objective and projector of the microscope are round einzel lenses. There is a field-emission ion source, with an emitter radius of 150 nm and great brightness (5 x lo3 A/cmz.sr for Ar ions). This provides a total probe current of 2 x lo-" A at a probe radius of 100 nm and an accelerating voltage of 10-20 Kv. There is another class of conventional electron optical devices that use electrostatic lenses as the focusing system-cathode-ray tubes. At present, the tubes are an integral part of various devices produced commerically, e.g., TV
THE OPTICS OF R O U N D AND MULTIPOLE ELECTROSTATIC L E N S t S
5
+u
7
5
9 1
I
f
FIG. I . Electron optical diagram of a scanning ion microscope: ( 1 ) ion source; (2) field emission cathode;(3) gas inlet;(4)liquid nitrogen;(5)objective;(6)aperture diaphragm;(7)double deflection plates; (8) projector: (9)deflecting plates; (10) sample; (1 1) detector.
6
L. A. BARANOVA A N D S. YA. YAVOR
FIG.2. Principal diagram of a wideband oscillograph tube: ( I ) electron gun; (2) einzel crossed lenses; (3) deflecting plates; (4) post-accelerating system; ( 5 ) screen.
sets, oscilloscopes, etc. Figure 2 shows a diagram of a modern wideband oscilloscope tube. Its focusing system consists of three crossed electrostatic lenses, which have partly replaced round lenses, because they increase the deflection and provide a smaller spot on the screen. Electrostatic lenses find new applications in investigations of solid surfaces. Surface research is one of the most important fields in modern science. Solid-surface physics is associated with a large number of fundamental and applied problems in microelectronics, catalysis, adhesion, friction, etc. Numerous techniques have been suggested to solve these problems, most of which are based on charged particle beams probing the surface under study. The installations that have been designed are, as a rule, sophisticated and envisage the use of several techniques simultaneously (Cherepin and Vasil’ev, 1982). A schematic representation of a microprobe for secondary ion massspectrometry (SIMS) is given in Fig. 3 (Mc’Hugh, 1975). It uses electrostatic round lenses to form the primary ion beam, which scans the specimen, and to focus secondary ions. The SIMS technique allows one to find the surface distribution of a specific chemical element with a resolution of a few microns. One should keep in mind that progress in electron optical device design is closely associated with the development of the adjacent areas of science and technology. It has been stimulated by the improvement of high-vaccum technology, fabrication of vacuum materials, design of new types of cathodes (e.g., field-emission cathodes), etc. There are several types of electrostatic lenses, each of which has found specific applications associated with its electron optical properties. Round lenses are most commonly used because the study of their properties has the longest history. This is the only type of lenses capable of uniformly converging
T H E OPTICS OF ROUND A N D MULTIPOLE ELECTROSTATIC LENSES
7
c
FIG.3. Diagram of an ion microprobe for surface analysis: ( I ) primary ion source; (2,lO) mass analyzers; (3) condensor; (4) deflecting plates; ( 5 ) objective; ( 6 ) optical system; (7) sample; (8) secondary ion focusing system; (9)electrostatic analyzer; ( I 1) ion detector.
charged particle beams in any direction, creating a correct electron optical image. All the other lenses are astigmatic, so it is necessary to use a system of lenses in order to get a point image of a point object. Two-dimensional (cylindrical) lenses converge particles in only one direction; for this reason, they are largely employed to focus ribbon beams, for example, in ion sources. Application of quadrupoles in electron optical devices started in the 1950s (Courant et al., 1952). These lenses have a high optical power and are used to focus high-energy beams. Much effort has been made to create quadrupoleoctupole correctors. A wide fan-shaped beam or a set of beams lying in the same plane can be conveniently controlled by transaxial lenses, which combine well with a prism spectrometer. Cathode-ray devices use crossed lenses, which permit correction of spherical aberration and, at the same time, possess a comparative simplicity of production and adjustment. One can see from the foregoing discussion that the applications of electrostatic lenses and, therefore, the requirements placed on them are very diverse. One or the other lens parameter may become essential for the solution of a particular problem. For instance, transportation of high-energy particle
8
L. A. BARANOVA A N D S. YA. YAVOR
beams requires a system of very high optical power. The lenses used in a microscope must form a correct image, the objective must have low spherical aberration, and the projector lens must possess low distortion. The focusing of ion beams, known for their large energy spread, requires lenses with low chromatic aberration. The various requirements have given rise to a large number of modifications of electrostatic lenses, which differ in both design and optical characteristics. At present, much work is being done to optimize the properties of the available types of lenses and to design new ones. There is a profound analogy between the light propagation is an optical medium and the motion of charged particles in electric and magnetic fields, which is based on the analogy between the Fermat principle for light propagation and the least action principle for particle motion. Many of the available electron optical devices have analogues among optical devices (microscopes, spectrometers). Their principal designs are, as a rule, similar and contain similar components (lenses, prisms). Since light optics has a long history, some of its basic results have been borrowed by electron optics. For this reason, much of the theoretical treatment of electron optical components and the terminology in electron optics are basically similar to those in light optics. However, there is a fundamental difference between the two fields of knowledge. In light optics, we deal with well-bounded homogeneous media, whose boundaries may be chosen arbitrarily. In electron optical systems, the media are essentially inhomogeneous, their transitions are continuous, and the obeys the Laplace equation, electron optical refractive index n = and thus n cannot be given arbitrarily. The refractive index variation in glass lenses is comparatively small, smaller than one order, while in electron optical systems it varies over a very wide range of values. The implications from these differences will be considered with reference to round-lens optics. Since we can take the refractive index and boundary shape of a glass lens arbitrarily, it is not difficult to make a diverging lens or lenses with corrected spherical and chromatic aberrations. This cannot be done for round electron optical lenses. Here, we have an unambiguous correlation between the field distribution in space and its distribution along the axis due to the superposition of the axial symmetry conditions on the Laplace equation, which completely defines the shape of equipotential surfaces. Therefore, there are no diverging round electron lenses or lenses with corrected spherical or chromatic aberrations. Similar conclusions can be drawn from comparison of two-dimensional (cylindrical) electron and light lenses. In some types of astigmatic electron lenses, where the field has a different symmetry, these limitations may be removed. A n advantage of electron lenses is the possibility of the electric control of their parameters, while glass optics permit only mechanical adjustments to be
,/m
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
9
made. So the requirements for the calculation accuracy of electron optical design may be somewhat lower because there is an opportunity to obtain the necessary parameters by a slight adjustment of the electrode potentials. The development of electron optics is much affected by the variety of its applications. As a result, there is a certain disintegration of electron optical studies and a variability in the terminology used, as well as some difficulties in the use of achievements in related areas. In this monograph, the theory of electrostatic lenses is treated consistently from a unified point of view; the basic characteristics of the lenses are compared to point out their common properties and specific features, together with their essential applications.
11. EQUATIONS OF MOTIONOF CHARGED PARTICLES METHODSFOR FIELDDISTRIBUTION CALCULATION
AND
The various types of available electron lenses are commonly considered in terms of the general theory of motion of charged particles in electric and magnetic fields. To determine the trajectory of a charged particle, one must know the equation of motion of a particle in static fields and the potential distribution in the lens in question. The first problem is discussed in Section 1I.A; a review of the field calculations is given in Section 1I.B. A . Equations of' Motion
The relativistic equation of motion of charged particles in an arbitrary electric field E and in a magnetic field H is as follows:
]
rnv
=
e(E
+ v x H),
where t denotes the time and v the charged particle velocity; c is the speed of light, rn and e are the rest mass and the charge, respectively. In the static case, the field E relates to the potential cp in the following way;
E
=
-grad 40.
(2)
Here, the energy conservation law allows one to relate the particle velocity and the electrostatic potential at a certain point in space:
In this equation, cp
=0
at the point where the particle velocity is zero.
10
L. A. B A R A N O V A A N D S. YA. YAVOR
In the nonrelativistic case (ti2 << c’), the denominator of the left side of Eq. ( I ) is unity, and Eq. (3) takes the form
In considering the motion of electrons, the relativistic correction should be taken into account at particle energies greater than 10 KeV; for protons the correction is at energies greater than 0.5 MeV. Equation of motion (1) in the vector form corresponds to three scalar equations that may be written in an arbitrary coordinate system. Very often it becomes possible to select the coordinate system in such a way that the field does not depend on one of the coordinates, which considerably simplifies the solution of the problem. For example, considering two-dimensional (planar) fields, it is convenient to use the Cartesian system. Then, the scalar equations of motion of a charged particle (for the nonrelativistic case) have the form
+ e ( j H Z- i H y ) , mj’ = eEy + e ( i H x - illz), mi’ = eEZ + e ( i H , jH,.).
m.Y
= eEx
(5)
-
Here and below, the dots denote the time differentiation. If the field has the rotational symmetry, in order to find the trajectories, we should select the cylindrical coordinate system ( r ,$, z) by superimposing the z-axis with the field symmetry axis. Then the equations of motion may be written as
mi’
= eE,
+ e(iH$ - r$H,).
As a rule, we are interested in the position of a particle in space, rather than in the dependence of this position on time; therefore, it is reasonable to pass from the equation of motion to the equation of trajectory. Time can be conveniently eliminated from Eq. (1) using Eq. (3). If we take the arc length of the trajectory s as an independent variable, we get
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
11
-
where E = - e / 2 m c 2 and q = (le(/2m)’’z.For the electron q z 3 x lo5 C’” kg-’’’ and E z V-’ t: is negative for positively charged particles. Generally, beams of charged particles move in a lens close to the optic axis coinciding with the z-axis. In this case, the independent variable may be the z-coordinate, which relates to s as follows: ds =
[ + (z)2 + 1
(g)2]”2d~.
Then, Eq. (7) takes the form
For the nonrelativistic case, one should neglect the terms containing t:(p in Eq. (9). To find the trajectory, Eq. (9) may be used in two ways. If the particle beams are sufficiently narrow and pass near the system axis, one can conveniently represent the field and the potential as the power series in the vicinity of the axis and extract from Eq. (9) the paraxial equation and higher order aberration equations. This approach will be discussed in detail later. When we deal with very wide particle beams or when the axial trajectory has a complicated configuration (i.e., it does not coincide with the system axis or passes in a field having no symmetry axis), it is worth using Eq. (9) in its initial form. The trajectories of charged particles can also be defined by integrating the Hamilton-Jacobi equation (see, for example, Kel’man and Yavor, 1968). Of great importance in designing and modeling electron optical systems is the similarity theory, which permits selection of the optimal size of the electrodes and poles, as well as of the field intensities, taking into account the device dimensions and the requirements on current passage and image quality. In discussing the similarity principle, we shall restrict ourselves to the nonrelativistic case. First, we shall consider how the I-fold uniform extension of the electron optical system, the p-fold increase of the electric field, and the q-fold increase of the magnetic field will affect particle trajectories. Denoting the initial values with the subscript 1 and the modified values with the subscript 2, we write
12
L. A. BARANOVA A N D S. YA. YAVOR
It is clear that for the potential we have cp ( r 2 )= plcp(rl). The equation of motion for the initial system is
while for the modified system including Eq. (10) it is
d2rl ml,dt
= epE,(r,)
+ elq
Hence, Eqs. ( 1 1) and (12) will coincide if a new variable is introduced, z = tJp,
and if the following condition is satisfied,
- _ -1. q21
Then the trajectories in the modified field will be similar to those in the initial field, with the similarity coefficient 1, and the particle velocities will be It should be remembered that the initial conditions must be multipled by changed in a similar way, that is, the initial velocity must change a t i m e s , the coordinates of the trajectory’s initial point will change 1-fold, while the trajectory inclination angle at this point must remain unchanged. Since, for a purely electric field, condition (14) is invalid, the similarity principle may be stated as follows: An 1-fold increase of the system size, together with a pl-fold increase of the electrode potentials and the respective change of the initial conditions, will not alter the trajectory configuration, producing only an I-fold increase of its scale and a a - f o l d increase of the velocity. We should also note that the n-fold change of the specific charge e l m , which preserves its sign, does not alter the particle trajectory in a purely electric field, but only multiplies the velocity by If one does change the charge sign, the change of the potential signs on all of the electrodes will leave the trajectory unchanged.
a.
A.
B. Potential Distribution in Electrostatic Lenses The solution of the equation of motion or the equation of trajectory of a charged particle requires knowledge about the potential distribution cp(r) in the interelectrode space, which is determined by the electrode potential and geometry. Finding this distribution is the most complicated aspect of the
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
13
problem. In a space free from electric charges, the function q ( r ) obeys the Laplace equation
W r )
= 0,
(15)
and it must satisfy the given boundary conditions on the electrodes. In the Cartesian coordinate system, the Laplace equation (1 5 ) has the form
In cylindrical polar coordinates, the Laplace equation is described by i a The difficulty in solving this equation is largely associated with the boundary conditions that are to be satisfied. It follows from the Laplace equation homogeneity that the electrostatic potential distribution obeys the superposition principle, which may facilitate the calculations. If the electrostatic field is created by a series of N electrodes and the potential is applied to the i-th electrode, the potential distribution has the form
d r )=
vi$i
+ v2$2 + ... + G h .
(18)
Here +hi(r)is the function of the i-th electrode effect, which characterizes the potential distribution when a unity potential is applied to the i-th electrode while the other electrodes are grounded. The change of the i-th electrode potential changes only the coefficient before the respective function t,hi in expression (1 8). Both experimental and computational techniques have been designed to define the electrostatic fields. Experimental methods include measurements in the electrolytic tank or on the resistance network (see, for instance, Kel’man and Yavor, 1968). The possibility of defining lens fields by modeling in the electrolytic tank is based on the fact that the potential distribution in an electrolyte, as well as in a vacuum, obeys the Lapface equation. Since this equation has only a unique solution with the given values for the boundary surfaces (electrodes), the potential distribution in the electrolyte and the vacuum will coincide if the boundary conditions are idential. The principle of this method is that a model electron optical system cut into the symmetry planes is immersed in the electrolyte, making the symmetry plane coincide with its surface; then the potentials are applied to the electrodes and the potential distribution on the electrolyte surface is measured with a special probe. Owing to the homogeneity of the Laplace equation, the electrode size and their potentials can be varied proportionally.
14
L. A. BARANOVA A N D S. YA. YAVOR
For modeling the fields, the continuous conducting medium may be replaced by a resistance network. Mathematically, this procedure is equivalent to the replacement of the differential Laplace equation by the finite difference equation. The boundary conditions are preset by connecting the mesh nodes, whose positions correspond to the electrode profiles, and applying the respective potential to them. The field is found by measuring the potentials at the nodes located within the given boundaries. Generally, two-dimensional networks are used, which permit modeling planar and axially symmetrical fields if an adequate combination of resistances is selected. The measurement accuracy increases with an increasing number of nodes. This method provides higher accuracy (0.01%) than the electrolytic tank; in addition, the network is easier to operate. One of the drawbacks of experimental techniques for evaluating fields is the large size of the equipment used and the low accuracy of determining the potential distribution function derivative. Lately, due to the rapid development of computer technology, experimental methods have nearly entirely given way to computational techniques. These may be subdivided into analytical and numerical methods; they have been described in many publications (see, for example, Binns and Lawrenson, 1963; Vlasov and Shapiro, 1974; Zienkiewicz, 1977; Il'in, 1974; Tsyrlin, 1977). A strict mathematical justification of numerical approaches can be found in the work of Kantorovich and Krylov (1950). A brief review of the techniques commonly used to calculate electron lenses, with examples of their application, is presented in the work of Mulvey and Wallington (1973). A traditional approach to the planar problems is the use of conformal transformations. They are applied to the fields of two-dimensional (cylindrical) lenses, quadrupoles, and multipoles of greater lengths. The method reduces to finding the harmonic functions cp(x,y ) that satisfy the Laplace equation in a two-dimentional D-region and the boundary conditions on the electrode profiles Si If the D-region also includes an infinitely remote point, these conditions must be supplemented by the requirement that the potential cp should tend to zero at infinity uniformly in all directions. An effective way of solving the problem is the construction of a complex variable function that would be regular in the D-region and would be able to perform conformal imaging onto another D*-region for which the problem solution is known. The construction of a function performing the conformal transformation of the D-region onto the D*-region may in many cases be made by combining a number of transformations given by known elementary functions. When the
T H E OPTICS OF R O U N D AND MULTIPOLE ELECTROSTATIC LENSES
15
D-region is a polygon that must be imaged on the upper half-plane, the transformation function can be found using the well-known KristoffelSchwartz integral. In this case, the polygon boundary overlaps the real D*region axis. It is essentially easier to solve the Laplace equation for the upper half-plane, because the boundary conditions are simplified. Difficulties arise when trying to relate that solution back to the real plane; they are due to the fact that the constants included in the function that performs the conformal transformation are intricately related to the lens geometry and must be found by solving a set of transcendental equations. The development of computational methods of solving combined equations permits one to overcome these difficulties and opens up possibilities for a wider application of conformal transformation to determine two-dimensional lens fields. A considerable gain can be made in the time and accuracy of these estimations compared to the numerical methods of field calculation. Another advantage of this approach, as well as the other analytical methods, is the possibility of obtaining simple analytical expressions for potential derivatives necessary for aberration calculations. One of the general approaches to solving partial differential equations is separation of the variables. However, it allows one to find the potential distribution for only relatively simple boundary conditions, for example, when the constant value of one of the coordinates describes the electrode profile. The method consists of writing the solution as the product of functions, each of which depends on only one variable of the coordinate system. As a result, the Laplace partial differential equation can be transformed into two or three ordinary differential equations related to each other by some constants. Their solution then becomes a relatively simple task. Superposition of the product of interrelated partial solutions of the equations, which satisfy given boundary conditions, is the complete solution of the Laplace equation representing the distribution p(r)in question. For illustration, we shall consider the Laplace equation in the cylindrical coordinates of Eq. (17). We shall seek a solution that is a product of the three functions: d r , $, 2) = R ( r ) Y ( $ ) Z ( z ) .
By substituting Eq. (20) into Eq. ( 1 7 ) and dividing by p ( r , $, z ) , we get 1 d -(rR’) rR dr
~
where
+ r1 - YY” + Zz” = 0, -
-
16
L. A. BARANOVA A N D S. YA. YAVOR
Each term in Eq. (21) is a function of one only variable, so Eq. (21) will be totally satisfied only if each of the terms is equal to the constant. Therefore, we may write
Here, n can take only integer real values, because partial solutions of cos n$ and sin n$ in the first equation must be periodic functions with a period of $ equal to 2n. The constant k may be either real or imaginary. Then for R(r)we get the following equation:
By introducing a new variable v equation
=
kr, we can reduce Eq. (23) to the Bessel
the solution of which for real k has the form R,(kr) = AJ,(kr)
+ BN,(kr).
(25)
Here, J,,(kr) is the n-order Bessel function of the first kind; N,(kr) is the Bessel futiction of the second kind. If k is imaginary, the solution takes the form R,(kr) = Al,,(kr)
+ BK,(kr),
(26)
where I,,(kr)and K,(kr) are the modified Bessel functions of the first and second kind, respectively. Since the functions N,(kr) and K,(kr) turn into infinity on the axis ( r = 0), the arbitrary constant B must be zero if there are no electrodes on the axis. In practice, this occurs in most electrostatic lenses used. The general solution of the Laplace equation, as was pointed out earlier, is the superposition of its partial solutions
x
[C,cos n$
+ D,sin n$]
[Ek cos hkz
+ Fk sin hkz]
for real k,
x [C,,cosn$
for imaginary k.
+ D,sinn$][Ekcoskz + F,sinkz]
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
17
For round lenses (n = 0) Eq. (27) takes the form
When the electron optical system field along the z-axis extends from - w to + x,the Fourier series transforms to the Fourier integral. This method is widely used to calculate round lenses composed of cylinders. Lately, integral equations have found wide application in calculations of lens electrostatic potentials. The method to be described below is referred to by many authors as the churye density method. It is based on the potential representation in space in the form of so-called single (or double) layer potentials, that is, in the form of an integral over the electrode surface, the subintegral function being a known kernel and an unknown density function. Equipotential electrode surfaces are replaced by surface charges with a certain density distribution. Since the potential at the system boundary IS known, the density function may be found from the conditions that the integral is equal to the potential value at any point on the boundary (or, as it is generally phrased, from the condition of boundary collocation). The first step in this method is to find the charge distribution on the electrodes. The second step is to determine the potential in the whole space. The potential cp(r) at any point of the lens space r is given by Coulomb's law:
where n denotes the number of electrodes that form the lens, gk(rk)is the charge distribution on the k-th electrode S, is its surface, the vector rk determines the point position on the surface & , and E is the dielectric constant. Then, for all values of r that belong to the surface Sjwe have
where V, is the potential on the j-th electrode. It is this integral equation that is used to find the charge distribution gk(rk).The most sophisticated part of the method is the solution of integral equation (31), which relates the applied potentials to the distribution of the surface charge density. For most lenses, the analytical solution of integral equations is impossible, so they are replaced by a set of linear algebraicequations, whose order is equal to the number of collocation points. Such a set of equations can be obtained, for example, if the terminal electrodes are divided into a finite number of intervals so that the continuous charge distribution is replaced by the discrete one. A charge equal to the product of the mean charge density within the interval and its length or area is placed at the central point of each interval
18
L. A. BARANOVA A N D S. YA. YAVOR
l piecewise-constant approximation). This procedure permits one to get an approximate solution, the accuracy of which is high enough if the number of intervals is large. However, this approach involves solutions of numerous algebraic equations that take much time and effort. There are other ways of approximating surface charge density, including bilinear or spline approximation. The charge density may be approximated with allowance for the surface geometry distincton (angles), which reduces the order of the set of equations. A symmetry of potential distribution can also facilitate the solution of the problem. Although the integral equations describing electrostatic potentials have long been known, their wide use started only with the introduction of fastoperating computers, which can provide sufficiently high accuracy and short computation time. The development of large-memory and high-speed computers and computational mathematics has stimulated a widespread of numerical methods for field solution. We shall discuss briefly two of them-the finite-difference method and the finite-element method. The principle of the finite-difference method is that one partial differential equation (Laplace equation) is replaced by a set of simple, finite difference equations. They have the form of linear algebraic equations relating the potential at each given point to the potentials at all other points surrounding the first one. The solution of the obtained equations yields the potentials at discrete points in space. In order to replace the Laplace equation by a set of finite difference equations, the lens inner space is divided into a finite number of regular meshes. In two-dimensional problems, square meshes are most often used (Fig. 4) for which the finite-difference equations have the form 'PI
+ vz + v3+ v4 - 4v0 = 0,
(32)
where 'po, 'p,, 'pz,( p 3 , 'p4 are the potential values at the lattice nodes Po, PI,P z , P3, and P4,respectively. It is seen that the order of the set of equations is equal to the number of lattice nodes. For nodes lying on the boundary, the potential values are known. If the boundaries do not coincide with the lattice used (if they have a curved geometry), the meshes adjacent to the boundary are deformed and the distance between any two neighboring nodes, one of which lies on the boundary, decreases. In this case the finite-difference equations must include additional factors for the potentials and differ from Eq. (32). When deducing Eq. (32), the assumption was made that the potential between any two neighboring lattice nodes changes linearly and that the potential 'po at node Po is influenced by the potentials at nodes PI - P4 only. The obtained set of equations is, of course, only an approximation of the Laplace equation, but in practice one can always select a sufficiently small size
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
19
FIG.4. Division of space within a lens with a square-mesh grid in the finite difference method.
of meshes for the error to lie within permissible limits. Moreover, method accuracy can be increased by introducing into the equation the potentials at four other nodes lying on the diagonal from node Po. This is a nine-point scheme, in contrast to the five-point scheme described above. If a more detailed analysis of the field in some area of the lens is necessary, one may use a finer lattice for this area. The finite-difference equations (32) have two important features: The number of equations in a particular practical problem is very large, while the number of terms in each equation is small. To solve them, relaxation or iteration methods are generally used, which are based on a stepwise approximation. The finite-difference method has the merit of being general and versatile. Its other advantage is the simplicity of algorithms used. It should be noted that the integral equation approach, possessing the same degree of generalization as the finite-difference method, has certain advantages over it. First, potential derivatives are obtained analytically rather than by numerical differentiation. Second, the order of the set of linear equations is much lower than in the finitedifference method. At the same time, the method of integral equations involves more complex and elaborate algorithms to calculate the coefficients in a set of linear equations.
20
L. A. BARANOVA A N D S. YA. YAVOR
Another numerical technique to solve the field problems is the method of finite elements. Its application turns out to be rational to calculate systems with a complex electrode geometry and also when the finite magnetic permeability in magnetic lenses is to be taken into account. The method is based on the postulation of a variational extremal principle, valid for the whole area. The solution minimizes a certain functional, which is defined as an integral of unknown functions over the whole area. The physical meaning of the functional usually corresponds to the energy stored in the system. The method reduces to an approximate minimization of the functional, for which purpose the area under study is divided by a lattice, the meshes of which, as a rule, have (in the planar case) a triangular shape. Minimization involves all the meshes attached to a given lattice node; as a result, one gets a set of algebraic equations solvable with a computer. The conclusion that follows from the comparison of analytical and numerical methods is that the latter possesses the advantage of being applicable to problems with any boundary conditions. As a rule, they provide the necessary calculation accuracy if the procedure lasts a sufficiently long time. The principal disadvantage of numerical techniques is that the solution must be repeated for each new combination of problem conditions. The methods for potential distribution calculation that we have discussed are also applicable to the estimation of the scalar potential in magnetic lenses. If it is necessary to take into account finite magnetic permeability, the calculation becomes more sophisticated.
111. THEBASICCONCEPTS
OF PARAXIAL OPTICS
This section is concerned with fundamental concepts of the theory of focusing electron optical systems creating an image. The problem is considered in terms of so-called Gaussian optics. A focusing effect is characteristic of arbitrary nonuniform fields. In the general case, charged particle beams passing through the fields are not only focused, but also deflected. If we identify the axial trajectory in the beam, strictly speaking, it will be curved. However, the electron lens proper is a system with a straight axis. Further consideration of the problem will be based on the straight axis concept, which considerably simplifies the lens theory. A general theory of paraxial focusing for static electric and magnetic fields of arbitrary types, in which the axial trajectory is curved, has been developed in a monograph by Grinberg ( 1948). To provide high-quality focusing, relatively narrow beams are used. In this case, calculations of electron optical properties can be essentially facilitated by
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
21
introducing small parameters and using well-elaborated procedures of the perturbation theory. This approach is similar to that used in light optics, so some of its results may be profitably utilized here. Before discussing particular types of lenses, we shall outline their general characteristics. In Section M A , we give paraxial equations for a charged particle trajectory with reference to both relativistic and nonrelativistic cases. The trajectory equations are analyzed in Section III.B, where it is shown that an electrostatic field can create an image. A method of constructing images in electron lenses is described in Section 1II.C. Finally, in Section 1II.D we analyze the matrix method for calculation of paraxial characteristics of multilens systems. A . The Paruxiul Equations of’ Trajectory It has been pointed out earlier that a common property of all types of lenses is the straight axial trajectory. This is generally provided by the fact that the field possesses two perpendicular planes of symmetry, whose intersection line forms the lens optic axis. A particle moving along the axis is not affected by forces perpendicular to this axis, so that the trajectory remains straight. A special case is the helical quadrupole lens (Yavor, 1968),each cross section of which has two planes of symmetry rotating along the lens optic axis; as a result, the lens as a whole does not possess any planes of symmetry. There is also an asymmetric quadrupole lens that has no planes of symmetry, but its straight axis is formed by two perpendicular planes of the field antisymmetry; on their intersection line the field is zero. Since our discussion is restricted to narrow beams of charged particles traveling close to the optic axis, the potential distribution q ( x ,y , z ) can be represented as a power series in the transverse coordinates x, y :
+ 4&)(x4
- 6x2y2
+ y 4 ) + ....
(33)
Here the coordinate planes are superimposed on the field symmetry planes, the z-axis falls along the symmetry axis, 4i(z)are the functions of z determined by the lens type and primes denote differentiationwith respect to z. Expression (33)was obtained using the Laplace equation (16);it contains only even-power coordinates because of the presence of two planes of field symmetry.
22
L. A. BARANOVA A N D S. YA. YAVOR
In the cylindrical coordinate system, the potential expansion q(r, $, z ) may be written as follows:
+ [ 4 4 ( z )- . . .] r4 cos 41,b + . . . One can easily see from Eq. (34) that the field of a round lens is characterized by the function 4(z) only. In conventional quadrupoles, possessing both symmetry and antisymmetry planes, the potential expansion contains only the . if we restrict ourselves to the series terms with 4 2 ( z ) ,&(z), and 4 4 , , + 2 ( ~ )So containing powers of r not higher than the fourth power, the quadrupole field will be characterized by only the function 4 2 ( z ) .In symmetric octupoles, the . potential distribution includes the terms with 44(z), 4 1 2 ( z ).,. . &,,+ 4 ( z ) Similar expansions of the scalar potential may also be written for the vicinity of the magnetic lens axis. In the case of superposition of deflecting fields, or of some kind of mechanical defects in the lenses, expansions (33) and (34) will have additional terms containing odd-power transverse coordinates. Furthermore, from the exact equation of trajectory (9) one can get an approximate equation by substituting into Eq. (9) the expansion of electrostatic and magnetic potentials and by then making the expansion in terms of the small trajectory slopes and coordinates. If we halt the series expansion at the first terms, we shall obtain the paraxial equations of trajectory. The small trajectory inclination indicates smaller transverse velocity components compared to the longitudinal component. Retaining only the first term in the expansion over the small inclination angles will correspond to neglecting the transverse velocity components and to replacing the longitudinal component by the total velocity. For an arbitrary electrostatic field and magnetic field having no longitudinal component, the paraxial equations in Cartesian coordinates have the form
x” + 6-x’ 4’
2u
4‘ y” + a-y’
2u
(
+ 6-u;
+ ( v:
f3-
+
- a42 - v Q U2 U 1 ’ 2
)*=o (35)
+
a42 - vf12u“2 -)Y
= 0,
U
where a = 1 + 2 ~ 4 and 1 U = 4(1 a$). The function Q 2 ( z )characterizes the magnetic field and, if the expansion of the scalar magnetic potential o(x, y, z) is restricted by the first term, we can
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
23
m(x, JJ,z ) = R2(z)xy.
(36)
write
The magnitude U is called relativistic potential. At small velocities U = 4 and Eqs. (35) take the form x”
y“
(Q
4’ + 4’ + -x’
24 + -4‘y ’
4”
+ (G
24
-
+
42 -
1,
v f i 2 4 ” 2 x = 0,
4
42 -
0=
ylR24”2
4
) )
(37) y
= 0.
Since in Eqs. (35) and Eqs. (37) the variables are separated, in the paraxial approximation we can analyze the trajectory projections on the xOz and yOz planes independently. Eqs. (35) and (37) are linear homogeneous second-order differential equations, whose properties determine all the fundamental characteristics of electrostatic systems with two planes of symmetry and of magnetic systems with two planes of antisymmetry. I n the absence of a magnetic field, Eqs. (37) take the form
4x“
+ 21 @XI + 41 4”x -
~
1
-
42.Y
= 0,
1
Therefore, in the electrostatic field the paraxial trajectory equations are also linear with respect to the axial potential 4. Nevertheless, Eqs. (35) are nonlinear with respect to 4, and, therefore, the relativistic trajectory is determined not by the relative potential distribution, but by its absolute magnitudes. Moreover, the relativistic trajectory in the electrostatic field also depends on the ratio of the particle charge to its rest mass. As a rule, the second derivative of the axial potential distribution 4” cannot be determined with high accuracy. If an electron-optical system consists of round and quadrupole lenses, it is justifiable to eliminate it from Eqs. (38) by substituting
x = x4”4,
Y = y45”4.
(39)
Then, we get X“+
[
6;
- (;)2-
Y”+ [:6(;y
-
-
$]x
+-
=
0.
Y=O.
24
L. A. BARANOVA A N D S. YA. YAVOR
Eqs. (40) provide better accuracy in finding trajectories, especially when the fields are defined by numerical methods.
B. The Paraxial Characteristics of Electron Lenses The general solution of each paraxial equation is a linear combination of two partial independent solutions. We shall select the pairs of independent solution x u , x D , and y a , y,, in such a way that they will satisfy the initial conditions in the plane z = zo
Note that x a ,x,, y,, and y, are not paraxial trajectories per se, because they satisfy the nonparaxial initial conditions and, moreover, x , and y, are dimensionless values. The projections of paraxial trajectories on planes xOz and yOz can be written as follows:
where x o , yo are the coordinates and xb, yb are the trajectory slopes in the plane z = z o . It should be kept in mind that since in the general case the trajectory equations for the planes xOz and yOz are different, x(z) and y(z) are also different. It is known from the theory of linear homogeneous second-order differential equations that their linearly independent solutions are related by the expressions (relativistic case) , j ” ~ ~ ( x , x - xuxh) ; = const,
(43) & F m Y , Y h
- Yay;) = const.
At small velocities
The expressions in brackets in Eqs. (43) and (43a) are known as the Wronskian of the linear homogeneous differential equation, which is denoted as W ( y x=
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
25
FIG.5. Formation of electron optical image. x,xi
-
xaxb). Using these notations, we can rewrite Eqs. (43a) as
f i w =a w 0 .
(44)
For independent solutions satisfying the initial conditions (41), the initial magnitude of the Wronskian Wo = 1. If the independent solutions are selected differently and if they satisfy different initial conditions, W, will have another magnitude determined by the given initial conditions. Let us consider how an electron lens produces an image. Suppose there is a point-like source of charged particles on the lens axis. Then xo = yo = 0 and in the xOz plane, for example, the trajectory projection has the form x(z) = xb.x,(z) [see Eq. (42)]. If x,(z) becomes zero at z = zix, at this point x(zix)= 0 for any xb. Therefore, the projections of all trajectories outgoing at a point on the axis z = zo will intercept after passing through the lens at the same point z = zix (Fig. 5). A planar beam emerging from the point z = zo and lying in the x0z-plane will converge at the point z = zix,so we can speak of its focusing at this point. Consider a similar beam emerging from the point P, which is at a distance xo from the lens axis. From Eq. (42) we get
It can be seen from Eqs. (45) that the trajectory coordinate in the plane z = z,, does not depend on the value xo, therefore the beam will again converge at a point lying at a distance xoxp(zix) from the axis (Fig. 5). Thus, in the case of a planar beam the system in question creates a point image of a point source, no matter whether it lies on the axis or near it. The plane z = z, is called the object plane and the plane z = zix is known as the image plane, or the Gaussian imaqe plane. It follows from Eq. (45) that the distance between the point and the axis in the image plane is xg(zix)times larger than in the object plane. The value xp(zix)is the same for all the points in the plane and is called linear magnijicication M,.
26
L. A. BARANOVA A N D S. YA. YAVOR
If the particle beam is not planar, it is necessary to also analyze the yprojections of the trajectories. Three situations are likely to occur. 1. The partial solution y,(z) does not become zero at any value z . In this case the beam emerging from a point z = z,, in the plane z = z i x is converged into a line parallel to the y-axis and is known as a line image; this is one-directional focusing. The line-image length is defined by the product 2 ~ rnaxYa(zix). d 2. The function y,(z) becomes zero at the point z = ziy # zix. Then in the plane zix the beam produces a line image parallel to the y-axis, but in the plane ziJ it forms a line image parallel to the x-axis. This two-directional focusing is referred to as astigmatic focusing. Each of the two images has its own magnification M, and My. 3. Both partial solutions pass through zero at z = z i ; this corresponds to the situation when a beam emerging from a point is again converged at a point, producing a stigmatic image. The magnifications M, and My are, generally speaking, unequal. If M, = My = M , the system forms an undistorted image.
Lenses that act on the beam differently in two perpendicular directions are called ustigmutic. Lenses or lens systems forming a point image of a point object are called stigmatic. The foregoing discussion concerns the formation of a real image but may also be extended to a virtual image. In this case the inverse extensions of the trajectories intersect, rather than the trajectories themselves. Eqs. (43) yield a fundamental law of electron optics, which describes the transmission of a charged particle beam through a focusing system. This law is known as the Helmholtz-Lagrange theorem, by analogy with a similar theorem in glass optics. If we write Eqs. (43) in the object and image planes, taking into account Eq. (41), we obtain
where M, = xP(zjX),My = yP(zi,,)are linear magnifications for the trajectory ) projections on planes xOz and yOz, respectively; and r, = xh(zi,), r, = y&(z. '? are the angular magnifications for the same projections. The angular magnification is equal to the ratio of the trajectory projection slopes in the image and object planes, respectively, if the point object and point image lie on the axis. The potential of object space is denoted by qho and that of image space by +i(here as usual, we take the potential to be zero at the point where the particle velocity is zero).
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
27
Formula (46) implies that the angular magnification is inversely proportional to the linear magnification, therefore, they cannot be decreased simultaneously if the potential ratio for object and image space is given. This can only be done by increasing the potential of image space. The Helmholz-Lagrange theorem for small velocities has the form
C. Cardinal Elements
As a rule, in calculations of electron lenses it is first of all necessary to obtain data that would permit the construction of an image created by the lens. The optical properties of electron lenses can be completely described in the paraxial region by the position of certain points and planes known as cardinal elements, as in glass optics. If the cardinal elements are known, the image construction can be considerably simplified and does not need the real trajectory tracing in the system. This becomes justifiable only if the object and its image lie in field-free space. We have pointed out above that to describe the paraxial properties of lenses it is sufficient to know two paraxial trajectories that correspond to two pairs of independent solutions of Eqs. (35)or (37). To construct the cardinal elements we shall choose two arbitrary space trajectories parallel to the optic axis on the left and on the right of the lens, respectively (Fig. 6). The interception points of the trajectory projections with the optic axis determine the positions of the foci in object space Fox and FoY and in image space Fixand All x-projections parallel to the axis in object space are described by the expression x(z) = xoxB(z).Therefore, in image space they will converge at
c,..
Hi
HO
FIG.6 . Construction of cardinal elements in a lens.
28
L. A. BARANOVA AND S. YA. YAVOR
one point F,, where xp(z,) = 0. [Here, for brevity, we have introduced the designation z F = z(&.)]. If the trajectory projections on the left of the lens are parallel to each other and have a constant slope x b , it will follow from x ( z ) = xox,(z) + xbx,(z) that at x,(zF) = 0 the trajectory projections will have the form x ( z ~=) x~x,(z,). Thus, all of them in the plane z = z F will intercept at the same point at a distance xbx,(z,) from the axis. The plane z = zF is called the focal plane of image space. A similar analysis may be made of the y-projections and of a parallel beam incident into the lens from the right; in this way we can find the positions of all the focal planes. If each projection is extended asymptotically to the lens field (solid lines in Fig. 6), the coordinates z ( H o ) and z ( H i ) of the interception points of the incident and outgoing asymptotes will determine the position of the principal planes H , and H,. The points where the principal planes intercept with the lens axis are called the principal points. In astigmatic lenses the positions of the focal and principal points in the XOZ- and y0z-planes do not coincide: @ox) # z(Foy), z(H0.x) # z(HoyX
44,) z
Z(F,,X
(48)
z(Hix) # Z(Hiy).
The distance between the focal and principal points is referred to as the focal length of the lens:
j o = z ( H 0 ) - z ( F ~ ) , 1;: = z(&) - z(Hi).
(49)
One can see that f, may be considered positive if the front focal point F, is on the left of the lens, while f, is positive with & lying on its right. It is easy to notice that the back focal length can be described by
An analogous expression can be obtained for fo from an analysis of the trajectory parallel to the axis in image space. The relation between fo and f;. is given below (57). Using these cardinal elements, one can obtain the relationships between the basic optical parameters of the system. Let I, and li denote the distance from the object and image to the respective principal planes (Fig. 6): lo = z ( H 0 ) - 20,
li
zi - z(Hi).
From the triangle similarity and the fact that the magnification M = - h i / h o , we have
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
29
Note that linear magnification for the inverted image is negative. To get larger magnifications one should use a short focal-length lens, placing the object near its focal point. Eq. (51) gives a simple relationship between the positions of the object and image (10 - fo)(li -
11)= fob.
(52)
This expression can be changed into the useful form fo 1 1 -+-=l.
10
(53)
li
Another implication of Eq. (51) is that if an object is placed in the principal plane of object space (lo = 0), its image lies in the principal plane of image space (li = 0),with transverse magnification equal to unity ( M = 1). It is clear that formulae ( 5 1)-(53) are valid for each of the coordinate planes xOz and yOz, permitting image construction in each of them. There are generally two astigmatic images that do not coincide and have different magnifications. When the fields possess rotational symmetry, the equations in the XOZ-and y0z-planes coincide and a correct image is produced. In order to establish the relationship between the focal lengths of object space fb and of image space fi, we shall employ the Helmholtz-Lagrange theorem. If xl(z) is the trajectory projection parallel to the axis in object space and xz(z) is parallel to the axis in image space, one can write (for the nonrelativistic case) m[xl(z)x;(z)
-
(54)
xz(z)x;(z)l = const.
Considering the initial conditions x;(zo) = 0, x;(zi)
J.,[XI ( z ~ ) x ; ( z ~=) I - J2i C
= 0, we
have
X(zi)x; ~ (zi)l*
(55)
The definition of the focal length gives
So we get
Electrostatic lenses are subdivided into two classes: einzel and immersion lenses. In an einzel lens, the outer electrodes have the same potential; therefore, the potentials of object and image space are equal: +o = &. Then, from Eq. (57),f o = f ; = 1:Immersion lenses are used when, in addition to focusing, it is
30
L. A. BARANOVA A N D S. YA. YAVOR
necessary to change the particle energy. In these lenses the potentials on the outer electrodes are different, therefore, the potentials of spaces on the left and on the right of the lens will also be different. When the particles are accelerated I4il > 14019 s0.L > fo. For an einzel lens, relations (51)-(53) are somewhat simplified:
In estimations of electron optical systems it is convenient to use the “thin”lens approximation. A lens is thin if its focal length is essentially greater than the field extent and the principal planes nearly coincide; in an einzel thin lens they coincide with its center. In this approximation, the transverse coordinates of a trajectory inside the lens can be assumed to remain unchanged, but only the inclination angle changes. In expressions (58) and (59) the lengths 2, and li are measured from the object and the image to the lens center. So far, in our description of image construction involving the cardinal elements we have assumed that both the object and image lie outside the lens field. This limitation is, however, unnecessary when the object is not a real source of charged particles, but is the image produced by the preceding electron optical system. Then, one usually speaks of a virtual object, which may lie within the lens field. In an electron microscope, for example, it is the objective that forms the image of a real object, while the images of virtual objects are created by intermediate or projector lenses. If by a virtual object we mean an object produced by the asymptotic extension of the incident trajectories, the above approach to image construction will be valid for this case too. In some cases a real charged particle source is placed within the lens field. This happens when a large magnification is required, for which purpose one should use a strong short focal-length lens, whose focal point lies within the lens field, and place the object near this point. An example of such a lens is an electron microscope objective. Here, only part of the lens field contributes to image formation, varying with object position. The cardinal elements are, in turn, dependent on object position, so they become less important.
D. The Matrix Method Electron optical elements are rarely used singly, but as a rule form a system. Such systems may include, in addition to lenses, deflecting and analyzing elements. Calculation of a trajectory transmitted by a complex
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
31
system is a fairly time-consuming task. The use of matrix algebra, however, essentially simplifies the procedure. The application of matrix methods to calculate electron optical systems has already been described by others (Banford, 1966; Kotov and Miller, 1969; Steffen, 1965; Yavor, 1968; Hawkes, 1970). The possibility of using matrix algebra for electron optical calculations is based on the linearity of differential equations of trajectory in the paraxial approximation. Therefore, an electron optical element makes a linear transformation of the incident transverse shifts and slopes to their emergent values. The linear transformation coefficients depend on the type of element, its geometry, and electrical parameters. A sequence of electron optical elements and field-free (drift)spaces between them can be considered as a series of linear transformations. If each linear transformation is described by a matrix, the series can be represented by the matrix product. The transverse shifts and slopes of a trajectory emerging from a lens can be expressed in terms of incident parameters of the same kind:
.; X2
= XlX,(ZZ)
+ .y;Xe(z2)>
Y2 = YIY&2)
= x,-xJz2,
+ .x;X:(z2),
Y;
= YlY;(Z,)
+ Y;Yc(z,), + Y;Y:(z,).
(60)
Here the subscript 1 stands for the incident plane; the subscript 2 corresponds to the emergent plane; and x,, x, and y,, y , are two pairs of independent solutions of the paraxial equations satisfying the initial conditions in the incident plane: Xp(Z1) =
Y&A
xb(zl) = y;(zl)
=
1,
= 0,
X&l)
= Y&l)
Xi(Z1) =
= 0,
yi(z1) = 1.
We can write these expressions in matrix form:
(;) (;) The columns containing
and
.Y
and
(“y
x, -x:)(xl), x;
(? Y,
YP y:)(;;).
= x,,
= XI,
y and y’ may be regarded as vectors
(61)
32
L. A. BARANOVA A N D S. YA. YAVOR
represent second-order square matrices. Linear transformation by an electron optical element in the vector form is x2
= TxX1r
Y2
=
(66)
TYl.
Note that the matrix, unlike a determinant, is an operator, not a number. In order to find x2 or y,, it is necessary to multiply each term of the upper matrix row by the column
(:)
or by
(::>
and then add the products; to find xi or
y; one should perform a similar operation with the bottom row. It is easy to show that the transformation matrix determinant in this case is equal to unity if the system does not change the particle energy (42= 4,).The determinant of each matrix (65) is equal to the Wronskian W of the respective differential equation. From the initial values of the independent solutions (61) we find the initial Wronskian to be equal to unity (W, = 1). Then, it follows from Eq. (44)that W2 = W, = 1. This result can be used to verify the matrices and to determine the fourth matrix element if three others are known. In a drift space the trajectory is a straight line. If the drift space length is equal to d , the coordinates and slopes at its end are:
Hence the drift space matrix has the form T=(i
f).
The matrix for an einzel electron lens expressed in terms of cardinal elements is 1
- - cz2 - Z ( J 9 1
*
I = [_ -1
f
f
it
7cz2 - Z(F,)lcz, - Z ( F 0 ) l
-cz1 1
f
-
+1
Z(F0)l
If the equations of trajectory in the XOZ- and y0z-planes are different, the matrix for each of the planes will be different too and can be obtained by substituting the corresponding cardinal elements into Eq. (69). Between principal planes, and at a thin lens (zl = z, = 0), Eq. (69) takes the form
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
33
The matrix of the electron optical system that transforms the coordinates and slopes of a trajectory from the object plane to the image plane can be written as follows:
It has been pointed out that the matrix of a complex system is the matrix product of all the system elements, including the drift space matrices. The matrix multiplication reduces to the multiplication of the constituent members (elements) according to a certain law. Each element has two indices: The first one is the row number; the second is the column number, at the interception of which lies a given element. The product of matrix A (with elements aik) and matrix B (with elements b,) is matrix C = AB, each element cij of which is equal to the “product” of the i-th row of the first matrix A and the ,j-th column of the second matrix B:
where n is the number of elements in a row or column, called the matrix order. Matrix multiplication does not possess the commutative property: A B # B A . The matrix that describes the first of the systems acting on a particle must occupy the right-hand position in the product. Let us introduce the inverse matrix T-’that allows the incident values of the coordinates and slopes to be expressed in terms of their emergent values. It is clear that consecutive applications of the T and T-’ matrices to a trajectory does not change it. So, their product represents a unit matrix E , in which the principal diagonal is filled by ones, with all the other elements equal to zero:
T-’T
=
E.
(73)
Let us write the inverse matrices for an electron optical system that does not change the particle energy, keeping in mind that the direct matrix determinants ( 6 5 )are equal to unity. It is easy to see that
(74)
34
L. A. BARANOVA A N D S. YA. YAVOR
Electron optics often uses systems whose field distributions are mirror symmetrical. Let us find the relation between their transformation matrices. If the first system is characterized by the field distribution function 4 z 2 , - z), the second system has U ( Z ~ , ~ z ) = ~ ( - 2z). Here, ~ ~ the subscripts 1 and 2 stand for the incident and emergent coordinates (see above); the subscripts I and I1 refer to the first and second systems. The trajectory in the second system will be symmetrical to the trajectory in the first system if it emerges from the symmetrical point and is oriented in the opposite direction at the same angle. So we can write
+
Here, the subscript r denotes the mirror transformation matrix; the coordinates and slopes satisfy the relations x(zllJ = x(z2J and x'(z,',) = -x'(z2'), that is, we have the identity
An analogous identity can be written for x(zIII).Then we can get a relation between the direct and mirror matrices T = ( '0
-1 " ) T ; ' ( ' 0
").
-1
(77)
Hence, using the expression for the inverse matrix (see Eqs. (74) and (65), we can find:
Thus, a mirror matrix differs from a direct matrix only in the transposition of the elements on the principal diagonal. Let us consider the matrix of a system symmetrical with respect to its center. It is evident that the second half of the system is a mirror replica of the first, and, therefore, the total matrix has identical diagonal elements: and only two of the four elements are independent.
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
35
We shall also derive the commonly used formula for the optical power of a system of two lenses. For simplicity, we shall assume the lenses to be thin and the distance between their centers to be d. Using formulae (68) and (70) and multiplying the corresponding matrices, we shall have the T matrix of the system
I-- d
\
fl
fz
flf2
1’
fz/
wheref, is the focal length of the lens nearest to the object and f 2 is the focal length of the second lens. Hence, the optical power of the system 1/F is 1
1
F
fl
---
1 +----.
d
fi
fif2
When the lenses cannot be assumed to be thin, formula (8 1) remains valid, but d is the distance between the principal planes of the lenses. From Eq. (81) we conclude that the system optical power is close to the sum of optical powers of the constituent lenses only when d << fi.
If the system consists of one diverging and one converging lens and if their optical powers are the same (in the absolute values), fl = - f 2 = f , the system as a whole will possess a converging property: 1
d
F -flfi’ The convergence in a system of two thin lenses is determined by drift space between them. Having passed through the first lens and drift space, the particles approach the system axis if the first lens is convergent, or they move away from the axis if the lens is divergent. In both cases the particles pass through the focusing lens at a larger distance from the axis (i.e., where the fields are stronger) than through the defocusing lens. We can also write a formula for the linear magnification M of a system consisting of K optical elements k
M = n M i . i= 1
(83)
Matrix algebra becomes especially profitable in calculations of multiplets. In particular, matrix methods are widely used to design transport systems containing mostly quadrupoles and deflectors.
36
L. A. BARANOVA A N D S. YA. YAVOR
Iv. ABERRATIONS OF ELECTROSTATIC LENSES If the beams of charged particles do not satisfy the paraxial condition, that is, if particles travel too far from the optic axis and form large angles with it, the image they form is distorted due to geometrical aberrations. The aberrations may be reduced by introducing apertures, but this decreases the beam intensity and is not always acceptable. Moreover, small apertures give rise to additional image defects due to electron diffraction. The theory of geometrical aberrations allows us to find the distortions and to determine the dimensions of the apertures at which these distortions are within acceptable limits. These aspects of the problem are discussed in Sections 1V.A and 1V.B. In addition to geometrical aberrations, there are errors known as chromatic aberrations, by analogy with light optics. These arise from the fact that the effect of an electrical field on high-energy particles is not as strong as on low-energy particles, so that if a beam consists of particles of different energies, a point image in the Gaussian plane looks like a spot of finite size. This question will be considered in Section 1V.C. There are other factors responsible for poor focusing, for example, mechanical defects in electron lenses, external field, and electrode contamination. A brief account of these questions will be given in Section 1V.D. Finally, Section 1V.E describes experimental techniques to determine optical characteristics of lenses, including their paraxial properties and aberrations. A . Third-Order Geometrical Aberrations
Aberrations are the differences A x and A y between the real trajectory coordinates and the trajectory coordinates calculated in paraxial approximation. In the Gaussian image plane we have
It may happen that none of the real trajectories emerging from a given point of the object coincides with the theoretical trajectory derived from paraxial optics. Aberration calculations usually employ the perturbation method, for which the trajectory equations retain not only linear terms, but also high-order terms with respect to the transverse coordinates and slopes. The equations do not contain even-power terms due to the presence of two fieldsymmetry planes. We must, therefore, consider third-order terms for calculating the primary (third-order) geometrical aberrations. In this case, the
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
37
non-relativistic trajectory projection on the x0z-plane is given by the equation I 2
4x" + - 4'x'
+ 41 4"x -
-
42x
where we have transposed all the nonlinear terms to the right-hand side. The equation for the y-projection can be derived from Eq. (85) by replacing x y, 42 -+ - 42 . Similar equations have been given (Hawkes, 1970)for the relativistic case; they are valid for a wider class of systems that contain magnetic quadrupoles and octupoles. The variables x and y are not separated in the third-order trajectory equations, in contrast to the paraxial equations. Expression (85) and the corresponding expression for the y-projection are nonlinear second-order differential equations. Following the successive approximation method, we substitute the general solution of linear homogeneous equations (38) into their right-hand parts. As a result, we get linear inhomogeneous equations, whose solutions yield third-order geometrical aberrations. The projections Ax and Ay of the aberrational image broadening are expressed as the differences between the corresponding general solutions of the homogeneous and inhomogeneous equ.ations. It was pointed out in Section 111 that the general solution of a homogeneous equation is a linear combination of two partial, independent solutions. Therefore, for the paraxial trajectory we can write
Here x,(z), x&z) and y,(z), y 6 ( z ) are the linearly independent solutions of Eq. (38); A,, . . . ,A , are arbitrary constants that can be unambigously defined by the coordinates and slopes of the trajectory in an appropriate reference
38
L. A. BARANOVA A N D S. YA. YAVOR
FIG.7. A trajectory given by the coordinates of its intersection with the object and aperture planes.
plane. In Section 111, in finding the solutions of x u , xg and y,, y p , the reference plane was the object plane. It is also possible to use two reference planes, the coordinates or the slopes of which are specified. The aberration broadening depends on the shape of the beam, which in turn is generally determined by the aperture. The aberration blurring is, therefore, usually expressed as a polynomial in x , y coordinates in the object z = zo and aperture ( z = z B )planes, respectively (Fig. 7). The linearly independent solutions x , ( ~and ) x,,(~)then have the form shown in Fig. 8, which satisfies the following conditions in the object and aperture planes.
FIG.8. Linearly independent solutions x y ( z )and x,(z) satisfying the conditions of Eqs. (87)
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
39
The constant A , = x ( z B )= x B , and the constant A , = x ( z o ) = xo. Analogous expressions can be written for y(z). After substitution of the general solutions of homogeneous equations (86), Eq. (85) takes the form
The function f ' ( z )contains partial solutions of paraxial equations and the functions 4Jz) characterize the field distribution, so it depends on one variable z ; it also includes four constants: xo, x B , yo, y,. The aberration broadening is the difference between the values of the trajectory calculated to the third-order accuracy and the paraxial trajectory. Aberration equation (88) can be solved by varying the arbitrary constants:
where C is an arbitrary constant. A similar derivation can be drawn for the y projection. By grouping the identical power terms of transverse shifts in the object and aperture planes in Eq. (89), we shall obtain a general expression for the aberration broadening. For the Gaussian image plane it has the form2 =
'Yi
+ B 2 x x B y i + GlxxixO + G Z x y i x O + G 3 x x B y B y 0 + E L x x B x i + E2xxByi + E 3 x y B x O y 0 + Dlxxi + D Z x x O y O2,
Blxx;
+ B 2 y x i Y B + GlyyiyO + G 2 y x i y 0 + G 3 y x B y B x 0 + ElyyBy; + E2yyBxi + E 3 y x B x O y 0 + DlyY: + &yXiYO.
(90)
= Blyy;
It can be seen that Axi and Ayi contain third-order terms with respect to the transverse coordinates. For this reason, the image defects they describe are called third-order uberrations. In the general case, aberrations of an electron optical system with two planes of symmetry depend on 20 coefficients,some of which may be zero in particular systems. When a coefficient is not zero, there is a certain kind of distortion, so each coefficient is associated with its own aberration.
' Different ways of writing down the aberration broadening in terms of aberration coellicients have been suggested. A common factor is introduced in some of them-linear magnification. These coefficients differ from those defined in Eq. (90)by the factor value.
40
L. A. BARANOVA A N D S. YA. YAVOR
FIG.9. The origin of spherical aberration.
The coefficients Bi describe the spherical aberration, Gicorrespond to coma, Ei define astigmatism and image field curvature, and Di refer to distortion. It is seen form Eq. (89) that the geometric aberration coefficients can be expressed as quadratures. The aberration coefficients derived from Eq. (90) vary with the object and aperture plane positions. There are positions of the aperture for which some of the aberration coefficients become zero. It should be noted that the spherical aberration cannot be corrected by an aperture shift. One can see from Eq. (90)that the spherical aberration does not depend on xo and yo, that is, it is the same for all points of the object, being the only aberration type for a point object located on the lens axis. Figure 9 shows the trajectory projections for two particles emerging at different angles from a point on the axis. As a rule, the lens field grows too rapidly with distance from the axis, therefore, the farthest trajectories experience a greater refraction, intercepting the axis nearer to the lens than the paraxial trajectories. The other aberrations depend on the distance between the object point and the lens axis. In case of a small angular divergence of the beam and of a comparatively large object size, the main role is played by distortion, which distorts the image shape but does not cause blurring. Aberrations make different contributions to image distortion; this depends on the construction of the electron optical device and on the function of each of its lenses. For example, an electron microscope objective creates the image of an object of small size but with large angular divergence of the emerging particles, so here the spherical aberration predominates. In intermediate and projector lenses, the dominant aberration is distortion, because they deal with the virtual object magnified by the objective, but they have a smaller beam divergence. Thus, the overall spherical aberration of the microscope is determined by the objective spherical aberration, while the subsequent lenses contribute largely to the distortion. In some cases, for instance, in calculations of current density distribution, it is necessary to know the values of geometrical aberrations not only in the Gaussian image plane, but also in an arbitrary plane behind the lens. They can
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
41
be obtained from Eq. (89), where z is the coordinate of the plane of interest. Sometimes, the angular aberrations Ax' and Ay' must also be known. They can be found from Eq. (89) by differentiation with respect to z. B. Additional Data on Geometrical Aberrations It may happen that an electron optical system has no aperture diaphragm. Then it is justifiable to express the lens aberrations in terms of the coordinates x,,, y o and the slopes xb, yb of the particles in the object plane. In this case, in order to find the aberration coefficients, it is necessary to substitute the paraxial trajectory expression (42) into the right-hand part of Eq. (85). The partial solution of the obtained inhomogeneous equation will be written in the form
The coefficients derived here represent some linear combinations of the coefficients considered in Section 1V.A. Taken separately, they cannot characterize any of those aberration types. One exception is the coefficients with the third powers of the inclinations, which are proportional to the spherical aberration coefficients. The magnitude of geometrical aberrations depends on the position of the object plane, so when the plane is shifted, the quadratures used to find the aberration coefficients have to be recalculated. Repeated integration can be avoided by expressing the coefficients as linear combinations of quadratures independent of object position but defined only by the lens field (Harting and Read, 1976; Hawkes, 1980).Just like the calculation of the cardinal elements, this procedure should involve two pairs of independent solutions of paraxial equations corresponding to the trajectories, one of which is parallel to the lens axis in object space and the other in image space. In the x0z-plane the partial solution parallel to the axis in object space is the function xs(z)(41); the partial solution parallel to the axis in image space is denoted as xl(z). On the right of the lens the solution xn(z)will satisfy the following conditions: X1(Zi) =
Hence, in the object plane
1,
X>(Zi)
= 0.
(92)
42
L. A. BARANOVA A N D S . YA. YAVOR
The function x,(z) defined by the conditions of Eq. (41) can be expressed in terms of x&) and xi(z)
-%(4= ( X A - M;'x/I).fo,.
(94)
The partial solution yn(z)can be introduced in a similar way. By substituting into Eq. (91)expression (94)for x,(z) and the corresponding expression of y,(z), we obtain the aberration coefficients written in the form of polynomials in the powers of M i 1 and M;' . The powers of these polynomials are at most of fourth order, since (91) includes products of partial solutions containing at most four terms. The quadratures in the polynomial coefficientsdo not depend on the object position but are determined only by the lens parameters. The dependence on the object position is introduced in the aberration coefficients in terms of magnifications. This method of aberration calculation is versatile, but in some cases it does not provide the necessary accuracy, due to the subtraction of nearly equal quantities in the polynomial evaluation (Di Chi0 et al., 1974). Since in practical applications lens systems are usually preferred to single lenses, it is important to know the principle of aberration summation. Down to the third-order accuracy, the total aberration of the system is a sum, each term of which is equal to the product of a single lens aberration and linear magnification of the subsequent part of the system. So, if a system consists of k lenses, the aberration
Here M , is the total magnification, M x j is the magnification of the j-th lens, and Axj is the aberration of the j-th lens in the image plane of this lens. We shall illustrate this with the summation formulae for the spherical aberrations in an aperture-free system consisting of k lenses. In the Gaussian image plane AX = M x ( C 1 , x f
+ CzXxbyt).
(96)
The value C , , and Czxare generally called the spherical aberration constants of an electron optical system. They are expressed in terms of the constants of individual lenses C I x and j C Z xas j follows:
&
Here M.rtj- ,) = IIiL: MXi is the magnification of the part of the system located in front of the j-th lens. It is easy to make a similar conclusion about the aberration in the yOzplane. The summation formulae for all coefficients of geometrical aberrations
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
43
have been given by Baranova and Ovsyannikova (1971). Trajectory calculations involving third-order aberrations in an arbitrary plane of a multiplet electron optical system have been described by Ovsyannikova and Shpak (1 978). It is clear that the image quality of a system can be improved by decreasing the aberration coefficients.The necessary condition for complete correction of any kind of aberration is to reduce these coefficients to zero. This means that the aberration coefficients of the system components must have opposite signs [for example, see (97)]. Scherzer (1936) has shown the spherical aberration of a round lens to have a constant sign; therefore, in systems consisting of round lenses this aberration cannot be compensated. This conclusion may be extended to cover two-dimensional (Pierce, 1954) and quadrupole lenses (Yavor, 1968). The analysis from which this conclusion was drawn has been used to outline the directions in which to search for systems with corrected spherical aberrations (Scherzer, 1947). These are the application of special nonaxially symmetric elements and high-frequency lenses, the designing of lenses with the space charge distributed along the aperture radius according to a certain law, and the creation of some singularity in the potential distribution function along the optic axis or in the first derivative of this function. The first approach, namely, correction of geometrical aberrations using octupoles, has been studied in more detail than the other approaches; this will be described in Section XII. When third-order aberrations are small or have been corrected, the problem is to estimate higher order aberrations, primarily, fifth-order aberrations. For this, the trajectory equation must be written up to fifth-order accuracy and solved as they were earlier, using the successive approximation method (e.g., Ovsyannikova et al., 1968; Yavor, 1968). The above procedure for calculating aberrations is commonly known as the t r j r c t o r y method (Grivet, 1972).An alternative approach is based on the eikonul method (Glaser, 1952)borrowed from light optics; however, it is not as visual. In recent years computational methods have been widely used to design electron optical systems. They permit two approaches to the calculation of aberration characteristics: calculation of aberration coefficients in terms of the theory described above and direct integration of the exact equation of motion. The latter has the advantage of including all the aberrations present in the system. The aberration value is defined as the difference between the exactly calculated nonparaxial trajectory and the trajectory derived from the paraxial equation. However, this involves errors resulting from the subtraction of nearly equal quantities. Besides, numerous preliminary designs must often be calculated and compared before the electron optical system can
44
L. A. BARANOVA A N D S. YA. YAVOR
be optimized, so it is more convenient to use the values of the aberration coefficients themselves rather than the values Ax and Ay. Finding the coefficients from Ax and Ay involves additional computational difficulties. Special computer algebraic languages have lately been developed for deriving expressions for the geometric aberrations (e.g., Hawkes, 1980). Computer algebra allows us to avoid tedious routine computations and is particularly useful for calculating the higher order coefficients or the aberrations in complicated electron optical systems. One example is the CAMAL program for computing the relativistic expressions of geometrical aberration coefficients (Hawkes, 1977) C . Chromatic Aberration
Chromatic aberration may be due to the initial particle energy variation, oscillations of the accelerating voltage, and potential fluctuations on the lenses. The initial energy spread is associated with the particle emission conditions and energy losses during beam passage through the matter. In electron beams emerging from a thermionic source, the energy spread is relatively small, on the order of several tenths of an electron volt. In the case of photoelectron emission, it is as high as several electron volts. In ion sources, the initial ion energy spread is much larger, and in some new types of sources it is as large as a few hundred electron volts. This explains the increasing interest in the study of chromatic aberrations and correction methods. The chromatic aberration can be calculated using the successive approximation method, similar to that employed to calculate geometrical aberrations. If the potential fp(z) in an electrostatic lens has changed by a small value, Afp, the paraxial trajectory equations (38) take the form [fp
1 + A f p ] ~ ” + -21 4 ’ ~ +’ -$“x 4
[fp
1 + A f p ] ~ ”+ zfp‘y‘ + 41 4 ’ ’+~ fp2y = 0.
- 4
2 =~ 0,
(98)
Considering that A 4 is small, we transpose the small values to the right-hand side of the equation and express x” in this part of the equation in terms of the unperturbed paraxial equation (38) so that fpx“
fpy”
+ 1 fp‘x‘ + 1 fp“, -
-
2
4
1
+ 2 fp”” + -
-
1 - fp”y 4
fp2x = -
4
+ 4)2y = !!!4!
1 fp’x’ + - fp”, 4 2 -
[I_
2
1
-
fp x , 2
1
1
qYyf + 4 fpt’y + fp2y .
(99)
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
45
Expressions (99) are transformed into linear inhomogeneous second-order differential equations after substitution of the general solutions of the paraxial equations (38) into their right-hand sides. The chromatic aberration broadening in the Gaussian image plane can be found from Eqs. (99) by using the method of variation of parameters: AX,
= M,(C,,xb
+C
A$
x )-, *x
O
$0
Here the coefficients C,, and C, characterize the axial chromatic aberration; C,, and C, correspond to the chromatic aberrations of magnification. One can see that the axial chromatic aberration is proportional to the primary beam divergence angle and is the same for all points of the object. The chromatic aberration of magnification occurs only for the off-axis points of the object and is proportional to the distance between these points and the lens axis. The coefficients C, and C, are expressed in terms of quadratures; the integrands include the first terms of the potential expansion in the lens and the partial solutions of the paraxial equations. They will be described in more detail in the chapters on the particular types of lenses, together with the analysis of the sign of the chromatic aberration and possible ways of correcting this aberration. It seems nearly unfeasible to treat these questions in general terms. One should note that the chromatic aberration can also be expressed in terms of the trajectory coordinates in the object plane xo and the aperture plane xB [instead of xo and xb in Eq. (loo)]. In some cases, the chromatic aberration of magnification can be reduced to zero by suitably choosing the aperture position.
D . Distortions Due t o Mechanical Dejects
Focusing quality is greatly influenced by mechanical imperfections of an electron lens, such as inaccurate electrode machining and misalignment, which results in the shift, inclination, and rotation of electrodes relative to each other. Additional distortions may arise from misalignment of lenses, producing axial mismatch and rotation. External fields and electrode contamination may also affect focusing. These factors distort the field distribution, as a result of which the potential expansion (33) includes additional even- and odd-power terms. These terms are generally small, so the focusing errors they produce may be considered to be aberrations and may be calculated by the method described
Y
z
20 I
I I
I
I
earlier. It is clear that the lower the power of an extra term in the potential expansion, the greater its effect. It seems possible to relate each type of mechanical defect to a certain term in the field expansion, the only difficulty being the numerical expression of this relation. Axial astigmatism associated with imperfect rotational symmetry of the lens is the most serious mechanical aberration in round lenses. The potential distribution here has the form 1 4(r,i,b,z) = 4 ( z ) - -$"(z)[l 4
- 6(z)cos21//]r2
+ ...,
(101)
where the deviation from the axial symmetry is described by the function 6(z). Figure 10 shows a schematic trajectory path in the lens, with axial symmetry imperfection that may have been due, for example, to elliptic apertures in the electrodes. One can see that each point of the object is represented not by a point, but by two short lines in image space, which are perpendicular to each other and are spaced apart along the z-axis. The measure of axial astigmatism is the radius of the smallest circle lying between these lines. It is proportional to the beam divergence angle and to a constant expressed in terms of S(z). We should stress that it is a nontrivial problem to find the relation between 6(z) and the mechanical defects. This problem has been discussed in a number of publications; the appropriate references can be found in (Grivet, 1972). It has been shown experimentally that even small machining and alignment defects produce essential distortion of the image due to axial astigmatism. So, higher accuracy is not always justifiable; special stigmators have to be used to correct this type of aberration. It follows from Eq. (101)that an adequately oriented weak quadrupole or any other astigmatic lens may serve as a stigmator. For example, Rang (1949)used a sextapole as a stigmator to electrically change the orientation of the quadrupole field component.
T H E OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
47
Lens shift relative to the beam is a common defect that gives rise to linear terms with respect to the transverse coordinates in the potential distribution, that is, a weak deflecting field arises, shifting the image and producing secondorder aberrations. When the opposite electrodes in quadrupole lenses move apart or closer to each other, this results in the disappearance of the planes of antisymmetry and the appearance of axially symmetric and octupole field components. A slight rotation of the electrodes around the axis produces an additional quadrupole rotated through 45". This field distortion essentially contributes to the aberration in quadrupole systems (Kawakatsu et al., 1968). It can be corrected by introducing a weak quadrupole with the optical power of the opposite sign, whose planes of symmetry form 45"with the planes of symmetry of the system. Second-order aberrations arise when two opposite electrodes of a quadrupole are shifted relative to the axis as a single whole. In lenses whose fields can be considered as two-dimensional with high accuracy (cylindrical lenses or long quadrupoles), the relation between the mechanical defects and potential distribution can be found using conformal transformations. For magnetic quadrupoles this has been done by Doynikov (1966). The results can be extended to electrostatic lenses as well, if the electrode and pole profiles coincide, because the magnetic permeability was assumed to be infinite in the calculations. The adjustment errors in a quadrupole doublet (variation of the distance between the lenses, mismatch of the lens axes, and rotation of the lenses relative to each other and with respect to the transverse axes) have been studied using the matrix method by Kartashev and Kotov (1976). E . Experimental Methods jbr Determining Electron
Optical Characteristics
Experimental methods are widely used to determine the optical characteristics of electron lenses. For lenses with complex field configurations, experimental techniques are often less time consuming and are sufficiently reliable, so they should be preferred to numerical methods. Studies on electron lenses are generally made on an electron optical bench, which represents a vacuum chamber containing an electron gun, the optical system to be studied, and a transparent fluorescent screen. If necessary, the chamber may also contain a diaphragm with apertures, precise grids, and some other accessories. A Faraday cylinder is often used to measure the current distribution. The bench usually permits movement of the lens and other devices without breaking the vacuum. The installation must be protected from external magnetic fields by a magnetic screen or compensating coils.
48
L. A. BARANOVA A N D S. YA. YAVOR
The simplest way to determine the first-order electron optical properties of a round lens is to produce a point image of the gun crossover on the screen and to measure the distances from the object and the image to the lens at given electrode potentials. For an einzel thin lens, one measurement at the given potentials is sufficient to find the cardinal elements [see Eqs. (58) and (59)]. In all other cases, it is necessary to make measurements at two different positions of the lens at the same potentials in order to calculate the focal lengths, the positions of the principal planes, and the magnification [see Eqs. (50)-(53)].In each measurement the screen must be matched with the image position. One of the most accurate approaches used in the first-order optics and aberration measurements is the shadow method, or the two-grid technique. The schematic diagram of the method is shown in Fig. 11. There the image should not coincide with the screen plane, for it is then difficult to perform the measurement with reasonable accuracy. On the screen we get a shadow projection of two grids placed on both sides from the lens and illuminated by a beam of electrons from a point source. Using the shadow of a grid located between the lens and the screen, we find the distance from the screen to the image:
where s is the mesh size of the screen grid, s1 is the size of the mesh shadow image, and b is the distance between the grid and the screen. For the image position Q we get Q=B-el. (103) From the mesh shadow image of the cathode grid, the angular magnification of the lens can be found as follows:
Hence, using Eq. (47), we can calculate the linear magnification M . In order to increase the measurement accuracy, several (not one) meshes must be used; however, the working area to be measured at the screen should not be too large so that the aberrations could be neglected. The object position P and the image position Q are related to the cardinal elements, as can be seen in Fig. 11 and in expression (51):
P
=
so -z(F~)- -,
M
Q = z(&) - f i M .
(105)
Here, the coordinate origin coincides with the lens center, the value P is considered positive if the object is on the left of the lens, and Q is positive for
FIG.
1 1. Determination of the optical characteristics of a lens by the shadow projection method: (0)point source; (GI,G,) grids; (L) lens; (S) screen
50
L. A. BARANOVA A N D S. YA. YAVOR
the image on the right of it. It should be remembered that the focal length signs are defined in the same way, while the coordinates z ( F ) and z ( H ) are defined in the conventional way, that is, they are negative for all the points on the left of the lens center. Equations (105) contain four unknowns, however, one measurement permits one to make up only two equations, so for determining the cardinal elements it is necessary to make measurements at two different positions of the lens, keeping the electrode potentials unchanged. If we mark the values of the first and second measurements with the subscripts 1 and 2, respectively, we find
Note that focal lengths are related via Eq. (57), which may be useful for verification. For an einzel symmetrical lens, one measurement is sufficient; from Eq, (105) we derive
If the lens length is much smaller than its focal length, the data processing can profitably involve thin-lens approximation. Then z(H,) = z(H,) = 0; the focal length is described by f=-
PQ P+Q‘
(108)
The shadow method also allows us to measure the lens aberrations. Let us discuss in some detail the spherical aberration measurement. From the shadow projection of the screen grid, we can find the value e , , that is, the distance from the screen to the interception point of a nonparaxial trajectory and the optic axis:
where d is the distance between the optic axis and the shadow projection of the outer boundary of the n-th mesh and n is the number of grid meshes to be measured. I t is easy to get the longitudinal sphericaf aberration from here: Az
= e2 - e l .
(1 10)
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
51
To calculate the spherical aberration constant, it is necessary to know the inclination tangent for the incident trajectory tqp,, . It can be determined by a cathode grid, counting the number of meshes m of its shadow image that fit d. The quantity m may be either an integer or a fraction. Then
Using Eq. (96), we obtain the spherical aberration constant
To increase the spherical aberration accuracy, the screen grid should be placed close enough to the image produced by the lens to fit four to six meshes into the working area at the screen. The procedure we have described is adequate for finding the cardinal elements and spherical aberration of round lenses. In astigmatic systems, the first-order optics can be found by making the above measurements in two mutually perpendicular directions passing through the optic axis. One should keep in mind that the measurement accuracy is much lower in a divergent plane than in a convergent one. Since the spherical aberration of astigmatic lenses is described by four coefficients [see Eq. (96)], its complete description requires measurements not only in the directions passing through the optic axis, but also in the parallel off-axis directions. Thus, to find spherical aberrations of astigmatic lenses, the above procedure must be repeated four times. Some other approaches to aberration measurement have been considered by Grivet (1972).
V. PHASE-SPACE APPROACH TO PARTICLE BEAMS
The discussion has so far been concerned with the position, size, and distortions of the image in electrostatic lenses. In this section, electron optical systems will be considered from another angle, namely, in terms of phase space. There are a large number of physical applications that require compressing particle beams rather than correct imaging. One of these problems is long-distance transport of beams with minimum intensity losses and formation of narrow beams with high current density. In these cases, another approach is advisable to the solution of electron optical problems: instead of analyzing individual particle trajectories, as has been done hitherto, one should treat the beam as a single whole (Steffen, 1965; Banford, 1966;
52
L. A. BARANOVA A N D S. YA. YAVOR
Kapchinsky, 1966; Lichtenberg, 1969; Lawson, 1977; Lejeune and Aubert, 1980). This can be best done by introducing the conception of phase space. The basic concepts of the well-elaborated phase space theory will be given in Section V.A. Since in considering the “compressing” systems we are not interested in the localization of single particles constituting the beam, we shall introduce in Section V.B new beam characteristics-emittance and phasespace boundary contour, which describe the beam behavior in phase space and are more suitable for solving the above problems. To describe the beam propagation in real space, in some problems it is sufficient to know about the behavior of its envelopes. The theory of beam envelopes closely associated with the beam characteristics in phase space is described in Section V.C. Formation in a given plane of a minimum beam cross section (crossover) with the highest particle density is necessary in some applications. The crossover is one of the basic characteristics of the beam. Calculation of the crossover position and size is considered in Section V.D, together with a comparative analysis of the image-forming and “compressing” systems. A . The Conception of Phase Space and the Liouuille Theorem
Analysis of phase space involves the concepts of generalized coordinates and generalized momenta. The values q l , q 2 , .. .,q N , which completely describe the position of a particle, are called the generalized coordinates, while their total-time derivative Gi are referred to as the generalized velocities. (Here N is the number of degrees of freedom; for a single charged particle, N = 3). The generalized coordinates do not necessarily have the dimensions of length; they, for example, may be angles in the polar coordinates. The generalized momenta are the derivatives of the Lagrangian with respect to the generalized velocities
The Lagrangian L for a relativistic charged particle moving in an electric or magnetic field is equal to (Landau and Lifshits, 1960) L = - mc2
/<
- ecp
+ e(Av),
where (Av) is the scalar product of the vector potential of the magnetic field and the particle velocity. Here cp and A are functions of the generalized coordinates and time.
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
53
The particle motion can be described by Hamiltonian equations (canonical), which are equivalent to Newtonian equations of motion:
. dH q. = -
p. = --.
a&’
‘
’
dH dqi
Here the Hamiltonian His equal to the total energy of the particle expressed in terms of the generalized momenta: N
H ( & , qi,t ) =
C piqi - L . i= 1
(1 16)
From ( 1 13) and (1 14) we have
H = J m 2 c 2 + c2(P- eA)’
+ ecp.
(117)
For small velocities, the Lagrangian ( 1 14) is transformed into L
mu2 2
= __ - ecp
+ e ( A - v).
In this approximation the Hamiltonian has the form
1 H = -(P 2m
- eA)’
+ ecp.
(1 19)
In the absence of a magnetic field in the nonrelativistic case, if the generalized coordinates have been chosen in the Cartesian coordinate system, the generalized momenta coincide with conventional ones: P = p = mv. In cylindrical polar coordinates, P, = p , = mi. P,, = P,,,~ = my2$, P, = p , = m i , ( 120) Hence the $-component of the generalized momentum is equal to the moment of momentum. The state of a particle at any given instant of time can be completely described by three generalized coordinates and three generalized momenta. This information can be represented by a point in six-dimensional space ( q r,q 2 ,q 3 ,P,, P2, P3),which is called phase space. Each particle corresponds to a representative point in phase space, and the entire beam fills up a certain part of this space limited by a closed surface. As the particles move in real space, the corresponding representative points describe curves in phase space that are called phase-space trajectories. The time transformation of the domain occupied by the representative points of the beam particles is shown schematically in Fig. 12. If the particle motion associated with one degree of freedom does not depend on the other two, the Hamiltonian can be represented as a sum of three
54
L. A. BARANOVA A N D S. YA. YAVOR
FIG.12. Phase-space volume of a beam and its phase-space trajectories.
terms, each of which depends only on one pair of q , , 4 : 3
H(q,P,t) =
1 Hi(qi,pi>t).
i=l
(121)
Then each pair in Eqs. (1 15) is independent of the other two and has the form
Now, from consideration of six-dimensional phase space, we can pass over to an independent consideration of three, two-dimensional subspaces ( 4 i ,pi), which are called phase-space planes. So, the phase-space trajectories can be represented as curves in the plane, essentially simplifying the problem. The introduction of the concept of phase space is justifiable because this simplifies the investigation of the beam behavior in electron optical systems, which do not need to create an image. Here we list some of the properties of phase space that underlie the simplification. The particle trajectories do not interpret in phase space. This is due to the fact that the motion is unambiguously determined by the initial conditions, namely, by the values of 4i and pi at the initial instant of time. If the two trajectories could intercept, their values pi and 4i at that moment would coincide and, therefore, the further trajectories would overlap. The closed surface in phase space bounding a group of particles at time t , has transformed by time t 2 to another surface bounding the same group of particles. So, the particles do not cross the surface. This conclusion has been drawn from the first property, because if a particle did cross the surface, its coordinates q i , pi would coincide with the coordinates of a boundary particle, which cannot happen. In studying the motion of a particle beam, this property
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
55
allows us to consider only those particles that lie at the boundary of the beam phase-space volume and in this way to considerably decrease the number of particles to be analyzed. From the above properties we may conclude that the boundary representative points move across the surface, maintaining the same arrangement, when phase-space volume is transformed. The particle density in phase space obeys the fundamental law described by the Liouville theorem, which states that the density of noninteracting particles in a conservative dynamic system in six-dimensional phase space is invariant along the phase-space trajectories. The forces are called conseruutiue if their work depends only on the position of the initial and final points but is independent of the shape of the path between them. Any system of conservative forces acting on the particles possesses a Hamiltonian of this type. This theorem is equivalent to the statement that the phase-space volume occupied by the beam remains constant and that only the shape of the bounding surface varies. A beam of charged particles behaves in phase space like an incompressible fluid. Optical transformation of the beam changes the configuration of the phase-space volume. Therefore, when designing an electron optical system, one can reduce the area occupied by the beam in a given phase-space plane by enlarging phase-space areas in other planes. For example, by accelerating the particles, we can simultaneously reduce both the beam cross-section and transverse velocities. This conclusion is consistent with the HelmholtzLagrange theorem [see Eqs. (46) and (47)]. If three degrees of freedom of particle motion are independent and sixdimensional phase space is, therefore, considered as three two-dimensional subspaces, then each of these phase-space planes obeys the corollary of Liouville’s theorem, stating that the projection of phase-space volume onto the corresponding plane remains constant. In Cartesian coordinates, if A , = A, = 0 (i.e., P, = p x and P, = p , ) and the longitudinal component of the momentum p x is constant along the z-axis and identical for all the particles, then from the consideration of the phase-space planes (x, P,) and ( y , P,) one can pass over to the planes (x, x‘) and ( y ,y’). Here x’ and y’ are the tangent of the inclination angles, given by x’ = f x / P z and y’ = P,,/P,. Since the trajectory analysis commonly deals with the coordinates and inclination angles, further simplification becomes possible. The corollary of Liouville’s theorem concerning the invariance of the phasespace areas is valid for this case too. If, however, p z is not constant along the z-axis but is taken to be identical for all the particles considered (paraxial approximation), the phase-space areas occupied by the beam in the planes (x, x ’ ) and ( y ,y’) are not invariant but are inversely proportional to p , . Consider in more detail the properties of linear transformations of the beams in conservative systems. The equations for the charged particle motion
56
L. A. BARANOVA A N D S. YA. YAVOR
in these systems have no terms containing particle velocities, while the forces are linearly related to the coordinate. Then for the plane x in the Cartesian coordinates we get
x + gx = 0,
(123)
where g is defined by the system field and depends in the general case on time t. The Wronskian of Eq. (123) has the form
where xl(t)and x,(t) are two linearly independent solutions of this equation. It follows from the linear differential equation theory that the Wronskian is constant because Eq. (124) has no term containing i: W
The values x i ( t ) and initial values
i i ( t ) , where
=
const.
(125)
i = 1,2, are unambiguously related to their
Then we can write down in matrix form (x"" i,(d
Xdt)) i.20)
=
(;;;
xm)
;;;)(xdO) il(0) i2(0) .
(127)
Here, on the left, is the matrix corresponding to the Wronskian at a certain time t ; on the right is the product of the matrix A, characterizing the transformation produced by the optical system and the matrix that corresponds to the Wronskian at the initial time. Then, since the determinant of the product of the two matrices is equal to the product of the determinants for these matrices, we get for the Wronskian W ( t )= \ A [W(0).
(128)
Using Eq. (125), we have W ( t )= W(0)and, hence,
The fact that the determinant of the transformation matrix A is unity is an important property of linear systems that is widely used in electron optics. In Section I11 we considered transformation of the particle trajectory coordinates and angles by electron lenses. In the paraxial approximation these transformations are linear. It was shown that in einzel lenses the determinant of the transformation matrix (65) is equal to unity. The result obtained in
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
57
Section 111 is a particular case of Eq. (129), because the latter can be extended to cover immersion lenses too. B. Beam Emittance and Its Transjbrmation in Electron Optical Systems
The concept of emittance has been introduced to describe quantitatively the beam volume in phase space. Particle beams formed by electron optical systems have, as a rule, two planes of symmetry or are axially symmetric. In the first case, beams are described by two emittances gXand , f y , while in an axially symmetric beam both emittances are equal. The emittance ,fx in the xoz-plane is defined as the area of the beam phase-space volume projected onto the xu'-plane divided by 71. (The reason for introducing the coefficient 71 will become clear later.) In practice, it appears unfeasible to define the actual phase-space area occupied by the beam. Generally, the beam density is nonuniform, being maximum at the center and falling towards the beam edges. For this reason, the concepts of the beam boundary and, therefore, of the phase-space area occupied by the beam are somewhat arbitrary. Sometimes the boundary is defined as a line of constant density equal to a few percent of the maximum beam density. The area enclosed by the corresponding line in the phase-space plane is taken to be the beam emittance. One should keep in mind that the boundary contour thus defined may have an irregular shape, so it is quite difficult to determine its transformations when the beam passes through an electron optical system. In calculations, the beam boundary on the phase-space plane is often approximated by simple curves, for example, by an ellipse or a parallelogram. The elliptical approximation quite satisfactorily describes beams emerging from an accelerator or some ion sources; the other contour is used to describe beams collimated by two slits. Both types of boundary contours can be conveniently described mathematically; they preserve their shape in linear transformations: Straight lines are transformed to straight lines, and ellipses produce ellipses. To prove this statement, let us consider linear transformation of points on the phase-space contour in the xu'-plane produced by the matrix T :
The inverse transformation will have the form (see Eqs. ( 6 5 ) and (74)]
(:;) (-:I: -:::)(::), =
58
L. A. BARANOVA A N D S. YA. YAVOR
Let the initial phase-space contour be a straight line
+ b.
X; = a x ,
(132)
By expressing .xl and x', in terms of x2 and x i using Eq. (131) and substituting the result into Eq. ( 1 32) and grouping the terms, we shall again get the equation for a straight line
Since the gradient of the line of Eq. ( 1 33) depends only on the gradient a of the line in Eq. (132) but is independent of its shift h, it is clear that the parallel lines will be transformed to parallel ones. Likewise, let us consider the transformation of an elliptical contour,
where a and h are the semi-axes of a regular ellipse. Having in mind that the ellipse area is nab
=
(135)
n6,
we shall write down (134) in the form b -.x: u
+ -w; b. u
2
=
6.
Using (1 3 I), we can get
After a number of simple transformations we shall find the equation for an oblique ellipse yx;
+ 2ax2x; + ax;
=
8,
(138)
the parameters of which are related by the following expression:
yJ
-
c12 =
I.
( 139)
It is clear that in linear transformations the elliptical contour is preserved; only the orientation and eccentricity of the ellipse will change. Figure 13 shows the characteristic points of an ellipse expressed in terms of the above parameters. The elliptical contour approach is useful to study transformations of phase-space areas because it has some advantages. First of all, the ellipse can rather well approximate phase-space contours of most real, charged particle
T H E OPTICS OF R O U N D AND MULTIPOLE ELECTROSTATIC LENSES
59
--t
FIG. 13. Beam emittance bounded by an elliptical contour
beams. The ellipse can be conveniently described mathematically, because three parameters are sufficient to define it unambiguously, one parameter (ellipse area) being the transformation invariant. To illustrate this, we shall describe the transformation of an elliptical contour in two simple cases-when the beam passes through field-free (drift) space and through a thin lens. Drift space does not change the inclination angle of a particle but only shifts the representative points along the x-axis on the phase-space plane (Fig. 14). Points 1,3, corresponding to the trajectories
a
b
FIG.14. Schematic representation of a beam in real space (a)and on the phase-space plane (b)when it passes through drift space. The regular ellipse corresponds to the A'A cross section, the oblique one to the B'B cross section
60
L. A. BARANOVA AND S. YA. YAVOR
FIG. 15. Schematic representation of beam refraction in a thin lens (a)and the corresponding transformations of its phase-space contour (b). The regular ellipse corresponds to a beam preceding the lens; the oblique ellipse corresponds to a beam following it.
parallel to the z-axis, do not change their position on the phase-space plane. This transformation can be described by the matrix (68). One should note that the top of the oblique ellipse does not coincide with the point of maximum xcoordinate. Figure 15 shows schematically the effect of a thin lens on the phase-space contour. It is well known that a particle passing through a thin lens does not change its coordinate but changes only its gradient [see Eq. (70)], that is, all the representative points are shifted along the x’-axis. Points 2, 4, which correspond to the trajectories passing through the lens center, are not shifted in transformation. One can see from Figs. 14 and 15 that the boundary points move along the contour, preserving their mutual arrangement. When the beam cross section is relatively large, the lens system produces aberrations. Then the particle motions in the planes (x, Px),( y ,P,,), (z,P,)are not independent, so the corollary of the Liouville’s theorem about the constant phase-space areas does not hold for this case. Moreover, the longitudinal momentum cannot be considered as constant and identical for the particles traveling at different distances from the axis. Therefore, the beam emittances gXand &y are not conserved in systems with high aberrations. Beam transformation in an aberration-producing system has a nonlinear character, so the emittance contours, elliptical or rectangular, become distorted. Figure 16 demonstrates distortion of an elliptical contour by a lens system with spherical aberration. However, in calculations it is convenient to set the contours in the xx’- and yy’-planes to be elliptical. The ellipse embraces the distorted beam contour, as is shown by the dashed line in Fig. 16, thus bounding the so-called effective beam emittance. The effective emittance appears to be larger than the real emittance by the phase space unfilled by particles. The emittance may decrease due to particle losses in a lens system. The transmittance of the lens system is characterized by the acceptance defined as
THE OPTICS OF R O U N D AND MULTIPOLE ELECTROSTATIC LENSES
61
Fic. 16. Consecutive distortions of an elliptical contour in a system of lenses having the spherical aberration.
the quotient obtained when the area in phase space that includes all the points representing particles transmitted by the system is divided by n. So, the acceptance describes the system as a whole, rather than the beam; it depends on the system characteristics-its design and the forces acting on the beam particles. If the acceptance is smaller than the emittance, this means that the system can transmit only particles, whose representative points lie within the acceptance; therefore, particle losses are inevitable. If, on the contrary, the acceptance is larger than the emittance, particle losses can be avoided if the beam is formed correctly. By varying the emittance boundary contour, one can achieve the optimal matching with the acceptance. This procedure is called beam-device matching. In many cases the emittance calculations are complicated, so it is found experimentally. The two-slit method is widely used to measure twodimensional phase-space volume. The first slit located in the plane under study fixes the x-coordinate. The second slit is placed at a distance from the first one and moves perpendicularly to the axis, thus cutting out the beams with a definite inclination x'.The part of the beam that has passed through both slits is recorded and the measurements are then repeated at other values of x.To get a satisfactory measurement accuracy, the distance between the slits must be made large enough compared to the slit width. A detailed description of the experimental techniques for measuring emittance can be found in the work of Lejeune and Aubert (1980). C . The Beam Envelopes The concept of phase space introduced earlier simplifies and helps to visualize the problem of forming charged particle beams. This problem arises
62
L. A. BARANOVA A N D S. YA. YAVOR
in many cases, for example, when a beam must be transported along a given path with minimum intensity losses, in making probes of small cross sections or of small widths in a given plane, or when the input beam parameters must be matched with the transmission characteristics of the system. All these applications require beam compression rather than image formation, so there is no need to produce a point-to-point correspondence of the particle density distribution in the object plane and a given plane. We must note that both “compressing” and image forming optical systems transform the beam with finite emittance. The only difference is that the properties of an image-forming system can be completely defined by considering transmission of beams with zero emittance. The characteristics of a “compressing” system depend on the emittance and its boundary contour. Thus, the difference lies in the specific applications of the beam rather than in its nature, which requires different approaches to beam description. The beam as a whole is characterized in real space by envelopes. When studying “compressing” probe-forming systems, it is unnecessary to consider individual trajectories; only those particles in each z = const plane that have either maximum coordinate or maximum inclination need to be considered. The tangent to the trajectories at the points of maximum deviation from the axis in the x0z-plane is referred to as the linear envelope of the beam in this plane and is denoted by R,(z). Likewise, the line on the x’0z-plane connecting the points of maximum angular deviation is called the angular envelope A,(z). Since the envelopes are defined by the maximum values of coordinates and trajectory inclinations, they can be formed by only particles, whose representative points lie on the phase-space contour. The differentialequation of a linear envelope for the elliptical phase-space contour can be obtained by expressing the ellipse parameters in Eq. (138) in terms of R, and its derivatives. As shown in Fig. 13, R, = xmax= &fp and R: = - a @ / d , because the inclination angles of the trajectory and envelope coincide at the point of tangency. As a result of simple transformations, the elliptical contour equation will have the form , 2
(RLx - R,x )
6 2 + -x2
R:
= g2
By differentiation of Eq. (140)with respect to z, we obtain
In the particular case of a quadrupole lens, x’’ and x are linearly related; x“ = -,!12x (see Section VIII), so the differential equation for the envelope has
T H E OPTICS OF R O U N D AND MULPIPOLE ELECTROSTATIC LENSES
63
the form
RY
A2 + p 2 R , - -3 = 0.
R,
Equation (142) differs from the trajectory paraxial equation for the quadrupole lens by the extra term G ' / R ; . This value is small at large R , but becomes essential only at the beam waist. Equation (142) can be integrated for simple cases, for example, for the quadrupole in the rectangular approximation of field distribution [I2 = const or in drift space p2 = 0. The envelopes R , and A , can also be calculated straightforwardly by expressing them in terms of two linearly independent solutions of the paraxial equation of trajectory x,(z) and xg(z).The trajectory projection on the XOZplane is written as follows (see Section Ill): x = XoXg
+ XbXa.
(143)
Let the initial phase-space contour be elliptical and the initial values of the x,,coordinates and .&-inclinations of particles lying on the phase-space contour be related by the expression
where R,,) and A,, are the initial values of the linear and angular envelopes. To find the envelopes of the curve family depending on two parameters, we can make use of a method known from differential geometry. Employing the relation (144) between x,, and .xb, we will eliminate one of the parameters from Eq. (l43), for example, x,,. Then Eq. (143) will take the form
The expression for the envelope can be found by eliminating the xb-parameter from the set of two equations: (145) and csx/i?xb = 0. Hence, we will have
If the explicit solutions of the paraxial equation of trajectory x, and xD are known, the expression for the envelope will also take an analytical form. The values of x, and xDhave been found accurately or approximately for many lens types, so in all these cases we can use Eq. (146) to write the expression for the envelope. Similarly, one can find the expression for the angular envelope, which for an elliptical phase-space contour has the form
64
L. A. BARANOVA A N D S. YA. YAVOR
The subroot expressions in Eqs. (146) and (147) are positive and d o not turn to zero, so the beam envelope will never touch the axis. This implies that a beam of finite emittance cannot be focused into a point, nor can it be parellel in the optical sense. It will always have finite dimensions and finite angular spread. In the work of Gavrilov and Shpak (1983), the envelopes have been found by taking into consideration some third-order aberrations for a larger class of phase-space contours of the form
(3Y)2m'" = 1,
(?!Y)2 +m'"
where m and n are the natural numbers. We will consider in more detail the beam envelopes in drift space because beams pass drift space in all types of lens systems. From Eq. (146) we can obtain for this case
One can see from Eq. (149) that the envelope of the beam propagating through drift space is a hyperbola (Fig. 17). The hyperbola asymptotes represent trajectories with maximum inclination to the z-axis.
FIG.17. The beam envelopes in drift space (a) and boundary phase-space contours (b).
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
65
D. Crossover
The plane in which the envelope coordinate is minimal corresponds to the plane of the beam waist, or crossover. The position and size of the crossover are important characteristics of a beam. Their calculation is part of “compressing” system designs. There are two problems: One problem is to determine the crossover parameters for a particular electron optical system; the other is to design a system that can form the crossover of a given size in a given plane. In most cases, the beam cross section must be minimal, so the R , and R, minima must coincide, a requirement that is automatically satisfied in round lenses. In some applications, however, a knife-like beam is necessary, that is, a beam, whose thickness falls only in one direction. Here the envelope is minimal only in one plane, so we can speak about a linear crossover. In this case the beam formation requires astigmatic lenses. The position of the crossover can be found if the envelope derivative with respect to z is taken to be zero [see Eq. (146)l. It is known that the function minimum can be found if the following two conditions are satisfied:
For illustration, we shall describe only the x-projection of the envelope. After differentiation of Eq. (146) we have R&xPxb
Rf,l(xDx;)’
+ Af,x,x;
= 0,
+ Af,(x,x~)’> 0.
Substituting x, and xD of a particular system into Eq. (151) and finding the roots of the obtained equation, we can define the position of the envelope extremum. If condition (152) is satisfied, the extremum must correspond to the crossover. For an elliptical phase contour, the crossover is always associated with a regular ellipse [see, for instance, Fig. 17(b)]. If the second envelope derivative at a given z is negative, the plane has the maximum transverse deviation of the beam. The latter is of interest from the point of view of beam transport through a channel with a present inner diameter. We, as a rule, are especially interested in the case when the crossover lies in drift space. Then, in the space that follows the lens, the linearly independent solutions of x, and xB represent straight lines
where z 2 corresponds to the exit plane of the system. Substituting Eq. (153)
66
L. A. BARANOVA AND S. YA. YAVOR
into Eq. (151) and solving the resulting equation, we find the distance q, between the crossover and the system edge
Here the subscript 2 denotes the linearly independent solutions at the point z = 22. For compressing lenses, it is useful to introduce the concept of linear M and angular r magnification, as has been done for the image forming systems. The linear magnification M , in the x0z-plane is the ratio of the crossover dimension R,, behind the lens to the initial crossover dimension R x o :
Likewise, we introduce the angular magnification
For a crossover located in drift space at a distance q from the lens, we have
where W,, = xP2xh2- xa2xb2 is the Wronskian (see Section 111). Using Eqs. (146), (147), and (I 57) we get
+
M , = Wx2Axo[A,20~r2R , ~ O X & - ~ ’ ~ ,
r, = A;,’[A,Z,x&: + R 2 0 ~ z ; ] ” 2 .
(158) (1 59)
The product of linear and angular magnifications of the crossover satisfies a relation similar to the Helmholz-Lagrange theorem in Gaussian optics (47). Using Eq. (44) and the fact that Wo = 1, we have M,rx =
/$.
Image-forming electron optical system can be described by formulae relating the positions of the image and object in terms of the cardinal elements. It is of interest to derive similar formulae relating the positions of the initial and final crossover in the “compressing” systems, This is especially profitable
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
67
since it is the cardinal elements of lenses that are usually known, but not independent solutions of their paraxial equations x, and x,. Expressing xE2,x p 2 ,xi2,and xb2 in terms of the lens cardinal elements, it is easy to find the distance from the crossover behind the lens to the principal plane Hxi. Denoting it as L x i ,we have
Here L,, is the distance between the initial crossover and the principal plane H x o . The initial crossover for a real object coincides with it. The characteristic crossover length is denoted by zxo.
After some transformations, Eq. (161) can be written as
-+ Lxi fxi
fxo
Lxo
+ BX
= I,
(163)
where E, = Z ~ ~ / (-Lfs0). ~ , The expression for linear magnification of a “compressing” system can also be written in terms of the cardinal elements: M x
+
= f X O C ( ~ X 0- Lxo)2 z:01-1’2.
( 164)
By substituting zxo = 0 into Eqs (161), (163), and (164), which corresponds to the zero-emittance beam, we obtain the well-known formulae from Gaussian optics (see Section 111); in particular, Eq. (163) changes into the Newtonian formula. It can be seen from Eq. (163) that the crossover position behind the lens depends on the beam phase-space volume. This is the principal difference from Gaussian optics, in which the image position does not depend on the object size. The analysis made in this section is applicable to both einzel and immersion electron lenses. One should keep in mind that the corollary of Liouville’s theorem concerning the preservation of the beam area on phasespace planes xx‘ and yy’ does not hold for immersion systems. A detailed analysis of these questions has been made by Glavish (1972). A large number of phase contours described by expression (148) without restriction to the straight axial trajectory have been studied by Shpak and Yavor (1984). Note that the product of linear and angular magnifications cannot be described by the Helmholtz-Lagrange formula unless the phasespace contour is elliptical.
68
L. A. BARANOVA A N D S. YA. YAVOR
_ _ - -15 - '. * 2
a
FIG.18. Object-to-image transformation (a);crossover-to-crossover transformation (b).
We may discuss in some detail crossover-to-crossover transformation when a beam passes through an einzel thin lens and two adjacent drift spaces. Figure (18(a) shows several typical trajectories in a lens system in real space, and Fig. 18(b) represents schematically the transformation of the beam contour in the two-dimensional phase space xx'. The object and image planes are each located at twice the focal length from the lens L. Two types of boundary phase-space contours are considered-ellipse and parallelogram. Figure 18(b) illustrates their shapes in several characteristic planes: (1) in the object plane, (2) just in front of the lens and (3) right behind it, (4) in the crossover plane and (5) in the image plane. The dashed line in Fig. I8(a) corresponds to the trajectory, with maximum deviation from the axis in the lens when the initial phase-space contour is a parallelogram. The solid straight lines in Fig. 18(b)represent the transformation of the beam emergent from the point on the axis xo = 0 (zero emittance beam). The crossover is seen to be closer to the lens than the image. The dependence of the positions of the crossover (solid line) and image (dashed line) on the position of the initial crossover or object for a thick
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
1.5
‘
t
69
\
\I‘ I
’
-1,s
-2,o
Fir,. 19. Relationship between the positions of the image and object, and between the positions of the final and initial crossover.
immersion lens is shown in Fig. 19. The abscissa gives the relative distance from the object to the front focal point of the lens zl/zxoexpressed in units of crossover characteristic length. The ordinate shows the distance from the back focal point to the image or crossover behind the lens z 2 in p-units, where p = fxofxi/zxo. One can see that at large Izl I the curves are close to each other; at z1 = 0 (the object or crossover in the focal plane), the image is at infinity, the other crossover being formed in another focal plane ( z 2 = 0). The distance of the second crossover from the focal plane is limited, becoming maximum, L o f x i / 2 z x o at Z I = Z ~ O .
VI. CURRENT DENSITY DISTRIBUTION AND FREQUENCY-CONTRAST CHARACTERISTIC In the foregoing chapters we have considered only the geometry of charged particle beams. However, such a consideration cannot be complete without analysis of the current density distribution in the beam. Indeed, for both image-forming and “compressing” systems it is ultimately very essential to know how the current density is distributed in certain planes in space behind the electron optical system, for example, in the Gaussian image plane or in the crossover. In one case of importance, it is of interest to study the structure of
70
L. A. BARANOVA A N D S. YA. YAVOR
the current density distribution function, which actually represents the image; another interesting case is concerned with other beam characteristics, such as the maximal current density and distribution uniformity. Aberration theory allows us to determine the shape and size of the aberration spot. However, if most of the current is concentrated in the spot center and drops sharply towards its edges, then it is the spot center that actually determines the focusing quality. To describe this characteristic, the concept of resolving power has been introduced; it is determined by the smallest distance between two neighboring points of an object for which two partly overlapping intensity distributions in the image plane corresponding to these points can still be discerned. The question of the transformation of the current density distribution by lens systems is discussed in Section V1.A. The effect of this transformation on image quality and lens resolving power is described in Section V1.B.
A . Calculation of the Current Distribution in the Space Beyond the Lens Our objective is to find the current density distribution I ( x ,y ) in a plane beyond the electron optical system as a function of its parameters at a given brightness of the object. The brightness B is defined as the number of charged particles emitted by the object per unit time from a unit area through a unit solid angle. If the brightness is constant, the distribution function I(x,y ) depends on the properties of the transforming system and the brightness is included in I ( x , y ) as a constant factor. There are various approaches to the solution of this problem, their underlying principle being identical for all electron optical components. The simplest but most time consuming way of calculating the current density distribution is by constructing a large number of trajectories. The initial conditions are preset by the aperture and object dimensions; the number of trajectories from the various elements of the object is to be proportional to the brightness. The density of points at which the trajectories intercept an arbitrary plane normal to the system axis will characterize, with a certain accuracy, the distribution for this plane. The number of trajectories necessary to obtain an acceptable accuracy needs to be sufficiently large. The time-consuming calculations of trajectories can be simplified considerably by using aberration theory. In this case the differential equations are integrated for only a small number of trajectories necessary to obtain the aberration coefficients.The number of coefficientsto be calculated depends on the aperture, object size, and the desired accuracy, which sets the order of the
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
71
aberrations. Knowing these coefficients and two linearly independent paraxial trajectories, it is easy to calculate quite a large number of trajectories by representing them in the form of an aberration series [see, for example, Eq. (9011. A more rigorous approach to the problem is to directly find the current density distribution function I ( x , J J ) , employing aberration theory. A description of this approach can be found in the work of Glaser (1952). In an aberration-free stigmatic system, the image reproduces on some scale the current distribution emitted by the object (the particle losses are taken to be zero). Due to the aberration, the charged particles are redistributed to produce image blurring, so the relation between the current distributions in the object and image planes becomes more complicated. This question has been discussed by several authors (Iznar, 1977; Tsukkerman, 1972). We shall consider charged particles emerging from the point x o , yo in the object plane in a direction determined by the angles x b , yb (in the general case, these angles are measured relative to the axial trajectory). These particles hit the point x, y of the given plane behind the lens (z = const), where we shall seek the current distribution. The number of emergent particles in the solid angle element d o around the xb and yb direction per unit time is B ( x , , y o , x b , yb) d o . They fall upon the area element d x dv. Their number varies with the position of the points xo, y o and x, y ; the slopes xb, yb; and is equal to the current i ( x o ,y o , x, y , xb, yb) d x dy passing through the area d x dy. Since the number of particles is conserved, we have B ( x o , y o , x b , y b ) d w = i(xo,y,,x,JJ,.xb,yb)dxdy.
(165)
Here the element d o of the solid angle is equal to dxb dyb dw = 7.
cos
(166)
y is the angle formed by the trajectory of the emergent particle and the system axis. The variables in the expression for the current density i are related by the
equation of trajectory .x = x(xo,yo,xb,yb),
y
= y(.xo,yo,xb,yb).
( 1 67)
The functions in Eq. (167) are determined by the field distribution in the system in question and by the accuracy with which we calculate the trajectories, that is, by the order of the retained aberration terms. The solid angle element dxb dyb is related to the area element d x d y in the given z-plane by the expression dxb dyb = J ( x o ,yo, x , Y , xb,
~ bd x) dy,
(168)
72
L. A. BARANOVA A N D S. YA. YAVOR
where J is the Jacobian of the transformations of Eq. (167):
Substituting Eq. (168) into Eq. (165), we obtain the relation between the source brightness B and the current density i: ~ ( x oYO, ,
X,
1 cosy
Y, xb, yb) = -B(xo,yo~
xby yb)J(xo, YO,
X,
Y, xb, ~ b ) . (170)
We can eliminate angles xb and yb from formula (170) by solving the equations of (167) with respect to these angles and then substituting the result into Eq. (170). We obtain
Expression (171) defines the current distribution i in the plane (x, y) from a point source located at the point (xo,yo). The value i(xo, yo, x, y) corresponding to the current density i, at the point (xo,yo) is called the scattering function Nxo yo x, Y): 7
9
It is clear that the current density in the object plane is given by the expression
The particular form of the scattering function is determined by the system aberrations. Let us assume that all points of the object plane produce similar distributions but are shifted relative to each other in the image plane (isoplanatic approximation). Such a transformation will be called homogeneous and the corresponding scattering function h will depend on the difference (x-M,x,), (y-Myyo) rather than on the coordinates xo, yo, x, y:
h
= “x
-
K x o ) ,( Y
-
M,Yo)l.
Spherical aberration, axial astigmatism, and axial chromatic error (if the beam is not monochromatic) are all associated with homogeneous transformations because they do not depend on the position of the object point. All other aberrations vary with the distance between the object point and the axis,
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
73
so, generally speaking, all the off-axis points are characterized by a different scattering function. However, one can always find areas (isoplanatic patches) in which the geometrical aberrations vary only a little, assuming that for these areas the form of the scattering function remains valid, in other words, that the transformation remains homogeneous with sufficient accuracy to satisfy practical needs. In the general case, the scattering function depends on the coordinates in the object plane, and the transformation of the image by the lens becomes inhomogeneous if all geometrical aberrations are taken into account . We should note that the size of the area where the scattering function for a given point essentially differs from zero (i.e.,the size of the scattering spot) is, as a rule, much smaller than the image size. Sometimes, when the scattering function h(x,y) is under the integral, this allows us to pass from finite to infinite integration limits. The total current density I ( x , y ) is determined by the contributions of all object points and is equal to the integral [see Eq. (171)]:
s
where S is the object area. Another expression for I(x,y) is obtained if the coordinates x o , y o , but not the inclinations xb, yb, are eliminated from Eq. (170) by using Eq. (167). In this case the current density distribution i for a point object will depend on xb, yb, x, y ; to find the total current density distribution I(x,y), the integration should be made with respect to the inclinations rather than the coordinates. When a beam is intercepted by an aperture, the particle trajectories are expressed in terms of a different set of variables, namely, the coordinates in object plane xo,yo and aperture plane x,, y,:
Here, again, the current density distribution I(x,y) in a given plane can be found by introducing the transformation Jacobian (see Glaser, 1952). One should bear in mind that finding integration limits presents some difficulties. As a rule, the limits are determined not only by the object parameters, but also by the system transmittance. Since all the trajectory variables are interrelated, additional limitations arise that may be more severe than the initial ones. This question deserves a careful consideration, which is not made here. The current density distribution I ( x ,y) in the plane of interest is usually represented schematically with the aid of isophotes, that is, with curves of
74
L. A . B A R A N O V A A N D S. Y A . YAVOR
=I
I
FIG.20. lsophotes of a n axially symmetric system in the plane of the disc of least confusion.
equal current densities f ( x , y )= const. Figure 20 shows the isophotes calculated in the work of Grumm (1952) for the plane of a circle of least confusion on the assumption that a small element of the surface emits according to Lambert’s law (brightness is proportional to the cosine of the angle between the normal to the surface and the direction of the emergent particle). There is a more formal way of finding the current density distribution, described for a round lens by Tsyganenko and Kucherov (1973) and for a larger class of electron optical systems with two planes of symmetry by Ovsyannikova (Ovsyannikova et al., 1975; Ovsyannikova and Sphak, 1977a). The current density distribution I ( x , y ) in the given plane z = const can be written in the form of an integral, employing the Dirac S-function. To calculate the integral one can make use of the formula derived from the work of Papoulis ( I 968).
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
75
In the general case, when a system is characterized by all the aberration coefficients, the use of expressions ( I 74) to calculate the current density distribution in a given plane is too complicated, since it seems impossible to analytically express the dependences of the coordinates xo, yo or the inclinations xb, yb from Eqs. (167). For this reason, the determination of current density distribution l(x,y ) should be reduced to numerical integration by setting the function B and Jacobian J numerically. When the principal aberration can be selected and the others neglected, the solution becomes considerably simplified. The expression for I ( x ,y ) in Eq. ( 1 74) has been obtained based on the assumption that the emergent beams of charged particles are monochromatic. In some cases, this assumption does not hold: The particles have considerable energy variation and the results obtained then refer to a small energy range d@. In order to find the total current density distribution in a nonmonochromatic beam, one should take into account the energy dependence in the expressions for brightness and for trajectories and integrate f(x, y, 4) with respect to d 4 . Questions related to image transformation by axially symmetrical electron optical systems have been studied more carefully. A method for the calculation of the current density distribution in the probe of a cathode-ray device has been suggested by Gaydukova et al. (1980).It takes into account the current density distribution in the probe cross section at the entrance to the focusing system and its aberrations possessing axial symmetry. The current density distribution in the optimum focusing plane of a round lens system has been studied by Geyzler et al. ( 1981), taking into account fifth-order aberrations. The authors considered complex systems containing various functional elements, which permit their analysis and optimization.
B. Frequency-Contrast Characteristics of Electron Optical Systems Analysis of frequency-contrast characteristics is widely used to estimate the quality of an optical system. The current density distribution in the object plane can be represented as a spectrum of spatial harmonics, each of which is described by an amplitude and spatial frequency. The frequency-contrast characteristic describes the ability of the optical system to transmit the frequency-contrast spectrum of the object to the image. Considering Eqs. (171),(172), and ( 1 74), the current density distribution in the image plane can be written as follows: I(x, Y) = ~ ~ G d xydh(.x o , - MxxOIy S
-
M yy o )dxo dy,.
( 1 76)
76
L. A. BARANOVA A N D S. YA. YAVOR
It is assumed that the electron optical system performs a homogeneous transformation. If the Fourier transform is applied to the function I(x, y), we shall obtain the frequency spectrum &(v,, v,) of the current density distribution in the image: &‘(vx,v,)
=
fl
I(x, y) exp[ - 24xv,
+ YVJIdx dy,
(177)
-4,
where v, and v, represent the spatial frequencies on the x- and y-axes. The Fourier transform, when applied to the right-hand side of Eq. (176) after substitution of the variables X = x - Mxo, Y = y - My,, gives m
-w
- a,
Thus, the frequency spectrum of the current density distribution in the image plane is equal to the product of the Fourier transforms of the current density distribution in the object and the scattering function. The function €o(vx,v,) is the frequency spectrum of the current density distribution in the object, while the function H(v,,v,,) characterizes the effect of the optical system on the spatial frequency spectrum during image formation. The value H(v,, v,) is complex, representing the complex spatial frequency transfer function of the optical system. It is not easy t o find H(v,, v,). One way is to use a point source, for which the current density distribution is described by the &function and the spatial frequency characteristic &(vx,v,) = 1. This means that the frequency spectrum of such an emitter contains the whole set of frequencies of equal amplitudes. Then the transfer function is where gT(v,, v,) is the frequency spectrum of the current density distribution in the scattering spot from a point source, which can be calculated or determined experimentally. The modulus of the transfer function IH(v,, vy)l represents the modulation transfer function of the optical system, which is also referred to the as frequency-contrast characteristic. This characteristic most completely de-
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
77
scribes the ability of an electron optical system to transmit the frequencycontrast spectrum of the object during image formations. Eq. (167) shows that the electron optical system acts on the initial object like a spatial frequency filter. Its spatial frequency characteristic represents the spatial spectrum of the scattering function. Therefore, by varying the scattering function, we can control the space frequency spectrum of the image. As a rule, the frequency-contrast characteristic is close to unity for medium spatial frequencies and drops with increasing frequency, lowering the resolution of fine details. The form of this characteristic allows is to determine maximum spatial frequencies for which the transmitted contrast exceeds a certain preset level. We shall try to illustrate the relationship between the frequency-contrast characteristic and the contrast variation with reference to a simple onedimensional case. Suppose the current density distribution in the object is given by the sum of one harmonic and a constant component: i(xo) = C,
+ Cisin 27cvix.
If the frequency-contrast characteristic is normalized, that is, the transmittance for the constant component C, is equal to unity, the current density distribution in the image plane has the form l(x) = C,
+ 1H(vi)lCisin2nvix.
The contrasts in the object plane k, and in the image plane k can be found as follows:
Hence, using Eqs. (1 80) and (1 8 I), we have
Thus, the contrast decrease is proportional to the function H(v): k
= koH(v).
(184)
The expressions for frequency-contrast characteristics of axially symmetric systems have been obtained by Tsyganenko and Kucherov (1973), taking into account the spherical aberration and axial astigmatism. This work presents plots of frequency-contrast characteristics for several values of aberration coefficients. The plots were used to construct the dependence of resolving power at various contrast levels on the position of the given plane z. They also permit us to determine the position of the optimum focusing plane
78
L. A. BARANOVA A N D S. YA. YAVOR
as a function of the required performance characteristics of devices (receiving and transmitting cathode-ray tubes). It is shown that the resolving power is not optimal in the plane of “least confusion.” The expressions for frequency-contrast characteristics of isoplanatic systems consisting of astigmatic lenses have been obtained by Ovsyannikova and Shpak (1977b), who give an example of constructing the frequencycontrast characteristic for a stigmatic quadrupole triplet.
VII. ROUNDLENSES Round lenses are the most commonly used type of electron lenses and also have the longest history. Their fields possess rotational symmetry, so the lenses are capable of stigmatic focusing with equal magnification, M , = M y , and are widely used to form correct images. Moreover, the property of the lenses to compress the beam uniformly in all directions is necessary in many applied problems. Due to these properties, round lenses have found wide application in electron optical instruments, for example, in cathode-ray tubes, probe systems, and ion microscopes. There is a vast literature on the theory and use of round lenses. The simplest designs of round lenses represent series of coaxial circular cylinders or plates with circular apertures. The former are employed when the radial dimensions of the optical system should be small, the latter when a short system is necessary. An advantage of cylindrical electrodes is the beam shielding from stray electric fields. More complex designs are aimed at optimizing various lens parameters. Round lenses are subdivided into einzel and immersion lenses. An einzel lens focuses a charge particle beam without changing the particle energy and, therefore, must contain at least three electrodes. An immersion lens accelerates or decelerates the particles while focusing the beam and may consist of two electrodes. In both cases, the lens is surrounded, on the left and on the right, by equipotential spaces. Zoom lenses are a modification of immersion lenses, which allow the energy of the exit beam to be varied by changing the final electrode potential while keeping the position of the object and image constant; in some cases this can be done without changing the magnification. Circular electrodes can also be used in emission systems and mirrors. Emission systems are characterized by small velocities and large incident angles of the particles entering the field. Mirrors possess a reversal point of the trajectory at which the longitudinal velocity is zero. The theory of such systems requires a special treatment and will not be considered in this book.
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
79
The present chapter summarizes the basic data on electrostatic round lenses. The first-order properties and cardinal elements of round lenses are described in Section VILA. Since lenses of this kind are most commonly used to form correct images, the problem of geometrical and chromatic aberrations is of primary importance; they are discussed in detail in Sections V1I.B to V1I.D. The optical properties of round lenses largely depend on whether the lens changes the energy of the particle beam. Simple immersion lenses consisting of two electrodes are described in Section VI1.E; einzel lenses are considered in Section V1I.F. Multielectrode immersion lenses are discussed in Section VII.G, together with their modification-zoom lenses. Applications of round lenses are numerous; some of the new trends are outlined in Section VI1.H. A . Field Distribution (rnd Prrraxicil Optics of Round Lenses
The exact solution of the Laplace equation for a system possessing rotational symmetry is known for only a few simple configurations. For optical problems, it is sufficient to know the potential distribution in the vicinity of the axis, which is described by expression (33) if we set 4i(z)= 0, starting from i = 2. Then we obtain cp(.~,y,:)
=
1
1
4
64
4(.~) - - ~ " ( z ) ( . u ' + y 2 ) + - ~ ' " ( z ) ( . x ~ + y2)' + ...
(185)
Hence, the field in space is unambiguously determined by the potential distribution on the axis 4(z). Its form will be given for various lens types later in the chapter. Since the potential is independent of the rotation angle with respect to the lens axis, its distribution can be conveniently written in the cylindrical coordinates r2
+ r4
(p(r,z) = $(:) - - 4"(3) - @'"(z, 4 64 (-1)"
= "<(,
(V!)l
42'(z)
("'5
The field components will be expressed as follows:
+
'
'.
80
L. A. B A R A N O V A A N D S. YA. YAVOR
Near the axis of symmetry, neglecting r-containing terms in the powers higher than the first one, we obtain
It is clear that in the paraxial region the longitudinal field component E, does not depend on r and is equal to its axial values. The radial component E,.is proportional to the distance from the axis. One can also see from Eq. (188) that the transverse component is proportional to the rate of variation of the longitudinal component, E, cc E:; therefore, they cannot be changed separately. We should bear in mind that although the particles in a lens are deflected by the transverse field, the longitudinal component also makes a contribution to the deflection by accelerating or retarding the particles. Thus, neither the particle energy nor the optical power of the lens can be regulated independently. Figure 21 shows some types of electrostatic round lenses and their light optics analogues. In the immersion lens shown in Fig. 21(a), the first half operates as a converging lens and the second half as a diverging lens. The transverse components of the field in both halves are identical, but the longitudinal particle velocities are different. The longitudinal electron velocity is greater in the right-hand, diverging half of the lens, so its effect on the particles is weaker. Besides, the distance of a particle from the axis is smaller on the right than on the left, therefore, the radial force acting on it is also smaller. So, the net effect turns out to be converging. The einzel lens shown schematically in Fig. 21(b) can be represented as a converging lens placed between two diverging lenses. Later we shall prove rigorously that both immersion and einzel round lenses are always convergent. The general equations of paraxial trajectories (35), (37), and (38) seem to also hold for the particular case of round lenses. However, due to their
_.
a
- +
b
+-+
FIG.21. Equipotential lines (solid) in round lenses: (a) in an immersion lens; (b) in an einzel lens. The dashed lines represent particle trajectories.
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
81
rotational symmetry, it is more convenient to use the equations in the cylindrical coordinates. If we do not restrict ourselves to the case when the longitudinal component of the magnetic field is absent, the paraxial equations of trajectories in a round lens will take the form
The constant C can be expressed in terms of the initial conditions as follows:
The general equations (189) and (190) take the simple form 1 1 4,’’ + -4‘r’ + -4”r
2
4
=0
*‘ = 0
( 193)
In the absence of magnetic field ( H = 0) and initial rotational moment (C = 0). It follows from Eq. (193) that in this case the trajectory does not depart from the meridian plane in which the initial point lies. For a point on the axis, there is always a meridian plane in which $’ = 0, and, therefore, the trajectory does not depart from this plane. An analysis of Eq. (192),similar to that described in IILB, shows that its general solution can be represented in the form
r(d = ror&)
+ rdra(z),
( 1941
where ra(z)and r,(z) are two independent solutions of Eq. (192) that satisfy the following initial conditions:
ra(zo)= 0,
rL(zo) = I,
(195) r;(zo) = 0. The trajectories emerging from a certain point P at a distance ro from the z-axis (at z = z o ) and lying in the meridian plane will again converge at a point with the coordinate z = z i , at which r,(zi) = 0. The distance from the z-axis to this point, referred to as the image point (to be denoted as Q ) , is equal to ri = rorp(zi). The linear magnification of a round lens can be expressed as follows:
ra(Zo)= 1,
M
r.
= 2=
r0
ra(zi).
(196)
82
L. A. BARANOVA A N D S. YA. YAVOR
We shall show that the particles emerging from the point P but not lying in the meridian plane, that is, possessing the initial nonzero rotational moment (G # 0), will also converge at the point Q. We shall assume, as before, that the magnetic field is absent, then Eq. (190) will take the form
The angle $ ( z )is the difference between the rotation angles of the nonmeridian and meridian trajectories relative to the axis. We introduce the complex function u(z) that completely describes the nonmeridian trajectory:
u(z) = rei$.
(198)
If r(z) is expressed in terms of u(z)from Eq. (198)and substituted into Eq. (189), taking H = 0 and using Eq. (197), then for the function u(z) we shall get Eq. (192). Its general solution can be presented as a linear combination of known partial solutions of r,(z) and r&z) [see Eq. (195)l; however, the arbitrary constants will be complex. These arbitrary constants can be found by satisfying the initial conditions, so for u(z) we get u(z)
= r,rg(z)
+ [rb + iro$b]ra(z).
It i s clear from formula (199) that in plane z functions u(zi) and r(zi) coincide:
=
u ( z i ) = r(zi) = rorg(z).
( 199)
zi, where ra(zi)= 0, the (200)
The function u(zi) is a real quantity, therefore, the total rotation angle of nonmeridian trajectories in plane z = ziis zero. From Eq. (200)it follows directly that all the charged particles that emerge from the same point P of the object plane will again converge at the same point Q in the image plane, in spite of the fact that the rotation angles with respect to the z-axis along this path are different for particles with different initial rotational moments Thus, analysis of the paraxial properties of round lenses can be restricted to the meridian trajectories only. For this, it is sufficient to define r from Eq. (192).Therefore such a lens converges all charged particles uniformly in all directions and thus forms a correct image of an electron optical object. At high velocities, the paraxial equation, with allowance for relativistic corrections, has the form
4".
r"
4' + 0-4 = 0, + 0-r'2u 4u "
(201)
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
83
where ~ ( zand ) U ( z ) have been defined in Section 111 [see Eq. (35)]. The equation remains linear, so the earlier conclusions concerning first-order focusing are valid. For a round lens, one can get an equation similar to Eq. (40), using the substitution R
=
r4114,
(202)
Then we have
Equation (203) does not contain the second derivative of the axial potential so its use is preferable when the potential is determined distribution 6", numerically or experimentally. If we consider an arbitrary axially symmetric electrostatic field lying between two equipotential regions, we can show that it always operates as a converging lens. Integrating Eq. (203)from the beginning of the field to its end, we get
Since the integrand is always positive because R does not change its sign in the region in question, we have R ; - R ; < 0.
(205)
By differentiating Eq. (202), we find
Assuming the integration limits in Eq. (204) to be outside the field, we shall find 4' = 0 at these points, and from Eq. (205) we have
Let a parallel beam of charged particles enter the field. Then rl = 0, and from (207) r ; < 0. Therefore an arbitrary axially symmetric electrostatic field bound by equipotential spaces is always convergent. This conclusion is invalid when the object or image lies within the field, as happens in cathode lenses. These electron optical systems may be divergent; however, it was pointed out above that they are not considered in this book. Let us consider a thin lens, inside which we may set R = const. In this approximation Eq. (204) yields the following expression for the focal length of
84
L. A. BARANOVA A N D S. YA. YAVOR
an immersion lens:
In an einzel lens 4o = 4i,and for fi we obtain
B. Spherical Aberration of Round Lenses Analysis of geometrical aberrations shows that the number of independent aberration coefficients in round electrostatic lenses reduces to five. Expressions (90) in this case have the form
+Y ~ Y ~ I + Y0YB)xO + L(xg + Y i ) x B + D ( X $ + y$)xO, (210) + CC(xi + Y ; ) Y ~ + +Y ~ Y ~ ) I + Y0YB)YO + L(x$ + Y;)YB + D ( X $ f Y g ) Y O .
AX; = B(xi + Y ~ ) x B+ GC(xi + ~ 3 x + 0 ~xB(xOX,
+ 2K(xOxB Ayi = B ( x ~ + Y;)Y, + 2K(xOxB
~YB(xOXB
There is the following relation between the aberration coefficients introduced in Eqs. (90) and (210):
B
2K
+L
El,
Blx = B,,
1
1
G=-G
=
lx
=-G
= El,,
D
1,
--G
L
=
Bzx = B,,, 1
2x
=G2y
= E2, =
1
=jG3x =p
3 y ,
1 1 K = -E3x = 2E3Yr 2
Ezy,
(21 1)
= D,, = D,, = DlX = D,,.
One can see that in round lenses the spherical aberration is described by only one coefficient B for both the XOZ- and y0z-planes. If all the coefficients in Eqs. (210), except B,are taken to be zero, we shall obtain Axi
B(x;
+ y;)x,,
+ y;)ye.
Ayi
=
B(x2
Ayi
=
Brisin h.
(2 12) Passing in the aperture plane to the polar coordinates rB and &, we have =
Axi = Bricos &,
( 213)
Eliminating the angle ~,b~,the aberration deviation in the Gaussian image
THE OPTICS OF ROUND A N D MULTIPOLE ELECTROSTATICLENSES
85
plane is expressed as
Ar
=
Bri.
Thus, all the trajectories intersecting the aperture plane along the circle of radius rB will describe in the Gaussian image plane a circle of radius Ar, with the center at the Gaussian image point. If the aperture of radius R, is completely filled with the beam, then in the Gaussian image plane one can observe a blurring disc of radius Ar = BR;. One can see that the value of A r is proportional to the cube of the aperture radius and does not depend on the position of the point on the object plane. In some cases it is convenient to express the spherical aberration in terms of the trajectory inclination rb in the object plane rather than in terms of the aperture radius
Ar
=
MCrb;l.
The quantity C is called the sphericul aberrution constant It is easy to see that B and C are related as follows:
B6 C=M ' Here ii is the distance between the object and aperture. The ray paths in a lens possessing spherical aberration are shown schematically in Fig. 22. The trajectories passing at a large distance from the axis intersect it closer to the lens than the paraxial trajectories. In this case the spherical aberration is negative, and so is the coefficient B. The coefficients B
FIG.22. Trajectories in a lens with spherical aberration.
86
L. A. BARANOVA A N D S. YA. YAVOR
and C have the opposite signs, because the magnification M is negative; so the constant C is positive. When analyzing the general expression for C written in the form of quadratures, Scherzer (1936) showed that the integrand may be reduced to the sum of squares, so that the spherical aberration constant in round lenses is always positive and cannot be turned to zero. In the work of Rose (1967), the statement that the spherical aberration in these lenses cannot be cancelled was extended to cover the case of relativistic velocities. Here lies the essential difference between electron and glass lenses, in which the spherical aberration can be completely corrected. The difference Az between the z-coordinate of the Gaussian image point and that of the point of intersection of the marginal rays is referred to as longitudinal spherical aberration. It can be seen from Fig. 22 that if zi zB >> Az, there is the following dependence between the longitudinal and transverse spherical aberrations: Ar Az = - = M 2 C r z . r:
The disc of least confusion in Fig. 22 lies not in the Gaussian image plane, but lies closer to the lens. Let us consider in more detail the beam structure near the image plane shown in Fig. 23. The disc of least confusion is determined by the intersection of the beam envelope with the marginal ray coming from the
FIG.23. Formation of the disc of least confusion in the presence of spherical aberration: (1) t h e Gaussian plane; (11) the plane of least confusion; (111) the plane of intersection marginal
trajectories; the heavy lines denote the beam envelopes.
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
87
opposite side. Herein lies the difference from the crossover of a beam with nonzero emittance, which is determined by minimum envelope and which we have considered in the paraxial approximation in Section V.D. The position and size of the disc of least confusion can be found as follows. The equation of trajectories emerging from the lens have the form
where r,(z)
= rh(zi)(z-
zi).
The expression for the beam envelope R,, found by the method described in Section V.C, is
By equating R , to the marginal ray transverse coordinate, we shall find that the distance between the disc of least confusion and the Gaussian plane is (3/4) Az, its radius being (1/4) Ar. When spherical aberration is present, therefore, a better image can be obtained by shifting away from the Gaussian plane towards the lens. The spherical aberration constant C can be written in quadratures in various ways, differing in the order of derivatives of the axial potential distribution and of the independent solutions of the paraxial equation. One expression can be derived from another, using integration by parts and taking into account the paraxial equation. The expression containing derivatives of the axial potential up to second order is as follows: C=-J
1
64&
i z
J$[(3g4
-
4.5yzg’
+ 5y”)r,4 + 4yg’r,3rhldz.
(221)
In
Here y = +‘/b and Y, is the solution of paraxial equation (192) satisfying the initial conditions (195). Expression (221) is convenient when the fields are calculated by numerical met hods or measured experimentally. Wc have pointed out in Section IV that the spherical aberration constant C can be presented in the form of polynomial in the powers of reciprocal magnification M - ’ :
C = C,
+ C , M - ’ + C,M
+ C 3 M - 3 + C4M-4.
(222)
The coefficients of this polynomial arc quadratures independent of the object position. They are also used to express the other geometrical aberrations associated with the meridian trajectories. In fact, the polynomial terms of Eq. (222) are not always independent; this
88
L. A. BARANOVA A N D S. YA. YAVOR
varies with the electron optical scheme in which the lens is used. If a lens operates at high magnification, then
c % c,.
(223)
In probe systems, where the magnification is small, the spherical aberration is usually described by the coefficient C‘, defined as follows: Ar
=
C’r;’,
(224)
Here ri is the trajectory gradient in image space. The coefficients C and C’ are related as C’ = M 4 C .
(225)
Turning back to expression (222), we can see that C’
%
C,,
( M << 1).
(226)
In a symmetrical einzel lens C, = C, and C, = C,. From comparison of expressions (223) and (226) it is clear that C = C’. C. Field Aberrations
Field aberrations will be described for a point in the object plane having the coordinates x = x,, y = 0. Due to the rotational symmetry of the field, this does not violate the general character of our considerations. Expressions (58) in this case have the form
+ G(3x; + y i ) x , + (2K + L ) x ~ x +, D x i , Ayi = B r i y , + 2GXoXB)~,+ L x ~ Y , . Axi = Brix,
(227)
Let us consider the aberration figures in the Gaussian image plane formed by each type of aberration. We shall introduce the polar coordinates in the aperture plane: X, =
r,cos$,,
y,
= r,sin$,.
(228)
An aberration figure is a figure produced in the Gaussian plane by rays that intersect the aperture plane at the radius r,. For coma described by the coefficient G, we have A x , = G ( 3 COS’ $, Ayi = 2Gx,r;sin
+ sin’ t+bB)xori= G(2 + cos 2$,)xOr;, $,COS
$, = Gxor;sin 2$,.
(229)
The aberration curve is circular: ( A x i - 2Gx,r;)’
+ (Ayi)’ = G’x2r;.
(230)
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
I
89
Y
FIG 24. Aberration figures of coma.
The circle radius is r = G x , r i , while its center is shifted meridianally relative to the Gaussian image point by the value 2r = 2Gx,ri (Fig. 24). If the beam fills all of the aperture and rB varies from zero to RB, the superimposed circles form a figure that looks very much like a comet tail (whence the term coma). The two tangents to the aberration figures emerging from the Gaussian image point intersect at 60". The coma length is d = 3Gx,R2. The coefficient K in Eq. (227) characterizes the image astigmatism. If all the aberration coefficients, except K , are zero, the image blurring in the Gaussian plane can be described by the following relations: A x i = ~ K x ~ x , , Ayi = 0.
(231)
The aberration figure represents a radially located line of length 4 K x i R E ,with the center at the Gaussian image point. At some distance Az from the Gaussian plane, the rays intersect to form another line normal to the first one. Thus, the beam emerging from a point becomes astigmatic following the lens and does not converge into a point. The distance between the two lines is given by the expression
In the general case, the beam cross section represents an ellipse that degenerates in some plane into a circle of least confusion of radius 2 K x i R B . One can see that the radius of the circle is equal to the half-line length. An astigmatic beam of charged particles is shown schematically in Fig. 25.
90
L. A. BARANOVA A N D S. YA. YAVOR
FIG.25. Astigmatic beam a n d the number of its cross sections
The surface on which radially oriented lines are formed is called the tangentid surficr of the image. When the coefficient L of Eq. (227), describing the curvature of the field, is equal to zero, the tangential surface coincides with the Gaussian plane. Linear images oriented normal to the radius are formed on the sagittal surface, which represents the rotational surface touching the Gaussian plane on the lens axis. The curvature coefficient L of the image field is responsible for image blurring in the Gaussian plane given by the relation
Axi = L x i x ,
= Lxir,cos
Ayi = L x i y ,
= Lxir,sin
$,
.,$I
(233)
The aberration curve represents a circle with a center at the Gaussian image point.
AX^)^
+ (Ayi)’ = LZx:ri.
(234)
All the trajectories emerging from an off-axis object point and intersecting the aperture plane along the circle of radius rs will again converge into one point, which, however, is not in the Gaussian image plane, but lies at a distance Az from it:
The image of a flat object is sharp on the rotationally symmetric surface touching the Gaussian image plane on the lens axis. In the vicinity of the axis,
T H E OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
91
FIG.26. Astigmatism and field curvature: ( I ) sagittal surface of image; (11) surface of least confusion: (111)tangential image surface.
the image surface can be replaced by a section of a sphere of radius /I =
M2 2L(Zi - z e ) .
Since the astigmatism and field curvature are of the same order with respect to the quantities xo, x,, and y,, these aberrations can be conveniently discussed together. If the two aberration coefficients K and L are nonzero, the aberration figure in the Gaussian plane is an ellipse. A particle beam emerging from an off-axis object point and filling all of the aperture forms two perpendicular lineimages on the tangential and sagittal surfaces, respectively (Fig. 26). These surfaces represent the surfaces of rotation with a common point on the axis. On an intermediate surface, the beam cross section is circular; here the aberration blurring is minimal. The coefficient D in expression (227) describes the distortion. This type of aberration is independent of the beam divergence but is determined only by the position of the object point. The distortion error is given by the expressions
AX^
=D
x ~ ,
byi = 0.
(237)
92
L. A. BARANOVA AND S. YA. YAVOR
a FIG.27. Distortion: (a) test object; (b) cushion-shaped distortion; (c) barrel-shaped distortion.
The distortion preserves the image sharpness, producing only a shift of the image point relative to its Gaussian position. Figure 27 shows a test object to study distortion and its images in the Gaussian plane, depending on the sign of the coefficient D. The positive D-coefficient decreases the distance of the image point from the axis (M < 0),producing a barrel-shaped distortion; the negative coefficient results in cushion-shaped distortions. The distortion coefficient can be represented as a polynomial in the powers of reciprocal magnification M - ' , in much the same way as it has been done for the spherical aberration (222):
Expression (238) is the first power polynomial, the coefficients of which is independent of the object position. The formulae for the geometrical aberration coefficients of round lenses have been given in several publications (Glaser, 1952; Grivet, 1972); for the relativistic case, see the work of Hawkes (1977). We should note that some of the aberration coefficients can be reduced to zero by choosing the right aperture position. The fifth-order aberrations of round lenses have been discussed in the work of Hawkes (1965a).
D. Chromatic Aberrution The chromatic aberration of a round lens can be obtained in the Gaussian image plane from Eq. (100) if the field axial symmetry is taken into account.
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
93
FIG.28. Change of aberration figures of the chromatic error with increasing distance between the object point and the lens axis.
Then we have Axi
=
M(C,xb
+ C,X~)-, A 4
(239)
40
where C, describes the axial chromatic error and CM characterizes the chromatic aberration of magnification. The axial chromatic aberration has the same value for all points of the object; it produces a circle with radius Ari = MC,rb,,,-.
A4max
40
Here is the maximum inclination of a trajectory in the beam, which is supposed to be the same in all directions and A4,,, is the maximum energy spread in the beam. If the object point is off axis, there is also a chromatic aberration of magnification, proportional to the distance between this point and the axis. In this case the aberration figure is represented by a series of circles whose centers are radially shifted relative one another. Figure 28 shows distortions due to the combined effect of the coefficients C, and C, for different values of xo. If we apply the perturbation method to the paraxial equation (192) (see Section IV), to calculate the blurring circle Ari produced by the axial chromatic error, we will obtain
94
L. A. BARANOVA A N D S. YA. YAVOR
It is clear from Eq. (241) that the integrand is always positive and that the sign of the coefficient is fixed. Particles of smaller energies are always focused closer to the lens than higher energy particles, so the cancelling of chromatic aberration in round lenses is impossible. E. Two-Electrode Immersion Lenses
The electron optical properties of round lenses have been studied extensively. A large number of recent studies have been stimulated by their new applications and by the development of computer technology, which allows us to compute complicated electron optical systems with high precision and speed. One example of using regular computations to investigate round lenses is the work of Harting and Read (1976), which gives a vast amount of factual material in the form of tables and plots concerning the paraxial properties and aberrations in several types of round lenses. The simplest round lens is the immersion lens formed from two electrodes-cylinders or plates with round apertures [e.g., see Fig. 2l(a)]. The potential distribution on the axis of two coaxial cylinders of equal radii R can be easily found by the method of separation of variables. For an infinitesimal gap width between the electrodes, this distribution can be described by an exact expression: V2 + V, I V2 V, sin kz dk (242) 4 ( z ) = ___ kl,(ikR)' 2
:{
where Z,(ikR) is the Bessel function of the first kind and Vl and V2 are the potentials on the first and second electrode, respectively. Expression (242) is well approximated by a formula widely used in approximate calculations
When the gaps are relatively small, the field on the axis practically falls to zero at a distance 2R from the lens center. Formula (243) is useful in estimations of focal lengths in the thin-lens approximation [see Eq. (208)l. It is also useful for approximate calculations of a lens made up of a series of cylinders if the gaps between them are small enough and the electrode lengths are greater than their diameters. Two-cylinder immersion lenses have found wide application because they permit comparatively easy focusing of charged particle beams when their energy is varied. For this reason, the properties of such lenses have been a subject of intense interest for many researchers; the basic results will be cited here.
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
95
Electron optical characteristics of electrostatic two-cylinder lenses were investigated by Kuyatt et al. (1972) and Natali et al. (1972) using numerical calculations of the field. The potential distribution was found by the relaxation technique, using a nine-point formula for a 40 x 320 mesh grid. The cylinder wall thickness was taken to be great, and the gap width was 0.1 D ( D is the cylinder diameter). The calculations were made for a wide range of potential values of the lens (VJV, = 1.5 - 50). The work of Kuyatt et al. (1972) and Natali et al. (1972) involved a comparison of the results with data obtained by other investigators. The conclusion that was drawn from this comparison and some additional tests was that the computation error for the first-order electron optical characteristics was less than 0. I ”/, . The third-order geometrical aberration coefficients, as well as some fifth-order coefficients, were obtained for meridional trajectories. The coefficients are presented in the form independent of a particular position of the object and aperture. The accuracy of the third-order coefficients was within 10%. The work of Natali et al. (1972) provides convenient curves for the dependence of the image-to-object positions (the P-Q-curves). Similar results for two-cylinder lenses were obtained by Grivet (1972) and Harting and Read (1976), covering a wider range of geometrical parameters. A more accurate method of integral equations (the charge density method; for this see Section 11) was used to calculate the potential distribution. In this method, the cylindrical electrodes are divided into rings, each of which has its own charge. The total potential created by the rings must satisfy the boundary conditions, that is, it must have constant values on the electrodes equal to the applied voltages. Cylinders of equal diameters were studied: thin walled and with the wall thickness equal to the cylinder radius, the gap width g varying between 0.1 D and 1.0 D. Moreover, the authors considered lenses formed from thin-walled cylinders of varying diameter, with the ratio of the first electrode diameter to the second electrode diameter varying from 0.5 to 2.0. The data are given for accelerating lenses in the range VJV, = 1 - 90. For retarding lenses the corresponding data can be obtained by replacing the cardinal elements of object (image) space with those of image (object) space of the accelerating lenses. The transpositions P 4 Q and M 1/M should be made when employing the P-Q-curves for the retarding lenses. A comparison of the data obtained by the researchers cited above for the case when the geometrical and electrical lens parameters are identical shows a good fit: The difference between the first-order characteristics is less than I%, and the difference between the third-order aberrations is less than 10%. This difference is partly due to the different electrode wall thickness in the lenses under investigation. --f
96
L. A. BARANOVA A N D S. YA. YAVOR
Figure 29(a) shows the dependence of the image position Q on the object position P for a conventional immersion lens formed by thin-walled cylinders of the same diameter and the gap g = 0.1D. The data are given in the form of curves of equal excitations (V2/V1 = const) and equal magnifications ( M = const). Figure 29(b) gives the relationship between the spherical aberration constant C and the magnification M for the same lens. The data on chromatic aberration of such a lens is given by Vijayakumar and Szilagyi (1987). The work of Harting and Read (1976) presents analytical approximate expressions for j’, z ( F ) , and C as functions of VJV,. After introducing the symbol R to denote the cardinal elements fo, fi, z(F,), and z(FJ, the approximating function of a two-electrode lens will have the form
The values of the coefficients ai depend on the lens geometry and are given by Harting and Read (1976). A similar formula was written for the spherical aberration constant C. The approximating formulae considerably simplify and speed up the calculation and optimization of lens systems, especially if a computer is used. They facilitate data storage in the computer memory, since only a small number of polynomial coefficients have to be stored, instead of lengthy tables of data. In two other publications (Cook and Heddle, 1976; Bonjour, 1979a), the potential distribution in a two-cylinder immersion lens was found using a variational technique in the form of a series of zero- and first-order Bessel functions. A linear potential distribution in the gap at r = R was assumed by Bonjour (1979a); Cook and Heddle (1976) approximated the potential in the gap q ( z , R ) by a third-power polynomial, which permitted field calculation for a lens with a wide gap between the electrodes. The polynomial has the form
Here, all the linear dimensions are given in units of the cylinder diameter D, the first electrode potential V, = 0, and the second electrode potential V2 = 1. The constant a of the approximate polynomial was found based on the assumption that the energy d stored in the system is minimized:
FIG.39. Dependences of the image position Q on the objec~position P [a);dependences of the spherical aberration on magnification [bl for a thin-walled. two-cylinder Ims.
98
L. A. B A R A N O V A A N D S. YA. YAVOR
If an approximate solution of the problem cp*, depending on the parameters
E., p,. . . , has been found, the variational principle states that b(cp*)2 €(cp). If the condition
is satisfied, the solution closest to the exact one can be found. This approach permits lens field calculations to be made with sufficient accuracy for electron optical problems, even if a modest computer is used. The work of Bonjour (1979a) describes first- and third-order electron optical characteristics for lenses with the gap g / D = 0.2, 1.0, and 2.0. A comparison with other studies (Kuyatt et al., 1972, Natali et al., 1972; Harting and Read, 1976) shows that this method provides sufficient accuracy: For the cardinal elements the error was about 1% for electrodes with wall thicknesses about as large as the radius; for thin-walled cylinders the error was 4-6%. An analysis of the available data permits us to conclude that the optical power of this type of lens falls with increasing gap between the electrodes and with increasing diameter of one of the electrodes; however, it increases somewhat with electrode thickness. A specific property of the lens is the fact that even in a weak lens the principal planes are shifted relative to the gap towards the smaller potential. In a stronger lens, the shift becomes more remarkable and the planes move apart. This conclusion remains valid for all types of immersion lenses. A comparison of the aberration behavior of various two-cylinder lenses shows that the spherical aberration is much smaller in an accelerating lens than in a retarding lens. By comparing lenses of the same magnification M = 1, one can see that the spherical aberration is smaller when the cylinders have different diameters (Dl z 1.5D2)and when the larger potential is applied to the cylinder of smaller diameter. The properties of two-cylinder lenses were also studied by Saito and Sovers (1979) as a function of the gap and wall thicknesses. The results agree rather well with earlier investigations in cases when the geometrical parameters of the lenses were similar. It was shown that at a fixed focal length the spherical aberration coefficient could be reduced by increasing the gap between the cylinders. Analytical functions were suggested that well approximated both the cardinal elements and the spherical aberration. The constants of the approximating functions were tabulated. An immersion lens consisting of two cylinders of equal diameter and designed for electron acceleration in a wide range, from 1 to lo6, was studied by Bobykin et al. (1975). To calculate the axial potential distribution, formula (242) was transformed to improve its convergence and to make it suitable for practical calculations at large potential ratios V2/V,. The dependence of the
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
99
concentrating power on electron acceleration for an off-field electron source was considered. Immersion lenses made up of two diaphragms with circular apertures are less common which seems to be due to their relatively large transverse dimensions, higher aberration level, and difficulties in the calculation of their optical properties. The latter is related to the complexity of potential distribution calculations in such lenses. A comparatively accurate representation (on the order of a few percent) of the potential distribution on the axis of a double aperture lens with equal aperture radii is given by the formula
Here h is the distance between the electrodes, z = 0 in the geometrical center of the lens, and all linear dimensions are given in units of aperture radius (Grivet, 1972). Data on first-order focusing properties and geometrical aberrations in such lenses are presented by Harling and Read (1976). The distances between the electrodes are 0.5D and 1.OD; the electrode thickness is 0.05D, where D is the aperture diameter. To illustrate this, Fig. 30 a shows a set of curves describing the dependence of the image position Q on the object position P; the parameters are the potential ratio V2/V, and the lens magnification M . The aperture diameters are identical and equal to the distance between the electrodes. Figure 30(b) gives the relationship between the spherical aberration constant C and magnification M for the same lens. Kanaya and Baba (1977) studied electron optical properties of an immersion two-electrode lens, using the approximation of the axial potential by the function 4 ( p ) = Vexp ( K Oarc cos p). The variable p is related to the z-coordinate as follows:
Here m is a parameter dependent on the lens geometry and the potential ratio V2/Vl; the value u describes the half-width of the function b'(z)/d(z)and is determined experimentally, like m; the constant K Ois defined by the potential ratio V2/V1. This approximation allows analytical expressions to be obtained for the basic optical characteristics. The paraxial trajectories are expressed in terms of hypergeometric functions. Lenses with small spherical and geometrical aberration coefficients were found. The obtained results are valid for lenses with various electrode configurations, for example, for two apertures or for an
L. A. BARANOVA AND S. YA. YAVOR 1
I
a
10-
'
10
W'
10
lo3
lo4 HC/D
Fic. 30. The same as in Fig. 29, but for a two-aperture lens.
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 101
aperture and a cylinder. The lens fields can be approximated with sufficient accuracy by the above function by selecting suitable constants. Kodama (1 980) used another approximation for the field distribution in a two-aperture lens without making an experimental measurement of constants. The obtained results agreed well with the available data. Szilagyi et al. (1987) gave the electrode shapes of an optimized lens with small spherical aberration.
F. Einzel Lenses Round einzel lenses, consisting, as a rule, of three electrodes, are most commonly used to focus charged particles without changing their energy in various electron optical instruments. There are two versions of the einzel lens-one with an accelerating, and the other with a retarding, potential on the central electrode. A particular case of a retarding potential lens is an unipotential lens, in which the cathode potential is applied to the central electrode. The focusing properties of the lens do not depend on the value of the potential applied to its outer electrodes, since the particle energy varies together with the potential; therefore, the lens optical properties can only be changed by changing its geometry. Figure 3 1 shows a three-cylinder einzel lens and nonparaxial trajectories in it for both an accelerating potential on the central electrode (solid curves) and a retarding one (dashed curves). It is seen that in the accelerating mode the trajectories in the first gap pass closer to the axis than in the retarding mode, which produces smaller aberrations in the image. Nevertheless, at the same
"7
V,=y
4
FIG.3 I . Einzel lens consisting of three cylinders
102
L. A. BARANOVA A N D S. YA. YAVOR
absolute potential difference, lenses with a retarding central electrode exhibit greater optical power, because retarded electrons are deflected to a greater extent by the electric field. Due to this fact, such lenses are more common in practice. It should be noted that there is a limit to the potential decrease on the central electrode, after which the lens becomes a mirror; this happens at V2 slightly smaller than zero. The cardinal elements of a symmetric einzel lens are arranged symmetrically relative to its center because of the equal potentials in object and image space. The principal planes are crossed, as in all electron lenses, shifting away from the midplane with increasing lens power. Round einzel lenses have been studied by many researchers, so one can find detailed information in the literature about their focusing and aberration properties. We shall first discuss lenses with cylindrical electrodes. Their fields have been calculated by different methods-from numerical to analytical. Some of these methods were briefly described earlier: the integral equation method and the semianalytical method using a cubic approximation to the potential distribution in the gap between the electrodes (Bonjour, 1979b). In the work of AniEin et al. (1976), the field of a three-electrode lens was calculated by the separation of variables for a linear approximation of the potential between the electrodes. For the axial potential distribution at 2 > 0 (the reference point in the lens center), the following simple formula was obtained, which is similar to the one given by Read et al. (1971):
O(Z)
=
v, +
~
2
-
1.3189
cos h 2.6362 + cos h 2.636(g + 4 2 ) cos h 2.6362 + cos h 1.318d
All the linear dimensions are given in units of cylinder radius. A comparison of &z) calculated by formula (249) and measured experimentally in the electrolytic tank showed a good agreement of the values. Numerical integration of the paraxial equations of trajectories on the basis of the axial potential distribution given by formula (249) was used to find the cardinal elements of a three-electrode einzel lens (CiriE et al., 1976a). A study was made of the dependence of the first-order properties on the central electrode length (d/D = 0.05 - 0.5) for two gap values ( g / D = 0; 0.1) and on the electrode potential ratio. A good agreement between the theoretical and experimental data was obtained. A detailed investigation of einzel lenses formed by three thin cylinders of equal diameter was carried out by Harting and Read (1976). The cardinal elements and spherical aberration were calculated for a wide potential range on the central electrode (V2/V1 = -0.6-20) for two central electrode lengths (d + y = 0.5D and 1.OD) and two gap values (g = 0 and 0.1D).
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 103
Figure 32 illustrates the paraxial and aberration characteristics of an einzel lens. It shows the curves relating the positions of the object P and image Q in the lens with d g = 0.5D and g = 0.ID [Fig. 32(a)], as well as its spherical aberration as a function of linear magnification [Fig. 32(b)] for retarding potentials of the central electrode ( V2/V, < 1). Similar curves are presented in Fig. 33 for accelerating potentials (VJV, > 1). The cardinal elements and third-order aberrations were calculated by Saito and Sovers (1977) for a three-cylinder lens, in which the field was found by numerical solution of the Laplace equation using the relaxation method. A detailed study was made of the dependence of the lens parameters on the central electrode length, which was taken in a wide range of values. It was shown that the optical power varied with the central electrode length nonmonotonically, reaching the maximum at about d / D z 1.0. At d / D 2 2.0 an einzel lens can be considered as the sum of two independent immersion lenses, with sufficient accuracy. The spherical aberration of an einzel lens decreases with increasing sum of the central electrode lengths and gap, relative to the diameter: (d + g ) / D if the ratio .f/D is fixed. It also decreases as the lens diameter becomes larger, if .f and (d + y)/D remain constant. The optical properties of this type of lens are largely determined by the distance between the gap centers and depend very little on the size of the gaps themselves (Bonjour, 1979b). In some publications (see, for example, Saito and Sovers, 1977) simple approximating functions have been suggested for the cardinal elements and geometrical aberration coefficients, which permit calculations of high accuracy. The functions depend on two variables: the potential ratio V2/V, and the relative distance between the gap centers (d + g ) / D . The constants of these functions are tabulated. An experimental investigation of a three-tube lens used as a collimating and focusing lens in an electrostatic prism /J-spectrometer is described by Bobykin et al. (1976). The central electrode diameter D , was larger than the diameters of the two end electrodes ( D , = 1.17 Di). The measurements made by the shadow projection method are presented in the form of a family of curves characterizing the dependence of the focal lengths and coordinates of the foci on the central electrode potential. Of interest is the study by Berger and Baril (1982) in which the optical power and spherical and chromatic aberrations were investigated as a function of the electrode potential ratio V2/V1 in an einzel lens. It was shown that the aberration coefficients, as well as the focal length, were minimum approximately at the same value of the VJV, ratio; these authors considered a parallel beam entering the lens. The work of Brunt and Read (1975) was concerned with the study of
+
L. A. BARANOVA A N D S. YA. YAVOR
104
50,O
10.0
M
-I
v~/v,=0,4-0,2 O@+J - ----O,1-.. ---I--
“’”‘‘I
’
“““‘I
&Q ;
- - ->;-:-=;:-==- ---
10‘
lau
lo-’
’
1
..
I1111
1
, ,,
,1111
1
1
I
, UJ
, ,
1
,,,,,I
b
. , ,,,,,,
1
-,
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 105
..
1u-'
roo
10
'
1 .
" 1
I
7
'
1
1 10'
loJ
lot
MCIU
FIG.33. The same as in Fig. 32, but for a lens with intermediate acceleration.
106
L. A. BARANOVA A N D S. YA. YAVOR
geometrical aberrations. Rather approximate formulae were given for aberration estimation of a finite object using the spherical aberration coefficient. Einzel lenses consisting of three plates with circular apertures are applied in such electron optical designs where the focusing system length is limited. The calculation of such lenses is more sophisticated than that of lenses made up of cylinder electrodes because of the complexity of the field estimations. The focusing properties and aberrations in these lenses are primarily determined by the following geometrical parameters: the central electrode thickness, the distance between the electrodes, and the aperture diameter of the central electrode. A n einzel lens with a thick central electrode and modified end electrodes is illustrated in Fig. 21 (b). A vast amount of factual information on three-aperture einzel lenses was presented by Hanszen and Lauer (1967). Now we shall discuss in more detail some of the later investigations. The axial potential distribution, cardinal elements, and spherical aberration coefficients were calculated by Read (1969) for 16 modifications of the einzel lens formed by three apertures with different interelectrode distances and aperture diameters. The potential distribution was found by the method of separation of variables. The boundary conditions were given on an infinite cylinder with a radius larger than the aperture radii and the distance between the apertures. On the left and on the right of the lens, the cylinder has the potential equal to that on the end electrodes (Vl); inside the lens there is the linear potential distribution on the cylinder surface. The boundary conditions are satisfied by the collocation method. The calculation error in this work seems to be a few percent. Data for practical use in designing three-electrode lenses with a thick central electrode are given in the work of Shimizu and Kawakatsu (1974).This work considers the modes typical for systems forming electron and ion probes with relatively large focal lengths. The potential distribution was calculated using the relaxation method. The potential distribution between the electrodes was taken to be linear at a distance from the axis of r = 2R (where R is the aperture radius identical for all the three electrodes). It was assumed also that outside the lens the potential was equal to that on the external electrode on the surfaces z = 1.85R and r = 2R. The consideration was restricted to the retarding central electrode, because this provides greater optical power of the lenses than the intermediate acceleration and is more commonly used for highenergy beam focusing. Symmetrical lenses were studied at different thicknesses of the central electrode T and different interelectrode distance h. Since the thickness variation of the external electrodes slightly affects the optical properties, it was taken to be 0.5R. The inner electrode edges were chamfered at 45' at a distance of 0.1R from the edge. At the central electrode, thickness T I 0.3R, its shape was taken to be rectangular. The study gives universal curves for the dependence of the focal lengths,
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 107
Fici. 34. Universal curves for an einzel lens with a thick intermediate electrode.
the principal plane positions, and the spherical and chromatic aberration coefficients, relative to a certain parameter tl, on the value of I<( Vl - V,)/V, ; this is shown in Fig. 34. The parameter d describes the half width of the axial potential distribution at a height 1/ e . Figure 35(a) shows the d values as a function of the central electrode thickness Tfor several values of the interelectrode distance h. The K values are defined from K = [V, - #(0)]/(V1 - V2); they are given in Fig. 35(b) for various lens geometries (here &O) is the potential in the lens center). The break of the curves at T / R = 0.3 is due to the variation in the central electrode shape, which was assumed in the calculations. Thus, the introduction of two parameters to describe the dependence of the potential distribution on the lens geometry has allowed four optical characteristics of a wide class lenses to be presented in the form of four unified curves. The error is as small as a few percent, which is acceptable for calculations of probe systems. (The error grows in the range of small focal lengths.) Figure 36 shows a typical electrostatic lens used to form ion probes. The basic parameters of the lens are: R = T = h = 0.5 cm; 1' = 1.2 cm; and the aberration coefficients C = 12 cm, C, = 5.4 cm.
108
L. A. BARANOVA A N D S. YA. YAVOR
0 .//,:I 1J 1,S L.(l FIG.35. Universal parameters dand K for an einzel lens with a thick intermediate electrode.
05i;m FIG.36. The design of a high-voltage einzel lens: (1,3) low voltage electrodes; (2) highvoltage electrode; (4) insulator.
THE OPTICS OF ROUND A N D MULTIPOLE ELECTROSTATIC LENSES
109
Due to the complexity of field calculations in the lenses in question, some researchers have used approximate formulae for the axial potential distribution, from which the optical properties of the lenses can be derived (Kanaya and Baba, 1978; Yamazaki, 1979). In the work of Kanaya and Baba 1978), an approximate function was introduced for the axial potential distribution in a three-aperture lens:
This function allows the paraxial trajectories to be expressed in terms of hypergeometrical functions and the basic optical characteristics to be presented analytically: the cardinal elements and the spherical and chromatic aberration coefficients. The constants a, k , , and m included in the field approximating function depend on the lens geometry and can be determined from the experimental data. The calculations were presented in the form of numerous plots showing the optical properties as a direct function of the lens geometry; the plots are convenient for use. The plot analysis yielded the parameters of a lens with great optical power and small aberrations. A thorough investigation of three electrode einzel lenses was given by Szilagyi and Szep (1987). General conclusions were drawn for the axial potential distribution and electrode shapes that have small aberration coefficients. We have so far considered symmetric einzel lenses. It is possible to optimize optical power and aberrations by creating asymmetric designs, for example, by placing the end electrodes at different distances from the central electrode. The idea of an asymmetrical lens was first realized in glass optics, where it was shown that the objective spherical aberration was reduced if the surface of a lens of larger curvature was placed on the side of the object. By analogy, one may expect the spherical aberration of an einzel electron lens (operating in the mode M > 1) to be reduced if the lens is made asymmetrical and the beam is incident on the side of the larger field gradient (that is, on the side of the minimum distance between the end and central electrodes). In probe systems operating in the mode M < 1, the beam must enter the lens on the side of the minimum field strength for the spherical aberration to be small. Asymmetrical lenses are discussed in several publications (see, for instance, Hanzen and Lauer, 1967; Grivet, 1972) in which the spherical aberration is shown theoretically and experimentally to be several times smaller than in symmetric lenses. Der-Shvarts and Makarova (1966, 1969) gave a detailed analysis of spherical aberrations in asymmetrical probe-forming einzel lenses with an
110
L. A. BARANOVA AND S. YA. YAVOR
C/{ 10.
v,
K
lif
0
I
1
0
1.1,
L
1
1
.7,2
-
1
4.8
,
I
6,4
I,/
FIG.37. Diagram of an asymmetric einzel lens (a); its spherical aberration (b): curves (1) h , / D = 0.41. (2) 2.1, (3) 3.3. Vdues ~f M = 10; h 2 / D = 0.41; T/D = 1.0.
intricate configuration of the central electrode. One of these designs is illustrated in Fig. 37(a). Figure 37(b) gives the spherical aberration coefficient of the lens for three values of the interelectrode distance h , at a constant h , as a function of the distance I between the image and the lens exit. The figure also demonstrates the relation between the image position and the potential ratio on the electrodes. The round lens shown in Fig. 37(a) was studied in the work Orloff and Swanson (1978) from the point of view of its application in ion microprobe systems with a field-emission cathode. The lens was found to possess a small chromatic aberration, which is of importance because field-emission ion sources are characterized by a great energy spread. This lens has also been used in the immersion modes and in the presence of acceleration between its first electrode and the ion source. An asymmetrical einzel lens of a complex configuration with the central electrode formed by two cylinders of different diameters was analyzed by Saito et al. (1979). The lens was shown to possess small spherical aberration. The effect of axial symmetry violation in an einzel lens consisting of three apertures was described by Balandin et al. (1977), together with the tolerance calculation. G . Multielectrode Immersion Lenses Multielectrode immersion lenses are generally used in problems that require a considerable change of the particle energy (acceleration or
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 1 1 1
retardation by several orders). This is largely due to two reasons: to the technical difficulties arising from the application of a large potential difference between two electrodes and to the need to reduce the optical power of the system to avoid overfocusing. The latter can be achieved by distributing the accelerating potential between several electrodes. The axial potential distribution in this case becomes extended, which reduces its derivatives. As a result, the optical power and aberration level decrease. If the beam energy variation is not very great, three-electrode immersion lenses are preferable, because they have some advantages over two-electrode lenses-greater flexibility of the optical properties and smaller aberrations. Three-electrode immersion lenses have been analyzed in several studies; in some of them the analysis also included einzel lenses (for instance, in Adams and Read, 1972; Brunt and Read, 1975; Bobykin et al., 1976; Orloff and Swanson, 1978; Bonjour, 1979b). Multielectrode immersion lenses, as a rule, consist of cylindrical electrodes. The study by Glikman et al., (l973a) concerned lens systems made up of tube electrodes of equal diameters and lengths with infinitessemal gaps between them. Two modifications were analyzed: (1) when the potential ratio on the neighboring electrodes was constant and (2) when the potential difference between the neighboring electrodes was constant. The potential distribution of the system was determined as the superposition of analytically found potentials of twoelectrode immersion lenses. The positions of the foci, the focal lengths, and spherical aberration coefficients were found by numerical integration. I t was shown that a larger number of electrodes for given potential values on the end electrodes sharply reduced the optical power and spherical aberration. A larger number of electrodes at a constant optical power also reduced spherical aberration. I t was concluded from the data comparison that, all other things being equal, the optical power and spherical aberration were somewhat greater in lenses with constant potential difference between the neighboring electrodes. Fink and Kisker (1980) calculated a system of 36 cylindrical electrodes accelerating electrons from 5 to 255 kV, which was designed to form the primary electron beam in an energy-loss spectrometer. The potential distribution in the system was determined analytically by summing up the potentials of two-electrode immersion lenses. The trajectories were found by numerical integration of the paraxial equation. The program permitted optimization of the angular divergence and cross section of the beam. The work of Ohiwa et al. (1981) described an ion-optical system of five cylinder electrodes, which forms an ion probe that is 10 nm in diameter on the specimen to be used for microfabrication. The central electrode was cut to form two octupole deflecting elements providing the specimen scanning over the area of 0.2 x 0.2 mm2. As in the previous work, the potential distribution
112
L. A. BARANOVA A N D S. YA. YAVOR
was approximated by analytical functions to speed up the calculation of multiplet systems and to optimize their parameters. The zoom lenses we have mentioned earlier, which differ from conventional immersion lenses in their operational mode rather than in their design, are a comparatively new type of lenses. At a given position of the object and initial beam energy, a zoom lens can vary its output energy widely, leaving the image position and sometimes some other parameters (e.g., linear magnification) unchanged. This kind of problem arises in many applications, for example, when it is necessary to form a probe of varying energy. Such lenses are often used in combination with energy analyzers. By decreasing the input beam energy, they allow one to decrease the absolute resolution A& at a constant resolving power of the device R = &/A& = const. When the beam is focused at the analyzer entrance, zoom lenses allow spectra to be recorded without changing the potentials on the analyzer electrodes, for which purpose the particles of different velocities are accelerated or retarded in such a way that their energy becomes equal to the adjustment energy (for details, see Artamonov et al., 1976; Draper and Lee, 1977; Wannberg and Skollermo, 1977).One should keep in mind that the use of an immersion system in front of an analyzer changes the beam divergence and cross section in the entrance slit plane as a function of the particle acceleration or retardation according to the Helmholtz-Lagrange law, Eqs. (46) and (47). This changes the intensity of the particle beam transmitted by the spectrometer; for this reason, the focusing system used must be carefully matched to the analyzer. Zoom lenses have been studied rather thoroughly, both theoretically and experimentally (Varankin, 1974; Harting and Read, 1976; Fink and Kisker, 1980; Kisker, 1982; Heddle et al., 1982).The simplest lenses are formed from three electrodes with two independently controlled potentials V, and V3.The I/,-potential changes the output beam energy, while the V2-potential regulates the optical power so that the image position remains unchanged. If the linear magnification of the system must be kept constant, the lens must consist of at least four electrodes. Figure 38 presents curves to demonstrate the relationship between the electrode potentials of a typical zoom lens consisting of three cylinders with a central electrode length D and gap of 0.1D. The curves represent closed figures, which means that the given acceleration and image and object positions correspond to two operational modes: In one mode the particles are accelerated by the central electrode (V,/V, > l), in the other they are retarded (V2/V, < 1).
Zoom systems with constant magnification formed by four cylinders were analyzed by Fink and Kisker (1980) and by Kisker (1982). The authors described simple methods to calculate such lenses; the second work includes a program for a personal computer.
114
L. A. BARANOVA A N D S. YA. YAVOR
H . Some Applicutions of Round Lenses In this section we have described the basic electron optical properties of round lenses. Their numerous applications can be illustrated by a few examples. In recent years various techniques have been developed for materials analysis using electron and ion probes formed by special electron optical systems. Analysis of surfaces by such probes has been described in many publications (see, for example, Mc'Hugh, 1975; Levi-Setti, 1980; Cherepin, 1981 ;Cherepin and Vasil'ev, 1982). A typical ion microprobe is shown schematically in Fig. 3, and its design is described by Liebl (1967). In this device the primary beam is mass analyzed and can be focused into a point of 2 to 300 pm. The focusing is performed by a condenser lens and an objective lens, which are einzel lenses. The work of Drummond (198 1) describes eight modifications of three-electrode electrostatic lenses and their properties from the point of view of the requirements on probe systems. Problems of optimization of such lenses are discussed. A schematic diagram of a high-resolution scanning ion microscope is given in Fig. 1 (Levi-Setti, 1980). The microscope has a liquid ion source of high brightness; the probe is formed by electrostatic round lenses with reduced chromatic aberration (Orloff and Swanson, 1978). The use of ion optics in the perspective method of secondary ion mass spectroscopy (SIMS) is described by Liebl (1981). The author considers the problems of forming a primary ion probe and of increasing the efficiency of secondary ion focusing, especially by reducing the emittance due to ion acceleration. The suggestion is made to use the latter lens of the probe system to focus secondary ions. Electrostatic ion optical elements are used in transport systems for ion implantation. Matrix algebra is employed by Larson (1981) to analyze transport systems containing round and other types of lenses and deflecting elements. Nonlinear effectsdue to the lens aberrations and beam space charge are taken into account. One of the advantages of electrostatic lenses over magnetic ones in this particular application is the possibility of pulsed operation with the beam energy variation. Application of electrostatic lenses in combination with energy analyzers has been discussed in Section V1I.G because such devices often use immersion lenses (see also Ballu, 1980).We shall also cite the work of Hoof (1981), which suggests an original round lens as part of the installation for measuring the angular distribution of photoelectrons. The lens is formed by six cylindrical electrodes and matches the photoelectron beam with the cylindrical mirror analyzer. By electrical readjustment only, this lens permits electrons with incident angles between 20" and 70" to be focused consecutively onto the analyzer entrance slit, keeping the exit trajectory inclination constant at about 42".
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 115
Electron lithography is a new and rapidly developing field where electron probes are applied. Here round lenses, but more often magnetic lenses, are employed because of high-energy electrons. A good review of the problem is given by Pease (1981).
VIII. QUADRUPOLE LENSES Quadrupole lenses are the second important class of electron lenses (for extensive references to earlier works, see Yavor, 1968; Hawkes, 1970). They came into the history of electron optics primarily as magnetic lenses and became indispensable in problems of converging and transporting highenergy particle beams, especially in acceleration devices, where the optical power of round magnetic lenses is insufficient. Similarly, electrostatic round lenses are replaced by electrostatic quadrupoles when the focusing power should be increased. Moreover, quadrupole lenses are used in numerous electron optical systems when astigmatic focusing is required. Such lenses have been considered in a large number of publications containing the theory of the lenses, their designs, and various applications. The field of a standard quadrupole lens possesses two perpendicular planes of symmetry and two planes of antisymmetry. Such a field can be created by four identical electrodes located symmetrically relative to the axis, to which alternating potentials & V are applied (Fig. 39). The axial potential is zero. Note that in this chapter, unlike the foregoing chapters, all the potentials are measured from the axial potential as the reference point, as is generally accepted in quadrupole optics. Figure 39 shows that the direction of the electrostatic field changes by -90" when the azimuthal angle changes by 90". Hence, if the lens converges the particles in the x0z-plane, then in the yOzplane it diverges them; in other words, the quadrupole operates like an astigmatic lens. The electrodes in a quadrupole are parallel to the z-axis and, as a rule, are rather long in this direction. Inside the lens, rather far from its edges, the longitudinal field component is close to zero, so the field is predominantly transverse. This is another difference between a quadrupole and a round lens (whose basic field component is longitudinal) resulting in a higher optical power. It should be emphasized that the longer the quadrupole, the greater its focusing effect on a particle and, therefore, the lens optical power grows with its length. Since in such lenses the axial potential is constant, they are in fact einzel lenses and are not used to vary the particle energy. The field distribution in quadrupole lenses and methods for its linearization will be discussed in Section VIILA, together with approximations for the
116
L. A. B A R A N O V A A N D S. YA. YAVOR
by,
FIG. 39. The field of a quadrupole lens: lines of force (solid) and equipotential lines (dashed).
potential distribution along the lens axis. The focusing properties of quadrupoles are analyzed in Section VIII.B, primarily using the rectangular field model. Since quadrupoles are generally used in the form of systems of lenses, the properties of various systems are discussed in Section VII1.C. The last two sections (VII1.Dand VII1.E) describe aberrations in quadrupoles and the problem of aberration coefficient signs. A . Fields of’ Quadrupole Lenses
The potential distribution in a quadrupole lens can be obtained from Eq. (33) by introducing the designation $i(z) = Vki(z):
+ k,o [x6 - 15x2y2(x2- y 2 ) R + 241 ikkv x2yz(x2 R -
- y2)
+ ...
-
y6]
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 117
Here R is the aperture radius (see Fig. 39). From Eq. (250) one can see that near the axis the equipotential lines have the form of rectangular hyperbolae in any plane normal to the lens axis, no matter what the configuration of the quadrupole electrodes; the field projection onto this plane grows linearly with the distance from the axis. In this region, the absolute field value along a circle centered on the axis is constant; in the x0z-plane the field is directed towards the axis, in the y0z-plane it is directed in the opposite direction, while in the planes at 45” to the coordinate planes it is tangent to the circle. The potential distribution in Eq. (250)holds for quadrupoles with different electrode configurations. In the central part of lenses whose lengths are several times greater than their apertures, the field practically does not vary with the z-coordinate and can be considered as two dimensional. In this case the potential distribution has a simpler form
Here, there are no terms containing the fourth coordinate powers and some others. The coefficients Ki are determined by the electrode shape. If the reference point is placed in the quadrupole center, one can see, by comparing Eqs. (250) and (251), that for a sufficiently great length k,(O) = Ki.The potential distribution along the z-axis is plateau shaped, and the optical properties are largely determined by the central part, depending little on its edges. Since the calculation of a three-dimensional field of a quadrupole is complex, it is usually calculated in the two-dimensional approximation and its distribution along the z-axis is approximated by some model. The potential distribution will be described completely by the first term of expansion (251) if the lens electrodes are infinite and if their profiles in the cross section that are normal to the z-axis have the form of rectangular hyperbolae; then K , = I . Such a field is called linear because it grows linearly with the distance from the axis along the whole aperture. The field gradient is constant, so such lenses are also known as constant gradient lenses. In practice, of course, a lens cannot be infinite in the longitudinal or radial direction; therefore, its field is not linear. Commonly used electrodes (Fig. 40) have profiles in the form of a hyperbolic segment, a circle or its segment with the convex side facing the beam (cylindrical electrodes), a straight line (planar electrodes), or a circular segment with the concave side facing the beam (concave electrodes). The smaller area of field linearity in such lenses is revealed by the fact that the coefficient k6(z) and/or the coefficients of the higher order terms are nonzero. By varying the electrode shape, one can simplify the technology and adjustment of the electrodes, thus making lens production cheaper and
118
L. A. BARANOVA AND S. YA. YAVOR
-Y
a
I b
-v
d
P
I -v
FIG.40. Quadrupole lenses with different electrode profiles: (a) hyperbolic; (b) cylindrical; (c) concave; (d) planar.
reducing the “mechanical” aberrations. Moreover, there is the problem of decreasing the radial dimensions of a lens. Let us consider from this point of view the electrodes shown in Fig. 40. The use of hyperbolic electrodes may help considerably in increasing the field linearity if the radial dimensions of a lens are assumed to be so large that the ends of the hyperbolae would be situated far from the axis. To avoid electrical breakdown on the electrodes, the potential difference should not be too large. We should not forget that the production and adjustment of such electrodes are not easy matters. Electrodes commonly have the form of round cylinders or their segments [Fig. 40(b)], the production of which is simpler, although the problem of
THE OPTICS O F ROUND AND MULTIPOLE ELECTROSTATIC LENSES 119
adjustment still remains. The problem is to find the cylinder radius r that provides maximum field linearity. It can be solved using conformal transformations, because the central part of the field in standard quadrupoles is practically two dimensional. Field calculations for cylindrical electrodes have shown the optimum radius at which K , z 1.0, K , = 0 to be equal to r x I . 15 R. This value depends only slightly on the coordinates of the profile end points if they lie far enough from the axis. Thus, the radius r is somewhat larger than the minimum hyperbolic curvature radius R. The relative field nonlinearity in the plane x = y, when the distance from the axis is less than 0.8 R. is about 2 x lop3 (Shukeilo, 1959). (These data have been obtained for a magnetic lens with infinite magnetic permeability and, therefore, can be extended to electrostatic lenses). The same value of r was obtained experimentally. Concave electrode lenses [Fig. 4O(c)] have a larger aperture radius with the same transverse dimensions; their production technology is a little simpler. In the two-dimensional approximation the potential distribution can be found by separation of the variables on the assumption that the potential changes linearly in the gaps between the electrodes. In cylindrical coordinates the potential distribution takes the form
where 28 is the angular gap between the electrodes. The coefficients of the series depend on (5' and for the first two harmonics have the form K2
=;
4 sin26 26 '
K6 -
4 sin66 37c 6 6 .
(253)
I t is evident that when we pass to the Cartesian coordinates [see Eq. (25111, the coefficient values remain the same. For small R we have K , = 1.273, K , = - 0.424. These results are consistent with experimental data. Field nonlinearity in concave electrode lenses can be considerably decreased by increasing the number of cylindrical segments and by applying the corresponding potentials to them. In the general case, when the electrodes form a closed cylindrical surface cut in such a way that two planes of geometrical symmetry are preserved, the field distribution can be written as follows (Baranova et al., 1986):
120
L. A. BARANOVA A N D S. YA. YAVOR
lY -V
P
X
+V
-V
b
FIG.41. Field linearization of a lens with concave electrodes by increasing the number of electrodes (a) and by applying additional potentials (b).
If only two values of the potential & V [Fig. 41(a)] are considered, then the coefficient a 2 ( 2 k + will take the form a 2 ( 2 k + 1)
= (-
1)'
+2
1
i=O
cos[2(2k
(-
+
1)$i]9
(255)
where 1 = (m - 4)/8, m = 4, 12,20,. . . is the number of electrodes in the lens, and Il/i are the angular dimensions of the additional electrodes. The condition of field linearity reduces to making the coefficients u Z ( 2 k1+) of harmonics higher than the second one vanish. In a 12-electrode system, only the coefficient of one harmonic, a b , can be reduced to zero, for which case the following condition must be satisfied: cos 6$1
1 2
= -.
(256)
Hence, $, = 10". For better field linearization it is necessary to increase the number of electrodes. A comparison of these results and data on linearization of a deflecting field obtained in the same way (deflectron-Bonshtedt and Markovich, 1967) shows that the optimal angular dimensions of the additional electrodes in a focusing system are twice as small as in a deflecting system. Linearization can be altered by varying not only the angular dimensions of the electrodes, but also their potentials. The number of electrodes in this case may be reduced; however, the lens power supply becomes more complex. If m = 12, we have u2(2k+l)
= ( l - n,)cos[2(2k
+ 1)11/11+ n l ,
(257)
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 121
where n , = Vl/V.The potential & V, replaces the potential & V on additional electrodes. The cancellation of the sixth harmonic is possible when
that is, the cancellation can be achieved for a wide range of angles and potential ratios. This range is limited by the inequalities V , / V I 0.5 and I 30". Of interest is the case when V, = 0; it is then possible to combine eight additional electrodes in pairs to form an eight-electrode lens with t,hl = 30" [Fig. 41(b)]. Such a system has a smaller number of electrodes and applied potentials, and it can be used for simultaneous correction of third-order spherical aberrations. High relative aperture and simple design can be provided by a quadrupole with planar electrodes [Fig. 40(d)]. The potential distribution of such a lens calculated in the two-dimensional approximation for an infinitesimal interelectrode gap is described by Lebedev et al. (1955): q(x,y) = - V + -
8V n
c a:
k=O
(-1)
k~~~
+
h[(2k + 1)nx/2R] cos[(2k l)ny/2R] l)n/2] (2k I)cosh[(2k
+
+
This yields the coefficients: K , 5 1,094and K , x - 0.108. The influence of the gap size on these coefficients was studied by Ovsyannikova and Szilagyi (1970) by measuring the field on a resistance network. The coefficient K , was shown to fall to zero when the ratio of the planar electrode width to the aperture radius was 0.9. The problem of field linearization for a quadrupole with complex polygonal electrodes was studied by Sakudo and Hayashi (1975) and by Novgorodtsev (1982). Conformal transformations with the SchwartzChristoffel integral were employed in the latter work to obtain the conditions for compensation of higher harmonics in the potential expansion; the variability of the field gradients was analyzed. Sometimes, the voltage is applied to the quadrupole electrodes asymmetrically with respect to the object space potential. For example, the potential of one pair of electrodes is equal to that of object space, while the potential of the other pair is 2V higher. Then the axial potential will be larger relative to object space by the value # ( z ) s V. Since there is a longitudinal component of the field on the axis near the lens edges, the field symmetry becomes distorted. The axial field distribution of a standard quadrupole lens slightly depends on the electrode profiles. If the electrode lengths are several times greater than R, then the rectangular model is sufficiently simple and reliable to calculate
122
L. A. BARANOVA AND S. YA. YAVOR
-4
-3
-2
-1
0
1
2
3
4 z/R
FIG.42. The dependence k , ( z ) and ellective lenglh L of a lens with concave electrodes (I
=
5R).
first- and third-order optical properties. The value k , ( z ) is approximated by a rectangle of length L known as the eflhcrioe lens length, which can be defined as follows:
It can be seen from Fig. 42 that the value k , ( z ) is practically independent of z along the whole lens length and falls sharply at the edges. The flat part of k , ( z ) decreases with decreasing l / R ratio and vanishes at 1/R _< 2. The fall-off rate of k , ( z ) at the edges is largely determined by the electrode profile. For lenses with cylindrical electrodes and total length I2 2.OR, the effective length can be represented by an approximate formula L = I + 1.06R (Kiss et al., 1970). In the work of Okayama and Kawakatsu (1978), the threedimensional potential distribution in a quadrupole lens with cylindrical electrodes was calculated by numerical solution of the Laplace equation. I t gives the values of K , = 1.003 and L = 1 + 1.165R. The effective lengths of electron optical elements with concave electrodes were found experimentally and theoretically by Koltay et al. (1972). For a quadrupole with l / R 2 3, the approximate formula L = I + 0.45R is valid. Experimental evidence shows that the rectangular model approach gives an error less than a few percent for the first-order optics of quadrupoles with I/R 2 3 and 20-400/, for the aberrations. This model is hence quite satisfactory for engineering calculations. Higher data accuracy can be achieved by replacing the rectangular model by the trapezoidal one [Fig. 43(a)]. In the fringing field regions, the function k 2 ( z ) is replaced by inclined lines, which serve as the lateral sides of the
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 123
FL. 43. (a) Trapezoid approximation of lield distribution of a quadrupole lens in the axial region: (b)composed bell-shaped approximation.
trapezoid so that the area enclosed by the real curve would be equal to the trapezoid area. The expression for k , ( z ) at the edges has the form
where the upper signs refer to the left part of the field and the lower signs to the right part. This model well describes the field of a lens with grounded plates at the edges. The dependence k , ( z ) is more exactly approximated by the composite bell-shaped model [Fig. 43(b)], in which the value of k 2 ( z )in the central part is constant, but at the edges it is given by the function
where z,, z k , and d are constants selected to give the optimal approximation of the real dependence. The plus sign corresponds to that part of the curve that describes the field drop at the entrance to the lens (-a< z 5 -zJ; the minus sign corresponds to that at the exit (zk 5 z < m). In the region - z k < z < zk the relation k 2 ( z )= K , is used. If the mechanical length of the electrode is great enough (1 > 2.2R), the composite bell-shaped model well describes the potential distribution in a lens with cylindrical electrodes ( r = 1.15R)for the following values of the constants: 2z, = 1 - 1.1R, d = 1.44R (Okayama and Kawakatsu, 1978). If the electrode lengths are equal to or smaller than the aperture radius, the value of k 2 ( 2 ) is well described by expression (262), in which we should set zk = 0 (the bell-shaped field model). The above models for the field variation along the z-axis have the advantage of permiting the exact solution of the paraxial equations and allowing us to find the aberrations analytically.
124
L. A. BARANOVA AND S. YA. YAVOR
B. The Paraxial Properties of Quadrupoles
For a lens with infinite hyperbolic electrodes, it is possible to find the exact solution of the nonrelativistic equations of motion if its field is taken to be two dimensional. From Eq. ( 5 ) , with H = 0, we have Y.
+ 0 2 x = 0,
j ; - ro2y = 0,
(263)
i'= 0. where
The solutions of these equations are x = x g cos wt
+
(2)
sin wt,
y=y,coshwt+ z = zg
(265)
+ i,t.
Here the subscript 0 means the initial values of the coordinates and velocities of a particle at the time t = 0. Expressions (265) exactly describe the motion of a charged particle if the object and the image are in the lens field, sufficiently far from the edges. The case in which the object and image are outside the field is of interest. Here we can use expressions (265) if we assume that the field distribution along the z-axis is well approximated by the rectangular model, and if we know how the sharp potential change at the entrance to the lens and at its exit affects the particle trajectory. The problem is to exactly determine the particle coordinate and velocity on either side at the entrance and at the exit, The calculations show that a sufficiently high accuracy can be obtained based on the assumption that the coordinates at the boundary do not change; the transverse velocity component also remains constant, while the longitudinal component changes abruptly at the boundary in accordance with the potential change, 1 1 -mu: =-mu: 2 2
+ eAq(x,y).
Here u1 is the longitudinal velocity component on the left of the boundary, u2
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 125
on the right of it; AQ(.x, y ) is the potential change on the boundary at the point of its intersection with the trajectory. When calculating the paraxial properties of a lens, the change of the longitudinal velocity component can be neglected, since the potential Q(.x, y ) near the axis is proportional to the square of the transverse coordinates, so that the abrupt potential change contributes only to the aberrations. I t was pointed out earlier that a standard way of studying the optical properties of lenses is to consider consecutively the first-order optics and aberrations. The paraxial equations of a trajectory in a quadrupole in the general case have the form [see Eqs. (35) and (38)]: x”
+ pzu(Z)x = 0,
y” - p u ( z ) y = 0.
The lens excitation for small velocities is
but in the relativistic case it has the form
8’ = - 2 V K 2 ( m c 2- ed)
(PR2(2mc2- up)‘
Here $ is the accelerating voltage corresponding to the initial particle energy; the function u(z) describes the normalized potential distribution
A specific feature of Eqs. (267) compared to the paraxial equations for a round lens is the absence of derivatives of the axial potential distribution, which considerably simplifies the solution. Note that the paraxial trajectory equations for a magnetic quadrupole have the same form as equations (267) but differ only in the expression for p2. Therefore, in the description of firstorder optical properties of an electrostatic lens, we can utilize the data obtained for a magnetic lens but must replace the above coefficient. (The function k 2 ( z )is assumed to be identical for both lenses.) The paraxial equations have the simplest solutions when the rectangular model is used. Within the lens the value u(z) is considered to be constant and equal to unity, outside the lens it turns to zero. The projections of the paraxial trajectory are described by a set of equations (267) that transform into
126
L. A. BARANOVA A N D S. YA. YAVOR
FIG. 44. The cardinal elements of a quadrupole lens in image space.
differential equations with constant coefficients. Their solution yields x = x, cos[P(z - zl)]
y = y , cosh[/?(z - z,)]
+
(“3 +
sin[fi(z - z,)],
-
(3 -
(27 1)
sin h[P(z - zl)].
The subscript 1 denotes the coordinates and their derivatives at theentrance to the lens. It follows from Eqs. (271)that the trajectory projection of a positive ion on the xOz-plane has an oscillatory character: A t a sufficiently great quadrupole length, and at certain values of z it falls to zero; the projection onto the yOzplane grows monotonically with growing z. Figure 44 shows these projections for a trajectory parallel to the z-axis at the entrance, together with the position of the cardinal elements of image space for both planes. One can see that in the plane z(&) the parallel beam converges to a line in the y0z-plane. From Eqs. (271), (49), and (50) we obtain expressions for the cardinal elements of a quadrupole (rectangular model) 1
z(&)
1
L
= - Z(F0,) = -
Z(HiX)= -z(H,,)
2
L 2
=-
+ cotlJL ___ , P
z(&) = -z(Fo,)
+ tan(pL/2) , P
L
=--
Z(Hi,) = -z(H,,)
cothBL ~
2
L 2
=--
P
’
(273)
tan h(fiLI2)
P (274)
The origin of the positions of the foci and principal planes lies in the lens center.
T H E OPTICS OF R O U N D AND MULTiPOLE ELECTROSTATIC LENSES
127
It is clear from Eqs. (272) that the focal length in the u0z-plane at 0 < /?L < 71 is positive, so the lens converges the charged particles. (At large PL, the xprojection of the trajectory intersects the axis inside the lens). In the y0z-plane, the focal length is always negative, which means that the focus is virtual and corresponds to a divergent action. The focal lengths in object and image space are equal to each other; the cardinal points are symmetrical relative to the lens center [see Eqs. (273) and (274)l. In the weak lens approximation (PL << I), we have (275) z(H,,) = z(H;.,) = z ( H , , ) = z(H,,) = 0.
(276)
I t is clear that the focal lengths of a weak quadrupole are equal and that the principal planes coincide in the lens center. This approximation gives good accuracy at PL I 0.5. Since the optical power in the converging plane is determined by the trigonometric sine, but in the diverging plane it is determined by the hyperbolic sine, the defocusing becomes stronger than the focusing as the value of BL becomes larger. It follows from Eqs. (275) and (270) that in a first approximation the lens optical power is proportional to the coefficient K,, and so, among the quadrupoles considered, a lens with concave electrodes and small gaps between them possesses the highest optical power. In order to determine the magnification of a quadrupole, we shall write down the emergent trajectory of a charged particle at a point on the axis at a distance u from the entrance with the inclinations xb = x i = 1 and J,( = v', = 1. Then from Eq. (271) we have
x,
=
acos[/?(z - zl)]
1 . + -sin[B(z
B
y , = acosh[B(z - z , ) ]
I . + -sin
B
-
z,)],
(277) h[B(z
-
z,)].
The values of the inclinations at the exit and, therefore, in image space, are equal to xj, = cos /?L - Basin PL,
y;
= coshfiL
+ Busin h/IL.
Because the linear magnifications in a quadrupole are inversely proportional to the angular magnifications, we obtain
128
L. A. BARANOVA A N D S. YA. YAVOR
Formulae (272)-(279) are sufficiently simple and exact to permit all the basic calculations of quadrupole paraxial optics to be made. In order to find the trajectory in the trapezoidal or composite bell-shaped models, the field is divided into three regions: the central region with k,(z) = K , and two edge regions. The paraxial equations are solved for each region individually, then the solutions are joined. The continuity of the trajectory and inclination must be guaranteed at the boundaries between the regions. The work of Okayama and Kawakatsu (1978) gives a detailed calculation of the optical properties of a quadrupole lens with cylindrical electrodes on the basis of the composite bell-shaped model. We shall also present formulae for the cardinal elements of a thin quadrupole with an arbitrary field distribution. For the focal lengths, after a single integration of Eqs. (267), using Eq. (50), we obtain 1
1
.fox
fix
-_ - - -_
- -1 - -foy
1
.fi,
=
p2
/Ia
u(z)dz = p 2 L .
(280)
The positions of the principal planes of a thin quadrupole lens coincide with its center: z(H0,) = Z(HiX)= z(H,,) = Z(Hiy)= 0.
(281)
C . Quadrupole Systems Quadrupoles are very rarely used separately because of their large astigmatism. Even when a beam going out from a point is to be focused in a line, one quadrupole produces too long a line. For this reason, quadrupoles are combined in doublets, triplets, or larger systems. In this case, the lenses are arranged in such a way that their planes of symmetry coincide, and the electrode polarity provides alternation of the converging and diverging planes. Such systems may be convergent in all directions: They may not only compress charged particles, but also provide point-to-point focusing (stigmatic systems). This principle underlies the various applications of quadrupole systems for the focusing of charged particle beams. The possibility of designing a converging system by combining a diverging and converging lens has been discussed in Section 1II.D with reference to two thin lenses. Stigmatic systems may possess different magnifications in the XOZ- and y0z-planes (Mx# My),or they may have the same magnification (M,= My).A system of four or more quadrupoles may possess first-order optical properties similar to those of a round lens. The calculations of quadrupole multiplets can be conveniently made using the matrix method described in Section 1II.D. Matrices (65), which permit
THE OPTICS OF ROUND A N D MULTIPOLE ELECTROSTATIC LENSES
129
calculation of coordinates and inclinations of particles at the exit from the lens, at given coordinates and angles at the entrance, for a quadrupole in the rectangular approximation have the form
Ty=
cos h PL \PsinhjL
-sin h PL
B
coshBL
]
A weak lens is described by the following matrices:
Field approximations of any other models require substitution of corresponding linearly independent solutions in (65). The matrix of a quadrupole multiplet is calculated as the matrix product of the constituent lenses and the drift spaces between them. A simple and commonly used system is a quadrupole doublet. An example of a stigmatic doublet and the trajectory path in it is shown in Fig. 45. One can see directly from this picture that in the xOz- and y0z-planes the angular and, therefore, linear magnifications in the doublet are considerably different. The maximum deviations of the trajectory projections in these planes are also very different. Due to this, at large initial beam divergence, there is a considerable beam loss in the y0z-plane. A stigmatic doublet differs from a round lens in that the shift of the object along the axis for constant doublet parameters not only shifts the image, but also destroys its stigmatism.
FIG.45. The trajectory projection in a stigmatic quadrupole doublet.
130
L. A. B A R A N O V A A N D S. YA. YAVOR
The image position and magnification for an arbitrary system can be written in terms of the matrix elements as follows:
We will now make some conclusions about the positions of the cardinal points in a doublet that are valid for the general case, and not only for the rectangular field model. Since a doublet is asymmetrical relative to its center, then t l l # r22. Therefore, according to Eq. (69), the distance between the focus in object space and the entrance to the doublet is not equal to the distance between the exit and the focus in image space (both in the XOZ- and y0z-planes). Since the focal lengths in object and image space are the same, the principal planes are located asymmetrically relative to the doublet center. Calculations show that the shift is large and depends strongly on the lens excitation; in the planes xOz and yOz the principal planes are shifted in opposite directions. The positions of the cardinal elements in a doublet are represented schematically in Fig. 46. If a doublet is formed by lenses of the same length and of equal excitations, the distance from the front focus to the entrance in the x0z-plane is equal to the distance between the exit and the back focus in the y0z-plane. This equality remains valid if planes xOz and yOz are interchanged. It is clear that in the stigmatic mode the positions of the image in these planes coincide: gx = gy = y. By equating gx and g y obtained from Eq. (284), we can get a set of
FIG.46. The cardinal elements of a doublet
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
13 1
qlL FIG.47. Dependence of
/liLi and /l:L: on the image position in a stigmatic doublet.
two transcendental equations to determine the excitatons B: and B:, which provide the stigmatic mode for the given geometrical parameters. For the rectangular field model these equations are written as follows: 1 a +-tan/jlL,
1 g +-tanh/?,L, B2
j3,gtanh8,L2
+1
-
-
s,
1 u+-tanhplL1
1
g+-tanB2L2 B2
P1
B1atanB1L1- 1 -
/I1
/I1atanh8,L,
lj2gtanB2L2- 1
+ 1 + s.
These formulae were used by Enge ( 1 959) to calculate the excitation values of a stigmatic doublet for the positions of the object and image in the region of practical interest. Analysis was made of doublets consisting of lenses of identical lengths (L, = L, = L), with the interlens distance equal to zero (s = 0) or to the lens length (s = L). Figure 47 illustrates the relationships between the excitations of lenses in a stigmatic doublet and the image position for various object positions. This case is for s = 0; its practical realization is possible due to the fact that the effective lengths of lenses are always greater than their mechanical lengths. One should keep in mind that when the lenses are arranged too close to each other, the distribution of their fields changes, resulting in an uncontrollable change of their effective lengths. The magnifications of a stigmatic doublet are given by the formulae
M,
=
+
cosh lj2 L, p2g sinh lj2 L2 cosfilL1 - &asin,!&L,
3
My =
COSB2L2 - 82gsinB2L2 cosh8,L1 + ~ l a s i n f l l L l ' (287)
132
L. A. B A R A N O V A A N D S. YA. YAVOR
of a stigmatic doublet.
Figure 48 presents the magnifications of a doublet calculated using expressions (287); the doublet parameters are shown in Fig. 47. Here in the x0z-plane the first lens converges the charged particles, while the other lens diverges them; in the $2-plane the operation of the lenses is reversed. One can see from Fig. 48 that the value of M , is several times greater than that of M y . Comparison of various doublets operating in the stigmatic mode shows that the excitations B1 and Pz decrease with increasing distance between the lenses (at constant a and 9). With growing LJL,, the excitations of both lenses also decrease; the value of B1L, drops, while that of lj2L2increases. The value of p1Ll strongly depends on a and slightly on 9; B,Lz, on the contrary, is strongly affected by a change of g and slightly by a change of a. In the thin lens approximation with a >> s and y >> s, the focal length f of the doublet is proportional to &.The growth of s results in higher IM,I but lower IMyI. A system of three quadrupoles with alternating converging and diverging planes (triplet) is more flexible in its optical properties. For given geometry and positions of the object and image, one can vary the linear magnifications by varying the lens potentials without violating the system stigmatism (this cannot be done in a doublet). In a triplet, the trajectories in the XOZ- and yOzplanes do not differ so much as in a doublet, so there is not much difference between M , and M y in the stigmatic modes. Modes with equal magnifications are also possible ( M , = M),). However, the larger number of lenses makes the adjustment and power supply more complex, overcomplicating the calculations. We shall discuss in somedetail a symmetric triplet in which the outer lenses have the same lengths and excitations and are located at the same distance
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
133
from the inner lens. A symmetric triplet of quadrupole lenses, in which the effective length of the inner lens is twice that of the outer lenses ( L , = 2 L , = 2L, = L), has been calculated in the rectangular field approximation (Enge, 1961). Figure 49 illustrates the results for s = s, = s2 = 0 and gives Bf L: = /l:L: and (which provide the stigmatic mode) as a function of the image position; the object position is the parameter for the plots. For comparison, the illustration shows dashed curves corresponding to a doublet with the same lens lengths ( L , = L , = L ) and the distance between the lenses equal to 2s. The reference points for the positions of the object a and image g are, as before, the entrance and exit edges of the system, respectively. From a comparison of the characteristics of the doublets and triplets, one can conclude that the optical power of a triplet is less than that of a doublet. The magnifications of the above triplet are given in Fig. 50. The first and third lenses have a divergent effect in the x0z-plane. One can see that the magnifications of the triplet in the two perpendicular planes are close, so image distortion is smaller than in the doublet. In a symmetrical triplet the principal planes are located on either side at the same distance from the center. They do not shift so much as in a doublet with changing lens excitation. A symmetrical triplet can be replaced by an equivalent lens located in its center. A change of the excitations changes in the first approximation only the lens focal length, which simplifies the system operation when the parameters of the object or image must be changed. Symmetrical triplets are used primarily for focusing high-energy particles. Asymmetrical triplets also find wide application, for example, in cathode-ray tubes to increase deflection or tube sensitivity (Shkunov and Semenik, 1976).
134
L. A. BARANOVA A N D S. YA. YAVOR
MX
MY
8
G
6
3
4
i!
2
1
0
3
10 15 yl. 0 5 FIG.50. Magnifications of a stigmatic triplet.
10
15
q/l.
The simplest way to increase the sensitivity is to place a quadrupole behind the deflecting system, so that its diverging plane is superimposed on the deflection plane. On passing through the quadrupole, the deflected beam is again deflected from the axis, increasing the sensitivity. However, the spot on the screen also becomes larger, due to the higher linear magnification of the lens system, lowering the specific sensitivity of the tube. As a result, the losses of specific sensitivity become greater than the gain in the deflection. Under certain conditions, a system of lenses with converging and diverging effects increases the sensitivity without increasing the spot size. For this, at least three lenses are required: two diverging lenses and one converging lens between them. Schematically, the principle of operation of such a system is shown in Fig. 51. If we consider the operation of the first two lenses located before the deflecting element, we shall see that the principal planes are shifted
FIG. 51. Diagram of deflection increase.
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 135
to the right relative to the doublet center and, therefore, the linear magnifica~ ~ , smaller than tion of the system, equal to the relation M , = l i ~ / l becomes that of one converging (round)lens. [For simplicity, we assume that the principal planes in the doublet coincide: z ( H 0 , ) = z ( H i x )= z(HXz).] The addition of a diverging lens (between the deflection center and the screen) increases the deflection, but the system magnification increases somewhat: M , = / i 3 / / 0 3 (the principal planes of the triplet N x 3 are slightly shifted to the left; see Fig. 51). The optical powers and positions of the lenses can be selected in such a way that the linear magnification of the triplet is smaller than the magnification of one round lens. The condition for stigmatic focusing must also be satisfied. Thus, the principle of simultaneous increase of sensitivity and specific sensitivity of a tube is to make the beam deflection angle larger, using a postdeflector, and to simultaneously compensate for the larger spot produced by the postdeflector, using a system of prefocusing lenses. The larger the optical power of the diverging lenses, the larger the gain in the sensitivity, whereas the current transmittance of the system becomes smaller. The parameters of various quadrupole systems used in oscilloscopes are given by Lyubchik et al. (1971), Ovsyannikova et al. (l972), and Fishkova (1980). In recent years, the systems for increasing deflection have used crossed lenses instead of quadrupoles, because the former have a simpler production technology and possess lower aberrations (see Section XI). Higher tube sensitivity can be achieved not only by the use of diverging lenses, but also by employing converging ones with refocusing. The problem of optimization of electron optical systems containing a deflecting element and an arbitrary number of lenses was considered by Afanas’ev et al. (1 979), who found the parameters of the system providing maximum sensitivity and specific sensitivity of the device. It was shown that at the same optical power, diverging lenses provide a larger gain in sensitivity than converging lenses with refocusing. Systems of four quadrupoles (quadruplets) are used when it is necessary to increase the deflection in two perpendicular directions. Such systems have found wide application because they permit solution of a variety of electron optical problems in beam transport systems, probe-forming devices and charged particle spectrometers. We shall discuss two types of quadruplets. A symmetrical quadruplet, in which the beam between the second and third lenses is parallel to the axis, is illustrated in Fig. 52. It is made up of two identical doublets, differing only in the sequence of lenses. The first doublet transforms the diverging beam into a parallel one; the other converges it into a symmetrically located point. The total magnification of the quadruplet is negative and equals unity. The system becomes a symmetrical triplet at the zero distance between the second and third lenses. A symmetrical quadruplet is convenient for beam transport, because the distance between the doublets is
136
L. A. BARANOVA A N D S. YA. YAVOR
AX
+ 2
FIG.52. A symmetric quadruplet.
not fixed and its variation does not affect the focusing. The free space between the doublets can be used for some additional elements that do not violate the parallel character of the beam. Of great interest is another quadruplet system similar in its first-order properties to a round lens, that is, a system forming a correct electron optical image. In this case, a shift of the object leads to a shift of the image without distorting it (for constant values of the parameters of the system). It has been shown by Yavor (1962) and Dymnikov and Yavor (1963) that the minimal number of lenses necessary to solve the above problem is four. The suggested quadruplet contains two identical doublets, the sequence of lenses in the second doublet being opposite to that in the first one; the fields in the identical lenses of both doublets are rotated through 90" to each other (Fig. 53). The second feature of the quadruplet makes it different from the symmetrical quadruplet we have mentioned earlier. The properties of the quadruplet in question can be conveniently described in terms of its transfer matrix R. We shall denote the matrices of the first doublet of the system by T,, and Tlyand those of the second doublet by T,, and Tzy,respectively. Since the second doublet represents the mirror image of the first one rotated through 90", we can write G x
= Tlyr,
T2y = Tim.
(288)
The elements of the mirror matrix are related to the initial matrix by (78). Then, for the quadruplet we have
where A is the distance between the doublets. Considering that the y0z-plane of the quadruplet is the mirror image of the x0z-plane, we shall get Ry
= Rxr,
(290)
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 137
FIG.53. Diagram of a quadruplet analogous to a round lens
and, therefore,
Let us write the matrix elements of the quadruplet in terms of the matrix elements of the constituent doublets: rll = fllxf22y r12
= t12xf22y
+ f 2 1 J l Z y + 2f21xt22y? + f22Xtl2." + ;I.f22xt22yr
+ t21xt11g + ;I.f21Xt21Y, r 2 2 = t12xt21y + t2ZxLlly + 3 ' t 2 2 J 2 1 Y r21
1292)
= tllxf2ly
It follows from Eqs. (69) and (291) that the focal lengths of the quadruplet are determined by the element rZ1and are equal in the XOZ- and yOz-planes. The same expressions lead us to the conclusion that the position of the focal point in image space in the x0z-plane is determined by the matrix element r l l , while in the y0z-plane it is determined by the element r22. For the quadruplet to be similar to a round lens, its focal points must coincide such that z(Fx) = z(Fy), in the same way as its focal lengths coincide such that f x = f,. Hence, the condition rll
= r22.
(293)
138
L. A. BARANOVA AND S. YA. YAVOR
follows, which can be satisfied by suitably selecting the distance between the doublets A. Considering Eq. (292),we require
.
A=
fllxf22v
+ fZlxf12y - f12xt21y - r 2 2 x f l l g
2 0.
(2941
f22xf2ly - t21xt22y
Inequality (294) defines the range in which a quadruplet that behaves like a round lens exists. Of importance are the focal length and focal position of the quadruplet expressed in terms of the matrix elements. Substituting (294) into (2921, we obtain
z*(F) =
f21yf22,
G I ,
- t21xt22x 2
- t21y
.
Here z*(F) denotes the distance between the front focus and the system entrance or the distance from the exit to the back focus, which are identical in this case. The focal lengths of the quadruplet in object and image space also coincide. In the thin-lens approximation, the above formulae become considerably simpler. The properties of a quadruplet similar to a round lens have been calculated and studied experimentally (Yavor, 1968).This type of quadruplet has found wide application in many research centers for creating high-energy probes (Legge et al., 1982; Grime et al., 1982). More complex quadrupole systems have been calculated and designed. Among them, symmetrical systems with linear and angular magnification coefficients of f I are very common. Such lens systems with different types of symmetry have been analyzed by Kartashev et al. (1976). The following four symmetry types are considered: transfer, mirror, crossed-transfer, and crossedmirror (Fig. 54). Analysis of these systems becomes considerably simplified because the total transfer matrix of the particle trajectories in them is expressed in terms of the transfer matrix of the first half of the system. The conditions to be satisfied by the system parameters have been obtained in the thin-lens approximation. It is easier to obtain identity transformation in systems possessing transfer symmetry, which requires only two conditions to be met. The maximum number of conditions (four) is imposed on systems with mirror symmetry. In accordance with this, the simplest systems (containing a smaller number of lenses) that provide transformation with M = f I belong to the first type of symmetry. The symmetrical quadruplet with a parallel beam between the two doublets that we
THE OPTICS OF R O U N D AND MULTIPOLE ELECTROSTATIC LENSES
ty ‘I L r J
I39
+
*
F i r \* G
11
2
* 2
+F‘ I
1
A * 2
have discussed earlier (Fig. 52) belongs to the group of mirror symmetry systems. Transport of charged particle beams for long distances requires application of periodic systems made up of sequences of identical elements. Here the particle trajectories have an oscillating periodicity. The fundamental question in the theory of periodic systems is the condition for the stability of trajectories (Kapchinsky, 1966). The calculation of quadrupole systems using matrix transformation of the coordinates in phase space has been described by Bonner et al. (1979). D . Geometricul Aberrutions of’ Quudrupole Lenses
Calculation of geometrical aberrations of quadrupoles, like those of any other type of lenses, makes it necessary to include in trajectory equations terms containing transverse coordinates and inclinations of higher orders than the first order. The trajectory equations do not contain terms of even degree in these quantities. The inclusion of the third power of small values x, x’, y, and y’ gives primary third-order geometrical aberrations.
140
L. A. BARANOVA A N D S. YA. YAVOR
The third-order geometrical aberrations in quadrupoles can be determined from the trajectory equations (85) if we set $(z) = const and $4(z) = 0. Then x”
+ P2u(z)x = /32
[
-u(z)x(x’z
1 + -u”(z)x3 6
1 + y q + -u’(z)x‘(x2 2
-
y2) (296)
-
P’u2(z)x(x’
-
y2)].
The corresponding equation in the y0z-plane is obtained by the transi The variables x and y in Eq. (296) are not sepposition b2 + -P2, x i=y. arated, unlike those in the paraxial equations. Some terms on the right-hand side of Eq. (296) are due to the variation of the longitudinal velocity component in the lens; others are associated with the longitudinal field component. The geometrical aberrations are found from Eq. (296) and from the analogous equation for the y-projection of the trajectory by the perturbation method. This procedure has been described in detail in Section 1V.A. The aberration calculation is made using formula (89); f(s, xo, xB,yo, yB) is replaced by the right-hand side of Eq. (296), into which independent solutions of the paraxial equation are then substituted. It is clear from Eq. (296) that the function f contains the potential distribution derivatives u ’ ( [ ) and u”(i), which may introduce a significant error in the calculations of aberration coefficients if the fields are determined numerically or experimentally. For this reason, the expressions for the coefficients are commonly transformed by integration by parts so as to exclude these derivatives. The aberration blurring in the Gaussian image plane is described by expression (90)for acircular aperture. When the system does not contain a real aperture, the aberration blurring of the image is generally expressed as the sum of third-order terms relative to the coordinates xo,yo and gradients xb, yb of the trajectory in the object plane. In the general case, none of the aberration coefficients falls to zero in a quadrupole and, therefore, their total number is 20. In stigmatic modes, the number of independent aberration coefficients reduces to 16. The calculation of geometrical aberrations of quadrupoles is a tedious task. The expressions for aberration coefficients are given in integral form by Ovsyannikova and Yavor (1965), who have also found them in the thin-lens approximation and in the two-dimensional approximation. The work of Ovsyannikova and Yavor (1967) presents the calculations for the geometrical aberrations of quadrupoles in the rectangular potential distribution model. Let us discuss in some detail the spherical aberrations of quadrupole lenses, because this kind of aberration is very important in practice. The
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 141
expression for the spherical aberration of a lens system without an aperture is Ax(z;,)
=
M,[C,,.~b’
Ay(ziy) = ~
+ C,,xby;], + ~2yxb’.~b1,
(297)
y ~ c 1 y ~ b ; i
that is, it is described by four coefficients instead of the one needed for a round lens. In the stigmatic mode, C,, = C,, and only three coefficients remain independent. Since a single quadrupole lens forms a line image, the aberration blurring in it is important only in one direction (e.g., A x ) described by two coefficients C,, and Czx:
c,,
=
;[
-x:’(zi)y,(z;)y;(z,)
+3
J:
x:’yp d z
where x , and y , are the solutions of the paraxial equations satisfying the initial conditions of Eqs. (41). It follows from Eqs. (298) that the coefficient C , , is always positive, as in a round lens. This means that if the image is formed by a planar beam with yb = 0, then in standard conditions at M, < 0 the value of A x ( z i )is negative. The particles incident at a large angle to the axis are focused more strongly than the paraxial ones, intersecting the axis nearer to the lens. It is impossible to draw any conclusion about the sign of the coefficient C,, from Eq. (299); the calculation shows that it may be either positive or negative. When C,, < 0, the value Ax(z;) may vanish at some points of the line image. The expressions for the aberration coefficients (298)and (291) are written in such a form that they do not contain the potential distribution derivatives. This facilitates their use if the field is calculated numerically. In the thin-lens approximation, expressions (298) and (299) can be easily integrated using the mean-value theorem, because the integrands do not contain sign-varying functions. We obtain
142
L. A. B A R A N O V A A N D S. YA. YAVOR
Here, T is a coefficient describing the rate of the field change
=
s-+:
u2(z)dz
L
= U(Z)
+m
I 1.
(301)
u(z) dz
I t is clear from Eqs. (300) that for a thin lens the coefficient C,, is always smaller than zero. Expressions (298) and (299) for the spherical aberration of a quadrupole lens can be integrated for the rectangular model to get simple analytical formulae (Dymnikov et al., 1965). I t should be noted that the above integration by parts of the aberration coefficients takes into account the effect of both the basic lens field and the edge fields
+ -1( 1 3
-
6p2a2
+p
u )-
sin4PL 4BL
+ j7 Lu ( 1 + /Pu2)( 1
-
cos 2BL)
The spherical aberration coefficients of a quadrupole, in which the dependence of the field distribution on z is approximated by a rectangle, have been calculated for a wide range of parameter variation and presented in the form of plots by Dymnikov et al. ( I 966). The expressions for quadrupole spherical aberrations can also be integrated by approximating the axial field distribution by the bell or triangle. Careful measurements of the spherical aberration in a single quadrupole have been made by Okayama and Kawakatsu (1983).It is shown that this lens could form fine-line images of submicron width. Since quadrupoles are commonly used in systems of several lenses, it is worth considering the question of aberration summation. The coefficients of spherical aberration are summed up according to formulae (97) and similar formulae for the y0z-plane. It should be remembered, however, that a quadrupole is an astigmatic lens creating a virtual image in one of the planes. In system designs, converging and diverging planes of single lenses alternate, and it is therefore necessary to know the aberration coefficients for the virtual image. Besides, a lens in the system transmits a nonhomocentric beam that has been distorted by the preceding lenses; therefore, to calculate the system we must know the aberrations of quadrupoles passing incident astigmatic beams.
THE OPTIC’S OF R O l l N I ) AND MIILTIPOLE CL1:CTROSTATIC LENSES
143
The spherical aberration of a simple quadrupole system-a stigmatic doublet has been calculated for the rectangular field approximation (Fishkova et al., 1968). The data in this work are presented in the form of numerous plots with illustrations of aberration figures created by the stigmatic doublet. The spherical aberration coefficient C,, is shown to be much smaller in the plane in which the first lens converges the beam. This may be explained by the fact that, for equal initial inclinations, the trajectory projection onto the plane with the first lens divergent departs further from the axis than the projection onto the plane normal to it, where the first lens is convergent. The optimal regime for obtaining minimum spherical aberration is the regime of high magnitications. when the object is located at the field edge. The quadrupole potential distribution [Eq. (250)]shows that the deviation from the linear field distribution in the central part of a lens begins to reveal itself from the sixth-order terms; the efkct of the electrode profiles on the image quality, therefore, becomes evident when fifth-order aberrations arc taken into account. This question has bcen considered by Ovsyannikova et a]. ( 1968). Fifth-order aberrations have been calculated by the succcssivc approximation method described in Scction 1V.B. In the xOz-plane, the thirdand fifth-order coefficients of spherical aberrations in a linear field are shown to be similar in value but opposite in sign. The contribution of the field nonlinearity to the fifth-order coefficient may be quite considerablc, coinciding in sign with the third-order coefkient if K J K 2 > 0. -
In some cases, quadrupoles are used to focus charged particles with a Iitrgc variation of the initial velocities. The focusing quality is then much aff‘ected by the chromatic aberrations. A method for chromatic aberration calculation has been described in Section 1V.C. The trajectory equations taking into account the initial energy spread Ad) can be derived from Eqs. (99) by setting rb(i) = const. Then we have (303) Integrating Eqs. (303) by the successive approximation method, we obtain thc expression for chromatic aberration in the Gaussian line-image plane:
jz,, =I
A.xi = A/l’.x,,(zi)
+
u ( : ) . Y , ( z ) [ . ~ - ~ ~ x , , ( z )x;,.Y,(z)] dz.
(304)
144
L. A. BARANOVA A N D S. YA. YAVOR
This has been generalized to the case of orthogonal systems and relativistic velocities in the work of Hawkes (1965a). The chromatic aberration coefficients C,, and C, are defined by the first expression in Eqs. (100). Since the linear magnification M , is equal to M , = xp(zi) and A/12J/12= - AI$/&, we obtain ccx
=
-o’(;:
ux; dz,
CAx=
-
o2[: ux,xB dz.
(3051
The coefficient of the axial chromatic aberration C,, can be reduced to the following form by integration by parts and by using the paraxial equation:
It is clear from this that the chromatic aberration of electrostatic quadrupoles does not vanish for any field distribution. The value C,, is always negative, which means that higher energy particles are focused farther from the lens. No conclusion can be made about the sign of C, in the general case. Unlike round lenses, where there is a strict proof that achromatic lenses cannot be designed, an achromatic quadrupole can be designed by combining electrostatic and magnetic fields (Kel’man and Yavor, 1961).To prove this, we shall consider a compound quadrupole lens, in which the poles are rotated through 45” relative to the electrodes (Fig. 55). The paraxial trajectories in such a lens in the relativistic case are described by Eqs. ( 3 9 , in which we should
FIG.55. An achromatic quadrupole lens: ( 1 ) magnetic poles; (2) electrodes.
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 145
set & ( z ) = const,
4’(z) = 4”(z) = 0. Then Eqs. (35) take the form x” + (p; - P&)u(z)x = 0, y” - ( p i - p$)u(z)y = 0.
(307)
Here 11; means the excitation of the electrostatic lens [see Eq. (269)], and fiiis the excitation of a magnetic lens:
The field distribution u(z) along the z-axis is taken to be the same in the electrostatic and magnetic quadrupoles: 5 W are the scalar magnetic potentials at the poles; RM is the aperture radius of the magnetic quadrupole; and N , is the coefficient related to the pole profiles and is introduced like the coefficient K 2 in the electrostatic lenses. It should be emphasized that, in contrast to Eq. (35) for dimensionless coordinates, the coordinates in Eqs. (307) are not related to the lens aperture radius. It is clear that the trajectory in the first approximation will not depend on the particle energy near a certain accelerating potential, = q50, if
here = Pi - p i . From this, we obtain the relationship between the electrostatic potential on the electrodes V and the scalar magnetic potential at the poles W :
where R E is the aperture radius of the electrostatic quadrupole. The polarity of the potentials on the electrodes and poles must be such that the forces acting on a charged particle are oppositely directed. The polarity shown in Fig. 55 will correspond to an achromatic lens if positive particles move along the z-axis in the positive direction. For small energies, expression (3 10) is reduced to the form
Substituting Eq. (310) or (311) into Eqs. (269) and (308), we obtain the excitation P i of the achromatic lens. At small velocities /I: = = -0.5 fli.Figure 56 illustrates the dependence of the quadrupole excitations in a compound lens on the particle energy. The compound lens excitation (curve 2) has an extremum at 4 = 4,, which corresponds to an achromatic lens.
146
L. A. B A R A N O V A A N D S. YA. YAVOR
FIG.56. Excitations of ( 1 ) magnetic, (2) compound and ( 3 )electrical quadrupoles.
When the functions of the axial potential distribution for the magnetic and electrostatic quadrupoles are different or the quadrupoles are not superimposed but are arranged in series, the chromatic aberration can also be corrected; however, the achromatic conditions are more complex. A doublet of achromatic lenses to form high-voltage ion probes was designed and tested by Martin and Goloskie (1982). A detailed calculation of electrostatic quadrupole aberrations, including chromatic and third-order geometrical aberrations, is made for the relativistic case by Fujita and Matsuda (1975). This work also considers the effect of fringing fields on the third-order properties. The results are presented in the form of 3 1 4 order matrices, the products of which allow one to define the properties of arbitrary quadrupole systems. To conclude this section, we should note that electrostatic quadrupoles find application in various fields of science and technology. Some applications have been described earlier, however, we shall give some more illustrative examples. Quadrupoles, unlike round lenses, are better combined with such electron optical elements as deflecting systems and spectrometers. The focusing properties of the latter are usually different in the deflection plane and the plane normal to it, so they must be combined with astigmatic lenses in order to obtain a stigmatic outgoing beam. Moreover, the fact that quadrupoles can independently vary the beam size in two mutually perpendicular directions permits optimal matching of the beam and spectrometer aperture in order to increase the spectrometer transmittance. The application of quadrupoles in combination with spectrometers has been analyzed by Petrov (1 975).
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
147
i
,
FIG.57. A doubled quadrupole used to rotate a line focus.
Systems for image analysis use astigmatic particle beams that form a line on the screen (line focus). In some cases it is necessary to move or rotate it. The shift of the line focus is made by deflecting systems; its rotation requires the use of two superimposed quadrupoles rotated through 45" with respect to each other. Usually, doubled magnetic quadrupoles are used (Bonshtedt and Markovich, 1967), but the use of electrostatic lenses is also possible. Figure 57 shows the design of a doubled electrostatic quadrupole that permits one to form a line focus rotated around its center. Potentials that are sinusoidally time varied and phase shifted by n/2 relative to each other are applied to the electrodes of individual quadrupoles. Let us suppose that at time r = 0 the potentials are nonzero only on the first lens and the line focus formed by the electron beam is oriented vertically. When wt changes by n/2, the potentials become nonzero only on the second lens, and the line focus turns through 9 0 and acquires the horizontal position. At s t = n, it again becomes vertical. At the intermediate moments of time, when the potentials of both lenses are nonzero, an equivalent lens is formed, whose electrodes are rotated through an angle of 45", the line focus occupying successively all the positions along the circle. In the paraxial region, the field of the equivalent lens does not differ from that of a standard mechanically rotated quadrupole. In the regions far from the axis, the fields are different, so the aberrational properties of the lenses are also different. High-energy ion probes can be formed by quadrupole lenses. At ion energy (4' I I .O - 2.0 MeV, electrostatic quadrupoles are used. The work of Augustyniak et al. (1978) gives a detailed description of a triplet of electrostatic
148
L. A. BARANOVA AND S. YA. YAVOR
quadrupoles used to focus ion beams of 2 MeV produced by an electrostatic accelerator. The beam current is 10 nA, and the image size on the target is less than 25 pm. The triplet is 11 cm long, the aperture diameter is 2 mm, and the focal length is 10 cm for a singly-charged ion energy of 1.5 MeV and electrode potentials & 1 kV. The system forms both stigmatic and astigmatic images and has been used in some physical experiments, in particular, in studies of surface structure and backscattering. One of the possible applications of quadrupole lenses is aberration correctors, firstly, for spherical aberration correction when they are used in combination with octupole elements. This question will be discussed in Section X1I.C.
IX. TWO-DIMENSIONAL (CYLINDRICAL) LENSES Two-dimensional (cylindrical) lenses, like round lenses, are the oldest focusing elements in electron optics. The first term is associated with the type of field: In the region used, far from the edges, the field practically does not change in one of the directions, so it can be considered as two-dimensional (or planar). As a result, there is no focusing in this direction, which makes the lenses similar to cylindrical glass lenses. Thus, the alternative name for twodimensional lenses is cylindrical lenses. We shall mainly use the first term, because sometimes the term cylindricul is used to describe round lenses, whose electrodes have the shape of circular cylinders. The electrodes of two-dimensional lenses commonly have the form of plates with one dimension much larger than the other. They may be placed parallel to the beam axis, which coincides with the z-axis [Fig. 58(a)], or perpendicular to it [Fig. 58(b)]. In the first case, each electrode is formed by a pair of plates arranged symmetrically relative to the x0z-plane, with the same potential applied to them. In the second case, the electrode is a plate with a slit aperture. Since the gap size in the plane of symmetry xOz, called the midplane, is much larger than in the perpendicular direction, two-dimensional lenses are used to focus ribbon beams. One illustration is ion sources and prism analyzers. It will be shown later that the optical properties of two-dimensional lenses in the plane normal to the midplane are very much like those of a round lens. However, because of the absence of focusing in the midplane, they are less common and have not attracted much attention. Two-dimensional lenses are classified like round lenses into immersion and einzel lenses. In the first group the minimum number of electrodes is two, in the second group three. Two-dimensional lenses can also form zoom lenses. An important advantage of a two-dimensional lens is the relatively simple analytical and numerical calculation of the field. The field of a two-
T H E OPTICS OF R O U N D A N D M U L T I P O L E E L E C T R O S T A T I C L E N S E S
149
Fici. 58. Modifications of einzel two-dimensional lenses.
dimensional lens with rather complicated electrode profiles can be calculated using the method of conformal transformations developed for planar problems. This section describes the general properties of two-dimensional lenses, including the potential distribution, paraxial characteristics, and aberrations (IX.A), and gives illustrations of various types of two-dimensional lenses and their parameters (1X.B). A , Optical Properties of’ Two-Dimensionai Lenses
Since the field remote from the lens edges is independent of the coordinate x, we can derive from the Laplace equation the following relation between the
coefficients in the potential expansion of (33):
Here, as in a round lens, the field in all space is determined by its axial distribution. Considering Eqs. (312),the potential expansion of a two-dimensional lens can be written as follows: c p ( y , ~= )
1 2
- -@’(z)y2
+ 241 9 l v ( ~ ) y+4 ... -
(313)
150
L. A. BARANOVA A N D S. YA. YAVOR
Substituting Eq. (313) into Eq. (35) or (38), we obtain the paraxial trajectory equations, which in the nonrelativistic case have the form (&xt)! 4y”
1
1
+ 2 @y‘ + -2 4 ” y -
=0 = 0.
(314)
The origin of the potential is the plane in which the particle velocity is zero. The first of equations (314) can be easily integrated to obtain
(Here the subscript 0 denotes the initial values.) Due to the absence of forces along the x-axis, the two-dimensional lens does not form an image in the plane parallel to the x0z-plane. However, it is clear from Eq. (315) that for 4 # &,the slope x’ does change due to the changing longitudinal velocity of the particle. In this case, the sign of x’ remains unchanged, so a divergent incident beam remains divergent after it has passed through the lens (in the x0z-plane). One can conclude from Eq. (315) that the degree of refraction of the trajectory varies with the particle energy. This fact served as a basis for designing some spectrometers known as prism spectrornerers (see Kel’man and Yavor, 1968). The second equation in (314) differs from the paraxial equation of a round lens only in the coefficient of the third term (1/2 instead of 1/4 in Eq. (192)). Therefore, a two-dimensional lens focuses charged particles in one direction parallel to the y-axis, and the image of a point object is a line perpendicular to the y0z-plane. The line length varies with the beam divergence in the direction parallel to the x-axis. We shall show that a two-dimensional lens cannot be divergent (see Pierce, 1954). We shall integrate the second equation of (314) with respect to the z-coordinate from z I to z 2 , assuming that the first one limits the field from object space and the second limits it from image space. Then we shall have
Integrating by parts the first right-hand term, we obtain
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 151
Since the field is absent beyond the ingration range, 4 ’ ( z 1 ) = 4’(z2) = 0 and so Eq. (316) takes the form
The y-coordinate is assumed to be positive on the left of the lens and does not change its sign within this limit, that is, the trajectory does not intersect the axis. It follows, then, from Eq. (318) that the value of y’ decreases within the lens field, making the lens always convergent. This statement has been proved for the relativistic case by Glikman et al. (1967a). The second equation of (40) for a two-dimensional lens, taking into account (312), can be written as follows:
where Y = y4ll4. By integrating Eq. (319) in the thin-lens approximation, keeping in mind that 4’(zl) = $ ’ ( z 2 ) = 0, we obtain the following formula for the focal length in image space:
The integration limits have been replaced here by co,because this does not change the final result. The focal length in object space can be derived from Eqs. (320)and (57). One can see that a two-dimensional lens is stronger than a round lens; their optical powers differ by a factor of two in the thin-lens approximation. For numerical calculations of optical properties of a two-dimensional lens, the paraxial equation should be reduced to the form not containing the second derivative of the potential distribution. This can be done using the following substitution: v] = y41’2.
(321)
Then the second equation of (314) will take the form
Equation (322) contains only the first derivative of the axial potential distribution, which can be determined with a satisfactory accuracy from numerical calculations or experimentally. Therefore, the calculation of the
152
L. A. B A R A N O V A A N D S. YA. YAVOR
paraxial optical properties of two-dimensional lenses using Eq. (322) provides a greater accuracy than Eqs. (314). Since a two-dimensional lens forms a line image of a point object, only the aberration of its width Ayi is of interest. The geometrical and chromatic errors of imaging in such a lens can be calculated by the method described in Section IV using the formulae given there. The geometrical distortions are given by the second expression in (90), which has no x,-containing terms, because the image quality of a two-dimensional lens does not depend on the position of an object point relative to the x-axis. Thus, the lens in question is described by six coefficients of the third-order geometrical aberrations; four coefficients correspond to trajectories lying in a plane parallel to the y0z-plane. It has been shown by Rheinfurth (1955) that the spherical and chromatic aberration coefficients have the same signs as the corresponding coefficients in round lenses; they d o not vanish for any field distribution. The expressions for the geometrical aberrations of two-dimensional lenses are given by Hawkes (1966/67) in the form of quadratures.
B. The Parameters of Some Two-Dimensional Lenses Our knowledge about most of the standard lenses is sufficiently complete. The optical properties of a large number of two-dimensional lenses formed by electrodes parallel and normal to the beam axis (Fig. 58) have been found by numerical calculation of the potential distribution, using the charge density method (Harting and Read, 1976). Two classes of two- and three-electrode lenses have been studied-immersion and einzel lenses. The work cited above gives the parameters of thdee-electrode immersion lenses, in which the image position is not shifted wheb the beam energy changes (zoom lenses). For illustration, we shall describe the parameters of some typical lenses. Figure 59 shows the curves relating the object and image positions, P and Q, respectively, in an immersion accelerating lens formed by two pairs of plates (the reference point for the P and Q values is the geometrical lens center; they lie to the left and to the right of it, respectively). The gap width between the electrodes is s = O.la, where a is the distance between the parallel plates. The same figure will represent the parameters of a retarding lens if the particle beam is oppositely directed. In this case, P and Q will be interchanged and M will correspond to the reciprocal magnification. The paraxial characttiristics of a three-slit two-dimensional einzel lens are shown in Fig. 60. The distance between the slit diaphragms is h = OSa; here a is the slit width. Figure bO(a) corresponds to the retarding potential on the intermediate electrode (V, < V l ) ; Fig. 60(b) corresponds to the accelerating electrode (V, > V,).The work we have cited also presents data on the spherical
0
E
9 a
0
5 L
THE OPTICS OF ROUND AND MULTIFQLE ELECTROSTATIC LENSES 155
a FIG.61. A two-dimensional einzel lens with a thick intermediate electrode: (a) lens section by the yOz-plane; (b) relationship between the positions of the object P and image Q: h,/u, = h J u , = h,/uz = 0.5; a , la, = u3Juz= 0.5.
aberrations of two-dimensional lenses. A comparison with round lenses shows that the spherical aberration of slit lenses in the y0z-plane is similar in order of magnitude to the aberration of lenses formed by circular apertures (for the same electrical and geometrical parameters). A thick intermediate electrode is used to increase the optical power of a three-slit einzel lens. Such a modification also allows the spherical aberrations to be reduced. The calculation of this type of lens [Fig. 61 (a)] has been made using a wide variation of the intermediable electrode thicknesses, interelectrode distances, and gap widths (Afanas'ev et al., 1975). The field has been found by approximate calculations based on the series expansion of potential in each electrode slit in a complete set of orthonormalized functions. In interslit space, Dirichlet's problem can be solved exactly, and the unknown expansion coefficients can be found from the continuity condition of the normal potential derivative in each slit. This method provides a high accuracy of the results with minimum computations. The lens parameters were calculated by numerical integration of the trajectory equations. The calculations of the paraxial properties are represented as curves of equal magnifications and equal excitations [Fig. 61(b)]. By the excitation b, here we understand the ratio of the electrode potential difference to the accelerating voltage as fi = ( Vz - Vl)/V,. The calculations were made for seven lens geometries; the excitation, object, and image positions varied widely. Figure 61 (b) illustrates the relationship between the image position Q
I56
L. A . BARANOVA A N D S. YA. YAVOR
and the object position P measured from the lens center to the right and to the left, respectively. The geometrical parameters of the lens are given in the figure caption. Note that if the object and image were in the lens field, the values of P and Q were determined from their real position, and not from the apparent position seen from the outside (asymptotic position). Because of the lens symmetry, the plots for the positive and negative excitations are given in the same picture, one half for each. So, if Q > P for fl > 0 or Q < P for fl < 0, the parameters can be found by interchanging P S Q and M 4 l/M. The data analysis shows that the lens optical power grows with the distance between the electrodes and with the intermediate electrode thickness, as well as with decreasing slit widths at the end electrodes. The same authors have calculated the coefficients of spherical and chromatic aberrations. The spherical aberration is shown to decrease with increasing slit width of the end electrodes. The effect of the other dimensions is not so essential. Two-dimensional lenses with the retarding intermediate electrode have, like round lenses, much greater spherical aberration than lenses with the accelerating electrode (at the same optical power). In the region of P 2 5a,, the spherical aberration coefficients can be well approximated (the error 10%) by the formula
-
The work of Alexandrov et al. (1977) catalogs the parameters of einzel slit lenses with thick intermediate electrodes. The lenses are symmetrical relative to the plane passing through the middle of the intermediate electrode. The potential distribution was obtained by numerical integration of the Laplace equation or by modeling on a resistance network. The optical parameters were found by numerical integration of the trajectory equations. The catalog was made for a wide range of geometrical parameters of the lens, object plane positions, and electrode potential ratios. It includes the following lens characteristics: the cardinal elements, linear and angular magnifications, the Gaussian image positions, maximum trajectory deviations from the axis, as well as the coefficients of spherical and chromatic aberrations. The fields of two-dimensional lenses formed by pairs of parallel plates have been carefully studied because of the relative simplicity of analytical calculations. The potential distribution in two- and three-electrode lenses with an equal distance a between the parallel plates [Figs. 62(a), (b)] was calculated by the method of separation of variables (Tsyrlin, 1977), based on the assumption that the gaps between the electrodes were small and the plates of the outer electrodes were semi-infinite. The Fourier integrals obtained were transformed into series by the residue theorem. In both cases, the series can be
"3
*
a
* v,
AY
A
.
z
k2
"2
"f
a2
- 01
0
2
7
V 'I
1
i
summed to obtain the solution in closed form. Theexistence of the closed form is accounted for by the possibility of solving the problem by means of a conformal transformation based on an elementary function. For a two-electrode lens [Fig. 62(a)] the potential distribution has the form 1 cp(y,z) = ? ( V ,
+ V,) k (V,
-
[:
V,) -
-
-arctan sin . k(nz/u)
For a three-electrode lens the potential distribution is written as the sum of the symmetrical and antisymmetrical functions:
dY,4
= cp+
These functions are expressed as follows:
+ cp-.
(325)
158
L. A. BARANOVA A N D S. YA. YAVOR
where
The potential distribution on the z-axis can be obtained by setting y = 0 in Eqs. (324), (328),and (329). Two dimensional lenses can be designed in which each pair of plates remains symmetrical relative to the x0z-plane but the distances between the plates in the pairs are different [Figs. 62(c), (d)]. For instance, Glikman and Yakushev (1967) used the method of conformal transformations to calculate the potential distribution in a three-electrode lens shown in Fig. 62(d). The authors used the Christoffel-Schwartz method; the result is given in parametric form. The three-electrode two-dimensional lenses represented in Fig. 62(b) are employed in prism spectrometers as separate elements of these devices or as part of electrostatic prisms. As a rule, they operate in the telescopic mode, in which a parallel incident beam remains parallel after it has passed through the field. Since two-dimensional lenses are always convergent, the telescopic regime can be provided only by creating an intermediate focus inside the system. For given geometrical parameters, this mode is determined by suitable selection of the potentials on the electrodes. The work of Glikman et al. (1 967b) presents the calculations for a three-electrode lens operating in the telescopic mode and having equal distances between the parallel electrode plates. The relationships between the electrode potentials, cardinal elements, positions of the intermediate focus, and magnifications of the beam cross sections have been found and presented in numerous plots. When a two-dimensional system is used in a prism and performs the focusing and energy separation of the beam, the charged particles are directed onto the system at a large angle to the y0z-plane. To determine the position of the cardinal elements in lenses with oblique beams, one can use the formulae and plots for a normal beam, in which the potential cp(y,z) is replaced by the value cp*(y,z) = cp(y,z) - f$,sin26, where 6 is the angle formed by the beam axis and the y0z-plane. The potential distribution in a three-electrode einzel lens was calculated by VukaniE et al. (1976), taking into account the gap widths between the electrodes; the calculation was made by separation of variables. For the potential distribution on the z-axis, the following simple expression was obtained based
THE OPTlCS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
159
on the assumption that the interelectrode gaps were small:
(330) For the finite interelectrode gaps s, the expression for the axial field E was found in the completed form:
sin h(rcz/a) - arctan cos h(7r(l + s)/a)
(331)
A comparison with experimental data obtained by measuring the field in the electrolytic tank showed a good agreement. These analytical expressions for the field distribution were used by CiriE et al. (1976b) to calculate a few hundred trajectories in the lens in question and to find its cardinal elements for a wide range of geometrical parameters and electrode potential ratios. The authors analyzed only lenses with a retarding intermediate electrode, as in the most commonly used lenses. The results were presented in the form of plots, from which it is clear that the optical power grows with increasing interelectrode gap in the range under study (s/a = 0 - 0.2) for constant intermediate electrode length. The problem of finding the field between two infinite planes parallel to each other with an arbitrary potential distribution was solved by De Wolf (1978). In another work (De Wolf, 1981), the same problem was considered for a symmetrical electrode potential distribution stepwise constant in the rectangular areas. A semianalytical method using Green's functions was suggested. The method can be applied in approximate calculations of the field between parallel plates of finite size. Thus, there is a possibility of determining, for the lenses discussed in this chapter, the field deviation from the twodimensional pattern, which is due to the finite character of the electrodes, and of estimating the effect of this factor on the lens optics.
X. TRANSAXIAL LENSES The term transaxid has been introduced to describe lenses, in which the field has axial symmetry but the beam axis does not coincide with the symmetry axis; the optic axis is now normal to the latter (Strashkevich, 1962). Such lenses are represented schematically in Fig. 63. They can be formed from a series of coaxial circular cylinders with ring-like slits [or parts of such
160
L. A. BARANOVA A N D S. YA. YAVOR
6 FIG.63. Modifications of immersion transaxial lenses: (a) cylindrical electrodes; (b) planar electrodes.
cylinders, see Fig. 63(a)], with the beam axis lying in the plane of symmetry normal to the generating lines of the cylinders. In another modification [Fig. 63(b)], which has turned out to be of greater practical interest, the lens is formed from pairs of parallel plates with circular gaps between the pairs. In this case the beam axis lies in the plane of symmetry parallel to the electrodes. In a transaxial lens, as in a two-dimensional lens, the aperture has very different dimensions in two mutually perpendicular directions (in the planes xOz and yOz; see Fig. 63). In the x0z-plane, which is usually called midplane, the aperture is large, and the size of the beam focused in this plane may also be large. For this reason, such lenses are convenient for handling disc beams, as well as fan-shaped beams. Unlike the two-dimensional lens, the transaxial lens has a nonzero optical power in all the planes of symmetry. Such a lens converges charged particles in two perpendicular directions. However, its optical power in the midplane is considerably lower than that in the normal direction. As a result, a transaxial lens possesses, as a rule, considerable astigmatism. If a transaxial lens has one circular gap, it represents an immersion lens; if there is more than one gap, it may be an einzel lens. In the midplane the beams usually form an angle less than 360°, so the electrodes need not be axially symmetric in the whole region. Their angular size is selected so as to provide the axial symmetry of the field in the region through which the beam passes with sufficiently accuracy. In a sense, the transaxial lens is closer to a glass lens than any other type of electrostatic lenses. In the midplane, its equipotentials have the form of concentric circles (or their segments); the radius of curvature and electron optical refractive index can be regulated independently. In the midplane, this
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 161
lens may be convergent or divergent; its spherical and chromatic aberrations can be corrected. An advantage of transaxial lenses is the considerable simplicity of calculation of the electron optical properties in the midplane. I t should be emphasized that the transaxial lens, unlike other types of lenses, has no magnetic analogue. Although the lenses in question are comparatively new, they have attracted much attention due to their properties and have already been applied in electrostatic prism spectrometers to improve their characteristics (Kel’man et a]., 1979). The paraxial optics of transaxial lenses will be considered in Section X.A, their aberrations will be considered in Sections X.B and X.C, and some modifications of these lenses will be described in Section X.D. A. Potential Distribution and Focusing in Transaxial Lenses
The theoretical foundations of transaxial lenses have been laid by Brodsky and Yavor (1970,1971) and by Karetskaya et al. (1970,1971); the first two also take into account the relativistic effect. Here we shall restrict ourselves to a description of the optical properties of transaxial lenses transmitting lowenergy beams. Let us consider the field distribution in the vicinity of the lens optic axis, which represents, as usual, the interception line of two planes of field symmetry. We shall locate the origin at the center of curvature of the circular gaps between the electrodes and direct the z-axis along the lens axis. Because the lens field at a sufficiently large distance from the edges possesses rotational symmetry, the potential expansion coefficients in Eq. (33) in the vicinity of the lens axis are not indepenent, but are interrelated by the following expressions obtained from the Laplace equation:
One can see from Eq. (332) that all the coefficients are expressed in terms of the potential distribution on the axis 4(z),which completely determines the field distribution throughout the space. Considering these relations, we can obtain from Eq. (38) the paraxial trajectory equations:
162
L. A. BARANOVA A N D S. YA. YAVOR
In the relativistic case, the second and third terms of Eqs. (333) contain an additional factor
where E = - e / 2 m c z . As before, the potential is taken to be zero at the point where the particle velocity is zero. A comparison with the paraxial trajectory equations for a twodimensional lens (314) shows that Eqs. (333) contains new terms with the factor l/z. These are responsible for the particle focusing in the x0z-plane, producing a certain difference in the optical properties of two-dimensional and transaxial lenses in the y0z-plane. However, for a large gap curvature radius, this difference is only slight, and for practical applications, the first-order properties of transaxial lenses in the y0z-plane can be found using the formulae and data for two-dimensional lenses. If a more accurate calculation of paraxial properties in the direction parallel to the y0z-plane is necessary, they should be found from the second expression of Eqs. (333) by the conventional method described in Section 111. The second equation of (40), taking into account Eq. (332),can be written as follows:
~. (335) differs from the analogous Eq. (319) for a where Y = ~ 4 " Equation two-dimensional lens in the additional term in the right-hand side. By integrating Eq. (335) in the thin-lens approximation and taking into account $'(zl) = $ ' ( z 2 ) = 0, we obtain the following expression for the focal length in image space:
Here the coordinate z was taken to be constant within the lens and equal to R, the radius of curvature of the refractive layer. Of primary importance is the focusing parallel to the midplane that is described by the first of Eqs. (333).Its solution can be reduced to a quadratures as follows:
Here the subscript 0 designates the initial values of the corresponding parameters. Expression (337) completely defines the trajectory projections onto the midplane, and it can yield the values for all the cardinal elements in this plane.
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
I63
The relationship between the coordinates z of a point object lying on the lens axis and its image can be obtained from Eq. (337) by setting x, = 0 and x(zi) = 0. Then we have
From expression (338) it is easy to determine the position of foci in object space z(F,,) and in image space z(Fi,) by setting zi + co or z , -+ -a, respectively. Then we shall get
For a trajectory parallel to the z-axis in object space, from Eq. (337) we find:
Hence, taking into account Eqs. (50)and (57), we shall find the focal lengths in the midplane of a transaxial lens
From comparison of Eqs. (339) and (341), one can see that z(F,x)
=
-.L z(F,*) = .fox.
(342)
The positions of the principal planes can be found using expression (49) as well as (339) and (341):
I t is clear that the principal planes H,, and Hi,of the lens coincide, so it can be regarded as a thin lens. Let us consider an immersion lens formed from a pair of electrodes separated by a narrow circular gap with radius R. When the gap width is much smaller than its radius of curvature, the integral in expression (333) can be found in the thin-lens approximation. Considering the value of z constant and equal to R within the field, let us factor it oustide the integral sign to obtain the expression
(344)
164
L. A. BARANOVA A N D S. YA. YAVOR
One can see that the position of the principal planes coincides with the zcoordinate of the gap if the latter is sufficiently narrow. Using Eq. (344),we can rewrite expressions (338),(339),and (340) in a form analogous to the formulae of glass optics for a cylindrical boundary between two media. These expressions will completely coincide if we introduce the electron optical refractive index n = When the interelectrode gap is not narrow, or when the lens contains several concentric gaps, one can introduce the effective radius of curvature R e = z ( H x ) ,which is defined by expression (343).Then the formulae of paraxial optics in the midplane will also coincide, in the general case, with those of glass optics for a cylindrical surface, Expression (338) will take the form:
a.
Correspondingly, the expressions for the positions of foci and focal lengths can be written as follows: 1 ~
It follows from this that a transaxial lens with concentric gaps can be replaced by one refractive surface with a radius of curvature equal to Re and potentials $o and 4ion either side of it. It is clear from Eq. (346) that the sign of the focal lengths depends on the sign of the effective radius of curvature and the potential ratio in object and image space. By varying these values, we can easily pass from a converging lens to a diverging one and vice versa. For instance, if it is necessary to change the sign of the focal length in an immersion lens with one gap, leaving the potential ratio d i / & constant, we should merely change the direction of the gap curvature relative to the beam direction. The magnification of a transaxial lens is given by the expression M
Z. =--I-.
(347)
ZO
It is easy to see this if we draw a trajectory through the center of curvature and the object point and recall that such a trajectory is not refracted by a transaxial lens. For a lens formed from a few pairs of electrodes separated by narrow concentric gaps, the effective radius of curvature can be obtained by dividing
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
165
the integration interval in Eq. (343) into the corresponding number of parts
Here Rk is the radius of curvature of the k-th gap; dkand 4k- represent the electrode potentials on the right and on the left of this gap, respectively; and N is the number of gaps with +N = 4i. When deducing this formula, it was assumed that the distance between the gaps was much larger than the gap width and the interelectrode distance in the direction normal to the midplane. The optical power of a transaxial lens was calculated by Brodsky and Yavor (1970),assuming that the gap was not narrow and that the potential in the refractive layer was distributed linearly. The above formulae cannot be used straightforwardly to describe multielectrode transaxial systems with nonconcentric gaps. Such a system should be subdivided into elements in which all the gaps are concentric; the optical parameters of each element should be calculated individually, and then the parameters of the system as a whole should be determined by the methods of matrix algebra described in Section 1II.D.It is evident that one can act like this only if the fields of the neighboring gaps do not overlap and if the subdivision into elements is justified. If, however, the fields of nonconcentric gaps overlap, the axial symmetry of the system is disturbed, so it cannot be regarded as transaxial. Nonconcentric systems possess a larger flexibility, but their calculation is more sophisticated (Nevinny et al. 1985).
B. Geometrical Aberrations of Transaxial Lenses Third-order geometrical aberrations of transaxial lenses can be calculated using Eqs. (85) by the method described in Sections 1V.A and 1V.B. In the general case, the aberration blurring is characterized by 10 coefficients in the direction parallel to the midplane and by 10 coefficients in the perpendicular direction. By choosing the aperture position, some of the coefficients can be reduced to zero. The calculation of the aberration blurring A x for transaxial lenses can be considerably simplified by writing the x-projection of a trajectory in terms of quadratures [see (335)]. Then the linearly independent solutions x,(z) and x&z) will take the form
Expressions for all the aberration coefficients are given in the work of Kel'man et al. (1979).
166
L. A. BARANOVA AND S. YA. YAVOR
FIG.64. A diagram for the calculation of spherical aberration in the midplane of a transaxial lens.
An important property of a transaxial lens is the possibility of determining exactly its spherical aberration in the midplane without restriction to the third-order term. For this purpose, an original approach to the calculation of the lens optical properties has been developed that is based on axial field symmetry (Brodsky and Yavor, 1970, 1971). Let us assume that the lens field in the midplane is enclosed between the arcs of two concentric circles with the radii R , and R, (Fig. 64) and that the potentials on both sides of the lens are constant and equal to &, and di, respectively. Having passed from the Cartesian coordinates x, z to the cylindrical coordinates r, II/, we can write the exact expression for the trajectory in the midplane.
Here, the subscript 0 marks the parameters on the left of the lens, the subscript h, represent the “impact” parameters of the incident and outgoing trajectory, respectively. In an axially symmetric field, the generalized angular momentum of a charged particle P# is constant: s marks the parameters on the right of it, and h, and
P$ = m r 2 4 = const.
Hence, we have
(35 1)
THE OPTICS OF ROUND AND MlJLTIPOLE ELECTROSTATIC LENSES 167
Let us denote by z, the exact value of the distance between the lens center and the point of intersection with the lens axis of an outgoing trajectory at an arbitrary angle from a point object located on the axis. As before, the positions of the point object and its paraxial image will be denoted by zo and zi. This notation has been retained, inspite of the use of cylindrical coordinates, in order to have a certain uniformity in the formulae of paraxial optics and spherical aberration. The exact value of the transverse spherical aberration is given by the expression Axi = (zi- z,)tan ys.
(353)
For zs, we have from Eq. (352)
It can be seen from Fig. 64 that the angles ys and yo are related as follows: '/r = yo
+ t,b,
-
t,b,
+ arcsin
(2)
-
arcsin(2j.
(355)
Substituting the expression for the trajectory from Eq. (350) into Eq. (355) and integrating it by parts, we will have
Expression (353) for the spherical aberration, after substitution of zi and z , ~ from Eqs. (338) and (354), will be written as follows:
(357) where
Since it has been assumed that the field is enclosed within the radii R , and R,, but the object and image are outside the field, the extension of the integration limits in Eq. (358) will not change the result. Expressions (357)and (358)give the exact value of the spherical aberration. By series expansion of the expressions in powers of the small parameter h,,
168
L. A. BARANOVA A N D S. YA. YAVOR
one obtains the spherical aberration with accuracy up to terms of any order:
Here A, =
(21 - l)!! (21)!!(21+ 1)
1
[-F+
(21 + l ) q q +l i 2 ) (360)
where Ek are Euler numbers, and Bk are Bernoulli numbers. The expression for the spherical aberration up to fifth-order accuracy has the form Ax
=
&[(% 2)h i + -
(&A:
-$)h:].
(361)
Let us consider, as earlier, a lens formed from several pairs of electrodes separated by narrow concentric gaps. By integrating Eqs. (358) and (360) in the thin-lens approximation, we shall obtain an expression for a lens containing N narrow gaps: 1 A o = - - - J & ZO
(21 - l)!! A , = (21)!!(21+ 1)
N
1
1
k = l Rk
1
(E-c)’ 1
[- p f (R,)
Z1+t
-k=l
(362) We shall write out the expression for A , because, together with A,, it defines the third-order aberration
By choosing the values of the electrode potentials and the radii of circular gaps suitably, we can minimize the spherical aberration. If we halt at third-
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 169
order aberration, it is clear that a system with the desired first-order properties and compensated third-order spherical aberration can be obtained by combining the refractive layers, very much like the way in which this is done in glass optics. C. Chromatic Aberrution
The chromatic aberration of transaxial lenses can be found from expressions (99) by substituting the linearly independent solutions of paraxial equations (333) into them. Then the expression for the chromatic aberration has the form of Eq. (100).In the y0z-plane, the general expressions for linearly independent solutions of the paraxial equation in explicit form are unknown. For this reason, the aberrations are usually calculated by numerical methods or semianalytically, using some field distribution models. For the trajectory projection onto the midplane, the independent solutions are written as quadratures (349), and the expression for A x i has the form (Karetskaya et al., 1970):
xbz,)P--,A 4
(364)
40
where
Here, as earlier, it has been assumed that the object is located on the concave side of the lens. For a parallel incident beam the expression in this case is p = p - - (40)3’2[(34i{0B F-2 4 i
,4512
*)
z4312
- 11.
(366)
Hence, we can derive an expression for an arbitrary position of the object
It follows from Eq. (364)that the coefficients of axial chromatic aberration and chromatic aberration of magnification are interrelated and can vanish simultaneously at P = 0. Moreover, as was pointed out in Section IV, the aberration of magnification can be corrected by choosing the aperture position suitably. In a transaxial lens the correction requires z B = 0.
170
L. A. BARANOVA A N D S. YA. YAVOR
In a multielectrode transaxial lens with narrow concentric gaps, PF can be easily found:
From Eqs. (367) and (368), the conclusion can be drawn that the chromatic aberration in a lens formed by one refractive layer cannot be cancelled. One exception is the case when the object is at the center of curvature (zo = 0); however, there is then no focusing. Let us consider a lens formed by electrodes with two concentric gaps. The expression for the effective radius of curvature, from Eq. (348), is
while PF is equal to
A lens consisting of two refractive layers has enough free parameters for it to be possible to cancel the chromatic aberration. For instance, for a parallel beam incident from the convex side, the condition of achromatism will take the form
Thus, the chromatic aberration in the midplane of a transaxial lens can be completely eliminated by selection of relations between the electrical and geometrical parameters of the lens, provided that it consists of at least two refractive layers. D. Transaxial Lenses Formed by Parallel Plates
Here we shall discuss transaxial lenses in which the electrodes lie in two parallel planes and are separated by narrow circular gaps (Glikman et al., 1971).Such a lens having one gap is illustrated in Fig. 63(b). The field of a transaxial lens with planar electrodes was found by the method of separation of variables, assuming that the gaps between the electrodes are infinitesimal. For the case of two concentric gaps, the expression
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 171
is [in polar coordinates (r,z)]
Here I, and I , are the Bessel functions, R , and R , are the gap radii of curvature, d is the distance between the planes on which the electrodes are located, and is the intermediate electrode potential. In Cartesian coordinates [see Fig. 63(b)] the expression for the potential distribution along the optical axis has the form $ ( z ) = ($0
-
$1)'(z,R,)
+ ($1
-
$i)'(z,R2)
+ $i,
(373)
where
Because the lenses in questions were designed to operate in prism spectrometers, two operational regimes were studied. In one regime the lens focuses the incident parallel beam in the midplane, leaving it parallel in the perpendicular plane; the latter is achieved by producing an intermediate focus in this plane. In the other regime (anamorphotic), the beam going out from a point source is transformed to a parallel beam in both directions. These regimes are provided by selection of the intermediate electrode potential at a given energy of the outgoing beam. The expression for the potential distribution (373) has been used to find the required values of the potential on the intermediate electrode, the cardinal elements, and the geometrical and chromatic aberrations. These data have been summarized in numerous plots and tables. It should be noted that they are in good agreement with the calculations obtained in the thin-lens approximation (see above). The transaxial lenses we have described have been used as collimating and focusing lenses in prism instruments, such as the electron Auger spectrometer (Bobykin et al., 1978) and the mass spectrometer (Kel'man et al., 1976). The design of transaxial lenses makes them suitable devices to combine with prism analyzers. Both transmit wide beams in the midplane, while in the perpendicular plane the beam cross section is very small.The astigmatism of transaxial lenses is necessary to match the particle sources with the prisms, An advantage is the low aberration of the lens in the midplane, which favors a higher resolving power of the device.
172
L. A. BARANOVA AND S. YA. YAVOR
Pz 91
$02
f 2 4 FIG.65. Electron optical diagram of a prism Auger spectrometer: (1.2.3) prism electrodes; (3,4,5) transaxial lenses; (8) source; (9) detector.
Figure 65 shows a schematic diagram of an electron prism spectrometer. One can see that all the elements form one unit placed on two parallel planes. The parameters of the three-electrode transaxial lenses are as follows: The distance between the parallel electrodes d = 12 mm; the average gap radius between electrodes 4 and 5 RI = 5d, between electrodes 3 and 4 R , = 7d; and the circular gap width between the electrodes is 0.25d. The lenses operate in the anamorphotic mode; and the ratio of the electrode potentials &4/&5 = 3.22 at &3/45= 0.16. The focal lengths of the lenses are very different, in the midplane and the plane normal to it. For the collimating lens f o x = 15.5d, fOy = 4.51d; the focusing lens has the same focal lengths in image space. Due to this correlation of the focal lengths, it is possible to use large divergence angles in the y0z-plane (normal to the midplane). The large focal length in the midplane increases the linear dispersion of the instrument. The basic characteristics of the instrument are as follows. The relative line half-width of the spectrometer measured from the peak of the elastically scattered electrons is 0.2% when the diameter of the primary electron beam is 1 mm and the exit slit width of the energy analyzer is 3 mm. The aperture ratio of the analyzer is 0.12% of 471,and the luminous emittance L = 0.32% mm’. The analyzed electron energy varies from 150 to 2300 eV.
THE OPTICS OF R O U N D AND MULTIPOLE ELECTROSTATIC LENSES
173
Fw. 66. Diagram of a prism mass spectrometer with transaxial lenses: (1-4) electrodes of a transaxial lens; (4-6) electrodes of a telescopic system; (M) magnetic prism; (S) source; ( D ) detector; (A) aperture.
A schematic diagram of a prism mass spectrometer is shown in Fig. 66. The spectrometer uses four-electrode transaxial immersion lenses to collimate and focus the beams. The distance between the parallel plates is 8 mm, and the interelectrode gaps are 1 mm. The focal length in the midplane is 460 mm. The linear dispersion of the mass spectrometer is 11.4 mm per 1% mass change. The resolving power is 120,000 at the peak half-width and 50,000 at 10% height. Transxial lenses can be used to increase dispersion in spectrographs. The electron optical scheme of a magnetic spectrograph (Fig. 67) consisting of a sector analyzer and a four-electrode transaxial einzel lens was calculated by
FIG.67. A magnetic spectrograph with higher dispersion: (1-4) electrodes of a transaxial lens; ( 5 )sector magnet.
174
L. A. BARANOVA A N D S. YA. YAVOR
Afanas'ev et al. ( 1982).The dispersion is increased because the transaxial lens increases the angle between the axial trajectories of different monoenergetic components of the particle beam without violating the focusing of these monoenergetic components. In the system shown in Fig. 67, the center of curvature of the first refractive layer coincides with the magnet deflection center, so the first layer does not change the dispersion. It serves to retard the beam as a whole and to focus individual monoenergetic components. The main contribution to dispersion occurs in the second refractive layer, whose radius of curvature is small, but the potential ratio is large. The third refractive layer further increases the dispersion and retards the beam down to the initial energy. The calculations show that the system provides a 4.5-to 5.0-fold increase in the angular dispersion. The foregoing discussion offers guidance about the advisability of creating and combining various electron optical elements, including transaxial lenses, on two parallel plates.
XI. CROSSED LENSES Crossed lenses are a new type of lenses (Afanas'ev et al., 1980).Their basic difference from those described earlier is the intrinsic three dimensionality of the field, which makes their calculation much more sophisticated. A crossed lens consists of a series of parallel plates with apertures having two planes of symmetry and adjusted coaxially. The dimensions of the apertures in two perpendicular directions are different; in the neighboring plates the apertures are rotated through 90" relative t o each other. Figure 68 illustrates a simple crossed einzel lens with rectangular apertures. It consists of three electrodes. Equal potentials V, are applied to the two outer electrodes, and the potential V, is applied to the inner electrode. Sometimes einzel lenses have more complex aperture shapes or a larger number of electrodes with alternating applied potentials V, and V,. A crossed lens, like any other lenses with a varying axial potential, may be of immersion type. A simple immersion lens contains only two electrodes with different potentials. Crossed lenses are simple in design and easily adjustable; their construction is relatively simple. For these reasons, such lenses are widely used in cat hode-ray tubes. Qualitatively, the optics of a crossed lens can be regarded as the combined effect of a round lens, a quadrupole, and an octupole; generally, therefore, crossed lenses are astigmatic. The relative contribution of the quadrupole component to the lens optical power is largely determined by the difference
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
175
r
FIG.68. A three-electrode einzel crossed lens.
between the transverse and longitudinal aperture lengths, and commonly it exceeds the contribution of the round-lens component. A single crossed lens is thus convergent in one of the planes but divergent in the perpendicular plane. If the electrodes are arranged as in Fig. 68, the focusing occurs in the x0z-plane for (V2 - V,)/V, > 0 or in the y0z-plane for (V, - V,)/V, < 0. A beam can be focused in all directions by a system of crossed lenses, just as in quadrupole optics. The presence of an octupole component in the potential distribution offers the possibility of correcting the lens aberrations. The term crossed lens was first introduced by Yavor (1970), where the electron optical properties of these lenses were considered in the approximation of narrow and infinitely long electrode apertures. These lenses differ from a series of two-dimensional lenses turned 90" to each other in that the field in the latter is created by plates with parallel slits, while the plates with perpendicular slits have equal potentials and field-free space between them. It was shown that crossed lenses may be divergent and achromatic. In the design of a commercial quadrupole lens, Himmelbauer (1969) also used a set of plate electrodes with curved apertures possessing two planes of symmetry (Fig. 69). A series of such electrodes alternately rotated 90" to each other models the aperture of a quadrupole lens. However, in an attempt to exactly reproduce a quadrupole field, the author failed to reveal the advantages of a crossed lens over a quadrupole, in particular, the possibility of correcting third-order aberrations. An einzel lens is an integral part of most systems formed from crossed lenses, so Section X1.A describes its simple modification and Section X1.B discusses possible optimization of its parameters.
176
L. A. BARANOVA A N D S. YA. YAVOR
FIG.69. An electrode of a crossed lens with a curved aperture.
Crossed lenses, like many other astigmatic lenses, are usually used in various combinations. This question is discussed in Section X1.C. In Section X1.D we shall describe various correctors designed on the basis of plate electrodes normal to the axis. A. A Three-Electrode Einzel Crossed Lens with Identical Rectangular Apertures
The field of a crossed lens has two planes of symmetry, and its potential distribution is described by expression (33), in which all the coefficients are nonzero. Such a field cannot be approximated by any two-dimensional models. The calculations are a very complex problem, even for a lens with simple rectangular apertures. The potential distribution of an einzel crossed lens with equally spaced electrodes (Fig. 68) was calculated using potential expansion in the apertures in terms of orthogonal functions (Afanas’ev and Yavor, 1973, 1977). In each aperture the potential was expanded in the complete orthonormalized set of functions. The Dirichlet problem for the regions between the apertures with a common term of this series as the boundary value can be solved exactly, after which the unknown coefficients of the expansion are found from the continuity of the normal potential derivative in each aperture. The order of the set of linear equations, to the solution of which the problem is reduced, is much lower than in the integral equation approach, let alone in the finite difference method. However, the calculation of the coefficients becomes a
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 177
more complicated task. This method allows us to find the potential and its derivatives on the system axis with high accuracy, using a small computer, in an acceptable time; this would not be possible with the finite difference method or the integral equation approach. The field calculations for a modification of a crossed einzel lens are given in Fig. 70, which shows the potential distribution on the lens axis cpoo(z) = &z) - V,, as well as the transverse potential derivatives q ( x .y , z ) inclusive up
FIG.70. Potential distribution components of an einzel crossed lens
178
L. A. BARANOVA A N D S. YA. YAVOR
to the fourth order: ( 7 2 ' p + q ' q ( x , y ,z ) %J2&)
=
d2pXd2qy
lx=l=o.
It follows from the expression for the potential distribution in Eq. (33) that qoo(z)and (qzo+ (pO2)/2correspond to the axially symmetric component of the field, while (qz0- q O 2 ) / 2represents the quadrupole component. We should recall that here, as usual, the potential q ( x , y , z ) is zero where the particle velocity is zero. The electrode potentials are measured from this point, that is, from the cathode. If the potential distribution is known, the first- and third-order optical characteristics of the lenses can be obtained by solving Eqs. (38) and (85). Numerical integration was used by Afanas'ev and Yavor (1977) to find the paraxial properties and spherical aberration coefficients for a three-electrode einzel lens with rectangular apertures, as well as for doublets formed from such lenses. Due to the complexity of these calculations for crossed lenses, they were largely studied experimentally. First-order focusing properties of crossed einzel lens were analyzed by Petrov and Yavor (1975), and by Petrov (1976). The measurements were made on an electron optical bench by the two-grid shadow method. The lens was formed from three-plate electrodes with identical rectangular apertures. The outer electrodes had the same potentials, equal to the anode potential V , , while the inner electrode had a potential V,. The potential ratio VJV, on the lens electrodes, its geometrical parameters, namely, the ratio of the aperture sides u/b, and the interelectrode distance h, varied in a wide range. Figure 71 illustrates the measurements of the distance from the lens center
FIG.7 I . First-order parameters of an einzel crossed lens.
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 179
to the image zi/2b, of the angular magnification r, the focal length f/2b for the aperture side ratio u/b = 2, and the distance between the object and the lens center z0/2b = 13. The abscissa shows the potential ratio V,/Vl. The dashed curves represent the x0z-plane (the inner electrode aperture is extended in the direction Ox), and the solid curves refer to the y0z-plane. Curves 1, 2, and 3 correspond to the interelectrode distances h/2b = 1.0,0.5, and 0.3. From the experimental data and calculations, one can arrive at the conclusion that the optical power of the lens grows in the given range of geometrical parameters with increasing u / h ratio and decreasing distance between electrodes h/2b (h/2b 2 0.2). This is where a crossed lens essentially differs from a quadrupole, whose optical power drops with decreasing length; in addition, a crossed lens is weaker than a quadrupole lens. A comparison with two-dimensional and round lenses consisting of three plates shows that the latter are weaker than a crossed lens for identical electrode potentials and identical distances between them. Baranova et al. (1987a) made detailed measurements of the focal lengths of an einzel crossed lens in the converging and diverging planes. Approximate functions were found that permit calculation of first-order properties with 4-57; error. The spherical aberration of a crossed einzel lens was studied by Petrov and Yavor (1975, 1976). The coefficient C,, responsible for the aberration of the line image in the mid-plane was measured. Figure 72a shows the dependence
c,, ,10 Jcm I4
Ft
'" t
i
0
1.
2
3
o/h
-2
-T
- b' a FIG.72. The spherical aberration coefficient C,, of an einzel crossed lens as a function of (a) the image position z,; (b) the aperture side ratio u / h .
180
L. A. BARANOVA A N D S. YA. YAVOR
of C, on the distance between the lens center and the Gaussian image plane. The dashed curves describe the case when the inner electrode potential is lower than the potential on the outer electrodes (V, < V , ) ; the solid curves correspond to the case V, > V,. The lens parameters are: a / b = 2, h/2b = 0.33, and 0.5 and 1.0 for curves I, 2, and 3, respectively. It is clear from the picture that the coefficient C, may change its sign and become negative. This happens only when the potential on the inner electrode is higher than the potential on the outer electrodes ( V, > Vl). For V, < V, the spherical aberration coefficient is always positive. These potential relations are valid for electrons. For ions they are reversed; for V, < V, the coefficient C,, may change its sign, but for V, > V, its sign is constant. This conclusion holds for crossed lenses in the whole range of the measurements and calculations. Figure 72(b) shows the dependence of the spherical aberration coefficient C, on the ratio of the aperture sides a/b, which considerably affects the aberration value. At a/b = 1 (square aperture), the quadrupole component of the lens field is zero. For this reason, the spherical aberration coefficients in the planes xOz and yOz (the same as the paraxial properties) are identical. However, even a slight growth of a/b sharply reduces the coefficient C,; when V, > V, it passes through zero near a/b = 1. For large values of a/b the spherical aberration depends but slightly on this parameter. B. Modijications of an Einzel Crossed Lens
The effect of aperture shape on the electron optical parameters of a threeelectrode einzel lens was studied experimentally by Baranova et al. (1 982). The optical power and spherical aberration coefficients Clx, C,, in lenses with rectangular and curved apertures were compared (Fig. 69). The measurements showed that at the same values of a/b, the optical power of the first lens was higher than that of the second one. The aberration characteristics in the midplanes of both lenses are similar. If the inner electrode potential is higher than the potential on the outer electrodes, then the coefficient C,, in either lens grows with the interelectrode distance, going to zero under certain conditions. Therefore, by varying this distance, one can obtain C , , = 0 for different positions of the image. In a lens with rectangular apertures, the variation limits for the coefficient C , , are somewhat smaller. The coefficient C, contributes to the aberration of the image outside the midplane. In contrast to Clx, it decreases with the distance between the lens electrodes. However, the coefficient C, of a lens with curved apertures changes its sign under certain conditions, whereas in a lens with rectangular apertures it remains positive in all the operational regimes under study. The sign of the coefficient C, in lenses with rectangular apertures can be
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
181
changed by reducing the aperture length in the outer electrodes down to a certain value, dependent on the inner electrode geometry. However, C, and C , , do not vanish simultaneously. The aberration properties of a crossed lens can be considerably improved by choosing the conditions under which both coefficients are sufficiently small. For instance, at h/2b = 0.45 and with the ratio u/b = 2.5 for the inner electrode, the ratio a/b 8 1.9 should be used for the outer electrodes. If the inner electrode has a/b = 2.0, the outer ones should have an optimal value for u/b of about 1.65; for a/b = 1.6 in the inner electrode, the a/b value for the outer electrodes is about 1.3. Methods to increase the optical power of crossed lenses were considered in the work of Baranova et al. (1986a). It was pointed out earlier that the value l / f for these lenses grows with increasing a/b ratio. According to the measurements, this value increases twofold in three-electrode lenses if the a/b ratio is raised from 1.6 to 3.0. However, for a given aperture width, this increases the transverse dimensions of the lens, which is not always feasible. There are two other ways of increasing the optical power: by decreasing the interelectrode distance or by using a larger number of electrodes in the series. The measurements made on an electron optical bench by the two-grid shadow method have shown that for a/b = 2.0-3.0 the optical power of a threeelectrode einzel lens grows 1.5-1.7 times when the interelectrode distance h/2b changes from 0.5 to 0.16. Hence, the application of lenses with large interelectrode distances is unjustifiable, because this simultaneously reduces the optical power and increases the system length. However, the dependence of l/f on h is not monotonic; it reaches its maximum at an interelectrode distance on the order of (0.1-0.2) 2b. When the electrodes approach each other more closely, l/f again begins to drop due to the overlapping of the fields. The spherical aberration coefficient C, passes through zero at a certain interelectrode distance. Further increase of the lens optical power can be achieved by adding one or more pairs of serially arranged electrodes. As earlier, the apertures in the neighboring electrodes are rotated 90" to each other. All the odd electrodes have the anode potential V,; the potential V, is applied to the even electrodes. It should be emphasized that the constructional complexity of crossed lenses depends little on the number of electrodes. The investigations have shown that for a sufficiently large interelectrode distance (h/2h 2 OS), the optical powers nearly add; in a five- or sevenelectrode lens, l/f grows about 2-3 times, respectively. However, with decreasing interelectrode distances, this effect becomes weaker, which seems to be due to the field overlap. When the aperture lengths become greater, the field overlap also grows. Spherical aberration measurements show that in this case there are regimes with small or negative C,. A larger number of electrodes at a constant focal length commonly reduces the spherical aberration.
182
L. A. BARANOVA A N D S. YA. YAVOR
We have so far considered the properties of einzel lenses. Immersion lenses, however, have not received that much attention from researchers. The work of Yavor (1970) gives an expression for the focal length of a two-electrode immersion lens regarded as a thin lens:
.fx
2 V,h =
(V2
-
Vl)’
f, = -
2V1h (V2 - Vl).
(374)
A numerical calculation of such lenses has been made by Gritsuk and Lachashvili (1979), who calculated the electric field by the method of integral equations with boundary collocation. The paraxial properties and spherical aberration were determined from a family of trajectories calculated by integration of the equations of motion. Series of trajectories enter the system for a fixed position of the object. The image position was regarded as the limit to which the point of interception of the trajectories with the system axis tended to go, while the trajectory inclination in object space tended to go to zero. Analogous limit values were used to find the spherical aberration coefficients. C. Systems of Crossed Lenses
Like other astigmatic lenses, crossed lenses are most often used as doublets and triplets. Such lens systems converge the beam in all directions, permitting the creation of a stigmatic image or formation of a beam with controlled astigmatism. A doublet is the simplest system of crossed lenses capable of focusing charged particles in all directions. Figure 73 shows schematic diagrams of five-electrode and six-electrode doublets. In a five-electrode doublet, the central electrode is common to both lenses [Fig. 73(a)]. In this system the first, third, and fifth electrodes have the same
FIG.73. Schematic diagrams of doublets formed by crossed lenses: (a) five-electrode;(b)sixelectrode.
T H E OPTICS OF R O U N D AND MULTIPOLE ELECTROSTATIC LENSES
183
potentials Vl. The second and fourth electrodes have the potentials V, and V4, one of which is to be higher than Vl, and the other lower, so that a fiveelectrode doublet converges the particles in two perpendicular directions. A six-electrode doublet may be formed either by two equally oriented crossed lenses or by lenses rotated 90" relative to each other [Fig. 73(b)]. In the first case, one of the potentials on the inner electrodes of the lenses must be higher than Vl, and the other must be lower. In the second case, the inner electrode potentials (V, and V,) must be simultaneously higher or lower than V1 and may have the same value. Although the design of a five-electrode doublet is a little simpler, the sixelectrode structure possesses a number of essential advantages. A six-electrode doublet permits a unipolar power supply to be used for the lenses (for example, V2 > V , , Vs > Vl). I t has a higher optical power and permits a more complete compensation of the spherical aberration. In a five-electrode doublet, only one of the lenses may possess a negative coefficient of the spherical aberration in the midplane. In a six-electrode doublet, if V2 > V, and V, > V , , one lens may possess a negative spherical aberration in the x0z-plane, the other in the y0zplane. Thus, in both midplanes one can achieve at least a considerable reduction of the spherical aberration, if not a complete compensation of it. First-order electron optical parameters and spherical aberration of stigmatic six-electrode doublets have been measured by Petrov et al. (1978). The inner electrode potentials of the doublet lenses providing the stigmatic mode are shown in Fig. 74 as a function of the image position. Figure 75 presents the measurements of the spherical aberration of a stigmatic doublet
"1
1
2 j
4 I
I
1
-1
I
,
1
1
I
I
I
2 4 2,/26 0 a 6 2 4 ZJ26 FIG 74. Potentials on the lenses of a six-electrode doublet providing the formation of a stigmatic image: curves 1, 2, 3, and 4 correspond to z,/2h = 10.3, 8.3, 6.7, and 5.7, respectively.
8
1E
184
L. A. BARANOVA A N D S. YA. YAVOR
c,,
fO2Clk
4 t
FIG.75. Spherical aberration coefficient of a six-electrode doublet in the stigmatic mode: curves I , 2, and 3 correspond to z0/2h = 10.3, 8.3, and 6.7, respectively.
in the x0z-plane at various positions of the object and image. In one of the regimes, measurements of the spherical aberration coefficients were made in both planes xOz and yOz. Both coefficients, C , , and C 1 , , were shown to be negative and small in their absolute values. A triplet of crossed lenses represents an optically more flexible system, which allows, in particular, similar magnifications to be obtained in both planes of symmetry. Such systems have found wide applications in oscillographs in designs with increasing deflection. They have replaced quadrupoles possessing more complex designs and thus have permitted higher performance characteristics of the instrument to be achieved. Questions concerning the application of crossed lenses for focusing electron beams in vidicons are discussed in the work of Petrov (1982). D . Correctors of Geometrical Aberrations
Correctors may have a construction very much like that of crossed lenses. Plate electrodes can be used to design electron optical elements, which allow us to correct geometrical aberrations of various orders. The properties of a planar electrode with an aperture are largely determined by the degree of the aperture symmetry. In a crossed lens the electrode aperture has two planes of symmetry; the basic harmonic in the potential expansion of such lenses is thus the quadrupole harmonic, in addition to the axially symmetric one.
T H E OPTICS OF R O U N D AND MULTIPOLE ELECTROSTATIC LENSES
185
FIG.76. Modifications of planar correctors of geometrical aberrations: (a) with four planes of symmetry; (b) with three. five, and six planes of symmetry, respectively.
An electrode with an aperture having four planes of symmetry [Fig. 76(a)] is analogous to an octupole. The electrostatic field it creates contains the axially symmetric and octupole components as the first two harmonics. Such an element can be used to correct third-order geometrical aberrations. Generally, a planar electrode with an aperture possessing N planes of symmetry corrects aberrations of the ( N - 1)-th order. Figure 76(b) shows possible types of such correctors that are easy to combine with crossed lenses, broadening their functional potentialities. Third-order aberration correctors have been studied experimentally in combination with an einzel crossed lens and a doublet of such lenses (Baranova et al., 1978; Baranova et al., 1982). It is shown that the axially symmetric component of a corrector field is not large and does not affect firstorder focusing within the experimental error. The spherical aberration of a corrector, as well as that of a standard octupole, depends linearly on the voltage applied to it. We shall illustrate this by describing the result of the corrector action on the spherical aberration of a crossed lens. Measurements were made of the spherical aberration of a crossed lens, together with two types of correctorsone with a cross-shaped aperture, and the other with a square one. They involved the whole length of the line image created by the lens. The lens with
186 ',5
c ,
L. A. BARANOVA A N D S. YA. YAVOR c1
FIG.77. The spherical aberration coefficients as a function of the potential U = V, - V , : (a) cross shaped; (b) square correctors.
rectangular apertures had an aperture side ratio a/b = 1.6 and an interelectrode distance of hj2b = 0.45. The corrector was placed behind the lens at a distance s/2b = 1.75 from its central electrode. The smaller aperture dimension of the corrector was equal to the smaller dimension of the lens aperture. The measurements of the spherical aberration coefficients of such a system are shown in Fig. 77 as a function of the corrector potential V, (on the abscissa, the potential U = V, - V,). The curves correspond to a positive potential U on the corrector with a cross-shaped aperture and to a negative potential U on a corrector with a square aperture, because the octupole components of the two elements are rotated 45" to each other for the electrode position indicated in Fig. 77, when potentials of the same sign are applied. If one of the correctors is rotated 45" relative to the positions shown in the picture, both electrodes will correct at the same polarity of the power supply. One can see from the illustrations that the corrector changes the coefficients C,, and C,, in opposite directions, so that if one coefficient is negative and the other positive, the corrector can considerably reduce the spherical aberration of the line image in the lens. The aberration minimum is achieved at smaller potentials if a corrector with a square aperture is used. Such a corrector should be preferred if we recall that it also has a simpler aperture configuration. A similar effect was obtained in the correction of spherical aberration A x in a stigmatic doublet, when the corrector was placed between the lenses at the same distance from each of them. The coefficients C,, and C,, in the conditions under investigation d o not vanish simultaneously but vanish at different values of the corrector potential. The minimum image width was obtained for an intermediate value of the corrector potential.
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 187
It should be noted that in addition to the correctors we have described, crossed lenses easily combine constructionally with round and twodimensional lenses formed from planar electrodes arranged in a similar way. This principle can be used to design various electron optical systems with a large variation of their focusing and aberrational properties.
XII. NEWTYPES OF LENSES.ABERRATION CORRECTORS In Sections VII to XI, we have discussed in detail the properties of the principal electrostatic lenses. In this section we shall deal with lenses that have not, for some reason, found wide application but are, however, of interest and worth special consideration. In recent years there have been some problems requiring high-quality focusing of high-energy beams as well as high current density on the target. A possible approach to such a problem is to use lenses with transverse fields and hollow beams. Electrostatic lenses that can provide focusing of such beams will be described in Section XI1.A. A description of radial lenses, whose electrodes consist of segments of a cone or wedge, is given in Section X1I.B. Finally, Sections X1I.C and X1I.D consider some methods for the correction of geometrical aberrations. An account is given of the designs of some electrostatic systems used for this purpose. A . Coaxial Lenses with Transverse Fields
Focusing of high-energy beams usually requires the use of quadrupole lenses possessing predominantly a transverse field. However, in order to provide stigmatic focusing with slightly differing magnifications in two perpendicular directions, it is necessary to use systems consisting of several quadrupoles. The adjustment and matching of such systems present a certain difficulty. It is quite clear that stigmatic focusing can be obtained much easier in round lenses, but usually such lenses exhibit a small optical power, because their fields are predominantly longitudinal. A strong-focusing round lens with a transverse field can be designed on the basis of two coaxial electrodes, one inside the other. By applying a potential difference to these electrodes, we produce a transverse axially symmetric field. Because the electrode axis lies outside the lens field, the lens is suitable for focusing hollow beams only. Calculations of the optical properties of coaxial lenses cannot be made directly in the framework of the paraxial optics theory developed earlier. If we expand the trajectory with respect to the lens axis, as we did before, the
188
L. A. BARANOVA A N D S. YA. YAVOR
restriction to the first expansion term will result in large errors, because the beam travels far from the axis. If the meridian trajectory is taken to be the beam axis, the calculations of the optical properties become considerably more complicated because of its curvature. The theory of systems with a curved axis has been developed in the work of Grinberg (1948) and Vandakurov (1957). A simple example of coaxial lens design is a lens consisting of two coaxial circular cylinders, with the radius of the inner cylinder being small. The potential distribution within the lens away from the edges has the form
Here, r l , r2 and V,, V2 are the radii and potentials of the inner and outer cylinders, respectively. The trajectory of a charged particle in such a field is expressed in terms of quadratures. Retaining the general character of the expression, we can set V2 = 0, so that for the meridian plane we have (see Kel’man and Yavor, 1968): z
cosv,dp
- 20
1
Jsin’ vo
+ D In p
,
D=
2e Vl mu; In(r2/r,) ’
(376)
Here, the subscript 0 denotes the initial parameter values, p = r/rO and cos vo = iO/uO. Expression (376) allows us to calculate, particle trajectories in the lens in question and to determine its optical properties using trajectory analysis. One can also calculate a single trajectory taken to be axial and expand the others in small parameters (angles and distances) around the axial trajectory. Further, the conditions for the beam focusing with respect to the curved axis are determined by conventional methods. One of the problems is to select lens parameters in such a way that the plane of focusing of the hollow beam round the axial trajectories coincides with the point of interception of the latter which the lens axis. The optical properties of a lens with a cylindrical rod on the axis have been analyzed in a number of studies ( e g , Krejcik et al., 1979, 1980a, 1980b; Liebl, 1979).A parallel beam entering the lens through a circular slit is considered. Numerical calculations show that the position of the point of intersection of the trajectory with the lens axis strongly depends on the initial distance between the particle and axis. This results in increasing spot size in the vicinity of the point of intersection. The spot can be made smaller by using a doublet of coaxial lenses with opposite polarity on the outer electrodes and a common axial electrode (see Fig. 78). A calculation was made of two doublet variants with different lens sequences. In one variant, the electrode potentials were such that the particles
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES 189
FIG.78. A doublet of round lenses with an electrode on the axis.
in the first lens were deflected towards the axis, while in the other lens they were deflected away from the axis due to the opposite field direction. In the other variant, the lenses were arranged in the reverse order. The focusing quality is better in the second variant, in spite of the fact that the trajectories in it travel farther from the axis than in the first one. The numerical results were verified on an experimental device designed for focusing beams of protons with an energy of 400 keV. A system of four coaxial lenses with a common axial electrode of 0.2 mm in diameter was used. The outer electrodes were 16 mm in diameter; their lengths were 50 and 100 mm. The lens potentials varied between 1.0 and 2.7 kV. The intermediate radius of the circular aperture varied from 3.0 to 3.3 mm. The spot diameter was found experimentally to be 1.3 mm for a circular aperture width of 0.6 mm, which agrees well with the calculations. Thus, coaxial lenses are suitable for the focusing of ion beams of several MeV, the lens potentials being a few dozens of kV. B. Radial Lenses
We shall discuss here the focusing properties of electron optical elements, in which the potential distribution does not depend on the value of the radius vector in spherical coordinates. Such fields are created, for example, by conical electrodes, which can be cut along the generating lines [Fig. 79(a)]. Another possibility is to use planar electrodes forming a wedge and cut along straight lines emerging from the same point on the wedge edge [Fig. 79bl. Radial systems were suggested for use as charged particle spectrometers (wedge-like and cone-like prisms) in the studies of Glikman et al. (1973b, 1977).The possibility of focusing charged particles by means of such systems were considered by Yavor (1 984), Baranova and Yavor ( 1 984), and Baranova et al. (1985).The potentials must be applied to the electrodes in such a way that
190
L. A. BARANOVA AND S. YA. YAVOR
2 4 +Y
I
Y FIG.79. Radial lenses: conic (a);wedge-like (b)(the potentials on the central V , , and outer V, electrodes).
K , intermediate
the field possesses two planes of symmetry, and the lens axis then coincides with their interception line. The potential distribution cp(lc/,v) far from the lens edges is found by separation of variables, because the lens electrodes cohcide with the coordinate surfaces of the spherical coordinate system. In a radial conical lens cut into m-electrodes, it has the form (Baranova et al., 1986b):
Here, 28 is the cone angle and f V are the potentials on the main electrodes. The coefficients a2(2kf depend on the number of electrodes, their angular dimensions, and the potentials on the additional electrodes. The simplest conical lens consists of four electrodes with alternating potentials on them, which can be regarded as a quadrupole lens with a varying aperture. If the value of the cone angle 28 tends to zero, the potential distribution in Eq. (377) is transformed into the eorresponding distribution for a quadrupole with concave dectrodes. By increasing the number of electrodes into which the cone has been cut, we can control the field configuration, for example, linearizing it or creating an additional octupole component. Series (377) can be reduced to closed form; in the four-electrode conical lens, we'obtain the following expression (Baranova et al., 1987b): 2v
cp(v, lc/) = -arctan 71
8 v 2tan2-tan2-cos2t,b
[
tan4 ~2
-
tin4 v/2
T H E OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
191
The field of a wedge-like radial lens was calculated by Baranova et al. (1985),who also found it in closed form. Unlike the conical lens, the wedge field has no planes of antisymmetry, which gives rise to a nonzero potential on the axis. A wedge-like lens can be compared to an asymmetrical quadrupole lens, which has no planes of field antisymmetry either. The equations for paraxial trajectories in radial lenses can be obtained from Eq. (38) by Substituting the corresponding expressions for $2(t). We shall assume that the potential distribution within the lens is described by the expression for the two-dimensional case and that it drops to zero bn the edges. Then we shall have z2xl‘ + p x
= 0,
z2y”
-
p2y = 0,
(378)
where Bz is the lens excitation. In a four-electrode conical lens (379) In a three-electrode wedge-like lens [ V , = Vo = 0; see Fig. 79(b)], we have
Here K O and K , represent the coefficients in the potential series expansion equal, respectively, to
2 K O =-arctansinh II
n
K2
=z 0
The trajectory equations (378) are the Euler equations and have analytical solutions. For a 2 = B2 - 1/4 > 0, we obtain
+
x = (~)1’2[xocos(aln~)
The reference point for the z-coordinate is the tip of th,e cone or wedge. In order to obtain the trajectory projections on the y0z-plane, it is necessary to replace the trigonometric functions by the hyperbolic ones in expression (382). The condition z2 I 0 corresponds to a weak lens, but we shall not deal with this case here.
192
L. A. BARANOVA A N D S. YA. YAVOR
The expression for the focal lengths of a radial lens in the x0z-plane, which is convergent, is
The length of the region in which the field is considered to be nonzero is called the effective length L of the lens, and somewhat exceeds the length of the electrodes. The values of all coordinates at the lens entrance are denoted by the subscript 0. A specific feature of radial lenses, as well as of other asymmetric lenses, is the shift of their optical center relative to the geometrical center. By the optical center we understand a point to which both principal lens planes move as the optical power of the lens tends to zero. The optical center is shifted relative to the geometrical center towards the smaller aperture, and the larger the electrode divergence angle, the greater the shift (Baranova et al., 1986~). The aberrations in conical radial lenses are described by the general expressions for aberration integrals obtained from calculations of electron optical systems with two planes of symmetry and two planes of antisymmetry. Therefore, general conclusions from these expressions also hold for them. The fundamental conclusion concerns the impossibility of cancelling third-order spherical aberration. In calculations of geometrical aberrations in wedge-like lenses, account should be taken of the fact that due to the absence of planes of field antisymmetry, supplementary fourth-power terms (octupole) appear in the expansion. These terms are determined by the system geometry and contribute to the third-order aberrations. Because of the large choice of the geometrical and electrical parameters of wedge-like lenses, the additional octupole component may vary in sign and, over a wide range, in value, permitting us to raise the question of correcting geometrical aberrations. Radial systems can be used to form achromatic lenses if electric and magnetic fields are applied simultaneously (Baranova and Yavor, 1984; Baranova et al., 1986d). Equations (378) preserve their form if the lines of force of the electric and magnetic fields in the paraxial region are mutually perpendicular, which happens only when the planes of symmetry of one field coincide with the planes of antisymmetry of the other. In this case a particle traveling along the z-axis is affected by parallel forces from the electric and magnetic fields; the polarity of the electrodes and poles must be chosen in such a way that the forces act in the opposite directions. Like quadrupoles, conical achromatic systems have the poles of the magnetic lens rotated 45" relative to the electrodes of the electrostatic lens. In wedge-like achromatic systems, the poles in the magnetic lens are shifted with respect to the electrostatic lens electrodes in the direction normal to the z-axis.
THE OPTICS OF R O U N D AND MULTIPOLE ELECTROSTATlC LENSES
193
The condition of achromaticity can be obtained by differentiation of the expression for the excitation b’ of a compound lens with respect to energy and setting the derivative equal to zero, as we did for a quadrupole lens in Section VIII [see Eq. (309)l. Radial systems allow deflecting and focusing fields to be easily combined in one unit. One of their possible applications is the focusing of deflected beams, because in such systems the varying aperture can be matched with the trajectory path. C. Correction of Geometrical Aberrations by Means of Octupoles
Correction of the geometrical aberrations of focusing systems would allow us to improve the parameters of a large number of electron optical devices. Scherzer (see Section 1V.B) has shown that a possible way of achieving such correction is the application of elements not possessing axial symmetry. It can be seen from the expressions for the third-order geometrical aberrations that their value can be changed by changing the potential expansion term, which depends on the fourth powers of transverse coordinates, that is, the value 44(z). The electron optical element, whose first term in the potential expansion depends on the fourth coordinate powers, is the octupole. It is formed by eight symmetrically arranged electrodes, to which alternating potentials f U are applied (Fig. 80). The potential distribution in the octupole can be found from
FIG.80. An electrostatic octupole
I94
I A R A K A N O V A AN11 5 Y A Y A V O K
Eq. (33),setting (/I(:)
=
const and
(/)L(~)
0: (384)
Here, [ ( z )characterizes the dependence of the potential distribution on the :-cciordinatc; the value of t ( : ) is normalized ( o u n i t y . The cocllicient K , is related to the electrode profile; i t is equal to unity if the pole profile is described by the function , / ' ( . Y , J ~ ) = (s4 6 s ' ~ ' + y4)/R4. The octupole field is predoniinantly transverse and has four planes of symmetry and four planes of antis ymme t r y . We shall recall that the question of aberration correction has been partly discussed in Section X1.D with reference to specially designed octupoles (on the basis of planar electrodes with apertures), which combined well with crossed lens systems. It can be seen from the trajectory equation (85)that the octupole field does not affect the lirst-order focusing properties, but contributes o n l y to aberrations; this contribution can be varied by altering the potential Ll on the electrodes. I t can be shown that the combination of octupoles with a round lens does not allow its geometrical aberrations to be completely corrected. This is due to the fact that the field of a round lens does not depend o n the azimuthal angle, while the octupole field is periodic in angle with a period of n,/2 and changes sign within this pcriod. Therefore, while reducing the aberration in two perpendicular directions, the octupole increases i t i n directions rotated 45' . When correcting the spherical aberration, the octupole transforms the aberration disc formed by the round lens into a rosette. I t was shown by Scherzcr and Typke ( I 967/6X) that in this case, however, there is some total gain in the resolving power due to the cusrent density redistribution in the Gaussian image plane. 1'0 provide complete correction of the third-order spherical aberration, astigmatic elements have to be introduced into the system. Such elements deform the beam, making it astigmatic, and the octupoles are arranged in such a way that the directions in which they increase the aberration coincide with those in which the beam cross section is small. O n e should bear in mind that the number of aberration coctfcients then increases, and the number of octupoles must be equal to it. In aberration correction design, the octupoles may be combined with lenses or used separately. A design is much simpler if the octupole is combined with a quadrupole lens. For this, a n octupole like the one shown in Fig. 80 or similar t o it but having difkrent electrode profiles, receives additional potentials, which create the quadrupole field component. For instance, the potential V is applied to the two side electrodes, while - V is applied t o the upper and lower electrodes. The electric supply in such a system can ~
+
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
195
independently control the octupole and quadrupole field components. The trajectory equations in a separate octupole written up to third-order terms have the form [see Eq. ( 8 5 ) ] x" = yt(z)x(x2 - 3 9 ) ,
y" = yt(z)y(y2 - 3x2).
(385)
Here, y represents the octupole excitation related to its parameters as follows:
The contribution of the octupole to the third-order geometrical aberrations can be found from expressions (385). For simplicity, we shall restrict ourselves to consideration of the spherical aberration only. Correction of geometrical aberrations in electron optical systems containing octupoles has been studied in great detail by Baranova and Ovsyannikova (1971) and Baranova et al. (1971). The spherical aberration in systems with two planes of symmetry but containing no apertures is given by expression (297). The spherical aberration coefficients of a separate octupole have the form
c,, = -y
s",
c,, =c2y = 3y
r(z)x,4dz,
1
c,, =
Iu
t(z)y2dz, -m
(387)
e
t(z)x,"y,"dz,
-00
where x, and y , are independent solutions of the differential equations (385) without the right-hand sides. They depend linearly on z and satisfy the following initial conditions at the entrance to the octupole: .%(ZI)
= a,,
Y&l)
= a,,
. a z , ) = Y;(z,) = 1.
(388)
It has been assumed that the beam at the octupole entrance is astigmatic; a, and a, represent the distances from the object to the entrance in the XOZ- and y0z-planes, respectively. The total spherical aberration of the system can be determined from formulae (97). If the octupole is superimposed on the lens, the trajectory equations taking into account the third-order terms have the form of Eq. (85). The image distortion due to the presence of spherical aberration is described by the same expression [see Eq. (297)], in which the coefficients are determined by mere summation of the lens and octupole coefficients. The latter can be found using formulae (387), where x, and y , represent independent solutions of the paraxial trajectory equation for the lens. In the thin-lens approximation, the spherical aberration coefficients are
196
L. A. B A R A N O V A A N D S. YA. YAVOR
identical for both the separate octupole and the one superimposed on the lens: C,, = -$a:, C - -yea,,4 C,, = CZy= 3ydu;a;, (389) 1,
-
where P is the effective length of the octupole:
P
1
co
=
t(z)dz.
(390)
-co
From the rectangular approximation for the function t(z), we can obtain the following expressions for the coefficients of the separate octupole: 1 Clx = - -r"P 5
+ ad5 - a
[:
Czx = CZ, = 3y - P 5
3
c1,
1 =
-
j*"(P
+ ay)5
-
a;],
1 1 + -2t 4 ( a , + a,) + -3t 3 ( a f + 4axay+ a;)
+ P2a,ay(a, + a,) + &;a;
1
J
,
(391)
When, however, the octupole is superimposed on a quadrupole lens, and the field distributions in them are approximated by the rectangular model, the aberration coefficients of the octupole can be calculated using formulae (387) and taking into account Eq. (271). In order to find the conditions for correction of spherical aberration in a system, it is necessary to write down the expressions for its coefficients derived from formulae (97) and to equate them to zero. The set of equations thus obtained is used to derive the required octupole excitations. We shall illustrate this by giving the condition for correction of the coefficient C,, in a system consisting of a quadrupole lens and an octupole that follows it at a distance 1. In the thin-lens approximation involving Eqs. (300) and (389), this condition will take the form
Comprehensive information about aberration correction is contained in the work of Hawkes, (1977) and Yavor et al. (1969). Later results concerning the designing of systems including correctors have been given by Pohner (1977), Bernhard (1980), and Hely (1982). Introduction of the octupole component in the field can be made by changing the lens design, which in fact will mean superposition of the octupole lens, rather than by introducing additional elements. This can be illustrated by the crossed lens described in Section XI. Here we shall dwell on the designs based on quadrupole lenses. An octupole component arises in a quadrupole if
THE OPTICS OF ROUND AND MULTJPOLE ELECTROSTATIC LENSES
v,
/I
Flci.
49
tY
6
197
C
8 I. Quadrupole-octupole lenses: (a) five-electrode; (b) asymmetrical with planar
electrodes: (c) three-electrode lenses.
it has no plane of field symmetry or of field antisymmetry. This situation can be achieved either by disturbing the lens geometrical symmetry or (in some cases) by changing the power supply. The potential distribution in such a lens is described by the series in Eq. (33), in which 4 4 ( ~#) 0. In this case a potential usually arises on the axis $(z). The fourth-order term with respect to the x- and y-coordinates coincides with the first term in the potential expansion of the octupole and can be employed for correction of third-order geometrical aberrations. The value and sign of $4(z) can be varied by varying the electrode profiles. One of the simplest designs of a combined quadrupole-octupole lens permitting electrical control of the octupole field component is the fiveelectrode lens (Baranova et al., 1968),whose possible modification is given in Fig. 8l(a). The lens consists of four inner electrodes and one outer electrode embracing the other four. The quadrupole component is determined by the potential V. The octupole component is formed due to the bulging of the field between the inner electrodes, and its value varies with the potential -t U. The octupole that is formed in this case differs from the one shown in Fig. 80 in that it has no planes of field antisymmetry. In the general case, the axial potential of a five-electrode lens is nonzero, which gives rise to an axially symmetric field component. The relationships between the components in the lens potential expansion characterized by the functions $i(z) [see (33)] can be varied by varying the geometrical parameters of the lens: the angular dimensions of the interelectrode gap 2 6 and the ratio of the radii R , / R , . The potential that arises on the axis can be compensated by applying a potential that is equivalent but opposite in sign, to all the lens electrodes. This lens compares favorably with an eight-electrode quadrupole-octupole lens because it is simpler in design but retains independent electrical control of the quadrupole and octupole components. Partial correction of third-order aberrations is also possible in quadrupole
198
L. A. BARANOVA A N D S. YA. YAVOR
lenses, which have only two planes of geometrical symmetry instead of four. This can be achieved, for example, by moving apart one pair of electrodes or by changing the electrode size ratios. Figure 81b illustrates an asymmetric lens with planar electrodes. A similar design can be made on the basis of a lens with concave electrodes by increasing the angular dimensions of one pair of electrodes at the expense of the other pair. A three-electrode quadrupole lens formed from the embracing outer electrode and two inner electrodes is also possible (Fig. 81c). The advantage of asymmetrical lenses is the design simplicity and the absence of additional potentials for creating the octupole component. One disadvantage is that no electrical adjustment is possible. A system of two sextupoles has also been suggested as a corrector of thirdorder spherical aberration (Crewe and Kopf, 1980). One example of a lens with a transverse field possessing two planes of symmetry and having no planes of antisymmetry is the biplanar lens (Afanas’ev, 1982; Afanas’ev and Sadykin, 1982). Its electrodes lie in two parallel planes, and the potentials are applied to them symmetrically relative to the midplanes. Such a lens is similar in design to a two-dimensional einzel lens [Fig. %(a)]; the difference is only in the directions of particle movement: The axis of a biplanar lens is parallel to the x-axis, while that of a twodimensional lens coincides with the z-axis. The methods for calculation of the field distribution in these lenses are the same, but the power expansion of small parameters is made around different axes. The basic field component of a biplanar lens in the vicinity of the axis is the quadrupole. The field also contains a weak axially symmetric component associated with the edge effects, as well as all even harmonics, since the planes of field antisymmetry are absent. Calculations of the lens paraxial properties made in the rectangular model approximation for the quadrupole component and assuming linear variation for the axial potential at the lens edges are consistent with the experimental results obtained by the shadow projection method. The advantages of this lens are design simplicity and the possibility of correcting the geometrical aberrations. D . Lenses with Partial Aberration Correction
Here we will discuss some unusual electrostatic lens designs, in which third-order aberrations can be corrected. The correction is possible due to the octupole component arising in the lens. The work of Okayama and Kawakatsu (1982) describes a new electron optical element consisting of an electrostatic quadrupole lens and a round aperture placed coaxially at some distance from it. When a nonzero potential is applied to the aperture diaphragm, an
THE OPTICS OF ROUND AND MULTIPOLE ELECTROSTATIC LENSES
199
octupole component arises in the region where the quadrupole and aperture fields overlap. The potential distribution of such an element was found by numerical solution of the three-dimensional Laplace equation using the relaxation method. It was shown that the effective octupole lies near the edge of the quadrupole lens. The paraxial properties and third-order aberration coefficients were calculated as a function of excitations of the quadrupole and the potential on the aperture diaphragm. I t was found that in such an element the spherical aberration in the converging plane could be completely corrected. This conclusion was confirmed experimentally using the shadow projection method. An advantage of this design is the automatic adjustment of the effective octupole with respect to the quadrupole lens, which permits the image errors due to imperfect adjustment to be eliminated. A way to correct the aberrations of lenses, whose electrodes are cylindrical or resemble elongated boxes with a rectangular cross section, was described in the monograph by Klemperer and Barnett (1971). For this purpose, the gaps between the electrodes were curved (lipped lenses). Such lenses.were studied in detail by Glikman and Sekunova (1981) and Glikman and Iskakova (1982). Depending on the degree of symmetry, the lens field may include, beside an axially symmetric component, a quadrupole or/and an octupole component. The first work describes a box-like lens with a square cross section that has the gaps on each face cut along the circle. Its field possesses four planes of symmetry and represents superposition of axially symmetric and octupole components. The lens forms a correct electron optical image, but its spherical aberration is different from that of a round lens, having two coefficients instead of one. It has been pointed out earlier that the spherical aberration of an axially symmetric system can not be compensated by octupoles unless astigmatic elements are introduced. However, partial reduction of the aberration is possible. Stigmatic einzel tube lenses formed by three electrodes are considered by Glikman and Iskakova (1982). The lines separating the electrodes represent the lines formed from the intersection of two cylinders of identical radius (Fig. 82). The lens is astigmatic because of the presence of only two planes of symmetry. The authors found the parameters that describe the paraxial properties of such lenses and the coefficients of their spherical aberrations. It is shown that the coefficients are of opposite signs and that one of them passes through zero. Tube lenses with interelectrode gaps having a rectangular tooth-like profile have been described by Glikman et al. (1984) and Glikman and Iskakova (1985). These lenses can be recommended when high-quality focusing in one direction is necessary. Tube lenses can be used as stigmators.
200
L. A. BARANOVA A N D S. YA. YAVOR
I
I
FIG.82. An astigmatic einzel tube lens
XIII. CONCLUSION Electron optics, which at their initial stage of development served mainly as a basis for electron microscopy and mass spectrometry, has expanded its range of application considerably. During the last decade, interest in electrostatic electron optics has increased greatly. This interest has been largely stimulated by new applications of electrostatic optical devices and instruments, in particular, to solve important technological problems in the field of solid state electronics. To cope with these problems, it is necessary to increase the lens gathering power, to optimize charged particle convergence, to create the desired current density distribution in the beam cross section, etc. Investigations, primarily of theoretical character, have been concerned with both conventional systems and the design of new ones. At present, there is a great variety of lenses and lens systems. The type of electron optical element and its basic properties are determined by such parameters as field symmetry, independent potential distribution with respect to any of the coordinates, as well the mutual positions of the field symmetry axis and the axis of the focused beam. The effect of the latter factor can be illustrated with reference to various classes of electron optical elements created on the basis of an axially symmetric field. When the beam axis coincides with the axis of field rotational symmetry, we have a standard round lens. When, however, the axes are normal to each other, we obtain a transaxial lens. The fundamental difference in their electron optical properties is quite evident. The material presented here shows that the requirements that arise in each particular case cannot be satisfactorily met by any one class of lenses. Each class of lenses has a predominant sphere of application. The choice is primarily made on the basis of lens paraxial properties. (The choice between electrostatic and magnetic systems is made on the basis of their properties discussed in the introduction.) When a sharp, undistorted image is desired,
THE OPTICS OF ROUND A N D MULTIPOLE ELECTROSTATIC LENSES
201
round lenses are undoubtedly preferable. If, however, a perfect image is unnecessary, astigmatic lenses should be preferred in many cases. Within each class of lenses, various designs are possible, which are determined, on the one hand, by the permissible dimensions and production technology and, on the other hand, by the requirements on the optical characteristics. Of great importance is lens optimization in order to obtain the desired values of the parameters of the instruments to be designed. Extensive investigations of various lens types in a wide range of geometrical and electrical parameters have provided a basis for a possible choice of suitable modifications. This book has given a few illustrations to show how optical characteristics can be essentially improved by selection of lens designs, in particular, by introducing asymmetry, by increasing the number of electrodes, and by using a complex power supply. Another straightforward approach to optimization problems is the direct search for potential distribution satisfying the requirements imposed on the system, in particular, the requirement of minimum aberrations. To minimize the spherical and chromatic aberrations in a system of quadrupole-octupole lenses, this approach has involved application of variational methods. However, optimization problems can not be considered to have been solved, and the potentialities of both trends in theoretical research are still great. Of importance also is the improvement of production technologymachining and adjustment of lenses-which may affect the “mechanical aberrations.” This question has many aspects and presents certain difficulties in calculations. in addition to comparatively new types of lenses, transaxial and crossed, which have found application, other promising electron optical systems are emerging. These are, for example, coaxial cylindrical lenses with a hollow beam, which provide high optical power. Such lenses, as well as some other new lens types, have been described in Section XII. New theoretical and computational methods are currently being developed to calculate the electrostatic fields. Much work is being done to standardize the program packages for computer analysis of fields and properties of electron optical systems. Progress in this area is directly related to further advancement in the study of electron optical systems, in particular, lenses. We may expect further improvement of electron optical characteristics and more extensive applications of electrostatic lenses.
REFERENCES Adams, A., and Read, F. R. (1972a). J . Phys. Ec Sci. Instr. 5, 150. Adams. A., and Read. F. R. (1972b).J . Phys. E : Sci. Instr. 5, 156. Afanas’ev, V. P. (19821. Zlr. Tekh. Fiz. 52, 945; Soo. Phys. Tech. Phys. 27, 604.
202
L. A. BARANOVA AND S. YA. YAVOR
Afanas’ev, V. P.,and Sadykin,A. D.(l982).Zh.Tekh. Fiz.52, 1213, 1226;Sov. Phys. Tech. Phys. 27. 735, 737. Afanas’ev, V. P., and Yavor, S. Ya. (1973). Z h . Tekh. Fiz. 43, 1371; Sou. Phys. Tech. Phys. 18, 872. Afanas’ev, V. P., and Yavor, S. Ya. (1977). Z h . Tekh. Fiz. 47, 908; Sou. Phys. Tech. Phys. 22, 544. Afdnas’ev,V. P.,Glukhoi,Yu.O.,andYavor,S. Ya.(1975).Zh. Tekh. Fiz.45,1526, 1973;Sov.Phys. Tech. Phys. 20,969, 1240. Afanas’ev. V. P., Bardnova, L. A,, Ovsyannikova, L. P., and Yavor, S. Ya. (1979). Zh. Tekh. Fiz. 49, 733; SOP.Phys. Tech. Phys. 24, 425. Afanas’ev, V. P., Baranova, L. A,, Petrov, I.’A., and Yavor, S. Ya. (1980). Optik 56, 261. Afanas’ev, V. P.. Bardnova, L. A,, Ovsyannikova, L. P., and Yavor, S. Ya. (1982). Tenth Proc. Int. Congr. Electron Microscopy, Hamburg. Alexandrov, M. L., Gall’, L. N., Lebedev, G . V., and Pavlenko, V. A. (1977). Z h . Tekh. Fiz. 47,241; Sou..Phys. Tech. Phys. 22, 139. , , AniEin, B., Terzit, I., Vukanif, J. and BaboviE, V. (1976): J. Phys. ,Ji; Sci. Instr. 9, 837. Artamonov, 0. M., Bolotov, B. B., and Smirnov, 0. M. (1976). Prib. Tekh. Exp. N3.207. Augustyniak, W . M., Betteridge, D., and Brown, W. L. (1978). Nucl. Instr. Methods 149, 669. Balandin, G . D., Gaydukova, 1. S., Ignat’ev, A. N., and Der-Shvarts, C . V.(1977). Elektronnaya Tekhnika Ser. 4, NI,29. Ballu, Y. (1980). In ‘,‘Applied Charged Particle Optics.’’ Part B. Adv. in Electronics and Electron Physics, (A. Septier, ed.), Suppl. 13B, p. 257. Academic Press, New-York. . Banford, A. P. (1966) “The Transport of Charged Particle Beams.’: E.8c.F.N. Spoon Ltd., London. baranova, L. A., and Ovsyannikova, L. P. (1971). Z h . Tekh. Fiz. 41,2182; Sou. Phys. Tekh. Phys. ’ 16, 1730. Baranova, L. A., and Yavor, S. Ya. (1984). Z h . Tekh. Fiz. 54, 1999; Sou. Phys. Tech. Phys. 29, 1173. Baranova, L. A., Fishkova, T. Ya., and Yavor. S. Ya. (1968).Radiotekh. Electron. 13,2108; Radio. Eng. Electron. Phys. 13. Baranova, L. A.,Ovsyannikova, L. P.,and Yavor, S . Ya. (1971). Zh. Tekh. Fiz. 41, 1323;Sou. Phys. Tech. Phys. 16, 1040. Baranova, L. A., Petrov, I . A., and Yavor, S. Ya. (1978). Z h . Tekh. Fiz. 48,2588; Sot.. Phys. Tech. P hys. 23, I48 I . Bardnova, L. A., Sadykin, A. D., Muchin, V. M., and Yavor, S. Ya. (1982). Z h . Tekh. Fiz. 52, 246; . Sou. Phys. Tech. Phys. 27, 161. Baranova, L. A.; Narylkov, S. C . ,and Yavor, S. Ya. (1985).Z h . Yekh. Fiz. 55,2209; Sou. Phys. Tech. Phys. 30, 1303. Baranova, L. A., Sadykin, A. D., and Yavor, S. Ya. (1986a). Radiotekh. Electron. 31, 365; Radio. Eng. Electron. Phys. 31. Baranova, I,. A., Narylkov, S. G., and Yavor, S . Ya. (1986b). Radiotekh. Hectron. 31. 778; Radio. knq. Electron. Phys. 31 (8). 169. Baranova, L. A., Narylkov, S. G., and Yavor, S. Ya. (1986~).Zh. Tekh. Fiz. 56, 2075; kov. Phys. Tech. Phys. 31, 1246. Baranova,, L.A., Narylkov, S. G.. and Yavor, S. Ya. (1986d). Z h . Tekh. Fiz. 56, 2279; Sor. Phys. Tech. Phys. 31, 1366. Bardnova, L. A., Bublyaev, R. A,, and Yavor, S. Ya. (l987a).Zh. Tekh. Fiz. 57,430; Sou. Phys. Tech. Phys. 32, 26 I . Baranova, L. A.. Narylkov,S.G..and Yavor,S. Ya.( 1987b).Zh. Tekh. Fiz.57,156;Sou.Phys. Tech. P hys. 32,9 1. Berger, C., and Baril. M. (1982). J . Appl. Phys. 53,3950. Bernhard, W. (1980). Optik 57, 73. Binns, K. J., Lawrenson, P. J. (1963). “Analysis and Computation of Electric and Magnetic Field Problems.” Pergamon Press, Oxford.
T H E OPTICS OF R O U N D AND MULTIPOLE ELECTROSTATIC LENSES
203
Bobykin, B. V., Nevinnyi, Yu. A,, and Yakushev. E. M. (1975).Z h . Tekh. Fiz. 45,2369; Sou. Phys. Tech. Phys. 20, 1475. Bobykin, B. V., Zhdanov, V. S., Zernov, A. A,, Lyubov, S. K., Malka, V. Y., and Nevinnyi, Yu. A. (1976).Z h . Tekh. Fi:. 46, 1348: Sou. Phys. Tech. Phys. 21,766. Bobykin, B. V.. Volkova,I.G.,Gall’, R. N., Karetskaya.S. P., Kel’man, V. M., Nevinnyi, Yu. A.,and Kholyn, N. A. (1978). Zh. Tekh. Fiz. 48, 853; Sou. Phys. Tech. Phys. 23, 500. Bonjour, P. (1979a).Reu. Phys. Appl. 14, 533. Bonjour. P. (1979b).Reo. Phys. Appl. 14,715. Bonner, R. F., Hamilton, G. F., and March, R. E. (1979).I n f . J. Mass Spectrom. Ion. Phys. 30,365. Bonshiedt, B. E, and Markovich, M. G. (1967).“Fokusirovka i Otklonenie Puchkov v Elektronnoluchevykh Priborakh.” (Focusing and Deflection of Beams in Cathode Ray Devices). Sov. Radio, Moskow. Brodsky, G. N., and Yavor. S. Ya. (1970).Z h . Tekh. Fiz. 40, 1304; Sou. Phys. Tech. Phys. 15, 1006. Brcidsky, G. N., and Yavor, S. Ya. (1971).Z h . Tekh. Fiz. 41,460; Sou. Phys. Tech. Phys. 16, 356. Brunt, J. N. H., and Read, F. H. (1975).J . Phys. E: Sci. Insrr. 8, 1015. Busch. H. (1926).Ann. Phys. 81,974. Cherepin, B. T., (1981).“Ionny Zond.” (Ion Microprobe.) Naukova Dumka, Kiev. Cherepin, B. T., and Vasil’ev, M. A. (1982). “Metody i Pribory dlya Analiza Poverkhnosti.” (Technique and Instruments for Surface Analysis.) Naukova Dumka, Kiev. CiriE, D., TerziE, I., and VucaniE, J. (1976a).J . Phys. E: Sci. lnstr. 9, 839. Ciri:, D., Terzi:, I., and Vucanii, J. (1976b).J. Phys. E: Sci. Instr. 9, 844. Cook, R. D., and Heddle, D. W. 0.(1976).J . Phys. E: Sci. Instr. 9,279. Courant, E. D., Livingston, M. S., and Snyder, H. S. (1952).Phys. Reo. 88, 1190. Crewe, A. W., and Kopf, D. (1980). Optik 56, 391. Davisson, C. J., and Calbick, C. J. (1931).Phys. Rev. 38, 585. Der-Shvarts, G. V., and Makarova, 1. S. (1966).Radiofekh.Electron. 11, 1802; Radio Eny. Electron Phys. 11. Der-Shvarts. G. V.. and Makarova, I. S. (1969).Rudiofeckh.Electron. 14,378;Radio Eng. Electron Phys. 14. De Wolf, D. A. (1978). Proc. I E E E 66, 85. De Wolf. D. A. (1981). Proc. I E E E 69, 123. Di Chio, D., Natali, S. V., and Kuyatt, C. E. (1974). Rev. Sci. Instr. 45, 559. Doynikov, N. 1. ( I 966). ElektroJizicheskaya Apparatura N4, 84. Draper, I., and Lee, Ch. (1977).Rev. Sci. lnstr. 48, 809. Drummond, 1. W. (1981). Vacuum 31,579. Dymnikov, A. D., and Yavor. S. Ya. (1963).Z h . Tekh. Fiz. 33, 851; Sou. Phys. Tech. Phys. 8, 639. Dymnikov, A. D., Fishkova, T. Ya., and Yavor, S. Ya. (1965).Doklady Akad. Nauk S S S R 162,1265. Dymnikov, A. D., Fishkova, T. Ya., and Yavor, S. Ya. (1966).Izu. Akad. Nauk S S S R (Ser. Fiz.) 30, 739: Bull. Acad. Sci. U S S R (Phys. Ser.) 30. Enge. H. A. (1959).Ren. Sci. Instr. 30, 248. Enge. H. A. (1961). Rea. Sci. Instr. 32, 662. Fink. I., and Kisker, E. (1980).Reu. Sci. lnsfr. 51, 918. Fishkova, T. Ya. (1980).Elektronnaya Tekhnika Ser. 4, N4, 19. Fishkova, T. Ya.. Baranova, L. A,, and Yavor, S. Ya. (1968).Z h . Tekh. Fiz. 38,694; Sou. Phys. Tech. Phys. 13, 520. Fiyata, Y., and Matzuda, H. (1975).Nucl. Insfr. Methods 123,495. Gaidukova, 1. S., Il’ina, 0. Yu., and Yarmusevich, Ya. S. (1980). Radiotekh. Electron. 25, 1256; Radio Eny. Electron. Phys. 25 (6), 110. Gavrilov, E. I., and Shpak, E. V. (1983). Z h . Tekh. Fiz. 53, 1637; Sou. Phys. Tech. Phys. 28, 1007. Geyzler, E. S., Kucherov, G . V., and Tsyganenko, V. V. (1981).Radiotekh. Electron. 26,416; Radio Eny. Electron. Phys. 26(2), 146.
204
L. A. BARANOVA AND S. YA. YAVOR
Glaser, W. (1952).“Grundlagen der Elektronenoptik.” Springer, Wien. Glavish, H. F. (1972).Nucl. Instr. Methods 99, 109. Glikman, L. G., and Iskakova, Z. D. (1982). Zh. Tekh. Fiz. 52,1874; Sou. Phys. Tech. Phys. 27, I 1 50. Glikman, L. G., and Iskakova, Z. D. (1985). Zh. Tekh. Fiz. 55,422,620; Sou. Phys. Tech. Phys. 30, 25 1, 367. Glikman, L. G., and Sekunova, L. M. (1981). Zh. Tekh. Fiz. 51, 1804; Sou. Phys. Tech. Phys. 26, 1046. Glikman, L. G., and Yakushev, E. M . (1967). Zh. Tekh. Fiz. 37, 2097; Sou. Phys. Tech. Phys. 12, 1544. Glikman, L. G., Kel’man, V. M., and Yakushev, E.. M. (1967a). Zh. Tekh. Fiz.37, 13;Soo. Phys. Tech. Phys. 12. 9. Glikman, L. G., Kel’man, V. M., and Yakushev, E. M. (1967b). Zh. Tekh. Fiz. 37, 1720; Sou. Phys. Tech. Phys. 12, 1261. Glikman. L. G., Karetskaya, S. P., Kel’man, V. M., and Yakushev, E. M. (1971). Zh. Tekh. Fiz. 41, 330; Sou. Phys. Tech. Phys. 16, 247. Glikman, L. G., Kel‘man, V. M., and Nurmanov, M. Sh. (1973a). Zh. Tekh. Fiz.43,1358,2278; Sou. Phys. Tech. Phys. 18, 864, 1441. Glikman, L. G., Kel’man, V. M., and Fedulina, L. V. (1973b). Zh. Tekh. Fiz. 43, 1793; Sou. Phys. T w h . Phys. 18, 1139. Glikman, L. G., Pavlichkova, 0. B., and Spivak-Lavrov, I . F. (1977). Zh. Tekh. Fiz. 47, 1372; SOU. Phys. Tech. Phys. 22, 788. Glikman, L. G., Iskakova, Z. D., and Petrov. 1. A. (1984).Zh. Tekh. Fiz. 54,2342; Sou. Phys. Tech. Phys. 29, 1378. Grime, G. W., Watt, F., Blower, G. D., Takacs, J. and Jamieson, D. N. (1982). Nucl. Instr. Methods 197,97. Grinberg, G . A. (1948). “Izbrdnnye Voprosy Matematicheskoy Teorii Elektricheskikh i Magnitnykh Yavleniy.”(Selected Problems of the Mathematical Theory of Electric and Magnetic Phenomena.) Nauka, Moskow-Leningrad. Gritsyuk, N. P., and Lachashvili, R. A. (1979). Zh. Tekh. Fiz. 49,2467; Sou. Phys. Tech. Phys. 24, 1389. Grivet, P. (1972).“Electron Optics.” Pergamon Press, Oxford. Grumm, H. (1952). Optik 9,281. Hanszen, K. J., and Lauer, R. (1967).In “Focusing of Charged Particles”(A. Septier, ed.), vol. I. p. 251. Academic Press, New York, London. Harting, E., and Read, E. H. (1976). “Electrostatic Lenses.” Elsevier, Amsterdam. Hawkes, P. W. (1965a). Phil. Trans. Roy. Soc. London A257.479. Hawkes, P. W. (1965b). Optik 22,543. Hawkes, P. W. (1966/1967). Optik 24,60. Hawkes, P. W. (1970). “Quadrupoles in Electron Lens Design.” Academic Press, London. Hawkes, P. W. (1977). Optik 48, 29. Hawkes, P. W. (1980). In “Applied Charged Particle Optics.” (A. Septier, ed.) Adv. in Electronics and Electron Physics, Suppl. 13A, p. 45, Academic Press, New York. Heddle, D. W. O., Papadovassilakis, N., and Yateem, A. M. (1982).J . Phys. E: Sci. Instr. 15, 1210. Hely, H. (1982). Optik 60,353. Himmelbauer, E. E. (1969). Philips Res. Reprs. Suppl. 1. 1. Hoof, H. A. (1981). J. Phys. E: Sci. Instr. 14, 325. ll’in, V. P. (1974). “Chislennye Metody Resheniya Zadach Elektrooptiky.” (Numerical Methods for Computing Electron Optical Problems.) Nauka, Novosibirsk. Iznar, A. N. (1977). “Elektronno-opticheskie Pribory.” (Electron Optical Devices.) Mashinostroenie. Moskow.
THE OPTICS O F ROlJND AND MULTIPOLE ELECTROSTATIC LENSES
205
Kanaya, K., and Baba, N. (1977).Optik 47.239. Kanaya. K., and Baba, N. (1978).J . Phys. E: Sci. Instr. 11, 265. Kantorovich. L. V., and Krylov. V. I . (1950). “Priblizhennye Metody Vysshego Analiza.” (Approximate Methods of Higher Mathematical Analysis.) Gostechteorizdat, MoskowLeningrad. Kapchinsky, I. M. (1966). “Dinamika Chastits v Lineynykh Rezonansnykh Uskoritelyakh.” (Particle Dynamics in Resonance Accelerators.) Atomizdat, Moskow. Karetskaya, S. P.. Kel’man, V. M., and Yakushev, E. M. (1970).Zh. Tekh. Fiz. 40,2563; Sou. Phys. Tech. P h y . 15, 2010. Karetskaya, S. P., Kel’man. V. M., and Yakushev. E. M. (1971). Zh. Tekh.R z . 41, 325; Sou. Phys. Tech. Phys. 16, 244. Kartashev, V. P., and Kotov, V. I. (1966). Zh. Tekh. Fiz. 36, 1569; Sou. Phys. Tech. Phys. I I , 1173. Kartashev, V. P., Kotov, V. I., and Khozyrev, Yu. S. (1976).Zh. Tekh. Fiz. 46, 1342; SOP.Phys. Tech. Phys. 21, 763. Kawakatsu, H., Vosburg, K. G.,and Siegel, B. M. (1968). J . Appl. Phys. 39,255. Kel’man, V. M., and Yavor, S. Ya. (1961).Zh. Tekh. Fiz. 31, 1439; Sou. Phys. Tech. Phys. 6, 1052. Kel’man, V. M., and Yavor, S. Ya. (1968). “Electronnaya Optika” (Electron Optics), 3d ed. Nauka, Leningrad. Kel’man, V. M., Nazarenko, L. M., and Yakushev, E. M. (1976). Zh. Tekh. Fiz. 46,1700; Sou. Phys. Tech. Phys. 21, 979. Kel’man. V. M., Karetskaya, S. P., Fedulina, L. V., and Yakushev, E. M. (1979). “ElektronnoOpticheskie Elementy Prizmennykh Spektrometrov Zaryazhennykh Chastits.” (Electron Optical Elements of Prism Spectrometers for Charged Particles.) Nauka, Alma-ata (Kazakh SSR). Kisker, E. (1982). Reo. Sci. Instr. 53. 114. Kiss, A,, Koltay, E., Ovsyannikova, L. P., and Yavor, S. Ya. (1970). Nucl. Instr. Methods 78,238. Klemperer, O., and Barnett, M. E. (1971).“Electron Optics,” 3rd ed. University Press, Cambridge. Kodama, M. (1980). Jpn. J . Appl. Phys. 19, 395. Koltay, E., Kiss, 1.. Baranova, L. A,, and Yavor, S. Ya. (1972). Radiotekh. Electron. 17, 1906; Radio Eng. Electron Phys. 17, 15 18. Kotov, V. 1.. and Miller, V. V. (1969). “Fokusirovka i Razdelenie PO Massam Chastits Vysokich Energy.” (Focusing and Mass Separation of High Energy Particles.) Atomizdat, Moskow. Krejcik, P., Dalgish, R. L., and Kelly, J. C. (1979).J . Phys. D: Appl. Phys. 12, 161. Krejcik, P., Kelly, J. C., and Dalglish, R. L. (1980a).N u d . Instr. Methods 168,247. Krejcik, P., King, B. V., Kelly, J. C. (1980b).Optik 55. 385. Kyuatt, C. E.. Natali, S., Di Chio. D. (1972). Reu. Sci. Instr. 43, 84. Landau, L. D., Lifshits, E. M. (1960). “Teoriya Polya.”(The Theory of Field.) Fizmatgiz. Moskow. Larson, J. D. (1981). Nucl. Instr. Methods 189. 71. Lawson, J. D. (1977).“The Physics of Charged-Particle Beams.” Clarendon Press, Oxford. Lebedev. N. N.. Skalskaya, 1. P., and Ufland, Ya. S. (1955).“Zbornik Zadach PO Matematicheskoy fizike.” (Problem Exercises in Mathematical Physics.) Fizmatgiz, Moskow. Legge, G . J. F., Jamieson, D. N., O’Brien, P. M. J., and Mazzolini A. P. (1982).Nucl. Instr. Merhods 197, 85. Lejeune, C., and Aubert, J. (1980). In “Applied Charged Particle Optics.” Part A. Adv. in Electronics and Electron Physics, (A. Septier, ed.). Suppl. 13A, p. 159. Academic Press, New York. Levi-Setti, R. (1980). In “Applied Charged Particle Optics.” Part A. Adv. in Electronics and Electron Physics, (A. Septier, ed.). Suppl. 13A, p. 261. Academic Press, New York. Lichtenberg, A. J. (1969). “Phase-Space Dynamics of Particles.” John Wiley and Sons, New York. Liebl, H. (1967).J . Appl. Phys. 38, 5277. Liebl, H . (1979). Optik 53,333.
206
L. A. BARANOVA AND S. YA. YAVOR
Liebl, H. (1981). Nucl. Instr. Methods 187, 143. Lyubchik, Ya. G., Savina, N. V., Fishkova, T. Ya., and Shkunov, V. A. (1971). Radiotekh. Electron. 16, 1941; Radio Eng. Electron Phys. 16. Martin, F. W., and Goloskie, R. (1982). Appl. Phys. Lett. 40, 191. Mc’Hugh, J. A. (1975).I n “Methods of Surface Analysis.” (A. W. Czanderna, ed.). p. 223. Elsevier, Amsterdam. Mulvey, T., and Wallington, M. J. (1973). Rep. Proy. Phys. 36,347. Natali, S.. Di Chio, D., Uva, E., and Kuyatt, C. E. (1972). Rev. Sci. Instr. 43,80. Nevinnyi,Yu. A.,Sekunova,L. M.,andYakushev, E. M.(1985).Zh.Tekh. Fiz.55,1713;Sou. Phys. Tech. Phys. 30, 1001. Novgorodtsev, A. B. (1982). Zh. Tekh. Fiz. 52, 2047; Sou. Phys. Tech. Phys. 27, 1257. Ohiwa, H., Blackwell, R. J., and Siegell, B. M. (1981). J. Vuc. Sci. Tech. 19, 1074. Okayama, S., and Kawakatsu, H. (1978).J . Phys. E: Sci. Instr. 11,211. Okayama, S., and Kawakatsu, H. (1982). J. Phys. E: Sci. Instr. 15, 580. Okayama, S., and Kawakatsu, H. (1983).J . Phys. E: Sci. Instr. 16, 166. Orloff, J. H., and Swanson, L. W. (1978). J . Vac. Sci. Tech. 15,845. Orloff, J. H., and Swanson, L. W. (1979).J . A p p l . Phys. 50, 2494. Ovsyannikova, L. P., and Shpak, E. V. (1977a).Zh. Tekh. Fiz. 47,438; Sou. Phys. Tech. Phys. 22, 260. Ovsyannikova, L. P., and Shpak, E. V. (1977b). Zh. Tekh. Fiz. 47,617; Sou. Phys. Tech. Phys. 22, 371. Ovsyannikova, L, P., and Shpak, E. V. (1978). Zh. Tekh. Fiz. 48, 1304; Sou. Phys. Tech. Phys. 23, 732. Ovsyannikova, L. P., and Szilagyi, M. (1970). Periodica Polytechnica-EElectrotekhnika, 14, 99. Ovsyannikova, L. P., and Yavor, S. Ya. (1965). Zh. Tekh. Fiz. 35,940;Sou. Phys. Tech. Phys. 10, 723. Ousyannikova, L. P.;and Yavor, S. Ya. (1967). Radiotekh. Electron. 12,489; Radio Eng. Electron Phys. 12,449. Ovsyannikova, L. P.,Chechulin, V. N., and Yavor, S . Ya. (1968).Zh. Tekh. Fiz. 38, 1953;Sou. Phys. Tech. Phys. 13, 1566. Ovsyannikova, L. P., Utochkin, B..A., Fishkova, T. Ya., and Yavor, S. Ya. (1972). Radiotekh. Electron. 17, 1062; Radio Eny. Electron Phys. 17, 825. Ovsyannikova, L. P., Shpak, E. V., and Yavor, S . Ya. (1975). Zh. Tekh. Fiz. 45, 2421; Sou. Phys. Tech. Phys. 20, 1509. Papoulis, A. (1968). “Systems and Transforms with Applications in Optic.” McGraw-Hill, New York. Pease, R. F. W. (1981). Coniemp. Phys. 22,265. Petrov, 1. A. (1975) Zh. Tekh. Fiz. 45,2203; Sou. Phys. Tech. Phys. 20, 1380. Petrov, LA. (1976).Zh. Tekh. Fiz. 46, 1085; Sou. Phys. Tech. Phys. 21, 640. Petrov, I. A. (1982). Eldktronnaya Tekhnika, Ser. 4,N3, 30. , Petrov, 1. A., and Yavor, S. Ya. (1975).Pis’mu Zh. Tekh. Fiz. 1,651; Sou. Phys. Tech. Phys. Lett. I , 289. Petrov, I. A., and Yavor, S. Ya. (1976). Z h . Tekh. Fiz. 46, 1710; Sou. Phys. Tech. Phys. 21, 985. Petrov, 1. A., Baranova, L. A., and Yavor, S. Ya. (1978). Zh. Tekh. Fiz. 48, 408; Sou. Phys. Tech. Phys. 23,242. Pierce, J. R. (1954).“Theory and Design of Electron Beams,” 2rtd ed. Van Nostrand, New York. Pohner, W., (1977).Optik 47,283. , Rang, 0. (1949). Optik 5, 518. Read, F. H. (1969). J. Phys. E: Sci. Instr. 2,679. Read, F. H., Adams, A., and Soto-Monliel, J. R . (1971). J . Phys.E: Sci. Instr. 4,625. I
.
THE OPTICS OF R O U N D AND MULTIPOLE ELECTROSTATIC LENSES
207
Rheinfurth. M. (1955). Optik 12.41 I . Rose, H. (1967). Optik 26, 289. Saito. T., and Sovers, 0.J. (1977).J . Appl. Phyx 48. 2306. Saito, T., and Sovers, 0.J. (1979).J . Appl. Phys. SO, 3050. Saito. T., Kikuchi, M., and Sovers, 0.J. (1979), J . Appl. Phys. SO, 6123. Sakudo, N., Hayashi, T. (1975). Rea. Sci. Insrr. 46, 1060. Scherzer. 0.(1936).Z s . .fur Phys. 101, 593. Scherzer. 0.(1947). Optik 2, 114. Scherzer, 0..and Typke, D. (1967/1968). Oprik 26.564. Septier. A. (1961 1. In “Advances in Electronics and Electron Physics”(L. Marton. ed). Vol. 14, p. 85. Academic Press. New York, Septier, A. (ed.).(1967).“Focusing of Charged Particles.” Vol. I , p. 509; Vol. 2, p. 471. Academic Press. New York. Shimizu, K., and Kawakatsu. H. (1974).J . Phys. E : Sci. Instr. 7,472. Shkunov, V. A., and Semenik, G. I . (1976). “Shirokopolosnye Ostsyllograficheskie Trybki i ikh Primenenie.” (Wide Band Oscillograph Tubes and their Applications.) Energiya, Moscow. Shpak, E. V., and Yavor, S. Ya. (1984).Zh. Tekh. Fiz. 54, 1992; Sou. Phys. Tech. Phys. 29, 1169. Shukeylo, I . A. (1959).Zh. Tekh. Fiz. 29, 1225; Sou. Phys. Tech. Phys. 4, 1123. Steffen. K. G. (1965).”High Energy Beam Optics.” Interscience, New York. Strashkevich, A. M. (1962).Zh. Tekh. Fiz. 32, 1142; SOP.Phys. Tech. Phys. 7,841. Strashkevich, A. M. (1966).“Electronnaya Optika Elektroctaticheskikh Sistem.”(Electron Optics of Electroctatic Systems.) Energiya. Moskow. Leningrad. Szilagyi, M., and Szep, J. (1987). I E E E Trans. Electr. Derices ED-34,2634. Szilagyi, M., Szep, J., and Lugosi, E. (1987). I E E E Trans. Electr. Deuicus ED-34, 1848. Tsukkerman, 1. I. ( I 972). “Preobrazovaniye Elektronnykh Izobrazheniy.” (Electron Image Processing.) Energiya, Leningrad. Tsyganenko. V. V., and Kucherov, G. V. (1973).Rudiotekh. Elektron. 18, 1085; Radio Eng. Elekrron Ph,vs. 18. 805. Tsyrlin. L. E., ( 1977). “lzbrannye Zadachi Rascheta Electricheskikh i Magnitnykh Poley.” (Selected Problems of Computing Electric and Magnetic fields.) Sov. Radio, Moskow. Vandakurov, Yu. V. (1957). Zli. Tekh. Fiz. 27. 1850: Sor. Phys. Tech. Phys. 2, 1719. Varankin, G. K. (1974).Prih. Tekh. Eup. N I , 21 1. Vijayakumar, P. S., and Szilagyi, M. (1987). Re”. Sci. Instr. 58,953. Vlasov. A. G., and Shapiro, Yu. A. (1974). “Metody Rascheta Emissionnykh ElektronnoOpticheskikh Sistem.” (Calculation Methods for Emission Electron Optical Systems.) Mashinostroenie, Leningrad. Vukanit. J., Terzit, I., Anitin, B., and Cirii., D. (1976).J . Phys. E: Sci. Instr. 9,842. Wannherg, B.. and Skollermo. A. (1977).J . Electr. Spectr. Relut. Phen. 10,45. Yamazaki, H. (1979). Oprik 54, 343. Yavor, S. Ya. (1962).Proc. Synipos. Elecrron Vucuum Pliysics. Hungary, p. 125. Yavor. S. Ya. (1968).“Fokusirovka Zaryazhennykh Chastits Kvadrupol’nymi Linzami.” (Focusing of Charged Particles by Means of Quadrupole Lenses.) Atom’izdat, Moskow. Yavor, S. Ya. (1970).Zh. Tekh. Fiz. 40, 2257; SOL..Phys. Tech. Phys. IS. 1763. Yavor, S. Ya. (1984). Pis’mu Zh. Tekh. Fiz. 10. 183: S o r . Phys. Tech. Phys. L e f t . 10, 76. Yavor, S. Ya., Fishkova. T. Ya., Shpak, E. V., and Baranova, L. A. (1969). Nucl. Instr. Methods 76, 181. Zienkiewicz. 0.C. (1977).“The Finite Element Method in Engineering Science.” McGraw-Hill, London.
This Page Intentionally Left Blank
ADVANCES IN ELECTRONICS A N D ELECTRON PHYSICS, VOL. 16
Electron Microscopy of Fast Processes 0. BOSTANJOGLO Optisches lnstitut der Technischen Uniuersitat Berlin Berlin, Federul Republic cf Germany
1. Introduction . . . . . . . . . . . . . . . . . . . . 11. Interactions of Electrons with Matter. . . . . . . . . . . . A. Single Scattering of Primary Electrons . . . . . . . . . . B. Secondary and Backscattered Electrons. . . . . . . . . . C. Auger Electron and Characteristic X-Ray Emission. . . . . . D. Optical Photons. . . . . . . . . . . . . . . . . . E. Electron-Beam-Induced Conductivity (EBIC). . . . . . . . 111. Modes of Electron Microscopy. . . . . . . . . . . . . . A. Conventional Electron Microscopy B. Scanning Electron Microscopy. . . . . . . . . . . . . IV. Time-Resolved Electron Microscopy . . . . . . . . . . . . A. Generation of Pulsed Electron Beams . . . . . . . . . . B. Image Detector for Short-Exposure Electron Microscopy . . . C. Periodic Processes and Stroboscopic Electron Microscopy . . . D. Fast Nonrepetitive Processes and Real-Time Electron Microscopy V. Application of Real-Time Electron Microscopy to Fast Laser-Induced A. Time Scale of Fast Laser-Induced Processes . . . . . . . . B. Explosive Crystallization of Amorphous Films . . . . . . . VI. Space-Time Resolution of Real-Time Microscopy . . . . . . . VII. Summary . . . . . . . . . . . . . . . . . . . . . References , . . . . . . . . . . . . . . . . . . . .
. . . . . 209 . . . . . 211 211
.
.
.
.
.
. . . . .
. . . . .
. . . . .
. . . . .
. 212 . 214
. . . .
. . . . .
. . . . .
. . . .
. 215
. 216 . 216 . .
. . . . . . . . . . . Processes. . . . . . . . . . . . . . . . . . . . . . . . . . .
216 221 223 224 233 231 253 260 260 263 213 216 216
I. INTRODUCTION Fast processes proceeding on the microsecond to picosecond time scale are not only of fundamental scientific interest but are also of great technological importance. Autocatalytic and laser-driven phase transitions, magnetic reversal in ferromagnetic materials, and switching of integrated electronic circuits are the most conspicuous examples of fast processes related to the above time scale. A large number of very different time-resolving techniques have been introduced to analyze their dynamics. Real-time measurements of reflectivity, transmission, and scattering of light have been extensively used by many authors (Auston et al., 1978; Baeri 209
Copyright i 19x9 hy 4cddemic Prc?, InL All right* of rrproduLtion In m y form rcwrved ISBN n- I 2-1)1 4 m ~ - 2
210
0. BOSTANJOGLO
et. al., 1985; Bergner et al., 1987; Von der Linde and Fabricius, 1982; Lo and Compaan, 1981; Lompre et al., 1984; Shank et al., 1983)to trace laser-induced semiconductor-metal transitions in the context of laser annealing. In addition to the annealing beam, a second laser beam of a different wavelength, used to discriminate against the scattered annealing radiation, was used as a probe. Its intensity changes caused by phase transitions were sensed by fast photodiodes. A time-resolving electron diffraction camera, based on a modified streak camera, has been reported by Mourou and Williamson (1982) and Williamson et al. (1984), and laser-induced melting of thin-metal films was studied by transmission diffraction in the subnanosecond time regime. The short illuminating electron beam pulses were produced by driving the photocathode with picosecond laser pulses. Pulsed reflection electron diffraction was employed by Khaibullin (l984), and crystal structure and lattice temperature of bulk silicon during pulsed laser,annealing were investigated. Facilities for X-ray diffraction and shadow microscopy at nanosecond exposure times have been realized with voltage-pulsed field ,emitters (Jamet and Thomer, 1976), laser-produced plasma sources (Rosser et al.., 1985), and synchrotron radiation (Larson et al., 1983). Applications concentrated on transformations of crystal structure and changes of temperature during pulse , .stressing of the specimen by shock waves and laser flashes. Other fast but less direct methods based on photoelectron emission and electric conductivity were introduced by Eberhardt et al. (1982) and Galvin et al. (1982), respectively. The latter method is particularly sensitive to the transitions isolator semiconductor --+ metal, which occur, e.g.,,when certain 'semiconductors (germanium, silicon) melt, but melting and solidification of metals can also be traced (Tsao et al., 1986); All these different techniques have specific advantages and drawbacks. The light optical methods are the fastest, reaching down into the femtosecond regime. They measure almost exclusively reflectivity and absorption giving refraction and extinction indices, from which, e.g., density and relaxation time of free charge carriers are deduced with an adequate theory of the electronic structure. Surface microscopy can be readily performed. Electron and X-ray diffraction are slower and instrumentation is more involved, as compared to light optical methods of similar time resolution. For instanc,e, high vacuum is a must. The higher complexity, especially in the case of synchrotron radiation, may be outweighed by the direct access to ,the atomic packing without ,the uncertainties of an interposed theory. A common drawback of the above methods is the limited spatial resolution on the order of 1 pm and the inability to measure ekctric and magnetic specimen fields. An alternative is imaging electron probes. Potentially; they reach a spatial resolution on the subnanometer scale, they have direct .access to crystal structure and morpholbgy, and they sense electric and magnetic specimen I
--f
ELECTRON MICROSCOPY OF FAST PROCESSES
21 1
fields. A further outstanding feature are the numerous interactions of electrons with matter, which supply a vast number of imaging modes yielding diffeknt specific information.
11. INTERACTIONS
OF
ELECTRONS WITH MATTER
Some possible interactions of a probing electron with an atom are shown in Fig. 1. Interactions that depend to a minor extent on the specific nature of the atom are omitted. Such less useful effects for analysis are Brems radiation of the decelerating primary electron and transition radiation. The latter is emitted from metals, when the dipole formed by the impingingelectron and its mirror charge collapses. Based on the interactions shown in Fig. 1, very different types of signals can be collected to give an “image” of the specimen. The different interaction processes are briefly discussed in the following. A fuller treatment is given by Reimer (1 985). A. Single Scattering of’ Primary Electrons
According to quantum mechanics, the scattering ‘amplitude for .a single scattering process of an electron by an atom is proportional to probing electron
Auger e Iect rons
s e c o n d a r y and back scattered
l o w angle scattering
FIG. 1. Some interactions of a fast probing electron with an atom. Ei are ionization energies of the atomic levels.
212
0. BOSTANJOGLO
where and I&) denote the initial and final eigenfunctions of the electronic states of the atom and H is the interaction Hamiltonian. Depending on whether the two atomic states are equal or different, scattering is elastic or inelastic. In the case of elastic scattering, any statistical phase factors of the eigenfunctions cancel and the scattering amplitudes of different atoms are coherent. The specimen acts as a three-dimensional diffraction grating, and the waves scattered by different atoms interfere. The total scattering amplitude contains information on the spatial atomic distribution. Elastically scattered electrons are the probe of choice if crystal structure and orientation, grain boundaries, and lattice defects are to be investigated. Bragg diffraction is then the main cause of image contrast. As fields due to electric and magnetic polarization of the specimen shift the phase of the probing electron waves, their distribution can also be imaged. If the primary electron is inelastically scattered by the atom, a bound electron is excited into upper levels or into the continuum. In addition to such intraband or interband transitions and ionization processes, scattering can also excite quasiparticles, such as plasmons in the conduction electron gas of metals and excitons in semiconductors and insulators. In summary, the primary electron suffers an energy loss that is characteristic of the electronic structure of the specimen. Inelastically scattered electrons may be exploited for imaging combined with chemical analysis, as was demonstrated by several authors (Colliex et al., 1975; Isaacson and Johnson, 1975; Zanchi et al., 1978). B. Secondary and Buckscuttered Electrons
Electrons ejected by a specimen that is bombarded by primary electrons with energy in the keV regime have a broad energy distribution, which can be divided into three main parts. The high-energy part extending from the energy E , of the primary electrons down to about 1 keV is mainly due to backscattered electrons. A central part in the range 50 eV to 1 keV is characterized by peaks due to Auger electrons. Finally, there is a low-energy part with a pronounced maximum at several electron volts. Somewhat arbitrarily, these electrons with exit energies below 50 eV are called true secondary electrons. Most of the secondary electrons have energies of 5-10 eV and come, therefore, from within several nanometers below the surface. They are efficiently used for surface topographical investigations, as their yield decisively depends on the tilt angle p of the trajectory of the impinging electron, as measured against the surface normal. An approximate expression for the yield ijSEof secondary electrons can be derived with the Bethe stopping
ELECTRON MICROSCOPY OF FAST PROCESSES
21 3
law (Bethe, 1930), which gives the energy loss per unit path length d E / d s of a primary electron with energy E as
(3
dE 1 ----In ds E
-
.
Energy dissipation is assumed to be caused by ionization of atoms, having a mean ionization energy Ei. Secondary electrons generated at a depth x below the surface escape with a probability exp( - .u/D), with D the exit depth, so that
bSE . --ln(E)-ja'exp( 1 1 E Ei coscp
--;)dx.
Because of their low energies, the secondary electrons are very susceptible to electric (and magnetic) surface fields. This makes them an ideal voltage probe for integrated circuits. In this context, the total yield 6 of ejected electrons as a function of the energy E of the primary electrons (Fig. 2) is of decisive importance. Since the absorbed current is a fraction (1 - 6) of the impinging electron current, a floating specimen can be charged positively (6 > 1) or negatively (S < l), depending on whether E , < E < E , or E < El and E , < E , respectively. Irradiation with primary electrons of energy E = El = E , leaves the specimen neutral, as S(El) = f i ( E , ) = 1, but only the energy E , gives stable operation. If, for example, 6 exceeds 1 by some fluctuation, i.e., the energy drops below E,, the surface is positively charged. Consequently, the exciting electrons are accelerated before impact and their yield is decreased again. The energy spectrum of the backscattered electrons consists of an elastic peak and a broad low-loss maximum, which shifts towards the zero-loss peak with increasing atomic number of the scatterer. The elastic fraction is due predominantly to single scattering. Accordingly, these low-loss backscattered electrons come from a thin surface layer of several nanometers. The broad tail of the spectrum down to about 50 eV is due to multiple inelastic scattering.
El
E2
h E
FIG.2. Yield 6 of backscattered and secondary electrons as a function of the energy E of the primary electron.
214
0. BOSTANJOGLO
The angular distribution of the backscattered electrons depends on the angle of incidence of the primary electrons. At normal incidence an approximate Lambert cosine law is observed, caused by electrons, most of which have experienced numerous scattering events and move isotropically, as in a diffusion process. In addition, there is a fraction of electrons that leave the specimen after one or two large-angle scattering events. With increasing angle of incidence (measured against the normal to the surface), a reflection-type distribution evolves with a growing portion of electrons escaping from the target after only few small-angle scattering processes. The elastic fraction of the scattered electrons carries information on crystal structure and surface morphology. It is exploited in diffraction and microscopy with low- and high-energy reflected electrons. Reflection electron microscopy has recently received interest as a high-resolution imaging technique for surfaces, as demonstrated by the work of Telieps (1983), Telieps and Bauer (1985), Ishitsuka et al. (1986), and Ogawa et al. (1986). C . Auger Electron and Characteristic X - R a y Emission
If the exciting primary electron has a kinetic energy that exceeds the ionization energy Ei of an inner, normally filled shell, for instance a K shell, this shell can be ionized. The generated vacancy attracts an electron, which moves in from an upper, e.g., L,, level. The remaining energy difference E , , - EiLl can be either emitted as a characteristic X-ray photon hw = E,, - EiL1,or in the Auger process it is transferred to an electron with an inferior binding energy, e.g., an L, electron. This L, electron is then ejected with a nonzero kinetic energy. Since the ejection of electrons causes a redistribution of the Coulomb field in the atom, its energy levels are displaced in the Auger process. A satisfactory approximation for the exit energy EA of an Auger electron, emitted by an atom with atomic number Z in a KLIL, process, is E A = EiK(Z)- EiL1(Z)- EiL3(Z+ 1). The Auger electrons appear in the energy spectrum of the ejected electrons as weak peaks in the range 40 eV-2 keV. In contrast to the peaks due to electrons that suffered characteristic energy losses by exciting interband and intraband transitions or plasmons, the Auger peaks do not move when the energy of the primary electrons is varied. As characteristic X-ray and Auger electron emission are alternate processes, their probabilities, i.e., yields, are complementary. An atom with a large yield 6, for emission of characteristic X-rays, i.e., an atom with large atomic number, has a low yield hA = I - 6, for Auger electron emission and vice versa.
ELECTRON MICROSCOPY OF FAST PROCESSES
215
Both Auger electrons and characteristic X-ray photons have energies that are characteristic of the electronic structure, i.e., chemical nature, of the specimen. Since the yield of X-ray emission increases with the atomic number, heavy atoms are best analyzed by X-ray fluorescence and light atoms are best analyzed by Auger electrons. The probe volumes are also complementary. As most Auger electrons have energies well below 1 keV, they have small scattering lengths, and most unscattered Auger electrons originate only a few atomic layers beneath the surface. They are an excellent probe for chemical mapping of surfaces. In contrast, scattering of X-ray photons is much weaker than the Coulomb scattering of Auger electrons. Accordingly, the X-ray photons stem from depths comparable to th,e range of the primary electrons, usually several micrometers. Characteristic X-ray analysis thus gives information on the chemical nature of the bulk.
D. Optical Photons Light emission from solids that are bombarded by electrons is due to various processes depending on the electronic band structure. In semiconductors and insulators, electrons are excited from the valence band, leaving free holes there, or from donor atoms across the band gap into the conduction band. The charge carriers thermalize within a time roughly coinciding with the Debye period l/fD % 1 ps and crowd at the bottom of the conduction or at the top of the valence band, respectively. Further relaxation is based on recombination of free electrons with holes or acceptor atoms. Most recombinations are nonradiative and deexcitation proceeds by multiphonon production. The radiative transitions are classified as intrinsic and extrinsic processes. Intrinsic photon emission is due to recombination of free electrons with holes across the band gap with or without exchange of a phonon. The former compensates for a change of electron momentum in materials with an indirect band gap. Apart from recombination with a hole, an electron can create a bound state, the exciton, having a hydrogen-like emission spectrum within the band gap. Extrinsic emission results in recombination of electrons with holes at trapping centers such as acceptor atoms or lattice defects. Consequently, extrinsic luminescence can be exploited to image lattice defects. Cathodoluminescence of metals is caused by quite different processes, as there are no holes formed. It consists of the low-frequency part of Brems radiation, of transition radiation, and of decaying plasmons of the conduction electron gas.
216
0. BOSTANJOGLO
E. Electron-Beum-Induced Conductivity ( E B I C ) An electron with an energy of several keV impinging on a semiconductor can generate up to several thousand free electron-hole pairs, thus locally modulating the electric conductivity to an appreciable extent. This effect is measured as the EBIC signal in an external circuit, for instance as a current modulation, if a constant voltage source is driving the circuit. Since the conductivity depends on the density and velocity of the free charge carriers, it is strongly influenced by traps and local electric fields. Consequently, electron-beam-induced currents are efficiently used to image electrically active lattice defects, pn-junctions, and sites of avalanche multiplication of charge carriers in semiconductor devices. In principle, all these different types of signals presented in Sections A to E can be used to study time-varying processes. But, as resolution ultimately is determined by shot noise, signals with a large efficiency, such as elastically scattered and secondary electrons, are preferred as time-resolving probes for very fast processes. Therefore, applications of only these two types of signals for electron microscopy of fast processes will be discussed.
111. MODESOF ELECTRON MICROSCOPY
There are two distinctly different modes of microscopy, conventional and scanning, depending on whether the signal detection and processing are parallel or sequential. A. Conventional Electron Microscopy
In conventional microscopy, the signals are picked up simultaneously from all object points and are simultaneously processed to an image by lens or mirror imaging or simply by central projection. There are various modes of conventional electron microscopy based on different signals, as shown in Fig. 3. 1. Transmission Electron Microscopy ( T E M )
Transmission microscopy is by far the most widespread conventional mode, as it allows investigations of crystal structure and crystal defects at high resolution in routine operation. The specimen, a thin film, is illuminated in transmission by a focused electron beam from a thermal or field emission gun and imaged by an objective lens and several projective lenses. In addition,
U w’
thermal electron gun
double condenser lens
specimen o b j e c t i ve lens aperture intermediate lens
I
projective lens
d e t e c t or I
magnetic prism
object i ve lens aperture project i ve lens
detector thermal e l e c t r o n gun (C)
FIG.3. Different modes of conventional electron microscopy: a) transmission, b) reflection, c) mirror, d ) secondary, and e) field emission electron microscopy.
218
,
0. BOSTANJOGLO spqcimen
ions, electrons, photons, heat
- L
aperture
y s
p eci me n
ted f o cleaning 1
1
detector
detector
selected area diffraction can be performed either with a focused beam or with an area-selecting aperture in the image plane of the objective lens. The contrast that is most frequently exploited is due to scattering-Bragg scattering in crystals-and absorption at the objective lens aperture. Strongly scattering parts in a weakly scattering matrix appear dark in a bright-field image (Fig. 4).
FIG.4. Imaging of a thin film with conventional transmission microscopy. Contrast is generated by scattering of the illuminating electrons by the specimen and absorption of the scattered radiation by an aperture at the back focal plane of the objective lens.
ELECTRON MICROSCOPY OF FAST PROCESSES
219
in t ensily
7 J c FIG.5 . Defocused imaging of a phase,.object, consisting of, magnetic domains with antiparallel in-plane magnetization. , . The image intensity is sketched for the dashed plane.
Local fields due to electric and magnetic polarization scatter by rather small angles (10-5-10-4 rad), so that imaging with an objective aperture is inconvenient. On the other hand, the pertaining potentials produce a phase shift of the electron wave
with A the magnetic vector potential of the magnetization, U the electric potential of the specimen with thickness D,Eo ( > > e U )and 1 the kinetic energy and wave length of the imaging electron, and C its trajectory. This phase shift causes pronounced interference patterns outside the object plahe, so that magnetic and electric domain boundaries may easily be visualized by defocused imaging (Fig. 5 ) Because of the small scattering angles, a very small illumination divergence ( 5 rad) must be used, and, in the case of magnetic domains, the specimen must be screened from the magnetic field of the objective lens. The spatial resolution is determined by spherical and chromatic aberration of the objective lens and by diffraction at the objective aperture, giving a disc of confusion with diameter 6 (in the object plane)
The different letters designate spherical C, and chromatic C , aberration constants, maximum semiangle a passed by the objective aperture, and spread of electron energy A E ,
220
0. BOSTANJOGLO
Blurring due to inelastic scattering and chromatic aberration can be appreciably reduced by energy filtering. The maximum resolution in the elastic image ( A E = 0) is then given by
hmin= const Cf’423/4,
(5)
reaching down to 0.2 nm for nonperiodic objects. 2. Reflection Electron Microscopy ( R E M ) Transmission electron microscopy is limited to thin films. This restriction is overcome by reflection electron microscopy. Using high-energy electrons, the setup is similar, except for the difference that the specimen is illuminated by a grazing electron beam. Furthermore, the surface of bulk material is usually imaged with electrons scattered by small angles only. A small angle of incidence ( “N lo-’ rad) is used to sense even small surface modulations. Small takeoff angles are utilized for imaging in order to provide as many elastically scattered electrons as possible. Contrast is due to shadow casting and scattering within the uppermost layers, and spatial filtering of the scattered electrons by the objective lens aperture. In conjunction with ultrahigh vacuum techniques, surface morphology, even oligoatomic steps, can be successfully studied (Ishitsuka et al., 1986; Ogawa et al., 1986). However, due to the oblique takeoff angle of the imaging electrons, the field of view is extremely anisotropic. This can be corrected with an added cylinder lens or by image processing. The resolution is primarily limited by chromatic aberration, as a substantial fraction of the “reflected” electrons has suffered inelastic collisions. In addition to imaging, the crystal lattice structure of the top layers can be studied by reflected electron diffraction in selected areas. 3. Electron Mirror and Emission Microscopy
Electron mirror microscopy exploits electrons that are reflected by an equipotential at a variable distance in front of the specimen surface. The object, being at a potential slightly more negative than the cathode, is part of an electrostatic immersion lens. Atomic steps, electric fields, and magnetic fields can be imaged, the latter with a considerably reduced resolution and only outside the center. The setup can be also operated as a powerful lowenergy electron reflection or emission microscope by adjusting the potential of the specimen positive relative to the cathode (Telieps, 1983; Telieps and Bauer, 1985). In electron emission microscopy, the object is the source of a cathode lens. The imaging electrons are emitted by heating or bombardment with electrons,
ELECTRON MICROSCOPY OF FAST PROCESSES
22 1
positive ions, or photons. The emitted current density carries local information on material parameters as atomic number and work function, and on the surface relief. The latter acts by local microlenses due to deformation of the electric field at the surface. The lateral resolution of mirror, emission, and low-energy electron reflection microscopy is restricted by the large spherical and chromatic aberration of the electrostatic immersion lens at the specimen. Optimizing the electron optics, Telieps (1983) achieved a lateral resolution of z 4 nm, at least with the low-energy reflection microscope. 4. Field Electron Microscopy
The field electron microscope operates without lenses. The object is a microtip with a radius r z 50 nm at a high negative potential of several keV. Electrons are field emitted and centrosymmetrically accelerated towards the viewing screen. Contrast is generated by differences in work function or electron affinity in the case of adsorbed molecules. The resolution is determined by a compromise between geometric imagin of and diffraction at the atoms, giving as the minimum resolved distance 2 h , with best values at around 0.5 nm.
e
B. Scanning Electron Microscopy ( S E M )
Figure 6 schematically shows the setup for scanning electron microscopy. The electron beam from a thermal or field emission gun is focused by several lenses down to a diameter of 0.5 to 10 nm in the plane of the specimen. The beam is raster scanned across the specimen synchronously with the electron beam of a cathode-ray tube (CRT). The image is generated by modulating its intensity with one of the various signals recorded. Information depends on the operation mode and on the signal. Emission modes comprise secondary, backscattered, Auger electrons; characteristic X-ray; and visible light photons. Absorbed electron current and beam-induced conductivity belong to the absorption mode. Image resolution is limited by the diameter 6 of the electron probe on the specimen and by the exit volume of the signal used. The former is given by the geometric optical source size, the spherical and chromatic aberration, and diffraction:
where 6, is the diameter of the crossover, ci, is the semiangle subtended by the crossover at the first aperture, ci is the semiangle of the beam converging
222
0. BOSTANJOGLO
condenser lenses scan aenerators
de f Iect i n g coils
probe f ormrng lens amplifier specimen
signal absorbed
e-
EBlC FIG.6 . Scheme of a scanning electron microscope.
on the specimen, and C, and C , are spherical and chromatic aberration constants of the probe-forming lens. At high currents and in the case of a thermal electron source, its large crossover and emission angle exceed the aberration terms and determine the final spot size. The exit volume of the signal used increases with growing penetration depth, i.e., the energy of the primary electrons, thus reducing resolution more seriously than the spot size 6 due to Eq. ( 6 ) .There is, however, no point in reducing the acceleration voltage below a certain limit. Resolution also decreases with decreasing signal/noise ratio. Since the yield of signal particles falls with decreasing primary energy, a compromise between shot noise and exit volume must be sought. The resolution ranges from 0.1 nm to about 1 pm, the bad resolution holding in the case of X-ray imaging. The crystal lattice structure can be analyzed by keeping the electron beam fixed on the probed area and scanning the angle of incidence through the
ELECTRON MICROSCOPY OF FAST PROCESSES
223
Bragg angles. In this way, so-called electron channeling patterns are generated that are the result of multiple diffraction at the lattice.
IV. TIMERESOLVED ELECTRON MICROSCOPY Time-resolved electron microscopy can be classified into stroboscopic and real-time microscopy. Stroboscopic microscopy is chosen when periodic processes are to be investigated. The signal is picked up periodically in exact synchronism with the periodic process. Access to all phases of the process is realized by varying the phase of the sampling procedure. Because of the additive collection of signals over many periods of the process, noise, being random, is effectively eliminated. Both conventional and scanning microscopes can be operated in the stroboscopic mode. Magnetic reversals induced by high-frequency magnetic fields in low-loss ferromagnetic materials and switching of electric potentials in clocked electronic circuits are important examples of application that will be discussed in Section 1V.C. The stroboscopic technique is limited to periodic processes. Nonrecurrent events are the domain of real-time microscopy. Fast processes, proceeding on the microseconds time scale and below, can only be traced by conventional microscopy, as scanning is far too time consuming to keep pace with the rapid events. Real-time microscopy is applicable to all time-varying processes. The main application are fast phase transitions, which will be discussed in Section 1V.G. Time-resolved microscopy is distinguished by the fact that the electron beam should be pulsed, and this is for three reasons. In the first place, pulsing of illumination is recommended in order to avoid radiation damage to the specimen and/or detector at the high intensities I used for short recording times At to maintain a sensible signal/noise ratio 1 A t / m t = Secondly, pulsing the illumination is a convenient way for time-resolved imaging. In addition, by pulsing the illumination periodically, the stroboscopic operation is easily realized. Of course, real-time and stroboscopic microscopy can be also accomplished by gating the detector, but the problem of radiation damage remains. The third reason for pulsing the beam is the possibility of increasing the emissivity of the electron source far beyond its stationary value. In the short-exposure mode of real-time microscopy, the image signal is to be collected within only a single short period of time. Accordingly, the illuminating electron source must provide a high-intensity pulse and the
-
n f .
224
0. BOSTANJOGLO
detector must be very sensitive. Before discussing time-resolving microscopy in detail, the two prerequisites-electron beam pulsers and an adequate detector-will be described. A . Generation of Pulsed Electron Beams
Electron beams for electron microscopy and analysis are generated by thermal emission, field emission, and photoemission. Tungsten and LaB, are preferred as thermal emitter materials. Usually, a thermal gun is built as a three-electrode system consisting of an electron emitter, a current regulating and focusing Wehnelt electrode, and an accelerating anode. The actual source is the crossover, with a diameter of about 50- 100 pm formed by the Wehnelt electrode. The thermal electron emission current density j at the temperature T is given by the RichardsonDushman law:
where C is a constant, W is the work function of the emitter, and k is Boltzmann’s constant. At the usual operating temperature of 2800 K for tungsten, the values of current density j and axial brightness j / m ’ (2cr is the FWHM value of beam divergence angle) are several A/cm’ and l o 5 A/cm2 .sr. Due to its lower work function, a LaB, cathode has values that are an order of magnitude higher. However, this cathode must be used at pressures below lo-’ mbar to avoid rapid corrosion of LaB, by oxidizing residual gases. Field emitters are distinguished by a very high emissivity of several 10’ A/cm2.sr due to their small emitting area. Conventional solid-state field emitters are, however, extremely sensitive to ion bombardment and can be safely operated only in ultrahigh vacua. A possible alternative is the liquid metal source. Here, a regenerating liquid tip is drawn by field forces from the liquid layer covering a robust, pointed wire. Stable field electron emission was recently observed by Hata et al. (1987), even at mbar, if certain conditions are fulfilled. The field intensity at the tip must stay below 10, V/mm and the radius of the supporting solid tip must be less than 10 pm. The usual photoemitters are very sensitive to oxidizing gases and can therefore be operated only in ultrahigh vacua. Recently, however a photocathode with a 20-nm thin, rough gold film as photoemitter that worked in a conventional vacuum of mbar was introduced by May et al. (1987). Electron emission was achieved by focusing an ultraviolet laser beam to a diameter of 10 pm on the gold film from the back side. The electrons escaped
ELECTRON MICROSCOPY OF FAST PROCESSES
225
by field-assisted photoemission. Photoemission alone would be inferior by four orders of magnitude. All three cathode types can be used for production of electron pulses and this is done in several ways. I . Modulation
of’ the Wehnelt Voltuye
The electron beam is turned off in the quiescent state by a negative Wehnelt bias. An electron pulse is produced by applying a short positive voltage pulse of an adequate amplitude to the Wehnelt electrode via a hightension capacitor. In this way, the pulse generator is decoupled from the high acceleration voltage of the electron gun. Electron beam pulse widths of 5 ns at operating frequencies up to 100 MHz have been reported by Szentesi (1972). The method suffers from the serious drawback that an energy spread of the beam electrons is produced at higher frequencies, as reported by Schief and Steiner (1973). This effect becomes appreciable when the time of flight of the electrons in the accelerating field becomes comparable to the period of the modulating voltage. Image resolution is then seriously degraded by chromatic aberration of the imaging lenses.
2. Pulsing of u Filter Lens An electrostatic three-electrode Einzel lens with a sufficiently high negative potential at the central electrode is an electron mirror. According to Plies (1982),it can be used to switch off the electron beam. Due to the energy spread of the electron beam, the blocking voltage must exceed the acceleration voltage applied to the electron gun. To produce an electron pulse, an opening voltage pulse is supplied to the central lens electrode via a capacitor. The blanking lens must be shaped such that the potential saddle is very flat in the radial direction. Then the lens can be switched by small voltage pulses at the central electrode. The lens proposed by Plies is designed to pulse 20-keV electron beams with a switching voltage amplitude of only 10 V at frequencies up to 100 MHz. 3. Drpection across u n Aperture Most often, electron pulses are generated by deflecting the beam with a transverse electric or magnetic field across an aperture. As well-shaped rapid voltage pulses are more easily produced than current pulses, electric fields are preferred for deflection at high frequencies. Magnetic field pulses are disfavored, as it is rather inconvenient to eliminate damping due to eddy
226
0. BOSTANJOGLO
currents induced in neighboring metals and due to ringing caused by selfinductance and winding capacitance of the coils.' Beam pulsing is achieved in two ways. Either the electron beam is deflected off an aperture with a dc bias voltage in the quiescent state and an electron pulse is produced by switching it off and on again for the wanted pulse duration, or, more frequently, the electron beam is swept across the aperture by a sinusoidal deflection voltage CJ, = U,cosst. The pulse duration is determined then by the slope of the sweeping voltage at the zero crossing points. In order to produce electron pulses with the very frequency w of the sinusoidal voltage, crossing of the aperture must be allowed only during one slope of the voltage. This is achieved by superimposing an appropriate second sweeping voltage. The deflecting transverse electric fields are produced either with a lumped parallel plate capacitor, a microwave cavity, or a traveling wave structure. The lumped plate capacitor is the simplest system. As the deflection angle H for electrons of energy eCJ( >>eUocos o t ) entering the capacitor at time t = 0 is given by
IUo sin(ot,) 02su ot, '
.-
effective deflection is realized only as long as the time of flight tJ of the electron through the capacitor is negligible against the period 2n/w of the deflecting sine voltage. Here I and s are length and mutual distance of the capacitor plates. The limiting frequency is increased by reducing the length I of the capacitor to a minimum. Electron pulse widths of 10 ps at deflection frequencies up to 10 GHz were achieved by Menzel and Kubalek (1979) with 1-30 keV electrons. There is a second reason to keep the length of the capacitor at a minimum at high frequencies. If the time of flight is comparable to the period of the deflection voltage, electron acceleration at the entrance and deceleration at the exit of the capacitor due to axial stray fields from the capacitor plates to the earthed surroundings do not cancel anymore. This results in a shift and spread of the electron energy. Microwave cavities tuned to the working frequency of the device under study were used as effective deflectors and bunchers in the gigahertz region. Electron pulses of 0.2 ps with a repetition rate of 1 GHz could be realized by Hosokawa et al. (1 978). The microwave cavity, however, has the disadvantage that it can be used at single frequencies only.
' Magnetic field deflectors, however, have the advantage of being significantly less sensitive to fluctuating charges on electron-beam-induced contamination layers.
ELECTRON MICROSCOPY OF FAST PROCESSES
+u e
227
4
1
FIG.7. Traveling wave deflector for an electron beam.
A disadvantage of the lumped capacitor is the fact that in the case of a pulsed deflection voltage the electrons fall behind the latter at short pulse widths, as the electric signal propagates with the vacuum velocity of light, which exceeds the velocity of the beam electrons. In order to fully exploit the deflecting action at short pulse widths, a meander-type traveling wave deflector system was introduced (Fig. 7) by Feuerbaum and Otto (1978). The deflecting voltage is forced on a detour, giving a decreased effective propagation velocity along the electron trajectory, which can be made to coincide with the electron velocity at a particular acceleration voltage. The traveling wave deflecting structure shown in Fig. 7 can be thought of as being a coaxialcable wave guide that is cut apart along the axis, with the two halves placed opposite and equal electrodes facing one another. In order to avoid reflections, the wave guides are terminated with a resistor R equal to the wave impedance Z. Ambipolar pulses f U are fed to the two wave guides, generating a transverse deflecting field. This field travels along the axis with a velocity made to coincide with that of the beam electrons, which accordingly experience the full deflecting field all their way through the deflector. In contrast to microwave cavities, which are tuned to a single frequency, and unscreened traveling wave systems, which reveal dispersion (Meinke and Gundlach, 1968), the short capacitor and the trough-type traveling wave deflector are applicable to all frequencies within their bandwidth of several gigahertz. In case where the image signal is obtained from a probe focused onto the specimen, pulsing by simple deflection across an aperture has the serious
228
0. BOSTANJOGLO
electron source deflecting capacitor
;.A; -
chopping aperture probe forming
FIG.8. Ellipticdeformation of acircular electron spot of original diameter d , by deflection of the probing beam across an aperture.
disadvantage that the effective probe size is increased along the direction of deflection (Fig. 8). The circular probe of original size d, is deformed to an ellipse with axes d, and d'. Disregarding the possibility of using unrealistically small chopping apertures, the deformation can be almost eliminated by placing the deflecting capacitor and the chopping aperture in suitable planes between the probe-forming lenses (Menzel and Kubalek, 1979).
4. Pulsed Elcctron Emission Pulsed electron beams can be generated by initiating pulsed emission of electrons from the source. This is done by supplying pulsed energy to the electrons, either by light, heat, or accelerating voltage. a. Pulsed Photoelectron Emission. In order to achieve pulsed photoelectron emission, the photocathode is irradiated with focused ultraviolet light pulses from a laser. Usually, stable photoelectron emission neccessitates an ultrahigh vacuum. Recently, suitable photocathodes were introduced that operated efficiently, even in normal high vacuum. Akhmanov et al. (1985) used bulk tantalum, and May et al. (1987) used thin, rough gold films as cathode material. The thin-film cathode is illuminated for convenience from the rear. Accordingly, its thickness must approximate the exit depth of the photoelectrons ( z20 nm) but should exceed the absorption length of light.
ELECTRON MICROSCOPY OF FAST PROCESSES
229
The film must be coarse grained, containing numerous microtips to provide field-assisted photoelectron emission, as photoemission from smooth gold films is inferior by a factor of lo4 at the used wavelength of 266 nm (Marcus et al., 1986). The light pulses were derived from an actively mode-locked CW Nd: YAG laser. They were passed through a lightguide, where they were compressed from 70 ps to 3 ps by chirp and dispersion. Then they were frequency quadrupled and thereby sliced to 1.5 ps. These ultraviolet pulses were finally focused to a spot of 1 0 p m in diameter on a rough-gold-film cathode, from which electron pulses were extracted with a two-anode structure. The electron pulses had a width of 1 ps, peak values of 0.5 mA, and a repetition rate of 100 MHz. An emissivity of lo8 A/cm2.sr was reported.
6. Pulsed Thermal Electron Emission (Bostanjoglo and Heinricht, 1987). The most widely used electron guns are based on thermal electron emission from tungsten. Their emission current density, as given by Richardson’s law [Eq. (7)] increases by one order of magnitude for every 200 K increase in temperature near their normal operating temperature of 2800 K. However, a substantial stationary increase of temperature above 3000 K, even when staying below the melting point, results in rapid destruction of the cathode wire by field-assisted flow. As, however, the involved hydromechanics proceed on the 100-ns time scale (Bostanjoglo and Heinricht, 1986), heating of the cathode with focused nanosecond laser pulses is not expected to destroy the emitter, even if the melting point is exceeded. Destruction is further obstructed by the fact that laser pulse heating is localized. The setup of a laser-flash driven thermal electron gun, which can operate in a technical vacuum of mbar, is shown in Fig. 9. It is a commercial three-electrode tungsten hairpin gun, installed in any commercial electron microscope, which has been modified for additional laser flash heating. The usual constant electron beam current operation is performed by conventional heating, focusing with the Wehnelt electrode, and accelerating with the anode. In addition, green laser pulses (532-nm wavelength, 5 4 s FWHM) from a Qswitched, frequency-doubled Nd:YAG laser can be focused to a spot with a diameter of 100 pm onto the tip of the tungsten hairpin. When a 5-11s laser pulse is applied, an electron pulse is emitted, having its maximum at the end of the laser pulse and being significantly broader. Because of the delay and the increased pulse width, the majority of electrons escape by thermal emission, not photoemission. The pulse shape and, in particular, the amplitude, expectedly depend on the energy of the light pulse and on the stationary background temperature (Figs. 10 and 11). Peak heights of focused electron beam pulses up to 5 mA were achieved, which is to be compared to the stationary value of 20 pA with the employed acceleration
0. BOSTANJOGLO
230
thermal e gun viewing lens
Nd-YAG l a s e r 532nm 5ns
-
L
E
anode condenser lens
e- pulse slicer
FIG. 9. Scheme of a laser pulse-excited thermal tungsten hairpin electron gun (From Bostanjoglo and Heinricht, 1987.)
FIG.10. Electron current pulses as emitted by the gun shown in Fig. 9, using laser pulses of different energies and a steady-state temperature of 2800 K of the cathode. The pulses are shown at two different time scales. (From F. Heinricht, PhD thesis.)
ELECTRON MICROSCOPY OF FAST PROCESSES
2
23 1
2.5' 0
I
0 :To = 3000K
+ :To I 28OOK 0 To = 2400K
50
100
150
200
250
300
350
400
Laser pulse energy ( p J ) FIG.11. Electron pulse amplitude as a function of the applied laser pulse energy and steady state temperature 7;,. The pronounced scattering of the measured values is caused by scattering of the deposited laser pulse energies as the shape of the emitter is changed with each shot.
voltage. At higher laser pulse energies, a delayed and broad satellite pulse piles up, with its maximum occurring about 100 ns after the main pulse. As the laserpulsed cathode wire shows shallow molten regions with numerous protrusions (Fig. 12), electrons are emitted from liquid material. The satellite pulse is probably caused by field-assisted thermal emission from the liquid when it disintegrates due to temperature-induced gradients of the surface energy. The
FIG. 12. Tip of a tungsten hairpin emitter after several thousand shots. The spherical protrusions show that the material is molten by the laser pulses.
232
0. BOSTANJOGLO
delay in the order of 100 ns is typical for flow processes (see Section V.A). Of course, this irregular satellite pulse must be suppressed, and this is done with a blanking unit consisting of a deflecting capacitor and a chopping blade. When laser pulses are applied to a cold cathode, the leading electron pulse splits into several bursts. In order to obtain a smooth shape, the emitter must be kept at a stationary background temperature above 2500 K. The bursts are thought to be caused by laser-induced desorption of adsorbed oxygen and accompanying changes in the work function. The latter is known to depend heavily on temperature below 2800 K in an ordinary high vacuum of lo-’ mbar, because of oxygen adsorption (Gmelin, 1978). As shown in Fig. 11, the electron pulse amplitude at first increases with laser pulse energy up to about 150 pJ. Above this value, however, the amplitude decreases again and the leading pulse is found to become very irregular too. Simultaneously, the electron emission is succeeded by arcing of the electron gun. Apparently, laser-supported combustion and detonation waves (Pirri, 1971,1973; Raizer, 1965) are produced. The generated tungsten plasma absorbs light by inverse Brems radiation and thus screens the cathode. The attractive features of this gun are ease of operation; robustness; no need for ultrahigh vacuum, as with field-emission cathodes; and operation as a pulsed cathode without obstructing the standard constant current mode. c. Cold Cuthode Semiconductor Junction Gun. If the electric field in a reverse-biased p-n junction of a semiconductor is high enough, the mobile charge carriers are so heavily accelerated that avalanche breakdown occurs. The resulting heating of electrons may stimulate a significant emission into the vacuum if the p-n junction is located close enough to the surface of the specimen. Using this effect, van Gorkom and Hoeberechts( 1987)developed an efficient cold cathode semiconductor junction gun. Figure 13 shows the principle. Conduction electrons, coming from the pdoped region into the reverse-biased depletion layer, are accelerated there by the high electric field and produce electron-hole pairs by collisions with bound electrons. The electrons crowd in the conduction band of the thin n-doped layer. Those that have a kinetic energy along the surface normal exceeding the work function escape into the vacuum. Doping and geometry must be such that the bias field provides efficient avalanching, but tunneling across the energy gap is negligible. This is necessary, as most electrons arriving at the conduction band of the n-layer by the Zener effect would have low energies and could not leave the semiconductor. A simplified scheme of the p-n junction electron source is sketched in Fig. 13(b). The emitter is produced by photolithography on a silicon chip. Driving this source in a two-electrode configuration with an acceleration voltage of 10 kV, a current density of 1500 A/cm2, and a brightness of
ELECTRON MICROSCOPY OF FAST PROCESSES semiconductor
I
233
vacuum
EC EF EV
-T 1
(a) (b) FIG. 13. Semiconductor pn-junction electron gun (From Van Gorkom and Hoeberechts, 1987.)(a) Energy band model of a reverse biased pn-junction showing avalanche multiplication of electrons in the depletion region ( E c , E, , bottom and top of conduction and valence band; E,. Fermi level; U,, reverse voltage: W, work function). (b) Schematic setup of a semiconductor junction gun.
lo7 A/cm2.sr have been achieved. Another very attractive feature, apart from these high values, is the fact that the emission current can be modulated with the reverse bias at frequencies up to the gigahertz region. There is, however, the serious drawback that the high current densities are reached only by lowering the work function with a cesium layer on the silicon surface. This cathode, therefore, requires a vacuum free from oxidizing gases such as oxygen, water, or carbon dioxide.
B. Image Detector for Short- Exposure Electron Microscopy (Bostanjoglo et ul., I987u, 1 9 8 7 ~ )
Short exposure times can be realized in two ways. Either the specimen is illuminated with a pulsed electron gun, having a brightness that exceeds that of a constant current gun by several orders of magnitude, or the image signal is collected with a gated image intensifier. Though the first method is to be preferred as the only way to reduce shot noise in the image, there are also several points that may favor a gated detector. In the first place, damaging of the specimen by electron bombardment due to thermal and electronic decomposition can be a problem at the high current densities supplied by a superradiant pulsed electron gun. Secondly, it is easier to realize a short gating time than a short electron beam pulse with a high intensity. Thirdly, the gain of a detector may be increased in the gated mode far beyond its stationary value.
234
0. BOSTANJOGLO
FP PC MCP
sc FP
FIG. 14. Closed-type micro channel plate image converter with proximity focusing. (FP, fiber plates; PC, photocathode; MCP, micro channel plate; SC, phosphor screen).
Image converters with microchannel plates, incorporated as main amplifiers, are distinguished by their high gain and compact construction, especially in the case of converters of the proximity-focusing type (Fig. 14). The microchannel plate is a continuous dynode secondary electron multiplier, consisting of an array of about lo6 hollow.-glass conducting channels. Channel diameter and length typically are 10 pm and 0.5-1 mm. A survey of properties is given by Wiza (1979). The sealed image converter consists of a photocathode, one or several cascaded microchannel plates, and an output scintillator. The open converter lacks fhe photocathode. The open type is more versatile and economic, as it can be assembled from selected components, reserving full access to all parts. Gating can be accomplished in principle
16' -
c
c
-
- current /
__
gain
I
I
Input current ( A ) FIG. 15. Output current versus electron input current (Bostanjoglo et al., 1 9 8 7 ~ )of a microchannel plate (Varian VUW 8922 40 x 0.5 mm) for pulsed (both voltage and input current) and conventional dc operation (shown in the insert as given by the manufacturer). The dashed line indicates the gain defined as the ratio of output to input.
ELECTRON MICROSCOPY OF FAST PROCESSES
0.5
1.0
MCP Voltage
235
1.5 2 .o pulse (kV)
(a)
.-C
W
c3
MCP Voltage pulse 2.1 kV. 5, 50ns
Input current pulse length A Ips
1 1
I
10-9
I
lo-*
I
IO-~
I
IO-~
Input current ( A ) (b) FIG. 16. Gain of a pulsed microchannel plate as function of (a) voltage and (b) input current (From Bostanjoglo et al.. 1987c.)
by pulsing any of the voltages. One must, however, bear in mind that gating, apart from determining the sampling time, must serve two further purposes: to increase the gain beyond the stationary value and to protect the detector against overload during the intensive illumination. This goal, in particular the first point, is reached only by pulsing the voltage across the microchannel plate.
236
0. BOSTANJOGLO
As the gain of a microchannel plate depends exponentially on the voltage, a moderate increase of the latter has a tremendous effect on the gain. If short gating pulses are applied, the voltage amplitudes can exceed the safe maximum dc ratings very significantly, as was demonstrated by Bostanjoglo et al. (1987~). The gain can be safely increased by orders of magnitude. A channel plate with a maximum safe dc voltage and input current density of 1 kV and 0.1 nA/cm2, respectively, can be operated for at least 50 ns at 2.1 kV to amplify input currents of densities up to 0.1 pA/cm2 (electron current pulse width 1 ps) without damage. Thereby the gain is increased by roughly two orders of magnitude above the stationary value (Fig. 15). The gain of a pulsed microchannel plate depends exponentially on the exciting voltage pulse amplitude, even at high input current densities, although it gradually decreases with growing input (Fig. 16). The exponential dependence of the gain on the voltage is additionally exploited to realize short exposure times with readily produced nonrectangular exciting voltage pulses (Fig. 17). These advantages of pulsed channel plates cannot be provided by sealed image converters for the following reasons. The capacitance between photocathode and microchannel plate is at least one order of magnitude smaller than the capacitance of the latter. Furthermore, the thin-film electrodes of the channel plate are not a good electric shield. Accordingly, when the voltage pulse is applied across the channel plate, a significant fraction is capacitively coupled to the photocathode. This causes arcing, with permanent damage to the voltage stability of the gap and to the photocathode layer. Therefore, the open channel-plate image converter (Bostanjoglo et al., 1987a, 1987c)was used as gated detector in short-exposure transmission microscopy, as described in Section 1V.D.b. A certain disadvantage of the open image converter must be remembered, however. Due to the decreasing secondary electron emission with increasing energy of the impinging electron (Fig. 2), only about 25% of the incoming high-energy electrons are registered. C. Periodic Processes and Stroboscopic Electron Microscopy
The main applications of stroboscopic microscopy have been investigations of periodic changes induced by high-frequency magnetic fields in ferromagnetic films and of electric potentials in semiconductor devices. Electron-beam stroboscopy is of particular technical interest as a fast and nonloading voltage probe for testing large-scale integrated circuits. 1. Applications to Magnetic Reversal
The macroscopic distribution of the magnetization vector in thin magnetic films is conveniently visualized in transmission microscopy by defocused
ELECTRON MICROSCOPY OF FAST PROCESSES n
> Y
Y
-InaJ 3
a2 aJ
237
a
c
9)
-2 1 0
>
a 0 u z
n
3
m
v
b ,
FIG 17. Generation of short exposure times with a microchannel plate, based on the exponential dependence of the gain on the voltage. (a) Exciting voltage and (b) output current. (From R. P. Tornow, PhD thesis.)
imaging (Fig. 5). Domain walls, their substructures, and fluctuations of the magnetization direction within the domains show up as bright and dark lines. The mean direction of the local magnetization vector within the domains may be determined with the “right-hand rule” of the Lorentz force. Applying a magnetic field H to a magnetic structure sets the local magnetization vectors M into rotation due to the torque M x He, where the effective field He is the sum of applied field and interactions due to exchange, anisotropy, demagnetization, and magnetostriction. As the magnetization is magnetostrictively coupled to the lattice, it dissipates its precession energy by spin-phonon interactions and orientates finally along the effective field. The temporal change dM/dt of the local magnetization M is given by the gyroscopic equation of Landau and Lifshitz (1935)
dM dt
~
=
1)lTH, x M - -M M,
x
(M x He),
(9)
238
0. BOSTANJOGLO
with = p0e/2rn, the gyromagnetic ratio, p o the vacuum permeability, e and rn, charge and mass of the electron, M , the saturation magnetization, and p a n empirical damping constant. The two terms on the right-hand side describe two fundamentally different modes of response of a magnetic structure to an applied field. The first term gives an inertia-type response consisting of precession around the effective field, whereas the second one comprises relaxation into the new equilibrium. A magnetic structure frequently consists of various domains separated by domain walls, with a gradual change of the local magnetization direction. Such a magnetic structure can respond to an applied field simply by displacing the domain walls. Thereby the magnetization within the wall rotates, generating a stray field energy that is proportional to the square of the propagation velocity. In analogy to the kinetic energy of a true mass, an inertial mass m can be ascribed to the moving domain wall, and, in general, to any propagating structure containing a gradient dtl/az of the magnetization angle 8. As was shown by Becker (1951) and Konishi et al. (1975), a 180" Bloch wall in a thin film (Fig. 18a) has a mass rn per wall area such that
which is caused by the first term in Eq. (9). Here N is the in-plane demagnetization factor of the wall and the z-axis is the in-plane normal to the wall. The second expression in Eq. (9) is a damping term, which shows that the magnetization relaxes within a time z = l/BTH, into equilibrium. Values typical for many low coercivity materials are p = 0.001 - 0.1 and He =
(a)
(b)
FIG.18. 180' magnetic domain wall in a thin ferromagnetic film with in-plane magnetization. (a) Bloch wall and (b) crosstie wall with Bloch lines and crossties (the local magnetization is indicated by arrows).
ELECTRON MICROSCOPY OF FAST PROCESSES
239
10 - 100 A/cm. They give z = 10-1000 ns as relaxation times for the motion of the magnetization vector. The magnetic structure can be quite complicated [Fig. 18(b)], so that magnetic reversal comprises various mechanisms, such as domain rotation, displacement of walls, and wall substructures. The different mechanisms proceed with different relaxation times and can therefore be observed individually, using exciting fields that change on different time scales. As complete decoupling is not feasible, transient stray fields build up, causing nonlinearities and irreversibilities. Using stroboscopic transmission microscopy in the defocused mode, Golubkov et al. (1979) and Bostanjoglo and Rosin (1980, 1981a, 1981b) investigated the different fast magnetic reversal effects in thin films. If a magnetic field is applied along a 180" domain wall, the latter tends to move in such a way that the domain with the favored magnetization grows. If the wall is fixed at two points, e.g., by film inhomogenieties causing a higher local coercivity, the wall will bow. A sinusoidal field H = H , sin wt will cause forced vibrations of the wall. Figure 19 shows the maximum excursion of a domain wall at three different exciting frequencies o.The amplitude H , of the exciting sinusoidal magnetic field was kept constant. The dependance of the vibration amplitude on the frequency is not monotonous, but has a resonance at 8 MHz. The resonance curve, showing the dependance in detail, is given in
Fici. 19. Stroboscopic defocused electron images of forced vibrations of a crosstie wall (Bostanjoglo and Rosin, 1980~1,IY80b. 1981a, 1981b)excited by a magnetic sine field H,sinwf along the wall ( H , = 190 Aim). The wall was imaged at its two maximum excursions and the negative prints were copied on top of one another to demonstrate the vibration amplitude at different frequencies c0/2n. (a) I MHz: (b) 7 MHZ: (c) 18 MHz.
240
0. BOSTANJOGLO
0
-
10 20 W / ~ J T MHz) ( Frc. 20. Amplitude x 0 of forced vibrations of the crosstie wall shown in Fig. 19 versus frequency to/2n of theexciting magnetic field H , sin cot (directed along the wall). The unsymmetric shape near the resonance is typical for unharmonic vibrations.
Fig. 20. Starting at zero frequency, the amplitude x o of the forced oscillation gradually increases with growing frequency. At about 10 MHz, the wall exercises large statistical jumps and a resonance collapse occurs, resulting in a new domain structure. If, on the other hand, the resonance curve is swept coming from the high-frequency region, the amplitude at first grows steeply with decreasing frequency. Then, beyond about 10 MHz the amplitude decreases again. Obviously, resonance is passed in this direction without a collapse. Such a nonsymmetrical resonance curve is typical for unharmonic oscillators with an additional cubic restoring force m A w g x 3 in the equation of motion d2x mT dt
+ mt dx + mw$ 1 + A x 2 ) x = 2M,H0 sin cot. dt --
Here x,, wo, m, and t are displacement, eigenfrequency, mass per area, and relaxation time of the wall; M , is the saturation magnetization; and A > 0 is an anharmonicity constant. The right-hand side of Eq. (1 1) is the exciting force exerted by the applied field. The anharmonicity term m A w i x 3 is due to stray fields generated within the domains near the displaced wall. Values m z 4.10-’ kg/m2 and t z 30 ns can directly be inferred for the mass and relaxation time of the wall that is shown in Fig. 19. Bloch lines within the domain walls are the one-dimensional analogue to Bloch walls. They constitute the boundary between two antiparallel magnetized parts of the domain wall [Fig. 18(b)]. Just as domain walls, they can respond as an entirety to a time-varying magnetic field. If a step-like magnetic field with a rise time exceeding the spin-spin relaxation time ( z4 ns) is applied perpendicular to the wall, the Bloch line responds by damped oscillations
ELECTRON MICROSCOPY OF FAST PROCESSES
24 1
FIG.21. Stroboscopic defocused electron images of the ringing response of a Bloch line on a magnetic wall, being excited by a rectangular magnetic field pulse H = 1 10 Aim (Bostanjoglo and Rosin, 1980a. 1980b, 1981a, 1981b). The field has a duration of 90 ns, riseifall times are 5 ns, and the repetition rate is 5 MHz. The sampling electron pulse width is 4 ns. Imaging points of time, as related to the leading edge of the magnetic field pulse, are indicated in the upper left corners.
about its new equilibrium (Fig. 21). At small amplitudes of the Bloch line oscillations, the neighboring crossties remain fixed. If, however, the Bloch line approaches a crosstie beyond a critical distance, the latter at first is repulsed and then the pair is annihilated. If the rise time of the magnetic field is inferior to the spin-spin relaxation time, the magnetization vectors cannot respond coherently. The domains on both sides of the wall break up into subdomains, creating stray fields that, in turn, force the domain wall to subdivide into numerous transient Bloch lines and crossties. These are gradually annihilated within the first 10 ns of such an incoherent switching due to a steep step-type magnetic field. The stroboscopic pictures demonstrate that magnetic switching comprises reversible processes, which can be imaged, and irreversible changes, which can result in resonance collapses of the original magnetic structure. Substructures, consisting of gradients of the magnetization direction, can move as entireties, in close analogy to particles in mechanics with an associated inertial mass and friction. What these entireties, which perform an individual motion, actually are, depends on the relation of the rise and fall times of the exciting magnetic fields to the spin-spin relaxation time. 2. Application to Time-Varying Electric Fields- Electron Beam Testing
The complexity of integrated circuits is continuously growing. In order to increase speed, the size of individual gates is steadily reduced and, simultaneously, the scale of integration is increased. Classical testing with mechanical probes is growing more and more prohibitive because of oversized capacitive
242
0. BOSTANJOGLO
and mechanical influence on micrometer and submicrometer devices. Electron and light beams have been successfully introduced as alternative probes, being non-mechanical, fast, and of low loading. As there are several articles reviewing electron beam testing by Feuerbaum ( 1983), Menzel and Kubalek (1983), and Wolfgang (1 986), only a brief survey will be given here. In the early days of electron beam testing, conventional emission (Spivak et al., 1966) and mirror electron microscopy (Gvosdover et al., 1970; Szentesi, 1972) were used in the stroboscopic mode. Rapid field changes in p-n junctions and propagation of surface acoustic waves on piezoelectric devices were successfully visualized up to 100 MHz. However, as periodic processes can be equally well analyzed with scanning microscopy, the latter was soon preferred because of its significantly larger variety of image signals and better resolution. The main interest in beam testing is directed toward the local electric potentials in large-scale integrated circuits and their time dependence. The basis of present fast beam testing modes is the “voltage contrast” effect of secondary and photoemitted electrons. a. Voltage Contrast. Signals that depend on specimen voltage are Auger electrons, secondary electrons, and photoelectrons. Auger electrons have the advantage that topographical and material contrast can be avoided. Since, however, their yield is several orders of magnitude inferior to that of secondary electrons and photoelectrons, the latter are preferred. The number of the secondary electrons or photoelectrons generated by the primary beam and collected by an electron detector is determined by the local electric fields on the specimen. Regions with a positive voltage attract a portion of the emitted electrons with low kinetic energy back to the specimen. These electrons are missing in the image signal, and, consequently, areas with positive potential will show up as dark. However, this voltage contrast signal depends in a nonlinear way on the specimen voltage, due to the peculiar energy distribution N ( E ) of the secondary electrons (Fig. 22). It can therefore be used only for qualitative investigations. The basis of quantitative voltage contrast is the linear shift of the energy spectrum of the secondary electrons with the applied specimen voltage (Fig. 23). The energy spectrum can be analyzed differentially using the linear shift of the peak of the spectrum, with applied voltage as the image signal. More commonly, the shifted spectrum is registered integrally with a retarding field analyzer (Fig. 24). The current 1 collected at a fixed retarding voltage -lUGol is a nonlinear function of the specimen voltage Usp. A linear signal can, however, be generated by installing a feedback, which keeps the collector current 1 constant by varying the retarding potential - 1 U,I [Fig. 23(b)]. The change AUG of the retarding voltage for keeping the emission current from the
v,,
+
luspl
0
E D
eUextr. FIG.22. Energy distribution N ( E ) of the ejected electrons at different potentials - 1 USPI, 0, and lU,pl of the specimen. U,,,, is extraction voltage applied to the detector; E is the energy of an electron reaching the detector.
4
A N IE)
=o
usp
IN
TNiEldE e(Usp +IUGO
I
D
USP
-I'GOI
-iuGl
(a) (b) FIG.23. Shift of energy spectrum N ( E ) of ejected electrons by a specimen potential UVp (a) and electron current I (b), as detected by a retarding held analyzer at different retarding voltages U G . ~
probing beam
I
specimen I retarding grid Fici. 24. Scheme of a retarding field energy analyzer with hemispherical electrodes. The probing beam may be either an electron or laser beam, ejecting secondary or photoelectrons of energy E, respectively.
244
0. BOSTANJOGLO
specimen,
constant when the specimen potential changes by AUsp is given by A1 UGl = -AUsp. Therefore, this change in the retarding voltage can be used as an imaging signal for quantitative measurements. In order to measure the potential distribution on large-scale integrated circuits quantitatively, which is a goal of highest technical importance, several conditions must be fulfilled. The interaction of local electric fields at the surface of the circuits must be reduced, and topographical and material contrast must be suppressed. Local electric fields, generated by a difference of voltage at neighboring devices on the integrated circuit, block the emission of low-energy electrons and alter the angular distribution of all emitted electrons. The influence of local fields on the energy spectrum can be effectively reduced by applying a strong extracting field at the specimen. Using favorably curved extracting and retarding field electrodes, the energy analyzer can be made insensitive to the angular distribution of the emitted electrons. A highly advanced spectrometer for electron beam testing, which solves the problem of local fields, has been developed by Plies and Schweizer (1987). It is schematically shown in Fig. 25. The spectrometer is optimized to achieve low aberration focusing of the primary electrons onto the specimen, and it minimizes the influence of local electric fields on the secondary electrons. It consists of a combination of a probe-deflecting system, a retarding-field spectrometer, and a microchannel plate detector. Primary electrons of low energy ( z 700 eV) are used in order to avoid electrical charging by working near the E , point (Fig. 2) and to keep radiation damage low. A low chromatic aberration constant is achieved using a short focal length, which resulted in a probe diameter of 0.15 pm at a current of 0.5 nA. The secondary electrons are extracted with a high-voltage U , = 2 k V applied to a planar extraction grid, furnishing an extraction field of 1 kV/mm, being far superior to local fields at the specimen surface. The high extraction voltage also helps to focus the secondary electrons, having a finite energy spread AE z 10 eV, as chromatic aberration of the focusing magnetic lens decreases with A E / e U , [see eq. ( 6 ) ] . The focus of the secondary electrons is nearly outside the magnetic field and within a region of constant electrical potential. Accordingly, the secondary electrons travel on straight lines passing their crossover. The latter is made to coincide with the center of the spherical retarding field, so that all secondary electrons move parallel to the electric field within the retarding field region. The spectrometer is therefore largely independent from the angular distribu-
ELECTRON MICROSCOPY OF FAST PROCESSES
245
deflector
FIG 25 Retarding-field-spectrometer objective for electron beam testing according to Plies and Schweizer (1987) with a channel-plate detector and electromagnetic probe forming lens ( U D ,U , . dnd UR are deflection, extraction and retarding voltages, respectively).
tion of the secondary electrons. Those overcoming the retarding field are amplified by the microchannel plate detector. Topographical contrast arises as the secondary electron yield varies with the angle q of incidence of the primary beam with respect to the surface normal as I/cos cp [eq. (2)]. With increasing inclination cp, the yield increases and the energy spectrum is displaced towards smaller energies (Koshikawa and Shimizu, 1973), thus causing faulty measurements. Material contrast is caused by the dependence of the secondary electron yield on atomic number, density, and work function. Even equal materials of a specimen can give different yields due to absorbed gases or contamination by beam-cracked hydrocarbons from the vacuum system. The influence of these effects can be eliminated by electronic processing of the signal. Two signals are obtained: one contains topographical and material contrast plus voltage information and the other is taken at zero voltage. As the first two contrast mechanisms are independent from the specimen voltage, they are eliminated by subtraction of both signals. h. Electron Beam Testing Modes. The aim of beam testing is to provide visual information on the spatial distribution of time-periodic voltage signals
246
0. BOSTANJOGLO
within the device under study in the time and frequency domain. There are several complementary testing methods, all exploiting the voltage contrast but processing the signal in very different ways (Wolfgang, 1986; Brust and Fox, 1985). Waveform Sampling. This mode is particularly suited for verification of the performance of large-scale integrated circuits, where the time dependence of the potential at particular test points is wanted. The probing beam is focused onto the selected area and pulsed in synchronism with the frequency of the voltage. The emitted electrons carry information on the voltage value at the pulsing time. By sweeping the phase of the beam pulser through 360", the total waveform is gained. Waveforms with rise times down to a few picoseconds were resolved at working frequencies up to several gigahertz (Fujioka and Ura, 1981; Hosokawa et al., 1978). Figure 26 shows an example of the waveform recording mode that was applied to study the motion of the high-field domain in a Gunn device. The waveform was measured at successive points A to E along the active region of the device. The anode was biased slightly below the threshold voltage of the Gunn effect and a 1 GHz rf triggering voltage was added. Each time
D
l o o p s I div ai
I
I
I
I
I
I
,
,
fi
I
(a) ( b) FIG.26. Wave form (a)sampled at points A to E in the active region of a Gunn diode, shown in (b) (From Fujioka and Ura, 1981.)
ELECTRON MICROSCOPY OF FAST PROCESSES
247
FIG.27. Propagation of the high-field domain in a Gunn diode, visualized by stroboscopic scanning microscopy in the voltage contrast mode. The imaging points of time after triggering of the domain motion are shown in the upper right corners. (From Hosokawa, Fujioka, and Ura, 1978.)
the threshold voltage was exceeded, a high-field domain nucleated near the cathode and started to drift towards the anode. The arrival of the domain at a test point is signaled by a drop in the potential. A propagation velocity of lo5 m/s is deduced. Stroboscopic Imaging. If an image of the spatial distribution of a voltage with a particular frequency is wanted at particular points of time, the probing beam is pulsed with this frequency and, in addition, scanned across the device, keeping the pulsing phase constant. Figures 27 and 28 give examples of such stroboscopic images of time-varying voltage distributions. Figure 27 shows as an example for processes on the picosecond time scale, the propagation of the high-field domain in a Gunn device. Slower variations of electric potentials, proceeding on the nanosecond time scale but in a device of significantly higher complexity, are shown in Fig. 28. Frequency Tracing. Frequency tracing is used to visualize all points of a device carrying a voltage of a wanted frequency coo. The probing beam scans the device and ejects electrons, which are exploited as an image signal. Since their current depends on the local voltage, it is modulated by the local frequency. The signal is passed through a narrow bandpass filter centered on
248
0. BOSTANJOGLO
FIG.28. Voltage distribution in a switching NAND gate at different points of time, indicated in the lower left corners, as visualized by stroboscopic SEM in the voltage contrast mode. (From Menzel and Kubalek (1983), reprinted with permission of the publisher, FACM, Inc. The JBI Building, Box 832, Mahwah, N. J. 07430, USA.)
wo before reaching the brightness control of the CRT of the scanning
microscope. Thus, only those device points appear bright that carry a potential of the wanted frequency. In order to circumvent limitations due to the bandwidth of the detector system, which may be significantly smaller than the wanted frequency wo, the heterodyne technique is applied. The probing beam is pulsed with a convenient and fixed frequency oj0 + wi. The ejected electron signal current is proportional to the pulsing beam intensity and, in addition, depends on the local voltage with frequency wo. Accordingly, the ejection process mixes the frequencies wo + mi and coo. The signal, collected by the energy-analyzing electron detector, contains, among other things, a Fourier component of the
ELECTRON MICROSCOPY OF FAST PROCESSES
249
+
difference frequency (wo mi) - wo = mi, i.e., the intermediate frequency, whenever a voltage with the wanted frequency wo is present at the object points. This component is extracted with a bandpass filter. The intermediate frequency is selected to fall well within the bandwidth of the detector system. Limits are set now not by the detector, but by the bandwidth of the beam pulser, which must exceed oo coi. This, however, is much more easily realized than increasing the bandwidth of the detector.
+
Frequency Mapping. If the voltage frequency wo is unknown, it can be determined by the frequency mapping mode. In this case, the probing beam scans the device along only one line. Simultaneously, a spectral analysis of the voltage contrast signal is carried out and displayed on the CRT of the scanning microscope. The y-axis coincides with the line scanned and the x-axis corresponds to the frequency axis. Frequency mapping and tracing function without synchronization with the device under study. They can be applied to asynchronously driven circuits, e.g., free-running oscillators. Logic State Tracing. This mode was introduced by Brust and Fox (1985) to trace a specific periodic voltage, e.g., a particular bit pattern, in a circuit. The probing beam scans the device and ejects electrons, which carry information on the local time-varying voltage, consisting of locally different bit patterns. This signal is fed to a correlator, which compares it to a particular stored bit pattern, coinciding with the wanted signal. The output of the correlator controls the CRT brightness of the scanning microscope. The stored reference signal can either be derived from a pattern generator or from a preceding measurement by the probing beam itself. Figure 29 shows an example of logic-state tracing in a clocked integrated circuit.
c. Resolution of Electron Beam Testing. An important question is the resolution that is threefold, concerning space, time, and voltage. Spatial Resolution. Spatial resolution is limited by the usual parameters of scanning microscopy, such as the spot size of the probe on the specimen and the exit volume of the secondary electrons, as discussed in Section 111.B. In addition, further degradation of resolution is caused by beam pulsing and by aberrations due to the extraction field of the detector. Degradation of the spot size due to electron beam pulsing depends on the pulsing technique. It can be made negligibly small in the case of a properly designed beam blanking system based on a capacitor deflector and chopping aperture (Menzel and Kubalek, 1979). High extraction fields of the electron detector,
250
0. BOSTANJOGLO
FIG.29. Logic-state tracing of an integrated circuit. (a) Conventional SEM image of several interconnections of the circuit, produced with secondary electrons. (b) Imaging of only those two interconnections, that carry the wanted signal [marked by arrows in (a)]. (From Brust and Fox (1985.)
being inevitable for breaking down microfields at the specimen surface, introduce only defocusing if the electrodes are planar. This is easily compensated with the probe forming lens. Nonplanar electrodes additionally cause astigmatism, which must be corrected by an additional stigmator. A resolution of about 0.1 pm is achievable in the voltage contrast mode with advanced detectors (Plies and Schweizer, 1987). Time Resolution of Voltage Measurements. The time resolution is determined in principle by generation time and dispersion of the time of flight of the secondary electrons, the bandwidth of the detector and-in the stroboscopic mode-by the exciting beam pulse width. s, thus allowing voltage Secondary electron emission occurs within measurements up to the terahertz range. Photoelectron emission with light of frequency f occurs within a time I/f, according to the uncertainty relation, and is even less limiting. The time of flight of a secondary electron of energy E is
where e and m are charge and mass of electrons; U , is extraction voltage; s is the distance between specimen and detector, and E is the energy of the secondary electrons. Due to the energy spread AE of the latter, there is a dispersion,
m AE 2E eU,’
ELECTRON MICROSCOPY OF FAST PROCESSES
25 1
of the time of flight, which limits the maximum voltage frequency f,,, to a z 1/2 A t f . The time dispersion is reduced by keeping s as small as value jmax possible and the extraction voltage U , as high as possible. With A E % E z 10 eV, U , z 2 kV, and s % 2 mm, maximum frequencies of several gigahertz are attainable. The bandwidth of the detecting system is of major importance in real-time measurements. It is determined primarily by the maximum operating frequency of the linearizing feedback system used in the quantitative voltage mode (Feuerbaum, 1983), being about 300 kHz. The rise times of the electron detectors (plastic scintillators and photomultiplier tubes or microchannel plates) are on the order of Ins and can be neglected here. The limitation due to the small bandwidth of the feedback system is resolved by the stroboscopic or sampling technique. In both modes, the generated secondary or photoelectron signal is averaged over a long time t comprising many cycles of the periodic voltage. The bandwidth A j z l / t may be chosen conveniently small to eliminate noise without degrading the signal. Time resolution is now given by the exciting pulse width. A resolution in the picosecond and subpicosecond regime has been attained at fixed frequencies of several gigahertz (Fujioka and Ura, 1981; Gopinath and Hill, 1973, 1977; Hosokawa et al., 1978), and a resolution of several picoseconds was achieved at variable frequencies up to several hundred megahertz or even gigahertz (Bokor et al., 1986; Feuerbaum and Otto, 1978; Marcus et al., 1986; May et al., 1987; Menzel, 1981; Menzel and Kubalek, 1979; Weiner et al., 1987). Voltage Resolution. The minimum detectable voltage Uminat the specimen is limited by the noise of the secondary current lsE. This current is derived by amplification of the primary current I , with an average yield 6, giving I , , = Tdl,, where T is the transmission of the spectrometer grids. Part of the noise is thereforecaused by shot noise in the primary current, with a mean power density AI;, which is amplified, giving T26'A1;
=
2~Afl,T'6~,
(15)
with A.f the bandwidth of the detector. In addition, there are fluctuations of amplification, showing up as shot noise of the secondary current 2el,,TAf
= 2c61,T2 Af.
(16)
The total mean square of the noise is then _ .
AI;
=
2eA,flpT'6(l
+ a).
( 1 7)
252
0. BOSTANJOGLO
With a retarding field spectrometer detector, the minimum detectable voltage USpis determined as follows. Operating the detector with a retarding voltage tiG at a current level Is, results in a shift of the spectrum by AUsp when the specimen voltage is abruptly changed by AUsp (Fig. 30). This shift is accompanied by a change AlsE of the emitted current. In order that this change of current is detected, it should be at least three times the noise amplitude. With
one finally gets
A minimum detectable voltage of Uspmin % 0.1 mV is expected, according to Eq. (19), under usual testing conditions, and actually an only slightly worse value of 0.5 mV was observed (Menzel, 1981).
d. Photoemission Sampling. Recently, time-resolved photoemission was introduced as an alternative to electron beam testing (Blacha et al., 1987; Bokor et al., 1986; Marcus et al., 1986; Weiner et al., 1987). Using focused visible or ultraviolet laser radiation, photoelectrons are ejected from the probed area by multiphoton or single photon processes, respectively. They carry information on the local electric potential as secondary electrons and are processed in an analogous way. A resolution of 17 ps was achieved for potential variation of several tens of millivolts on the submicrometer scale.
AUG
AUSPrnin
FIG.30. Change of ejected electron current Isp by a shift AUsp of the specimen potential Usp.
ELECTRON MICROSCOPY OF' FAST PROCESSES
253
D. Fust Nonrepetitioe Processes and Real- Time Electron Microscopy Fast nonrecurrent processes can be caused either by triggering the collapse of a highly metastable phase or by rapidly transfering a stable phase far into a metastable region, wherefrom the system relaxes, again by a fast process. In the first case, only an activation energy for nucleation of a stabler phase is to be supplied. Once present, the latter can grow precipitously due to excess energy liberated in the collapse. The second process usually requires more energetic stimulating pulses, as the total transformation energy must be supplied. Both types of fast processes can be initiated, among others, by ion, electron, and laser beams. These stimulants have a great advantage over other possible stimulants in that they can be focused in space and time. Accordingly, the fast processes can be started at a predetermined point of time and in a localized area, and even high converted power densities with the associated unusual states of matter can be handled in the laboratory. Using an electron microscope as the analyzing tool, it is straightforward to exploit the electron beam itself for initiating the fast process. However, laser beams have several advantages as compared to electron or ion beams. Very short and powerful pulses can be much more easily obtained with lasers. Even small-sized solid state laser oscillators give pulses with duration 5-50 ns and power densities up to 10" W/cm2, when Q-switched and focused by a single lens (Koechner, 1976).With some additional effort, using mode coupling and pulse slicing, single picosecond pulses with power densities up to 10l2Wlcm' are readily produced. Commercial lasers provide radiation with photon energies ranging from the far infrared, e.g., 0, 1 eV for the carbon dioxide laser, up to the ultraviolet at z 5 eV for excimer lasers. The interaction of laser light with matter can therefore be selected to range from purely thermal to purely electronic. Of course, at high pulse intensities the probability for multiphoton absorption sharply rises and electronic transitions are excited besides the thermal vibrations, even with low-energy quanta.' In addition, excitation may be extremely selective due to the narrow bandwidth of the laser light. Finally, targets can be treated by laser pulses in ambient atmospheres. There is no limitation to vacuum or low-pressure atmospheres, as with electron or ion beams. This fact greatly facilitates large-scale industrial laser machining. There are also, however, problems with laser beams. In the first place, the beam energy is deposited very inhomogeneously in the target, according to Beer's exponential law. This results from the fact that light energy is deposited by statistical absorption of photons. Treatment, therefore, is confined to Details on rnultiphoton absorption rates are given by Tozer (1965). Keldish (1965), and Bebb and Gold (1966).
254
0. BOSTANJOGLO
surfaces in most cases. Secondly, special conditions must be met to treat highly reflecting or transparent materials effectively. Nevertheless, the advantages of laser beams prevail and there is a rich body of applications to material processing, comprising etching, ablation, deposition, and transformation (Appleton and Celler, 1982; Bauerle, 1984; Poate and Mayer, 1982). As the electron microscope is an effective instrument for structure analysis and the laser is a powerful tool for treating matter, it is attractive to combine both techniques for investigations of fast processes. Thus far, two modes of electron microscopy have been adapted for real-time investigations, the transmission and reflection modes. suitable for thin films and surfaces of bulk materials, respectively. 1. Real-Time Techniques
Two distinctly different techniques are feasable for time-resolved investigations. In the continuous mode the image signal is followed with a high time resolution over an extended period of time, giving the total history of the process. In the short exposure mode an image, or diffraction pattern, is registered with a very short exposure time, storing a selected intermediate state. Both techniques have been realized in transmission and reflection electron microscopes. 2. Real-Time Transmission Electron Microscopy a . Continuous Mode. Figure 31 shows a commercial transmission electron microscope, as adapted by Bostanjoglo et al. (Bostanjoglo and Endruschat, 1984, 1985; Bostanjoglo et al., 1985, 1986) for investigations of laser-induced phase transformations. For this aim, modifications were introduced at the electron illumination system, the specimen chamber, and the detector. The illuminating electron beam is pulsed using a deflecting capacitor and a chopping blade. The beam is deflected off the specimen in the quiescent state and switched on for only several microseconds. Pulsing is an inevitable precaution to avoid radiation damage of the specimen and the detector. A pulsed laser beam is used to trigger the fast processes to be studied by electron microscopy. A Q-switched, frequency-doubled Nd:YAG laser is used, producing green pulses of 532 nm wavelength with a duration of 20-30 ns (FWHM). The laser beam is at first expanded and then focused with a laser objective lens and a dielectric mirror onto the thin-film specimen to a spot of about 15 pm in diameter. The mirror substrate, which should be polished to l / l O , is heavily doped silicon to avoid charging by the electron beam. The mirror has a central bore for transmission of the electron beam and can be tilted about two axes and translated in three directions for adjustment.
255
ELECTRON MICROSCOPY OF FAST PROCESSES -HV electron gun
condenser lens
b l a n k i n g system
dielectric mirror
specimen objective lens aperture
brightfield i m a g e scintillator photomultiplier
p h a s e transition
FIG.31. Transmission electron microscope for the continuous-rnode, time-resolved investigations of laser-pulse-induced phase transitions in thin films.
A green HeNe laser is employed for adjusting the light beam, whereby the specimen is viewed in the backscattered light with a telescope. The experiment runs as follows. A master pulse generator switches the electron beam on. About 100 ns later, when the electron beam has safely centered on the specimen, the Q-modulating Pockels cell of the laser is switched off. A giant laser pulse is then emitted after about 150 ns, initiating the phase transition. During illumination with the electron beam, a bright-field image-or diffraction pattern-is generated by conventional imaging on the detector, which is a combination of a fast plastic scintillator and a photomultiplier tube, placed beneath the final viewing screen. The signal of the photomultiplier is stored by storage oscilloscopes. The rise time of the total detecting system is 3 ns. Two storage oscilloscopes are used in order to display the change of contrast due to the phase transformation at two widely differing time scales. The triggering of the oscilloscopes deserves some consideration. Oscilloscope 1 covers the whole period of the electron beam illumination of 1-10 p s and is triggered at its beginning. A phase transition will show up as a step in the image intensity. Although it will be detected, its rise time may not be resolved.
256
0. BOSTANJOGLO
As it may occur with some unpredictable delay after the initiating laser pulse, the recording oscilloscope should be triggered by the very transition for high resolution. Because of the heavy shot noise, the shaping of a trigger pulse from the image signal is somewhat involved. In the simplest case, if only one phase transition is present, the image signal is a double step with superimposed noise (Fig. 32). The first step is due to the switching in of the electron beam, and the second one is caused by the phase transition. The signal is fed to a trigger pulse generator (Bostanjoglo and Horinek, 1983), having a high-impedance input, and to the oscilloscopes via a low-loss 50 C2 transmission cable. The latter compensates the propagation delay of the trigger pulse generator. A differentiating network at the input of the pulse shaper transforms the doublestep signal into two single pulses plus differentiated shot noise. This is fed after amplification to an ultrafast ECL comparator, the reference voltage of which is adjusted slightly above the maximum noise pulses. Accordingly, the comparator delivers a well-shaped pulse at any change of the image intensity, which exceeds the shot noise. A following type D flipflop
FIG. 32. Typical example of the continuous-mode, time-resolved TEM. (a)Final structure of laser pulse-crystallized amorphous Si-AI layer. (b) Change of the bright-field image intensity due to crystallization within the circled area. The intensity levels of the amorphous and crystalline phase are denoted by a and c, respectively. LP is the laser pulse. The noise is due to shot noise in the electron image. (c) The crystallization step of (b) at a magnified time scale. (From Bostanjoglo el al., 1986.)
ELECTRON MICROSCOPY OF FAST PROCESSES
257
transforms the double pulse into a single, broad square pulse with duration equal to the distance between switch on of the electron beam and phase transition. The trailing edge of this pulse coincides with the onset of the phase transition and is used to trigger oscilloscope 2, which displays the transition on a magnified time scale. If the phase transition consists of more than one step, any one of these can be readily selected to trigger the oscilloscope with a preset down-counter replacing the D flipflop. Using laser pulses of higher power density, the phase transition is initiated with negligible delay, so that oscilloscope 2 can be triggered directly with a fast photodiode, which monitors the laser pulse shape. Of course, the setup in Fig. 31 omitting the laser can be used to study fast phase transitions initiated by the illuminating electron beam itself (Bostanjoglo and Liedtke, 1980; Bostanjoglo and Schlotzhauer, 1981; Bostanjoglo, 1982, 1983; Bostanjoglo et at., 1982; Bostanjoglo and Hoffmann, 1982, Bostanjoglo and Horinek, 1983).A variant of the detecting method described above was introduced by Takaoka et al. (1986). The image intensity is picked up simultaneously at three different points of the final image with three scintillator/photomuItipliers. This technique can be applied if a phase front moves across macroscopic distances. The direction of propagation and mean velocity are then determined with a high accuracy. h. Short-Time Evposure Imuying. Figure 33 shows the commercial transmission microscope of Fig. 3 1 adapted for short-exposure photography (Bostanjoglo et al., 1987a, 1987b, 1987~). Electron beam illumination and laser pulsing are as before, but now the image detector is a gated image converter, as described in Section 1V.B. It consists of a voltage-pulsed microchannel plate and an output scintillator at a constant acceleration potential. By pulsing the channel plate, a short-exposure highly amplified image is displayed on the output scintillator, wherefrom it is picked up by a television camera and transferred to a monitor and a memory for image processing. The electron beam pulser, the annealing laser, and the image converter are driven by a master pulse generator with appropriate pulse delays. The complementary character of information furnished by the two modes of real-time transmission microscopy is demonstrated in Fig. 34. It shows the disruption of a germanium film by an intense laser pulse. According to the oscillogram the disruption starts quite abruptly after a delay of % 120 ns after the laser pulse and clears the field of view of 0.5 pm /2/ within 20 ns. The shortexposure micrograph, taken 140 ns after the laser pulse, shows that the film disintegrates in a very inhomogeneous way by breaking up into fragments.
3. R e d - Time Reflection Electron Microscopy Transmission electron microscopy is limited to thin-film specimens with thicknesses below ~ 0 . 5 p m In . order to investigate fast processes on bulk
258
0. BOSTANJOGLO
electron gun condenser lens
master p u l s e genera tor
-
trig MCP
Nd YAG laser
532 nm
blanking system
dielectric mirror
20ns
M
ba
g HeNe l a s e r
telescope telescope
I.(“‘
exposure pulse gen
=*
”
specimen objective l e n s aperture inter rnedi a te lens
,
projector lens b r i g h t f i e l d image
;
micro channel p l a t e 2scintillator
FIG.33. Transmission electron microscope for short-exposure time imaging of laser-pulseinduced phase transitions in thin films.
FIG.34. Time-resolved transmission electron microscopy of perforation of a thin film by a laser pulse. (a) Short-exposure time micrograph of a transient state during perforation, taken 140 ns after the laser pulse. Exposure time was 15 ns. (b) Change of bright-field image intensity within an area or 0.5 pm in diameter due to perforation, observed by the continuous mode (LP laser pulse). The two complementary results were gained on similar films with coinciding laser pulse energies.
259
ELECTRON MICROSCOPY O F FAST PROCESSES
laser- pulsed thermal e--gun condense1 lens
e- blanking u n i t
specimen (tiltable, translatable] objective lens aperture intermediate lens project or lens
MCP image converter
U FIG.35. Time-resolving reflection electron microscope for investigations of laser-pulseinduced modifications of surfaces (From Bostanjoglo and Heinricht, 1988.)
material, a time-resolving reflection electron microscope was assembled from components of a commercial transmission microscope (Bostanjoglo and Heinricht, 1988). The setup is shown schematically in Fig. 35. The electron optics consist of the conventional electromagnetic lenses, whereas the electron gun was modified for laser-pulsing, as described in Section IV.A.4.b. It allows conventional constant and pulsed high-current electron illumination. An electron beam blanking unit is installed to cut out a short illuminating pulse. Laser pulsing of the specimen is as described before. The electron image of the surface, or the high-energy reflected electron diffraction pattern, are picked up by a closed-type channel-plate image converter, operated in the conventional constant-voltage mode. The electron illumination time is determined by the duration of the sliced electron pulse and is z 2 0 ns. The two lasers and the beam blanking unit are driven by a master pulse generator. Figure 36 shows as an example the short-exposure image of a crater, shot into the surface of a silicon crystal by a 25 ns laser pulse. At the used laser power densities of 90 MW/cm2 the hydrodynamic processes are confined to almost the first hundred nanoseconds after the initiating laser pulse.
260
0. BOSTANJOGLO
FIG.36. A crater shot by a laser pulse (532 nm, 25 ns FWHM, 170 pJ, 90 MW/cm2)into the ( I 1 I ) surface of a silicon crystal. (a) Transient shape of the crater 40 ns after the laser pulse, as imaged by the time-resolving reflection electron microscope (REM) with an exposure time of 20 ns. (b) Final shape of the crater, imaged as in (a).(c) A crater shot with a similar laser pulse energy (180 pJ), as imaged with R E M but with a conventional long exposure time. (d) The crater or (c), conventionally imaged with scanning electron microscopy using secondary electrons. (From Bostanjoglo and Heinricht, 1988).
In addition to imaging, thermal and secondary electrons emitted by the laser-pulsed specimen can be picked up with an Everhart-Thornley detector ( Everhart and Thornley, 1960), an electrically screened scintillator/ photomultiplier, and surface processes traced continuously.
v. APPLICATION OF REAL-TIME ELECTRON MICROSCOPY TO FASTLASER-INDUCED PROCESSES Before discussing applications of real-time electron microscopy to fast processes induced by laser pulses, estimates of the relevant time scales are presented. A . Time Scale of’ Fust Laser-Induced Processes
Adiabatic deposition of high-density laser power in a solid results in highly excited electrons and holes, at first. Existing free charge carriers absorb the light energy directly by inverse Brems radiation, and bound electrons
ELECTRON MICROSCOPY OF FAST PROCESSES
26 1
are excited by multiphoton absorption. A t very high power densities above l O I 4 W/cm2, corresponding to electric field amplitudes of light on the order of the intra-atomic Coulomb fields ( lo9 V/cm), field ionization and electron tunneling occur, in addition. The time of interaction At, between laser light of frequency (1)and atomic electrons is estimated with the uncertainty relation to be A t , z h / h o z s. Equilibrium between the free charge carriers is reached theoretically within s (Yoffa 1980) by distributive collisions, plasmon production, and electron-hole production and recombination in semiconductors. The laser-induced generation rate of electrons and holes and their transient concentration is usually very high. For instance, if a green laser pulse (ho% 2 eV) with energy E z 1 mJ and duration z 10 ns is focused to a spot with area A z cm2 on silicon (absorption length d z 0.2 pm, reflectivity R z 0.8), free charge carriers with a concentration N = ( 1 - R ) E / A d h o 2 10’’ cm-3 are produced. At these high concentrations, recombination proceeds by the Auger process, as this is a three-particle collision process with a probability N Thereby an electron-hole pair annihilates and the band gap energy E, is transferred as kinetic energy to a third free charge carrier. At lower concentrations, when the plasmon energy h o p is smaller than the band gap energy E,,
-
’.
e, m e , and m h are charge and effective masses of electrons and holes, and E is permittivity. Plasmons are also excited by electron-hole recombination. Thermalization of the electronic system with the crystal lattice proceeds by a multiphonon process within the electron/hole-lattice relaxation time z. Its value can be deduced from the imaginary part of the complex refractive index ii = n - i.u with
n2
- x2
z nn[ 1 -
(z)’],
where (JJ is the frequency of light and nn is the refractive index of the nonexcited material. A value of T z 1 ps is deduced for silicon from recent time-resolved transmission measurements with ps-laser pulses (Baeri et al., 1985; Lompre et al., 1984), in agreement with estimates of T based on the Hall mobility p = er/ni. Thus, the initially decoupled electronic system returns into equilibrium with the lattice within a picosecond, and any slower processes are expected to obey classical thermodynamics with the usual lattice temperature.
262
0. BOSTANJOGLO
Once the energy has been transferred to the lattice, it is dissipated by thermal diffusion, as latent heat by phase transformations and by radiation. The time scale At characterizing these dissipative processes and accompanying material transport, proceeding on a micrometer level AL x 1 pm, is expected to be significantly slower than the picosecond electronic relaxations. Estimates will now be given that use material parameters typical for semiconductors and metals. Dissipation of energy by heat conduction across a distance AL x 1 pm is characterized by a diffusion time Atd = (AL)2/2Dthz 0.5 p s with the thermal diffusivity &, z 0.01 cm2/s (e.g., for amorphous Si). Phase transitions propagate at a temperate T with a velocity u = *aj,,exp(
-2)
= u,exp(
-+).
Here a is the distance between two neighboring atomic sites, fD is the Debye frequency, i.e., the most frequent jump frequency of an atom, exp( - E , / k T ) the Boltzmann probability factor for a successful crossing of the potential wall with activation energy E , and us is velocity of sound in the condensed phase. Thus, the minimum propagation times Atp of phase transitions on the micrometer scale are Atp = AL/v 2 AL/u, x 1 ns, with u I us x 1000 m/s. Eq. (22) also holds for removal of condensed material by vaporization, with E , roughly coinciding with the sublimation heat per atom. So, the maximum velocity of ablation by boiling is again the velocity of sound. The time At, of cool-down due to heat radiation is estimated from the Stefan-Boltzmann law dT pcd- z -uoT4, dt giving
with emissivity u , o is the Stefan-Boltzmann constant; p, c, and d are density, heat capacity, and absorption length of the target for visible light. Inserting the values c1 z 1, c p z 5 J/K.cm3 and d x 20 nm typical for metals, one gets At, 2 10 p s for temperatures T up to the highest possible boiling point ( 56000 K), so that radiation can usually be neglected. Transport of material, such as flow of a liquid layer driven by surface forces, propagates with the Rayleigh velocity
ELECTRON MICROSCOPY OF FAST PROCESSES
263
where p. y, and d are density, surface energy, and thickness of the crumbling liquid layer. A sensible value for the latter is the absorption length of light. Using mean values for liquid metals y z 0.1 N/m, p z 5 g/cm3, d z 20 nm, the time scale for hydromechanical fluctuations across distances exceeding AL z 1 pm is Atf z AL/u, z 20 ns. In summary, roughly two time regimes can be associated with high-power laser-induced processes. The faster regime comprises excitation and thermalization of the electronic system, which proceed within 10-'4-10-12 s. A slower one concerns dissipation of energy within the lattice and structural relaxations, which are characterized by a nanosecond time scale, at least for processes occurring on the micrometer level. These slower lattice processes are presently within the realm of real-time electron microscopy. Laser-initiated explosive crystallization of amorphous films is a particular fast process that has been investigated in some detail by electron microscopy. The results will be now discussed as an example of the application of real-time electron microscopy. B. Explosive Crystallization of Amorphous Films
I . Experimentul Results Amorphous films of a variety of elements [Sb (Bostanjoglo and Schlotzhauer, 1981; Bostanjoglo et al., 1982; Gotzeberger, 1955), Ge (Bostanjoglo, 1982; Bostanjoglo and Endruschat, 1984, 1985; Endruschat, 1986; Mineo et al., 1973; Takamori et al., 1973), Si (Andra et al., 1982; Bostanjoglo, 1982; Geiler et al., 1982, 1986; Gotz, 1986; Koester, 1978; Wagner et al., 1985)], alloys [ e g , Fe,Ni, ~x with x 2 0,6 (Bostanjoglo and Liedtke, 1980)], and compounds [SiOz (Aleksandrov, 1984)] prepared in a particular metastable state exhibit self-sustained crystallization. Once crystallization is locally started by a &like mechanical shock, light, or electron beam pulse, it spreads explosively across large areas, yielding a centrosymmetric crystal texture. Figure 37 shows a typically threefold structure generated by a laser-pulseinduced explosive crystallization of an amorphous germanium film. Explosive amorphous films of other materials, e.g., Si and Sb, give similar structures. They usually consist of a fine-grained central area 1, which approximately coincides with the laser spot. Then comes a region 2 with large radial crystals. Finally, there is a third region (3), consisting of several concentric bands of tilted crystals. Only region 2 is present in all explosive crystallization events. If the triggering pulse has a sufficiently low energy, region 1 shrinks to a single crystallite and the concentric bands may be missing altogether. The three regions not only differ with respect to their texture, but also their temporal formation is as dissimilar.
264
0. BOSTANJOGLO
FIG.37. Typical three-fold structure of an explosively crystallized amorphous germanium film. The dark surrounding material has remained in the original amorphous state.
FIG.38. Explosive crystallization of amorphous germanium within region 1. initiated by a low-energy laser pulse. (a) Final structure. (b)Abrupt change of image intensity within the circle in (a)due to crystallization. (c)Crystallization step of (b) at a larger time scale. (From Endruschat, 1986.)
ELECTRON MICROSCOPY O F FAST PROCESSES
265
FIG.39. Long-term crystallization of amorphous germanium within region I , initiated by a high-energy laser pulse. ( a )Final structure. (b)Gradual change of image intensity within the circle in ( a ) due to crystallization. Crystallites are smaller and crystallization time is markedly larger than in Fig. 38. (From Endruschat. 1986.)
ci. Region 1. Figures 38 and 39 show crystallization texture and formation dynamics of the fine-grained region l . This region is characterized by a small crystal size 0, 5 0.1 pm and a delay of crystallization on the order of 10 ns, which decreases with increasing laser pulse energy. A t low pulse energies, crystal growth is completed within A t , z 20-80 ns, giving crystallization velocities LI,= D c / 2 A t , zz 5-10 m/s. At high pulse energies, crystallization deccelerates appreciably during its course down to 0.5 m/s and lasts up to 500 ns. At even higher laser pulse energies, a hole is opened at the center by melting during laser irradiation. The hole continues to grow, even after the laser pulse, for several hundred nanoseconds by a propagating molten rim (Fig. 40) with a velocity of 50-60 m/s in films with a thickness of d z 100 nm. Identifying this measured value with the Rayleigh velocity in Eq. (25) gives a surface tension y z 0.5 N/m, which is approximately the value for liquid germanium near the melting point T,, = 1210 K of the crystal. Thus, as long as region 1 is not utterly destroyed, its temperature is in the vicinity of Tn,,.
h. Region 2. This is the region that is always present after explosive crystallization. It is characterized by radial growth of large crystals with sizes in the micrometer range. Growth starts here with a considerably larger delay, of 100-200 ns after the laser pulse, than in the central region; but once nucleated, the crystals grow very fast. Figures 41 and 42 show the evolution dynamics. Crystallization proceeds by the advance of a diffuse phase boundary for approximately 400-500 ns. The velocity of the radial crystal growth is o, = 12-15 m/s in all cases and, in contrast to region 1 , it was not observed to depend on the laser pulse energy.
FIG. 40. Formation of a hole in region I by melting and capillarity effects. ( a ) Intermediate state 500 ns after the laser pulse. Exposure time was 15 ns. (b) Motion of the molten rim across the viewing field of the photomultiplier (0.5 pm 9). The intensity dip prior to the rise marks the entrance of the molten rim into the viewing field. (From Endruschat, 1986 and R. P. Tornow PhD thesis).
FIG. 41. Explosive crystallization of amorphous germanium in region 2, initiated by a laser pulse. (a) Final shape of a radially grown crystal. (b)Change of image intensity within the circle in (a) due t o crystallization. (c) Crystallization step of (b) at an increased time scale. (From Endruschat, 1986.)
266
FIG.42. Intermediate states (a1 to a5) of explosive crystal growth in regions 2 and 3 of amorphous germanium, initiated by a laser pulse, The points of time after the laser pulse are indicated in the upper right corners. Exposure time was 15 ns. Below ( h l to b5), the final states are shown. The images were taken from neighboring areas of the same film at the indicated times after a laser pulse of equal energy. Figures a l to a4 show the radial crystal growth in region 2. Figure a5 shows a transient state of the delayed helical growth in region 3. The pronounced tilting of the radial crystals in region 2 of Fig. a5, generating radial grain boundaries, occurs at least several microseconds after crystallization. (From R. P. Tornow P h D thesis).
268
0. BOSTANJOGLO
Though the phase transformation starts significantly earlier in region 1, it may be still in full swing there, while it has entirely come to an end in region 2. This is substantiated by the delayed opening of a hole in region 1 (Fig. 42), which means that this hottest region consists of a liquid/solid mixture at a temperature near T,,, gradually solidifying during 0.5-1 ps, well after the surrounding region 2 has transformed into a stable crystal structure. c. Region 3. This region does not necessarily emerge in an explosive crystallization. If it is present, it consists of one or more concentric bands of rotated crystals, which are well separated from the radial crystals by a finegrained layer (Bostanjoglo, 1982). The growth of the bands does not follow that of the radial crystals right away. Actually, there is a substantial break of about 1 p s between the end of radial growth and the appearance of the first rotated crystals. Crystallization in this region lasts for about 5 ps.
2. Model Based on Nucleation and Growth These electron microscopical results support the following model of explosive crystallization. It is based on the assumption that the amorphous film has a roughly defined melting temperature T,,, which is below the melting temperature T,, of the stable crystal phase (Aleksandrov, 1983, 1984; Baeri and Campesano, 1982; Spaepen and Turnbull, 1982). Actually, amorphous films with an atomic packing different from that of the liquid, such as Ge or Si, are expected to have a melting point. Heating in this case leads to a first-order phase transition with a latent heat of fusion instead of gradual softening, as is observed with glasses. If an amorphous film is now adiabatically heated with a 6-shaped energy pulse within a limited area of several micrometers in diameter to an intermediate temperature between Tmaand T,,, the amorphous film is superheated locally. This is the starting point for explosive crystallization, which, being very fast, is certainly not a solid phase transition, but is believed to proceed via a transient liquid according to the scheme: superheated amorphous solid
melting
epitaxial
supercooled liquid solidification crystal.
These processes may be described by the usual kinetic theory of nucleation and growth (Bostanjoglo and Endruschat, 1985; Endruschat, 1986; Feder et al., 1966; Geiler et al., 1982, 1986; Gotz, 1986; Porter and Easterling, 1981). Before a phase transition from a nonordered to an ordered structure can actually set in on a macroscopic scale, critical nuclei of the ordered phase must appear, i.e., such nuclei of the new phase, which are large enough to grow spontaneously. According to the above scheme, there are three processes
ELECTRON MICROSCOPY OF FAST PROCESSES
269
involved in producing a critical crystal nucleus: 1. Appearance after a waiting time t,, of the first critical liquid nucleus in a volume V, later filled by a crystal. 2. Total melting of the volume V by growth of the liquid nucleus, consuming a time t , . 3. Generation of the first critical crystal nucleus in the supercooled liquid volume V, consuming a time tic.
The appearance times I l l , t,, are determined by
I
111,'
V N d t = I,
(26)
with &' the appropriate nonstationary nucleation rate, given by Feder el al. (1966) as a relaxation-type expression:
N
=
f i s [ 1 - exp(-i)].
The asymptotic value Ns is the stationary nucleation rate (Porter and Easterling, 1981) at the temperature T
1
1 6 ~ 7 T: ~i 3 k T H i ( T - T,,)2 ' with f, the adsorption frequency of atoms at the nucleus, n the density of nucleation sites that coincides with the atomic density nA if nucleation is homogeneous. The exponential is the thermodynamic probability of generation of a critical spherical nucleus, with ylz the interface energy between phases 1 and 2, T,, melting temperature, H , the enthalpy of melting per volume, ( T - T,) the supercooling or heating and k Boltzmann's constant. The adsorption frequency .fa is usually written as
where ,fb is the mean atomic jumping frequency (roughly the Debye frequency), ED is activation energy for self-diffusion in the parent phase, w ( ~ 0 . 5is) the probability that the atom swings in a direction suitable for a transition, Dd is the coefficient for self-difiusion, and u is the lattice constant. In the case of a liquid parent phase, the diffusion constant Dd can be replaced by the viscosity coefficient v] according to Einstein's relation D = kT/6zv]u. The relaxation time z of the nonstationary nucleation rate in Eq. (27) is
270
0. EOSTANJOGLO
given for spherical nuclei as (Feder et al., 1966)
It turns out to fall below 1 ns for germanium films (Endruschat, 1986) at the relevant temperatures, so that nucleation is approximately stationary on the investigated time scale. The time spent to melt a volume V of superheated amorphous material by a liquid nucleus growing with velocity om is t , = fl/2tJm. As the boundary between the two disordered phases is diffuse, the propagation velocity of the melting front is given by classical kinetic theory (Porter and Easterling, 1981) as
where H,,, is the melting enthalpy per volume, u = 1 / Y n , is the mean atomic distance, and ( T T,,,) is the superheating of the amorphous phase. The two exponentials, exp( - E,/kT) and exp[-(ED A G ) / k T ] ,express the jump probabilities of an atom across the potential wall between two states of different stability, which is characterized by the difference in frce enthalpy T,,,. per atom AG = H,,,,(T - Tm,)/nA Once a supercritical crystal nucleus is formed in the supercooled liquid, it will continue to grow by one of the following mechanisms: migration of a diffuse phase boundary, dendritic crystallization, transverse growth with nucleation of ledges or dislocations on an atomically smooth boundary, or twin-assisted atomic attachment. The boundary of a crystal during growth or in the final stage was not observed to be either dendritic or regularly shaped. Therefore, crystal growth is believed to proceed mainly by atomic attachment to a diffuse phase boundary. The growth velocity is then given by an expression analogous to Eq. (31): ~
+
where H,,, is the melting enthalpy per volume of the crystal, is the viscosity of the supercooled liquid, and (T,,, - 7')is the supercooling. The propagation velocity Phas , a maximum at a certain temperature T,,, (Fig. 43). The reason is that at lower temperatures the diffusion slows down and at higher temperatures the supercooling, and, accordingly, the instability of the liquid is reduced. Moreover, propagation at temperatures T < T,, is unstable. If the
ELECTRON MICROSCOPY OF FAST PROCESSES
27 1
Temperature ( K ) Flci. 43. Velocity L’,of crystallization by cxplosive liquid-phase epitaxy in amorphous
germanium and propagation velocity I),,, of melting of the amorphous phase superheated by a laser pulse. T,,, and T,, are the melting temperatures of amorphous and crystalline germanium, respectively. (From Endruschat, 1986).
velocity is somewhat increased by an instability, the production rate of heat v CH,, by crystallization and the local temperature are also increased. This, in turn, causes a further increase of velocity due to duc/dT > 0. Correspondingly, an incremental transient depression of velocity leads to a steady decrease of the latter. On the other hand, operating temperatures above T,,, yield a stable velocity because of “negative feedback” du,/dT < 0.
3. Comparison qf the Model with Experiments on Germanium Films Applying the above theory to germanium, the following material parameters were used, as given by Aleksandrov (1983). (a, I, c, standing for amorphous, liquid and crystalline): yal = 0.04 J/m2, ylc = 0.18 J/mZ, H,, H,,
T,,
=
= 2.67 x
=
1.8 x lo9 J/m3,
lo9 J/m3,
1210 K, E D = 0.87 x
J, fD
=
l O I 3 Hz, and
n = nA = 4.4 x lo2’ m-3.
(i.e., homogeneous nucleation is assumed) and T,, = 850 K, as adopted by Baeri and Campisano ( 1982). The viscosity q( T )of supercooled liquid germanium was obtained by extrapolating measured values below T,, (Endruschat, 1986). The hottest region, i.e., region 1, is characterized by a dense nucleation ) ~crystal, which is to be used of crystals, with a volume V = 0.1 ( ~ m per
272
0. BOSTANJOGLO
.- 10-51
-
4-
aJ u Cld'O:
e
0
--
g
-1s:
aJ
a10
I
I
\
~
900
I
I
I
I 1
950
1
1
1
1
ld00
1
'
'
1050
Temperature ( K 1
I,,
FIG.44.Appearance times and t I c of the first critical liquid and crystalline nucleus in the superheated amorphous and supercooled liquid material, respectively. (From Endruschat, 1986.)
in Eq. (26) (film thickness d x 0.1 pm and crystal diameter 0,x 0.1 pm). Obviously, this region is transformed by the laser pulse into a slush consisting of liquid nuclei at mean distances 0,in an amorphous matrix. A total melting of region 1 is to be excluded, since this area would not crystallize then, due to additional heat-up by the liberated latent heat H,, by as much as H,,/p,c, % 1500 K ( p l and c1 are the density and heat capacity of the liquid). Figure 44 shows the appearance time t , , , t , , of the first critical nuclei in the volumes V. Their sum and, in particular, the melting time t , of the volume I/ may well account for the observed delay of crystallization. When the first critical nuclei have appeared, crystallization sets in by solidification of the supercooled liquid and by melting a part of the surrounding amorphous material. At lower laser pulse energies, where region 1 is heated only slightly above T,,, the observed velocity of crystallization is about 5 m/s, which in fact agrees with the computed value at T,,, as shown in Fig. 43. The decelerating effect of high laser pulse energies is also understood with the same figure. The film is heated to temperatures near the upper melting point T,,, where supercooling of the melt and, accordingly, crystal growth velocity u, drop to zero. The fast radial crystallization in region 2 may be understood as a continued explosive liquid-phase epitaxial process. This region is outside the main part of the Gaussian laser pulse and is not heated by the latter up to Tma.So it is still amorphous after the laser pulse. As region 1 crystallizes, heat diffuses outside, heats the adjacent amorphous material up to T,,, and melts it locally. An additional heat source is solid-state crystallization at the inner rim of region 2. As the temperature of the melt is not much above T,, initially,
ELECTRON MICROSCOPY OF FAST PROCESSES
273
crystallization starts with a velocity of u, x 5 m/s. The crystallizing material ejects latent heat of fusion at a rate of H,,v,, which is used to heat adjacent amorphous material to a temperature T 2 T,, and transform it to a supercooled liquid, whereby heat is dissipated at a rate of [H,, + paca(T,, - To)+ p , c , ( T - T,,)]7,,, with To the local starting temperature, and c and p the heat capacity and density. Because of the positive feedback du,/dT > 0 at T,, < T < T,,,, the crystal growth velocity u, rapidly settles at the first stable value, which actually is the maximum value u,,,,, x 12 m/s. This stationary state is attained quite independently from what has been going on in region I. This state and the high value of the growth velocity are exactly the observed features of region 2. The fast autocatalytic liquid-phase epitaxial crystal growth stops when the crystallization front runs into the area with a too low starting temperature To,such that the liberated heat of fusion H,, does not suffice to melt the amorphous material. As heat continues to diffuse out from the regions 1 and 2, the amorphous material in region 3 is slowly heated, whereby solid-state crystallization can be activated supplying additional heat. If melting of the amorphous film is achieved, explosive liquid-phase epitaxial crystal growth can again set in. But, now it cannot spread across larger areas. It stops periodically and starts again each time the slow heat diffusion catches on, giving the periodic crystalline band structure observed in region 3. The delay of crystal growth in this region is then determined by heat diffusion across region 2, being on the order of
with the width of region 2 being R x 5 pm and the thermal diffusivity of crystalline germanium D,, z 0.12 cm’/s. This theoretical value is in rough agreement with the delay observed to occur between the end of radial crystal growth in region 2 and the beginning of crystallization in region 3.
VI. SPACE-TIME RESOLUTION OF REAL-TIME MICROSCOPY In real-time electron microscopy, the image signal is collected in a single pulse. In order to reduce shot noise, a high-current electron pulse must be used for illumination. This, in turn, causes adiabatic heating of the specimen, since the energy deposited by the electrons is not dissipated by heat conduction, as in a stationary microscopy. Ultimate resolution, therefore, is determined by a compromise between shot noise and radiation damage. As reflection microscopy suffers more from chromatic aberration than transmission microscopy, a higher resolution may be reached with the latter. It will
274
0. BOSTANJOGLO
be discussed in the limiting case of a weakly scattering, thin-film specimen. The bright-field image current of a selected homogeneous specimen area of diameter D is then J
= -D2joenp( 71
-nAd{~2msinOdO).
4
(34)
where ,j, is the impinging electron current density, d is the film thickness, nA is number of atoms per volume, 0 is the effective atomic differential elastic scattering cross section (taking into account Bragg scattering), and c i is the half-aperture angle. A phase transition changes the scattering cross section and consecutively causes a change, AJ, of the image current. The image signal, as picked up by a detector with gain G, is J, = GJ. It is superimposed by noise with an average amplitude A phase transition is resolved when the phase-induced change AJs of the image signal is roughly three times larger than the noise, i.e., AJs 2 3 m . Now the current noise amplitude is composed of fluctuations of the gain and of shot noise of the image signal, respectively. Here A j is the detechaving the rms values n a n d tor bandwidth and e is the electron charge. The rms of the total noise amplitude is then
a.
d a ,
a
=
J
~ t 2eJAfG’ J ~x J m .
(35)
Since the detectors used are based on high-gain secondary electron emission, so that L?LGz x G >> 1 and, furthermore, shot noise and image current are of the same order of magnitude near the resolution limit, Eq. (35) simplifies, as indicated on the right-hand side. Therefore, the resolution limit is determined by
Inserting the minimum detectable rise/fall time At x 1/Af of a transition due to the finite bandwidth A f , one gets the condition for mutual space-time resolution: D2At k
72e n ( AJ/J)2j,’
-
(37)
being valid in the limit of weak scattering (nA do + 0). Assuming a change of contrast A J / J = 0.5 and a current density at the object j , x 10 A/cm2, which is achieved with a thermal tungsten electron gun, Eq. (37) states that phase transitions with durations A t 2 3 ns can be resolved in specimen areas down to D x 0.1 pm in diameter. The experimentally reached resolution is actually very near to this theoretical prediction (see Fig. 32b).
ELECTRON MICROSCOPY OF FAST PROCESSES
275
Space and time resolution can be mutually increased only by increasing the electron current density, j 0 , impinging on the object. But here radiation damage sets an absolute limit. As heat is dissipated by conduction on the microsecond time scale in thin films, nanosecond electron pulses deposit energy adiabatically. An illuminating electron pulse of duration t , 2 A t then heats the specimen of thickness d by AT according to 1 ~
t’
- j,. tpAE ~
~~
= cpdAT
1
2 -j,AtAE, e
(38)
where A E is the mean energy loss per beam electron. The simple Bethe stopping power formula (Reimer, 1984) gives a satisfactory approximation for the mean energy loss
dE=
Kptl,
(39)
where K is approximately a constant for atoms with mean atomic number K 2 5.10 l 3 J cni’/g. The maximum tolerable current density j,,,, is reached when melting sets in, i.e., AT 2 T,,
where the high-temperature limit c = 3 k i i t 1 , of ~ the specific heat of solids was used ( k is the Boltzmann constant; inA is the atomic mass). Combining Eqs. (37) and (40) gives the achievable absolute spatial resolution limit in the case of ad i a ha t i c e I ect ro n pu I se il Ium i n a ti on.
Inscrting mean values for mass and melting point of heavy atoms ( i n A 2 100 i ~ ~ ~ , Tll ~ ~2 ~ ,2000 , , ; K ) and a n optimistic value for the change of contrast AJ .I z 0.5. the spatial resolution limit is estimated to be Dmin= 4 nm. The absolute limit of time resolution Atnli,,,in the case of the continuous mode o f time-resolved microscopy, is givcn by the geometric sum Atnlin = I t j c , + <,:/ of the rise times of the electron detector tdcl and of the oscilloscope I,,, ( 2 0.35 11sfor 1 GHz bandwidth). Scintillator~pliotomultipliertubes and microchannel plates are suitable high-gain fast electron detectors. The latter presently have the shortest rise tinics, t,,, 2 0.2 ns, whereas the former have better dynamic and long period properties. and are useful for highrcsolution long period tracing. The rise time of the scintillator/photomuItiplier tubes i s limited by the response of thc scintillator. being at best t,,, = 0.5 ns f o r fast plastic material, and by the rise time f p M r of the multiplier tube. This can be dccreased to I,,,, = 0.7 ns by reducing the number of dynodes
276
0 . BOSTANJOGLO
and by careful wiring. Taking these steps, the minimum rise time of the scintillator/photomultiplier detector is tde, = z 0.7 ns. Consequently, using the fastest available electron detectors with high gain, the ultimate time resolution is determined mainly by the oscilloscope and is Atminz 0.4 ns.
d
w
VII. SUMMARY Sampling and one-shot techniques of time-resolved electron microscopy have been described and specific applications have been given for illustration. Periodical processes are suitably investigated by stroboscopic microscopes operated in the conventional or scanning mode. Switching of magnetic films and of discreet and highly integrated semiconductor devices can be imaged at clock frequencies up to several gigahertz, and periodical signals with transition times of several picoseconds can be resolved. Two complementary realtime electron optical techniques have been presented, which were developed by the author’s research group to cope with very fast non-repetitive processes. Either the intensity in the electron image is continuously traced with a photomultiplier and storage oscilloscopes, or short-exposure time images are taken with a gated image converter or a pulsed electron gun and deposited in a memory. Fast phase transitions induced by energetic laser pulses, such as explosive crystallization, forced melting, and vaporization can be traced in thin films and on the surfaces of bulk material by transmission or reflection electron microscopy. The resolution is limited by shot noise of the electron beam. Presently, phase transitions can be traced with a time resolution of a few nanoseconds in specimen areas down to a hundred nanometers. ACKNOWLEDGMENT The work of the author’s research group concerning real-time TEM and REM was partially supported by the Deutsche Forschungsgemeinschaft. REFERENCES Akhmanov, S. A,, Bagratashvili, V. N., Golubkov, V. V., Zgurskii, A. V., Ishchenko, A. A,, Krikunov. S. A,. Spiridonov, V. P., and Tunkin, V. G. (1985). Sou. Tech. Phys. Lett. 11, 63. Aleksandrov. L. N . (1983). P k y s . Stut. Sol. A76, 179. Aleksandrov, L. N. (1984).In “Progr. Cryst. Growth and Charact,” (Pamplin B. R. and Elwell D. eds) Vol. 9. p. 227, Pergamon Press, Oxford, New York.
ELECTRON MICROSCOPY O F FAST PROCESSES
277
Andra, G.,Geiler, H. D.. Gotz, G.,Heinig, K. H., and Woittennek, H. (1982). Phys. Stat. Sol. A74, 51 1. Appleton, B. R., and Celler, G. K. ed. (1982).“Laser and Electron Beam Interactions with Solids.” Mat. Res. Soc. Symp. Proc. Vol. 4, North-Holland, New York. Auston, D. H., Surko, C. M., Venkatesan. T. N. C., Slusher, R. E., and Golovchenko, J. A. (1978). A p p l . Phys. Lett. 33, 437. Baeri, P., and Campisano, S. U. (1982).In “Laser Annealing of Semiconductors.”(J. M. Poate and J. W. Mayer, eds.) p. 93 Academic Press, New York. Baeri, P., Herith, M. A., Russo, G.,Rimini, E., Guilietti, A., and Vaselli, M. (1985). Phys. Stat. Sol. 8130,225. Bauerle, D., (ed.) (1984).“Laser Processing and Diagnostics.” Springer Ser. Chem. Physics Vol. 39, Springer, Berlin. Bebb, H. B.. and Gold, A. (1966).Phys. Rev. 143, 1. Becker, R., (1951).J . Phys. Radium 12,332. Bergner, H., Bruckner, V., Leine, L., and Supianek, M. (1987). Appl. Phys. A43,97. Bethe, H. (1930).Ann. Physik 5, 325. Blacha, A., Clauber R., Seitz, H. K., and Beha, H. (1987). Electronics Letters 23, 249. Bokor, J., Johnson, A. M.. Storz, R. H., and Simpson, W. M. (1986). A p p l . Phys. Lett. 49,226. Bostanjoglo. 0. and Rosin. T. (1980a).J . Magnetism and Magnetic Materials 15-18, 1529. Bostanjoglo, O., and Rosin, T. (1980b). Phys. Stat. Sol. A57, 561. Bostanjoglo, O., and Rosin, T. (1981a). Phys. Stat. Sol. A65, KI 17. Bostanjoglo, 0..and Rosin, T. (I981 b). Phys. Stat. Sol. A66, K5. Bostanjoglo, O., and Liedtke, R. (1980).Phys. Stat. Sol. A60,451. Bostanjoglo, O., and Schlotzhauer, G. (1981).Phys. Stat. Sol. A68, 555. Bostanjoglo, O., Schlotzhauer, G.,and Schade, S. (1982). Optik 61,91. Bostanjoglo, 0.(1982). Phys. Star. Sol. A70.473. Bostanjoglo, O., and Hoffmann G. (1982). Phys. Stcir. Sol. A73, 95. Bostanjoglo, 0.(1983). Phys. Star. Sol. A76, 525. Bostanjoglo. O., and Horinek, W. R. (1983). Optik 65, 361. Bostanjoglo, O., and Endruschat, E. (1984). Phys. Stat. Sol. A82, K I . Bostanjoglo, O., and Endruschat, E. (1985). Phys, Stat. Sol. 91. 17. Bostanjoglo, O., Endruschat, E., and Tornow, W. (1985).90,457. Bostanjoglo, O., Endruschat, E., and Tornow, W. (1986). Mat. Res. Soc. Symp. Proc. 71,345. Bostanjoglo, 0..and Heinricht, F. (1986).Scanning 8, 146. Bostanjoglo, O., Tornow, R. P. and Tornow, W. (1987a).1.Phys. E.: Sci. lnstr. 20, 556. Bostanjoglo, O., and Heinricht, F. (1987). J . Phys. E.: Sci. lnstr. 20, 1491. Bostanjoglo, O., and Heinricht, F. (1988).91h European Congress on Electron Microscopy York, England, September, 1988. lnst. Phys. Ser. No. 93, 1.91. Bostanjoglo, 0..Tornow, R. P., and Tornow, W. (1987b). Ultramicroscopy 21, 367. Bostanjoglo, 0..Tornow, R. P., and Tornow, W. (1987~). Scanning Microscopy Suppl. I, 197. Brust, H. D., and Fox, F. (1985). Microelectronic Engineering 3, 191. Colliex, C., Cosslett, V. E., Leapman R. P., and Trebbia, P. (1975). Ultramicroscopy I , 301. Eberhardt, W., Brickman, R., and Kaldor, A. (1982). SO/. Stute Comm. 42, 169. Endruschat. E. (1986). PhD Thesis, Technische Universitat. Berlin. Everhart, T. E., and Thornley, R. F., (1960).J . Sci. Instr. 37, 246. Feder, J., Russel, K. C., Lothe, J., and Pound, G. M. (1966). Advances in Physics 15, I 1 I . Feuerbaum, H. P., and Otto, J. (1978).J . Phys. E.: Sci. lnstr. 11, 529. Feuerbaum, H. P. (1983).Scanning 5, 14. Fujioka, H., and Ura, K. (1981). Appl. Phys. Lett. 39, 81. Calvin. G. J., Thompson, M. O., Mayer, J . W., Hammond, R. B., Paulter, N., and Peercy, P. (1982). Phys. Rev. Lett. 48, 33.
278
0. BOSTANJOGLO
Geiler, H. D., Glaser, E., Gotz, G., and Wagner, M. (1982). Phys. Stor. Sol. A73, K161. Geiler, H. D., Glaser, E., Gotz, G., and Wagner, M. (1986).J . Appl. Phys. 59, 3091. Gmelin, (1978). “Handbuch der Anorganischen Chemie.” Wolfram Part BI, pp. 89- 170, Springer. Berlin. Giitz, G. (1986). Appl. Phys. A40, 29. GGtzeberger, A. (1955).Z . Phys. 142. 182. Golubkov, V. V., Petrov. V. I., and Spivak, G. V. (1979). Phys. Stut. SO/.A54, K I . Gopinath, A., and Hill, M. S. (1973). I E E E Truns. Electron. Dev. ED-20, 610. Gopinath, A,, and Hill. M. S. (1977).J . Phys. E . 10, 229. Van Gorkom, G. G. P., and Hoeberechts, A. M. E. (1987). Phillips Twh. Rev. 43.49. Gvosdover. R. S., Lukianov, A. E., Spivak, G. V., and Rau, E. 1. (1970). Proc. 7th Int. Conqr. Electron Microscopy, Grenoble p. 199. Hata, K., Ohya, R., Nishigaki, S., Tamura, H., and Noda, T. (1987). Jpn. J . Appl. Phys. 26, L896. Hosokawa, T., Fujioka, H., and Ura, K. (1978).Reu. Sci. Insrr. 49, 624, 1293. Ishitsuka, T., Takayanagi, K. Tanishiro, Y., and Yogi, K. (1986). Proc. 11th f n t . Congr. Electron Microscopy, Kyoto, p. 1347. Isaacson, M., and Johnson, D. (1975). Ultramicroscopy I , 33. Jamet, F., and Thomer. G. (1976).“Flash Radiography.” pp. 169-183, Elsevier, Amsterdam. Keldish. L. V. (1965).Soil. Phys. J E T P 20, 1307. Khaibullin, 1. B. ( 1984). P roc. Con/’. Energy Pulse Modijcution uf Semiconductors and R e l a r d Materials. Dresden, September 25-28, 1984, p. 14 (Hennig, K. ed.). Akad. Wiss. DDR, ZF. K. 555, Dresden. Koechner. W. (1976). “Solid State Laser Engineering.” Springer Series Opt. Sciences, Vol. I , Springer, Berlin. Koester, U. (1978). Phys. Stat. Sol. A48, 313. Koshikawa T., and Shimizu R.(1973).J . Phys. D6, 1369. Konishi, S., Ueda, M., and Nakata, H. (1975). I E E E Trans. Magnetism MAGII, 1376. Landau, L., and Lifschitz, E. (1935). Phys. Zeitschrft Sowjet. 8, 153. Larson, B. C., White, C. W. Noggle, T.S., Barhorst, J. F., and Mills, D. (1983).Appl. Phys. Lett. 42, 282. von der Linde, D., and Fabricius, N. (1982). Appl. Phys. Lett. 41,991. Lo, H. W., and Compaan, A. (1981). Appl. Phys. Leii. 38, 179. Lompre, L. A., Liu, J . M., Kurz, H., and Bloembergen, N. (1984). Appl. Phys. Leu. 44,3. Marcus, R. B., Weiner, A. M., Abeles, J., and Lin, P. S. D. (1986). Appl. Phys. Lett. 49, 357. May, P., Halbout, J. M., and Chiu, G . (1987). A p p l . Phys. Lett. 51, 145. Meinke, H., and Gundlach, F. W. (1968). “Taschenbuch d. Hochfrequenztechnik.” p. 862, Springer, Berlin. Menzel, E. (1981). PhD Thesis, Universitit Duisburg, Duisburg. Menzel, E., and Kubalek, E. (1979). Scunning Elecrron Microscopy, I, 305. Menzel, E., and Kubalek, E. (1983). Scanning 5, 103. Mineo, A., Matsuda, A., Kurosu, T., and Kikuchi, M. (1973). Sol. State. Comm. 13, 329, 1165. Mourou G., and Williamson, S. (1982). Appl. Phys. Lett. 41,44. Ogawa, S., Tanishiro, Y., Takayanagi, K., and Yagi, K. (1986). Proc. 11th Inl. Congr. Elecrron Micro~copy,Kyoto, p. 1351. Pirri, A. N. (1971). The Physics ~f Fluids I I , 3002. Pirri, A. N. (1973). T h e Physics oj’F1uid.s 16, 1435. Plies, E. (1982). Proc. 10th Int. Congr. Electron Microscopy, Hamburg, I, 319. Plies, E., and Schweizer, M. (1987). “Siemens Forsch.-u. Entwickl, Berichte,” 16, 30 (1987), Springer-Verlag. Poate, J. M., and Mayer, J. W., (eds.) (1982). “Laser Annealing of Semiconductors.” Academic Press. New York.
ELECTRON MICROSCOPY O F FAST PROCESSES
279
Porter, D. A., and Easterling, K. E., (1981). "Phase Transformations in Metals and Alloys." pp. 135. 186, 198, Van Nostrand. London. Reimer, L. (1985 ). "Scanning Electron Microscopy." Springer Ser. Optical Sciences 45, Springer, Berlin. Reimer, L. (I984). "Transmission Electron Microscopy." Springer Ser. Optical Sciences 36. p. 423, Springer, Berlin. Raizer, Y. P. (1965).Sor. Phq". J E T P Lerr. 21. IOO9. Rosser, R. J., Feder. R., Ng, A., and Celliers, P. (1985). .J. Microscopy 140, R P 1. Schief. R.. and Steiner. M. (1973).Optik 3, 761. Shank. C. V.. Yen. R., and Hirlimann. C. (1983). Phy.5. Reo. Left. 50, 454. Spaepen. D.. and Turnbull D. (1982). In "Laser Annealing of Semiconductors," (Poate J. M. and Mayer, J. W. eds.). p. 15, Academic Press. New York. Spivak. G. V.. Diikov, V. G.. Nevzorov, A. N., and Sedov. N. N. (1966). Proc. 6 t h In!. Conqr. E/ec,/ron Mic,rowop~.Kyoto. p. 2 IS. Szentesi. 0. 1. (1972).J . PAYS.E.:Sci. Instr. 5. 563. Takamori, T.. Messier, R.. and Roy, R. (1973)../. Mat. Sci. 8, 1809. Takaoka. A., Sato. K.. and Ura, K. (1986). Proc. l l r h Inr. Conyr. Electron Microscopy, Kyoto, p. 375. Telieps. W. (1983). P h D Thesis, Unrversitiit Clausthal, Clausthal. Telieps, W., and Bauer, E. (1985). Ulrrattkrosc~opy17. 57. T o x r . B. A. (1965).Phys. Rev. 137. 1665. Tsao. J. Y.. Picraux. S. T., Peercy. P. S.. and Thompson, M. 0.(1986).Appl. Phps. Letr. 48. 278. Wagner, M.. Geiler. H . D.. and Gotz G. (1985). l'k Weiner. A. M., Lin, P. S. D., and Marcus. R. B. (1987). A p p l . Phys. Lett. 51. 358. Williamson, S.. Mourou. G., and Li, J. C. M. (1984). Phys. Rev. L&. 52. 2364. Wiz,a. J. L. (1979).Nuc/.Insti-. & Merh. 162. 587. Wolfgang. E. (1983).Sctrnninq 5, 71. Wolfgang. E. ( 19x6). Proc,. I Ith Inf. Concqr. EIwLron Mirro.sc,opy. Kyoto, p. 177. Yoffa. E. 1. (1980).Phys. Rev. BZ1. 2415. Zanchi. G.. Sevely. J., and Jouffrey. B. (1978). Proc. 9rh I n r . Cbnqr. Elecfron Mic.rosc,opy. Toronto, p. 538.
This Page Intentionally Left Blank
ADVANCES IN ELtClRONlCS A N D ELtCTKON PHYSICS.VOL 76
High Resolution Transmission Electron Microscopy and Geology MARCELLO MELLINI Dipartimento di Scienze dell0 Terra Uniuersitd di Perugia. Italy
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 282 A . General . . . . . . . . . . . . . . . . . . . . . . . . . . 282
B. The Nature of Problems in the Geological Sciences . I1 . Technical and Experimental Aspects . . . . . . . A . Specimen Preparation . . . . . . . . . . . B. Electron Beam Damage . . . . . . . . . . C. Lattice Imaging . . . . . . . . . . . . . D . Structure Imaging . . . . . . . . . . . . E . Electron Diffraction . . . . . . . . . . . F . Chemical Analysis in the TEM . . . . . . . . 111. Structure and Microstructure of Minerals . . . . . A . Structural Determination by Electrons . . . . . B. Microstructures as Sources of Contaminated Data . C . Microstructures as Sources of Useful New Data . . IV. Structural Control Over Microstructure . . . . . . A . General . . . . . . . . . . . . . . . . B. Polytypism and Polytypic Sequences . . . . . . C . Polysomatism and Polysomes . . . . . . . . D . Modulated Structures . . . . . . . . . . . v. Mineral Reactions . . . . . . . . . . . . . A . General . . . . . . . . . . . . . . . . B . Weathering and Alteration . . . . . . . . . C. Diagenetic Processes . . . . . . . . . . . D . Metamorphic Reactions . . . . . . . . . . E. Metamict State and Radiation Damage . . . . . F. Exsolution and Subsolidus Phenomena . . . . . VI . Extraterrestrial Mineralogy . . . . . . . . . . A . General . . . . . . . . . . . . . . . . B. Meteoritic High-pressure Minerals . . . . . . C. Layer Structures and Hydrous Material . . . . D . Interplanetary Dust Particles . . . . . . . . VII . Conclusions . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . .
. . . . . . . . . . 283 . . . . . . . . . . 284
. . . . . . . . . . 284 . . . . . . . . . . 285 . . . . . . . . . . 287
. . . . . . . . . . 288 289
. . . . . . . . . . 291 . . . . . . . . . . 292 . . . . . . . . . . 292 . . . . . . . . . . 296 . . . . . . . . . . 296 . . . . . . . . . . 297 . . . . . . . . . . 297 . . . . . . . . . . 297 . . . . . . . . . . 301 . . . . . . . . . . 303 . . . . . . . . . . 305 . . . . . . . . . . 305 . . . . . . . . . . 305 . . . . . . . . . . 308 . . . . . . . . . . 309 . . . . . . . . . . 314 . . . . . . . . . . 315 . . . . . . . . . . 317 . . . . . . . . . . 317 . . . . . . . . . . 318 . . . . . . . . . . 320 . . . . . . . . . . 322 . . . . . . . . . . 323 . . . . . . . . . . 324
28 1
.
Copyright [r) 1989 by Academic Press Inc . All righls of reproduction in any form reserved. n- I 2-014676-2
282
MARCELLO MELLINI
I. INTRODUCTION A . General
When planning this review, I had to choose between two opposite approaches. The first one was the detailed description of a few selected cases, studied by the most sophisticated, highest resolution techniques. A possible example would be the complete textural, structural, and chemical characterization of extremely complex tiny objects, such as the core of a chrysotile fiber (Fig. l), by using ultrahigh resolution imaging as well as sophisticated microdiffraction and microanalysis techniques. The second choice was pointing to a broader description of research in geology, and of the possible ways in which electron-optical techniques might be profitably used. I concluded that accurate specialized reports already exist and that I would
FIG. I . The complex structure of a fiber of chrysotile asbestos, Mg,Si,O,(OH),, is characterized by the presence of spirally or cyhndricdily rolled layers with 7 A spacing. Amorphous regions as well as interrupted layers are present. The inner core of the fiber is often deformed.
HRTEM AND GEOLOGY
283
have never been able to explain the results obtained by previous authors better than the authors themselves. Therefore, the broad approach was chosen as more representative of the huge amount of variegate problems existing in geological research. A few of these problems may be addressed by electronoptical techniques, even if the existence of alternative or complementary tools should never be forgotten. I also adopted a wide meaning for the term high resolution electron micmscopy. Thus, from case to case, not only will structure imaging be considered, but also low-resolution conventional microscopy and other techniques ( e g , electron diffraction or analytical microscopy) may well supply useful pieces of information. In particular, the overall picture resulting from well-addressed, independent observations often restores an “image” that is more informative than any sequence of technically skilled pictures. B. The Nature of Problems in the Geoloyicnl Sciences
Geology deals with the status of terrestrial planets, the sequence of episodes by which this status was achieved, their possible future evolution, the nature of the mechanisms by which geological materials may be modified, and, finally, the nature of the materials themselves. Geologists describe the current appearance of widely different objects ( e g , atomic arrangements in a crystal as well as the distribution of continents over the Earth’s surface). Further, they try to understand the spatial and temporal relationships among those objects and the ultimate reasons for movement and transformation. The preliminary problem is to decide which features should be taken into account and which ones should be neglected as completely irrelevant with respect to the current problem. For instance, the movement of a tectonic plate as large as Africa would hardly be approached by starting just from the movement of dislocations within a submillimetric crystal. Yet, geologists may perhaps be required to explain either movement or to guess how the movements of tectonic plates might be related to phase transitions, or whether and how phase transitions might be important to understand the origin of deep-focus earthquakes. Obviously, there are too many topics for only one scientist to study. Specialization becomes necessary, for instance, enlisting as a field geologist rather than as a laboratory specialist. First one attempts to understand large-scale phenomena, based mostly on field observations. Most often, these geologists are not directly involved in any kind of transmission electron microscopy (TEM) work. The laboratory geologists prefer to spend time and money under a roof, supposedly studying the ultimate chemical, physical, and structural reasons for large-scale phenomena. TEM is valuable, from this latter point of view, both to describe the present status of minerals and rocks, as well as to guess how small features might perhaps explain largescale phenomena.
284
MARCELLO MELLlNl
11. TECHNICAL AND EXPERIMENTAL ASPECTS
The range of TEM techniques useful to the Earth Sciences does not differ with respect to Solid State Physics, Metallurgy, or Inorganic Chemistry. Therefore, reference to technical aspects can be found in common general textbooks (e.g., Spence, 1981; Williams, 1984).However, owing to the specific nature of objects and problems, a few points need to be emphasized.
A . Specimen Preparation A specimen may consist of a single mineral (that is, a solid body with a given crystal structure and a given chemical composition), or it may be an assemblage of several minerals (perhaps a rock fragment). In spite of the existence of many compositionally different minerals, most of the TEM work up to now, has been devoted to silicates and, subordinately, to carbonates and sulfides. Both crushing and ion thinning have been largely used for the preparation of specimens for high-resolution work. Crushing in an agate mortar quickly produces several specimens and, as specimen drift is often neglectable, it may be profitably used for high-resolution structural imaging. Major drawbacks are due to the possible introduction of mechanical artifacts, the limited extent of electron-transparent regions and the rapidly variable thickness, or the number of accessible projections limited to the most important cleavage planes. Alternatively, chips can be drilled out from a standard petrographical, optical thin section and ion thinned to electron transparency by ion beams. The latter technique is extremely valuable for the study of geological specimens. In fact, it offers easy correlation between the observed TEM microstructures and other well-standardized observations (e.g., optical microscopy or electron microprobe analysis). Wide transparent regions may be obtained, no matter what the specimen orientation and composition, even for materials as difficult as cross sections of asbestiform silicates (Fig. 2). Finally, the textural relationships among interlocking grains are more evident than in small crushed fragments. The drawbacks are the long time required for preparation (up to 150 hours have been reported for the most difficult specimens), and the likely occurrence of specimen drift during highresolution imaging. Special care may sometimes be required and skilled technique may become necessary. Possible difficult cases arise when it is necessary to prepare and observe, along a specific direction, a given submillimetric crystal (perhaps previously analyzed by single-crystal X-ray diffraction and the electron microprobe) or when only a very limited amount of dust is available.
HRTEM AND GEOLOGY
285
FIG.2. In spite of the beam sensitivity, a fiber bundle of chrysotile has been successfully ionthinned along a section perpendicular to the fiber axis. Wide transparent regions, able to produce lattice images, are evident. Amorphization is almost absent and no separation among individual fibers occurred.
However, apart from the many possible ways in which a satisfactory specimen can be prepared, the quality of specimen preparation will subsequently affect the attainable results in a dramatic way. In the writer’s experience, several frustrating days spent wasting time, money, and one’s brain within the TEM room, together with unworkable specimens, can be avoided just by spending a few minutes more in careful specimen preparation. B. Electron Beam Dumage
Geological materials may undergo important electron beam damage during observation. Most often, damage appears as a crystalline-toamorphous transformation or, less commonly, as a crystalline-to-crystalline transformation (Veblen and Buseck, 1983). Beam damage may also produce artifacts that simulate primary structures. Even if not completely generalizable to every substance, the increasing water and silicon content seems to
286
MARCELLO MELLINI
Frc;. 3. Electron beam damage in quartz, SiO,, was determined by a focused beam. The two damaged regions are now completely amorphous, as revealed by the absence of contrast features. The mottled area all around the glassy region is due to incipient damage.
promote the degradation of the specimen under the electron beam (Fig. 3). Also, the kind of vacuum existing within the microscope may affect the damage rate. For instance, clean vacuum environments, such as those obtained by ion getter pumps, give greater specimen stability than instruments equipped with oil diffusion pumps. In any case, apart from the detailed reasons for beam damage, some trick must be used to minimize the effects. The first trick is the widely known use of a carbon coating film, carefully prepared to ensure effective dissipation of thermal heat and electrical charge. Also, after orientation, aperture centering, focusing, astigmatism correction, and a few further minor controls, quick movement from the now-damaged region to the fresh adjacent one may supply the desired pictures (hopefully, all the imaging parameters should still be the same, especially if subsequent image matching is scheduled). Specific treatments may also be proposed, such as cation exchange and overnight degassing of the specimen within the TEM, as is done in the case of the highly unstable zeolites (Bursill et al., 1981). Last but not the least, one should remember that people working with light usually do it in the darkness. Therefore, the observations should be performed using the lowest electron
HRTEM AND GEOLOGY
287
irradiation still compatible with a sufficient visibility of the specimen. An image amplifier and a video recorder may be extremely valuable. For instance, very weak electron beam techniques have revealed that the numerous lenticular fissures observed in the sodium mica paragonite, rather than a primary microstructure, are a secondary feature caused by electron beam damage (Ahn et al., 1986).
C. Luttice Iniuging Often, different but related minerals consist of basic modules, repeated by different stacking operators to produce the different structures (e.g., polytypes and polysomes). The crystallographic features of the structural variants may be predetermined or in some way known. As these different structures can sometimes be distinguished by comparison of the unit cell parameters, just a monodimensionally resolved lattice image, together with the corresponding electron diffraction pattern, may be sufficient to reveal the details of phases distribution throughout the specimen (Fig. 4). The technique is highly
FIG.4. Monodimensional image of the silicate mineral guarinite. At least two different structures. with 10 A and 20 A periodicities. respectively, alternate with each other in two directions according to a completely chaotic distribution.
288
MARCELLO MELLINI
informative, easy, and does not require excellent performance of the system microscope and microscopist because of the usually large periodicities occurring in minerals (often on the order of 10 A or more). Therefore, this relatively undemanding technique should be widely used, even if caution is surely required to avoid possible structural misinterpretation. For instance, Mellini et al. (1981) reported two pictures taken on the same crystal of the mineral apuanite. Both pictures showed the occurrence of two faulted regions within a regular matrix. However, whereas the matrix shows half-periodicity 9 A fringes in the image recorded at 1200 A underfocus, the true 18 A periodicity appears in the nearly focused image. The apparent occurrence of different multiple periodicities may be easily explained by using the dynamical diffraction theory (Gjionnes and Moodie, 1965)or the contrast transfer theory (Spence, 1981),and does not need any more comment, apart from advice for care and awareness, before stating that one has observed a phase transition in the microscope. D . Structure Imaging
In a very limited number of cases, high-resolution pictures can be considered to be projections of the potential that deflects the impinging electrons. Therefore, these pictures may represent not only the periodicities present in the crystal, but actual projections of the crystal structure itself. Frequently, the imaging parameters are far from the ideal values for structure imaging and straightforward interpretation is no longer possible. The interpretation then requires a comparison between images obtained under known conditions and images calculated on the basis of an assumed structural model. However, if the various experimental parameters are really known, these images convey just the same amount of information as the directly interpretable ones. Therefore, both types of pictures can be considered to be actual structural images, rather than lattice images only, in that they can supply not only periodicities but structural information too. Even if more difficult than uncontrolled lattice imaging, structural imaging becomes absolutely necessary in several cases. An obvious example is the use of high-resolution imaging to determine, at the atomic level, unknown structural arrangements. However, more subtle occasions for misinterpreted lattice images also exist, for example, while trying to decipher the right stacking sequence within a faulted crystal. Amouric et al. (1981) performed extensive calculations of the expected images for several polytypes of mica under different imaging conditions. They showed that even though the best imaging parameters were abandoned, two-dimensional images with detailed contrast were still obtained. However, the apparent stacking sequences were
289
HRTEM AND GEOLOGY
- 800
A
DEFOCUS
- 400
A
50 A THICKNESS 100 A FIG.5. Computed images for 2M1 muscovite. Whereas the left picture reveals the true stackingsequence, a false I M stackingsequenceappears in the right picture. (From Amouricet al., 1981.)
completely erroneous, and further misleading contrast was also produced (Fig. 5 ) . Taking into account the development of computing systems and image treatment within the last few years, it now seems appropriate to always perform extensive computations, even if structural imaging was not scheduled in the beginning of the experiment. This is because image computations may assist in finding the best imaging conditions for the required resolution, and it is possible that undetected sources of misinterpretation may be avoided.
E. Electron DifSrraction A good working knowledge of crystallography, together with careful examination of the electron diffraction patterns both at the microscope as well as on the plates, may help in planning successive work. In fact, the diffraction patterns convey unbiased structural information from crystal volumes larger than regions with bidimensional resolution. Therefore, the attention paid to such features as shapes of the diffraction spots, existence of streaking, direction of streaking, and occurrence of satellite reflections may constitute an excellent starting point for the more demanding high-resolution work. For instance, Fig. 6 shows the complex pattern arising from overlapping [OlO] antigorite and [OOl] clinopyroxene. The latter mineral is characterized by sharp, widely spaced spots and is transforming to antigorite, according to an
290
MARCELLO MELLINI
FIG.6 . Selected area electron diffraction pattern from an association of clinopyroxene CaMgSi20, (stronger spots) and antigorite Mg,,,Si,O,(OH),,, (weaker closely spaced spots). Streaking along c * of antigorile arises from lamellar association of twin related domains. (See also Fig. 18.)
oriented transformation requiring [OlO] of antigorite almost parallel to [OOl] of clinopyroxene. Within the electron diffraction pattern of antigorite (closely spaced spots), both modulated structures and twins can be observed. Attempts have been made to develop more powerful, quantitative electron diffraction techniques. One of them is convergent beam electron diffraction (CBED).CBED represents a very valuable development in electron diffraction analysis, as it offers several advantages over conventional selected area electron diffraction. For instance, the diffracting area is defined by the spot size only (which can go down to a few tens of angstroms in nondedicated instruments), thicker crystals can be analyzed and thickness can be determined, accurate measurement of the lattice parameters is possible even along the observation direction, the point group symmetry rather than the centrosymmetric Laue symmetry can be determined, dynamic effects (leading to diffracted intensities along kinematically forbidden directions) can be checked, and true systematic extinctions can be recognized. Even if the technique is now in progress and applications to Earth Sciences are still scarce, important results may be expected anyway. Possible applications are the
HRTEM AND GEOLOGY
29 1
diffractometric characterization of small volumes, space group determinations in ambiguous cases, or its use for thickness determination in thin-film microanalysis (Champness, 1987 1.
F. Cheniical Analysis
iii
the TEM
The most popular way to perform chemical analysis in the TEM is based on energy dispersive spectrometry (EDS) of the X-rays emitted from the specimen. Apart from obvious meaning as a high-spatial resolution probe (Williams, 19841, the extensive use of EDS in the TEM (analytical electron microscopy, AEM) offers more detailed knowledge of phase relationships and textural arrangements within multiphase assemblages. Also, AEM may avoid spending time trying to align a region that will come out to be other than what was desired, while studying complex systems. Whereas the use of X-rays for AEM is limited to elements heavier than sodium, light elements can be analy7ed by using electron energy loss spectrometry (EELS).Furthermore, the fine structure of the EELS spectra may also give information on the oxidation states of elements such as Fe, Ti, or Mn, as well as on the coordination number of those cations (Otten and Buseck, 1987). This spectroscopic information is particularly important from the point of view of the geological sciences. In fact, the knowledge of oxidation states is extremely important to determine the conditions of formation of the mineral (e.g., whether it was deposited within a sea basin or a continental basin, what was the depth of that basin, and what were the most important postdeposition events). Coordination has always been a matter of controversy among the different specialists. For instance, Xray crystallographers and Mossbauer spectroscopists have been rarely in agreement about questions such as the occurrence of ferrous rather than ferric iron, perhaps in octahedral rather than in tetrahedral coordination. Therefore, any new approach capable of dealing with this kind of problem is surely welcome, especially when we consider the small amount of material necessary for TEM investigation and the contemporaneous access to imaging and diffraction data. Biased collection of EDS and EELS spectra may be due to the existence of channeling effects that are able to modify the intensity distribution among the different analytical lines (Fig. 7). On one hand, this aspect constitutes a drawback for thin-film microanalysis in the TEM, and suspicious data should be checked by first modifying the specimen orientation by a few degrees and later acquiring a new analysis. On the other hand,just as the channeling eflects are position sensitive, they may become an accurate probe for site-resolved chemical determinations, namely, for the spectrometric determination of cationic distributions among similar crystallographic sites (Otten and Buseck,
292
MARCELLO MELLINI [ 311 1
12111
[Ill I I
c ~
I
1
2
1
2
1
2
keV
FIG.7. X-ray emission spectra for garnet pyrope, Mg,AI,(SiO,),. Intensity ratioes for the Mg, Al, and Si lines change with the different specimen orientations. (From Otten and Buseck, 1986.)
1986). As cation ordering is a possible source of data on the thermal history of rocks, channeling-based techniques surely deserve further testing in several groups of minerals where chemical substitution is common. 111. STRUCTURE AND MICROSTRUCTURE OF MINERALS
A . Structural Determination by Electrons
From the classificatory point of view, a mineral is fully defined when the crystal structure and the chemical formula are known. Nothing more needs to be known. Moreover, many important properties can be derived starting from a good knowledge of the average atomic arrangement within the unit cell. Single-crystal X-ray diffraction has surely been the most powerful tool for structural analysis, at least as far as a sufficiently large, ideally imperfect crystal is available and defective structures are negligible. However, in several cases electron diffraction and HRTEM have also been valuable for determination of the average structure. In order of increasing complexity, three principle possible cases can be schematized: 1. The crystal consists of a matrix in which several faults occur. Based on the known structure of the matrix, it may be possible to propose tailored modifications to that basic structure in order to explain spacings, image
HRTEM AND GEOLOGY
293
contrast, and chemical composition of the defective areas. Several examples occur in the literature and will be referred to later, for example, in the paragraphs on polytypism and polysomatism. 2. A second possibility arises when X-ray analysis offers only a rough knowledge of the crystal structure and the resulting model appears to be contradictory. For instance, during the study of niocalite, NbCa,(Si,O,),O,F, several results of X-ray refinement indicated the collection of biased structural factors (e.g., a relatively poor level of refinement, ill-defined atomic temperature factors, poorly reliable bond geometry). The cause for that failure was detected by HRTEM, which showed the extensive occurrence of polysynthetically twinned domains, with overlapping diffraction spots from twin related domains (Mellini, 1982). The most important consequence was that, whereas the X-ray data were indicating a disordered distribution of niobium and calcium cations, the actual structure consisted of twin related domains, each of which were characterized by an ordered distribution of calcium and niobium cations. A slightly more complex case was orientite, Ca,Mn,,[(SiO,),(Si,O,,), (OH),,]4H20 (Mellini et al., 1986). On the basis of X-ray data, two slightly different structural models had been proposed for that mineral, and both the models required the occurrence of 9.5 A [OOl] periodicities within the [OlO] projections. However, those models do not match the lattice images, as these invariably showed the occurrence of an evident 19 periodicity (Fig. 8). Based on those observations, a new model, hybrid between the two previous ones, was proposed. By using this third HRTEM model, not only were
F I G 8. Comparison between observed and computed structural images for the silicate mineral orientite.
294
MARCELLO MELLINI
FIG.9. Fiber texture of the asbestiform silicate carlosturanite, Mg21Si,,028(OH),,H20(st). The mineral is closely intergrown with chrysotile (ch) and brucite (b) Mg(OH), . Polysomatic faults, due to the occurrence of variable chain width, occur within carlosturanite.
structural images and chemical data fully accounted for, but also a lower residual was obtained in the X-ray refinement. 3. The third most difficult case arises when the small dimensions of the crystal, or the important disorder within it, prevent the possibility for any X-ray work. Asbestiform material is a n excellent example of such a situation. For instance, carlosturanite. ideally Mg,, Si , 2 0 Z B ( O H ) 3 4 H 2is0 ,a water- and magnesium-rich, silicon-poor, serpentine-like asbestiform mineral. The fiber texture consists of randomly rotated fibers, approximately 1000 A in cross section and intermixed with smaller chrysotile fibers (Fig. 9). By T E M analysis, not only was the unit cell found, but also a plausible structural model was proposed (Mellini et al.. 1985).The model was based O I Lthe ideal structure of serpentine, from which carlosturanite differs by the presence of infinite rows of vacancies in tetrahedral sites that produce triple silicate chains. This model explained the physical and chemical properties of the mineral and led to a fitting of observed and calculated images. Moreover, the model allowed the prediction o f a whole family of defect structures, and these structures actually occii rred within the specimens. O n c more example is given by the similar minerals balangeroite and gagcite. Thcy havc an ideal composition of M,,O,(OH )4,,(Si4012)4, with
295
HRTEM AND GEOLOGY
-
FIG. 10. (a) Interlocking fibers of balangeroite, M 4 2 0 h ( O H ) N ~ ( S i 4 0 , 2 ) 42000 r A in cross section and randomly rotated. Parallel Wadsley defects occur within the fibers. (b) Enlarged view of two adjacent balangeroite fibers. (c) Structural model of balangeroite, consisting of 3 x 1 and 2 x 2 octahedral walls, which define [OOl] channels where silicate chains (dotted triangles) are located.
A4 = Mg, Mn, Zn, and they too have an asbestiform habit. TEM study showed
that balangeroite is monoclinic, whereas either monoclinic and triclinic gageite exist. The structural model was obtained by exploitation of electron diffraction patterns and structural images (Ferraris et al., 1987). That model (Fig. 10) was later used as a starting point for the conventional X-ray refinement of an exceptionally good gageite crystal, and the refinement converged to R = 0.049 for 871 reflections. The previous different examples demonstrate that structural analysis in solid state chemistry and mineralogy can also be led by electron diffraction
296
MARCELLO MELLINI
and structural imaging, even if in an unconventional way. This kind of approach is perhaps just in its embryonic stage, and further successful development may be expected (Cowley and Smith, 1987). B. Microstructures as Sources of Contaminated Data As minerals are often more complex than definitions, we need to know several more things. Important deviations from the ideal crystal order frequently occur and may even modify the macroscopic behavior of the materials. Bulk properties other than those expected are observed. Sometimes, as contamination is easily recognized, it does not constitute a problem any longer. In other cases, contaminating agents may be insidious and may escape identification. The intercalation of different structural types is common in rock-forming minerals and may result in biased data, for instance, contaminated microprobe analyses. Hopefully, as we expect that contamination effects will be recognized by the skilled analyst, we may assume that contamination will not be a severe problem, at least as long as only the most abundant elements are considered. Extreme caution will be required, however, when interested in trace elements and trace element partitioning between coexistent phases (as is often required for the geochemical study of magma evolution). In fact, the occurrence of extended defects will affect the trace element distribution, especially when modified crystallographic sites are produced at the boundaries between intergrown structures (Veblen, 1985a). Due to the small amounts of trace elements (from parts per billion, p.p.b., to parts per million, p.p.m.), even very few faulted regions may behave as effective garbage baskets, where elements incompatible with the matrix are easily admitted. Thus, a good knowledge of the microstructural features present in the specimen and able to modify the bulk properties is often required in order to avoid biased or misinterpreted analytical data. As geological modeling is also based on laboratory observations, it seems improbable that any correct interpretation of the geological framework may result from poorly understood data alone. C. Microstructures as Sources of Useful New Data
Deviations from the ideal crystalline state are sometimes more important than the crystal order itself, at least from the point of view of the thermal history of the specimen, of the deformative events, and of the geological interpretation. This is because the occurrence of a given microstructure strongly depends on the nucleation conditions and/or on the postcrystallization
HRTEM AND GEOLOGY
297
subsolidus evolution. Microstructures may therefore be important indicators of the thermobaric evolution of minerals and rocks. Several examples on the meaning of microstructures as petrogenetic indicators can be found in one of the many review papers now available (e.g., Buseck, 1983; Veblen, 1985b; Zussman, 1987), as well as in the following pages.
IV. STRUCTURAL CONTROL OVERMICROSTRUCTURE A. General
Having previously sketched the importance of defects in minerals and rocks, we are now in a position to attempt a more detailed description of microstructures, of how they look at the TEM scale, and of why they are produced. Microstructures may be defined as defective arrangements in which the three-dimensionally infinite periodic order of the crystal was lost, due to variations in the environmental conditions. The episode was perhaps an abrupt change in the composition of the melt where the crystal was growing, or a subsolidus, stress-induced reaction. Generally speaking, both growth and deformation defects can be expected in minerals. In the case of extended defects, no matter what the origin, we can expect that some positive term will be added to the free energy content of the system, and this increased G value will decrease the stability of the system. It is possible that the defective arrangement may metastably survive or, otherwise, may further react. If microstructures have to persist within the crystal, the corresponding free energy increase should be as small as possible, especially if the material is submitted to an annealing process rather than abruptly quenched. Therefore, the possible existence of some kind of structural control over possible defects may be expected. Any successful defect will produce the least perturbation within the host matrix. This type of approach to defects has largely been emphasized through 15 years of HREM research on inorganic compounds. In fact, whereas prior to HREM work emphasis was mainly placed on the discordant nature of the defective region with respect to the surrounding matrix, HREM has instead stressed the importance of concordant interfaces. B. Polytypism and Polytypic Sequences
Using the most simple and least general definition, polytypes may be defined as structural variants built up by common structural modules that are repeated according to different stacking vectors. The most familiar example
298
MARCELLO MELLINI
comes from the close-packing theory, with AB (closely packed hexagonal structure) and ABC (closely packed, face-centered cubic structure) as basic polytypes. A common, even if unnecessary, feature of polytypes is the occurrence of two common lattice parameters (those describing the periodicities within the basic module), whereas the third parameter, variable in either magnitude or direction, defines the different stacking sequences. All the possible sequences contain discrete numbers of basic modules. Even if geometrically simple, polytypism has been a matter of controversy for a long term, and very basic questions have not yet attained satisfactory answers. We still lack a general explanation that is able to deal with several substances under widely variable pressure and temperature conditions, rather than with the few given cases. We cannot state how large (if any) the energy difference between two given polytypes is; how polytypism may be affected by small compositional differences; why sometimes different polytypes seem to occur under different P and T conditions and why, in other cases, they occur all together; which are the parameters responsible for ordered or disordered stacking sequences; how periodically repeated sequences as long as 10,000 8, can exist; and what is the memory mechanism able to duplicate them (Baronnet, 1980). As polytypic behavior occurs in almost every kind of compound, satisfactory answers to those questions would be widely welcomed in several fields of solid-state science. Confining ourselves to geology, one more specific question is whether the polytypic variability may be used as a sensitive indicator for P, T conditions. Polytypism is also important because it is common in rock-forming silicates, e.g., in mica, serpentine, and chlorite. The reason is the particular structure of those layer silicates, which is based on the occurrence of an hexagonal net of silicon tetrahedra linked to a continuous layer of edgesharing (Mg, Fe, Al) octahedra. As adjacent layers may be rotated around each other by 60" or multiple values, different stacking sequences may be produced that still have the same nearest-neighbor interactions. Even if the statistical description of the polytypic behavior in layer silicates had already been achieved by both powder and single-crystal X-ray analysis, the early HRTEM reports were quite impressive and renewed interest in the theoretical understanding of polytypism as well as in the actual description of polytypic sequences in real crystals (Amouric et al., 1978; Iijima and Buseck, 1978). In fact, it ended up that the different polytypes of mica can be distinguished just by looking at the structural image and that the correct stacking sequence can be determined from layer to layer (Fig. 1 I). Several alternative arrangements were found from highly ordered sequences with a limited number of stacking faults or according to completely disordered sequences. The faulted regions might perhaps correspond to an already known basic polytype of mica, as well as to a hitherto unreported sequence.
HRTEM AND GEOLOGY
299
FK;.I I Silicate layers, 10 A in thickness. alternate according to a two-layer stacking sequence in muscovite 2M1 (From Arnouric et al., 1981.) Whereas early HREM work was mostly devoted to elucidating the crystallographic aspects of polytypism, later applications paid attention to the possible petrogenetic meaning of polytypism. The aim was to use this as a possible tool for further distinction within the stability field of mica, namely. t o accurately determine the environmental conditions existing during formation, o r weathering, of the host rock. Amouric and Baronnet (1983) analyzed the possible effects of growth conditions over final stacking sequences and suggested that the very early nucleation stage might be important to determine the final sequence. Therefore, by careful choice of the hydrothermal synthesis conditions, they were able to observe just the embryonic stage of the nucleation of mica. With the increasing nucleation temperature. from 355 C to 630 C, the ?MI polytype became more abundant and stacking disorder between 1 M and 1 M I polytypes decreased. in keeping with the held evidence on thc evolution of mica under prograde metamorphic conditions. Increasing supersaturation acted in the same way a s the decreasing temperature. Furthermore, it was delinitely shown that the basic I M and 2M 1 structures of mica were actually as-nucleated structures. rather than determined by later spiral growth mechanisms. Combined with previous results, those H R T E M data contributed to give a continuoiis picture of mica polytypism, from the early nucleation stage up to the macroscopic development o f the crystal. Progressively more sophisticated studics have been produced later. In particular. as mica is a metamorphic mineral and the growth of metamorphic minerals is largely controlled by tectonic deformation. the effect of stress has been considered. Various dislocations and stacking faults have been observed in naturally deformed micas and havc been used t o explain the occurrence of complex polytypic sequences (Amouric. 1987). Also. intercalation of units
300
MARCELLO MELLINI
different from the basic mica layer may occur during growth and deformation events in layer silicates, thus producing important chemical changes (Amouric et al., 1988). Polytypism is not confined to layer structures only, as several other minerals are conveniently described by stressing their polytypic nature (even if no actual layer occurs in the structure). Examples can be found among group silicates, chain silicates, framework silicates, sulfides, and so on. In all those
FIG. 12. (a) Stacking faults in sursassite (s) arise from the occurrence of pumpellyile (p) lamellae. (b) Crystal structure of pumpellyite. (c) crystal structure of sursassite.
HRTEM AND GEOLOGY
301
cases, HRTEM reveals the most subtle details of the deviations from the ideal crystalline state. For instance, although successful, the X-ray cryslal structural refinement of sursassite, Mn,A1,[(Si04)(Si,07)(OH)3], produced results variable from crystal to crystal and pointed to the presence of some kind of disorder. Electron diffraction and lattice imaging showed that the crystals actually consist of thin intergrown lamellae (Fig. 12). Comparison with the calculated images shows that disorder occurs by the presence of common structural slabs, misplaced by l i 2 ( a h) with respect to the normal c stacking of sursassite. In particular, when regularly repeated, this vcctor would produce the crystal structure of anothcr group silicate, pumpellyite Ca,A1,[(Si04)(Si,0,)(OH)3]. The two structures are compared in Fig. 12, which shows the occurrence of common ‘‘layers’’ in the two minerals. Disordered crystals can, therefore, be considered as consisting of a matrix based on the stacking sequence of sursassite, but Faulted by episodical occurrcncc of pumpellyite stacking vectors (Mellini et al., 1984). A tentative explanation for faulting in sursassite was the possible presence of local chemical heterogeneities. In particular, the low amount of calcium substituting for manganese in sursassite (0.59 atoms per formula unit) might be heterogeneously distributed across the crystal and, more specifically, concentrated just in the regions where the stacking sequence of purnpellyite occurs. The gcncration of faulted lamellae would then be promoted by chemical control mechanisms. Usually, the pumpellyite lamellae are thinner than I00 8, and, more often, on the order of 20-30 A. Therefore, as reliable analytical electron microscopy on those lamellae was not possible at the time of the investigation, the hypothesis of chemical control was not supported by any chemical data and should not be considered too seriously. Howcvcr, apart from what really occurs, sursassite is an example of the kind of problems in which we may be interested in the future. In fact, the successful combined application of high-resolution imaging and high-resolution spectrometry may better clarify the possible relationships between observed microstructures and observed chemical heterogeneities. The aim would be to understand not only which growth defccts occur. but also why they occur.
+
C. Polysomatisrn and Polysomes
Polysomatic series may be considered as a further generalization with respect to polytypic families. Whereas polytypes result from the different combinations of modules with common composition, e.g., A, polysomes A,B, result from the variable combinations of two different modules, A and B. Even if the basic ideas that underlie the polysomatic approach to structural chemistry may be found scattered throughout a literaturedating back to earIy
302
MARCELLO MELLINI
1950s (Ferraris et al., 1986), the theory has become widespread only after the advent of TEM. In fact, many of the extended defects shown in the highresolution images are prone to polysomatic description. The most famous polysomatic series, at least in the mineralogical literature, is perhaps the so-called biopyriboles series (Thompson, 1978; Veblen and Buseck, 1979). The biopyribole minerals are chemically and structurally intermediate between pyroxene, Mg2Si20, (P), and talc, Mg,Si,0,,(OH)2(MP), with amphibole Mg,Si802,(OH)2 (MP) as the most important intermediate term. The crystal structures can be schematized by recalling the presence of single tetrahedral chains in pyroxene, double tetrahedral chains in amphibole, and infinite tetrahedral sheets in talc. Nonconventional biopyriboles have more complex tetrahedral ribbons, such as triple silicate chains (M,P), alternating double and triple silicate chains ( M P M 2 P = M,P2), and so on. The connections between tetrahedral chains and octahedrally coordinated cations gives rise to the so-called I-beams, responsible for much of the contrast in the structural images. Biopyriboles easily produce closely intergrown associations, normally by metastable hydration reactions of pyroxene or amphibole in retrometamorphosed rocks (Nakajima and Ribbe, 1981; Veblen and Buseck, 1981; Whittaker et al., 1981; Akai, 1982a). Most probably, up to now, several million unit cells of biopyriboles have been imaged by HRTEM, all of them pointing to quite constant relationships. An example is given in Fig. 13: Amphibole is substituting for pyroxene. Three amphibole lamellae, each of them one unit cell thick along [OlO], continuously cut across pyroxene. The large white dots correspond to the “I-beams’’ formed by two double chains in amphibole. The small white dots refer to the smaller “I-beams’’ of the single-chain pyroxene structure. Thc [Ool] electron diffraction pattern in the inset shows sharp spots for the pyroxene matrix, whereas the diffraction effects from amphibole appear as weaker and diffuse streaks. Several important points have been stressed, such as:
-
1. Hydrous biopyriboles may occur as local faults within an ordered matrix. Otherwise, they may concentrate in particular reactive regions, arranged according to completely disordered or partially ordered sequences. Several possible different sequences have been observed, and it has been questioned how many times they should repeat before being considered an ordered polysome, rather than a random fault. Veblen and Buseck (1979) computed the probability by which an ordered sequence may occur in a crystal after random placement of structural modules and proposed to use a threshold probability value to accept the given sequence as significant rather than random. 2. Within the faulted crystals, the different polysomes coexist according to a fixed number of geometrical conditions (Veblen and Buseck, 1980;
HRTEM AND GEOLOGY
303
FIG.13. Three bidimensionally infinite amphibole lamellae cut across a clinopyroxene matrix. The inset shows the electron diffraction pattern, which consists of clinopyroxene spots and amphibole streaks.
Whittaker et al., 1981). These rules explain what the possible interfaces are, how a lamella may terminate within the matrix, what is the structural misfit occurring in that faulted region, and how hydration may propagate across the crystal using faulted zones as “motorways” for chemical reactions. 3. Based on the observed textures and the suggested reaction mechanisms, schematic descriptions of the relative stabilities of the polysomes, together with kinetic models for the transformations, have been proposed (Veblen and Buseck, 198I ;Akai, 19824. Many other examples of polysomatic series have been reported ( e g , Ferraris et al., 1986). In all those cases, high-resolution imaging has been fundamental, in that this technique has shown the structural complexity existing within the crystals and has defined what the structures are and how they have formed. D . Modulated Structures
Polytypism and polysomatism are perhaps among the simplest cases of modulation in minerals. Actually, modulated structures are extremely
304
MARCELLO MELLINI
common and constitute an interesting field of investigation. Buseck and Cowley (1983) have recently reviewed this field. Modulation may arise by structural as well as by chemical variation across the crystal and may produce either strictly periodic or statistically periodic arrangements. The occurrence of modulation can be demonstrated within electron diffraction patterns by the occurrence of satellite spots. In case the spacings of these satellites are integral submultiples of the basic spacing, the structure is defined as commensurate (e.g., the 17-fold superstructure of antigorite in Fig. 14). Otherwise, nonintegral submultiples produce incommensurate modulations. From thc point of view of the Earth Sciences, much of the interest in modulated structures arises from the fact that they may represent intermediate stages along a continuous or discontinuous reaction series. Important examples can be found in plagioclase feldspars (Carpenter, 1986) and in the serpentine antigorite (Mellini et al., 1987). In both cases, the study of the modulation character has been used to decipher the reaction behavior of the minerals.
FIG.14. Electron d i h c t i o n pattern of antigorite, taken along [OlO]. The superstructure periodicity along a is 45 A and corresponds to a 17-fold modulation of the basic reciprocal cell. The diffracted intensity is modulated by a pseudo-sinusoidal function.
HRTEM AND GEOLOGY
305
V. MINERAL REACTIONS
A given mineralogical association is thermodynamically stable within a fixed range of P, T, X conditions (i.e., pressure, temperature, bulk composition of the system, compositions of the individual phases). Escaping the stability field, the system is expected to evolve towards new assemblages. If the reaction rate is fast enough, the new equilibrium conditions may be easily and quickly attained ( e g , beta-quartz is a nonquenchable phase, as it immediately transforms to alpha-quartz at the transition point). More often, transformations proceed more slowly or do not even proceed at all. Metastable material can then survive for an infinite amount of time (“A diamond is forever,” in spite of thermodynamics). The existence of kinetic controls over the advancement of reactions is important in geology, for two reasons at least. First of all, the sluggishness of reaction rates allows one to sample rock specimens that still preserve the original high temperature and/or high pressure aspects. Using these data from the Earth’s surface, geologists may guess about the nature of rocks and about the processes occurring within the lower crust or within the upper mantle. Secondly, geologists can follow along a profile (e.g., across a mountain belt as thick as the Alps) a complete suite of related rocks, all of them originating from a common precursor but showing different P and T conditions, and still indicating both gradients and polarities of the ancient deformative event. Microstructural analysis, at different scales, is an important tool, ablc to reveal the sequence of intermediate stages responsible for the final situation. I n particular, the HREM scale may reveal details about quenched reactions and sluggish reaction mechanisms. An example may be the direct observation of the reacting phase, the product phase, and the reaction intermediates, occurring all together within a few tens of angstroms. Based on the different chemical-physical parameters and the different geologic environments, earth scientists usually separate different types of mineral reactions. A few of them will now be exemplified, roughly in order of increasing reaction conditions.
B. Wruthering und Alteration The weathering of rocks occurs almost continuously over the wholc earth’s surface, and it is responsible for both the formation of soils and the modeling of landscapes. Much about the effects of weathering may be better understood by knowing what the mechanism is. Several important points may be addressed by HRTEM and AEM, for example, the determination of the
306
MARCELLO MELLINI
chemical composition, the degree of order, and the mineralogical nature of the early formed alteration products; determination of the possible geometrical features of the rock-fluid interaction (for instance, pore circulation rather than intergranular film or bulk diffusion); analysis of the textural relationships between altering matrix and early formed material; and study of the possible effects of previous deformation over the later alteration. There is almost no limit to the number of reactions that can be studied using both natural and treated specimens. The results may also have consequences with regard several problems in applied mineralogy. Examples are the degradation of stones and the preservation of historical and artistic objects, as well as the expected behavior of new materials under reactive conditions. Owing to their abundance within crustal rocks, feldspars were analyzed quite early. Page and Wenk (1979) studied the alteration of plagioclase feldspars adjacent to a hydrothermal vein (T = 300°C) in the copper deposit of Butte. The reaction progress was studied by performing a 10-cm transverse across an alteration profile, ranging from fresh plagioclase to progressively more altered materials (smectite, sericite, quartz, and secondary orthoclase in that order), By use of lattice images and thin-film chemical determinations, Page and Wenk (1979) were able to recognize the different minerals and to define their complex textural relationships, to perform a complete fine-scale zoneography of the alteration profile, and to suggest the following general pattern for alteration. First plagioclase is altered into low-grade, 10 A layersilicate smectite. Smectite later changes to 2M1 sericite, through a series of intermediate layer silicates. Intermediate material is commonly disordered, it extends for a few unit cells, and it reveals curved layers and partial interlayering (Page, 1980). From the compositional point of view, AEM indicates that sericite is strongly enriched in potassium content with respect to smectite. A lower thermal environment was investigated by Eggleton and Buseck (1 980), who were careful to separate the actual low-temperature weathering effects from the previous igneous or hydrothermal episodes. They limited themselves to the weathering of potassium feldspar, under a humid climate with an annual rainfall of 1000 mm and a temperature of 0-40°C. Within those conditions, weathering occurred preferentially at defective surfaces (such as the boundaries between twinned and untwinned domains), rather than simply over the outer crystal surface. Circular holes were produced by dissolution of the feldspars and amorphous ring-shaped structures were formed within the holes. This amorphous material later became the precursor for 10 .A layer-silicates. Two main textures of weatheringlaheration can be recognized. The first is based on strongly related orientations, possibly akin to actual topotactic replacement (Fig. 15). An example is the retrograde transformation from
307
HRTEM AND GEOLOGY
FIG. IS. Pseudo-parallel intermixed chlorite (c) and poorly crystalline material (pc). Lamellae of biotite may be still found within chlorite. The poorly crystalline material has an approximate stoichiometry of Fe:Al:Si = S : l : l .
biotite to chlorite (Yau et al., 1984; Eggleton and Banfield, 1985), where small packets of 14 A chlorite layers appear interleaved within the 10 biotite matrix. Several alternative mechanisms have been proposed to explain the biotite to chlorite replacement. In any case, apart from the actual operating mechanism, this transformation provides a further example of structural inheritance in the alteration product. The reaction may be further favored by the occurrence of tunnel-like structures localized at the reaction boundaries and able to facilitate the diffusion of ions (Ahn and Peacor, 1987). The second main texture exhibits the absence of crystallographic relationships and, probably, characterizes the lowest temperature environments. For instance, the formation of iddingsite rims on olivine has been explained by a two-step stage (Eggleton, 1984). At first, olivine breaks into a mosaic of tiny
a
308
MARCELLO MELLINI
needle-like crystals, 50 A in diameter and separated by channels 20 A wide. Smectite nucleates within the channels and later overgrows to form veins that are misoriented with respect to olivine. Much interest has been also given to the noncrystalline Fe-Si-Al oxyhydroxides (Eggleton, 1987). This material consists of hollow spheres, with outer diameter between 50 and 1000A. Even if these oxyhydroxides may be negligible in volume, they deserve attention because of their important geochemical role, both in adsorption and transport processes.
c. Diayenetic Processes
The fine-grained material resulting from weathering is subsequently transported and accumulates on the bottom of deep sedimentary basins. The sequence of the sediments constitutes a record of the paleoclimatic and paleogeographic conditions existing during the deposition. However, as the thickness of accumulated sediments increases, the original mineralogical and lithological features may be obliterated by a mechanism known as burial diagenesis. Water is progressively expelled from the sediments, chemical exchange occurs, the amount of open space is reduced, detrital minerals may recrystallize, and newly formed, authigenic minerals may appear. On the whole, the sediments become progressively more lithified and change aspect, for instance, from mudstone to slate. Much interest has recently been devoted to the study of diagenetic processes in clay minerals, but data have sometimes been confusing. Both Xray power diffraction and electron microprobe analysis have been largely hampered by the fine-grained nature of the material. Some reported interlayered minerals (namely, ordered alternations of different basic layers) may result in analytical artifacts due to chaotic intergrowth of separated discrete phases. Therefore, TEM analysis of fine-grained sediments has been rapidly growing as a powerful tool for the mineralogical determination of sedimentary sequences. The current approach is based on the combined use of electron diffraction data from dispersed particles, the careful use of analytical electron microscopy over several grains, and obtaining lattice images, pointing more to the general appraisal of structural and textural relationships than to extreme resolution. A detailed analysis of the layer silicates from Gulf Coast sediments and from the Martinburg Formation, which together represent a continuous transition from mudstone to slate, has been performed by Lee et al. (1985). At the textural level of the TEM scale, four main imperfections were observed. They were: (a) irregular, curved layers of smectite, with high dislocation
309
HRTEM AND GEOLOGY
densities; (b) termination of individual layers, with lateral dislocations; (c) formation of a mosaic-like arrangement of relatively ordered layersilicates; and (d) mixed layering and formation of faulted sequences. By the collection of several images, Lee et al. (1985) defined a trend of increasing perfection of the phyllosilicate crystals. In particular, the four types of imperfections decreased in abundance, descending from the top of the bottom of the sedimentary sequence. Not only did textural variation occur, but mineralogical changes also occurred along the sequence. For instance, the 14 8, layer silicate chlorite is absent from the 1750-m sample of the Gulf Coast sediment, and starts to appear in the 2450-m sample as lamellar packets 100-150 8, in thickness. Finally, chlorite becomes dominant over the 10 8, layer silicate (illite-smectite) in the 5500-m sample (Ahn and Peacor, 1985). The same sedimentary section has been studied by HRTEM and AEM in several different aspects. Ahn and Peacor (1986) studied the details of the smectite-to-illite transition. Whereas previous X-ray work had suggested the occurrence of mixed layering between illite and smectite, lattice images have conclusively shown that the two minerals occur together as discrete packets with different compositions. Whereas smectite gives the average composition K0.,6Na0.13(Si3.75A10.25
)(Al,.4BFe0.34Mg,,,),
illite gives K0,63Na0,01
(si3.40A10.60)(A1
1.69 FeO.
go.12 ).
D. Metamorphic Reactions Metamorphic reactions produce complete renewal of the rock by producing new mineral assemblages. The P-T field of metamorphism extends up to extremely high values, just before melting of the rocks. The study of metamorphic reactions provides information about the wide P-T space, leads to a definition of the stability field of a given assemblage, and, finally, by combining data from a regional area, brings an understanding of the overall geodynamic evolution. Several examples of applications of HREM to the study of metamorphic reactions have been produced within a very limited number of years. The most important results will now be recalled. Carpenter (1981) pioneered the study of the formation and growth of antiphase domains within the clinopyroxene omphacite. Antiphase domains arise from short-range cation ordering, which reverse across the so-called
310
MARCELLO MELLINI
antiphase boundaries. Several omphacite specimens from different geological environments showed antiphase boundaries, and it was found that the size of the equiaxed antiphase domains changed with the geological environment in a highly meaningful way. In particular, the average size of the antiphase domains was 50-100 8, in blueschists (which are metamorphic rocks equilibrated at, say, 300”C), changed to 1000 A in California eclogites (T = 500-600”C), and finally reached 3500 8, in the Nybo eclogite (T = 800°C or more). Even if a rate law for coarsening was also to be considered, the size of the antiphase domains in omphacite is more sensitive to temperature than to time. Therefore, antiphase domains in omphacites can be exploited as quantitative geothermometric indicators of the metamorphism peak temperature. Corona structures are concentric rims of several minerals and form by incomplete reactions between adjacent mineral grains. As the annular arrangement of the reaction products depends on the changing values of P, T, and volatile composition, quantitative estimates of these parameters may be obtained from the fine study of those reaction rims. As the corona system is, by definition, an unequilibrated system, the HREM investigation constitutes a check for bulk determinations and may even reveal unexpected results. For instance, Griffin et al. (1989, while studying the corona structures of Bergen and Sognefjord, found submicrometric inclusions of pigeonite (that is, a very low calcium clinopyroxene stable at very high temperatures) within the normal clinopyroxene of the corona. The observed occurrence of pigeonite contributed to a reinterpretation of chemical data and to a determination of an initial temperature for the formation of coronas ( T = 1000°C) that was higher than was previously expected. In that case, pigeonite was a relic of a high-temperature environment that was no longer observable from a macroscopic point of view. However, lower temperature episodes, subsequent to the corona formation, were also evident in the TEM. The first is a hydration reaction that produces oriented amphibole lamellae within clinopyroxene (Fig. 13). The second is a more abundant hydration reaction that produces large amounts of hydrated minerals (Fig. 16). Therefore, the study of the corona not only reveals the original state (“the unaltered aspect at the optical microscope scale”), but also shows, at the finer HRTEM scale, its subsequent evolution. In this particular case, later retrogression is associated with thrusting of the anorthositic gabbros from the deep crust to the surface in the late Proterozoic (900 m.y.) and further translation during Caledonian time (400 m.y.1. Thermometamorphic aureoles are formed when magma intrudes the country rocks and modifies the previously existing mineral assemblages. In case the original geometrical relations have not been modified by later tectonic movement, aureoles constitute a highly valuable natural laboratory. In fact, a
HRTEM AND GEOLOGY
31 1
Ci
complete range of reaction conditions is exposed, and several succeeding mineral assemblages can be mapped moving towards the contact. Within such conditions, Worden et al. (1988) studied the mechanism of thermal decomposition of the layer silicates chlorite and muscovite. A complete transformation to high-temperature minerals (spinel, cordierite, mullite) occurred
312
MARCELLO MELLlNl
FIG. 17. Interface between reactant chlorite (c) and product pyroxene (p) in thermally decomposed metapelitic rocks.
close to the contact, whereas 10 m from the contact only partial decomposition affected the layer silicates. In that region, a quenched reaction state could be studied and indicated that the nucleation and growth of the thermometamorphic minerals was controlled by orientation relationships between the parent and the product phases. In particular, the closest-packed zones of the layer silicate are inherited within the reaction products (Fig. 17). By combining lower grade samples belonging to the regionally metamorphosed ophiolites together with higher grade specimens from the thermal aureole of the Bregaglia Intrusive, Mellini et al. (1987) determined the complete evolution of antigorite within its stability field (that is, from 250°C to 550°C). Antigorite forms a polysomatic series, with variable composition and variable lattice parameters. It occurs as a fine-grained mineral, characterized by many important disorder phenomena (polysynthetic twinning, polysomatic disorder, modulation dislocations, wobbling and so on; Fig. 18).The lattice parameters were systematically determined by electron diffraction along a profile of increasing metamorphic grade. The modulating parameter a was found to decrease, on the average, from longer (60 A) to shorter (35 A) values with increasing metamorphic grade. In the same time, antigorite became progressively more silicon rich. Individual antigorite grains almost invariably showed a heterogeneous distribution of periodicities, with higher values close to grain boundaries or reaction fronts, and lower values toward the inner
HRTEM AND GEOLOGY
313
FIG. 18. Complex structure of the serpentine mineral antigorite, Mg,,Si,O,(OH), h . (a). in contact with pyroxene (p). (001) twin lamellae run SW-NE in the picture. Polysomatic disorder results into variation of the u periodicity. which increases towards the reaction front where pyroxene survives.
grain. As the transformations within antigorite require broken bonds and chemical diffusion, reactions were sluggish enough to prevent the complete achievement of equili brium. Therefore, time became an important factor to be taken into account and limited the possible use of antigorite as an accurate geo thermometer. HRTEM analysis of metamorphic environments is not necessarily confined to inorganic systems, such as the silicates described up to now. Instead, metamorphic evolution may also be determined by studying elemental carbon. In fact, elemental carbon is a common constituent of many rocks and shows important modifications in very low metamorphic environments. However, optical measurements or X-ray data may be confusing, as
314
MARCELLO MELLINI
indicated by the existing nomenclature. Buseck and Bo-Jun (1985) examined several carbons by HREM. Those images provided considerably more information, well beyond that previously available. Even if the quantitative meaning of those many parameters is not yet completely assessed, we can expect that further work may produce a better understanding of the relationships existing among carbon crystallinity, metamorphic conditions, carbonaceous precursors, and sedimentary sources. E. Mernrnict Strife and Radiation Damage
Metamict minerals are oxides or silicates that have partially or totally lost the original crystal order due to the radioactive decay of the thorium and uranium nuclides contained in these minerals. No longer a mineralogical curiosity, metamict minerals have been recently studied, as they offer a natural example of the long-term effects of radiation (Headley et al., 1981).This point is particularly important as mankind is now producing a growing amount of radioactive wastes, and an extremely safe stabilization of that waste is obviously required (e.g., Ringwood, 1985). HRTEM and AEM may be extremely useful in assuring a good knowledge of the effects of radioactive decay within a given material, as well of the mechanisms by which damage is produced. Yada et al. (1987) have extensively studied the nature of fission tracks in natural zircon, ZrSiO,, by using a 1 MeV electron microscope operating at nearly atomic resolution. They found that damage starts to appear as fission tracks, approximately 1000 A long and 2030 A wide. No relic of crystal structure can be seen within the track. Damage goes on throughout a sequence of intermediate stages, characterized by the survival of crystalline domains approximately 50-100 A large. These crystalline domains are tilted with regard to each other and are embedded within a structureless matrix. Finally, the complete amorphous state is achieved. During that evolution, contrast anomaly appears in the lattice image as irregular bright and dark spots. Based on skilled HREM technique and computer simulation, it has been concluded that these anomalies are point defects due to vacancies and interstitial atoms, produced by direct atomic collision with alpha particles or by passage of ionizing nuclear particles. Further work concerning radiation damage has been done on the pyrochlores microlite (Lumpkin et al., 1986a), betafite (Lumpkin and Ewing, 1987), and zirconolite (Lumpkin et al., 1986b). Zirconolite, CaZrTi,O,, has been proposed as an important constituent of SYNROC, which is acrystalline assemblage that was proposed for use in nuclear waste disposal (Ringwood, 1985). Zirconolite has been chosen because it is a stable phase and is resistant to alteration and insensitive to weathering. Even though Ringwood (1985) suggested that the principal effect of alpha decay would only be a different
HRTEM AND GEOLOGY
315
scheme of cationic distribution, within an essentially intact fluorite-type anion lattice, some controversy has affected that suggestion. In particular, Lumpkin et al. (1986b), based on several kinds of data and, in particular, based also on HRTEM observations, suggested that the most likely structure for the fully damaged state is that of a random network, with no periodicity extending beyond the first coordination sphere. Despite the modification, zirconolite does not, however, show any major sign of geochemical alternation. Moreover, the metamict structure can be annealed back to crystalline zirconolite by timely heating. Much further work should be necessary for the satisfactory design and testing of waste disposal forms that are able to resist radiation damage, alteration, and leaching for extremely long times. However, two main points can be stressed here. First, we must use as much geology as possible, because geology gives us long-term information on the behavior of natural systems. Second, we may expect that both HRTEM and AEM will be extremely valuable to understand the behavior of both natural and synthetic materials, possibly also using environmental reaction cells within the microscope.
F. Exsolution and Subsolidus Phenomena
A common textural feature found in igneous minerals is exsolution from an original high-temperature (HT) phase to form two chemically divergent low-temperature phases. The pristine composition of the HT phase depends on the melt composition and on the crystallization temperature. The final compositions of the exsolution products depend on the equilibration temperature attained during cooling. Finally, the textural relationships give information about the cooling rate, in that more coarsely grained associations indicate slower cooling conditions. The three main exsolution mechanisms are homogeneous nucleation, heterogeneous nucleation, and spinodal decomposition. This order roughly corresponds to an increasing cooling rate. Typical TEM textures for exsolved clinopyroxenes are given in Fig. 19. The figure refers to a clinopyroxene with a pre-exsolution composition of coming from the central portion of a tholeiitic dyke that was approximately 30 meters thick (Mellini et al., 1988).The cooling rate was intermediate between truly intrusive conditions (as found in intrusive bodies with kilometric extensions) and effusive, lava-like conditions. A fine-scale association of Capoor (0.22 calcium atoms per formula unit) and Ca-rich (0.67 calcium a.p.f.u.) pyroxenes was formed, with the thickness of the lamellae on the order of 200 A and 1000 A, respectively. Based on the TEM/EDS data, it was suggested that
316
MARCELLO MELLINI
FIG. 19. Typical exsolution texture of magmatic pyroxenes. Calcium-rich (r) augite alternates with calcium-poor pigeonite (p) lamellae.
crystallization had occurred at 1 180°C and that intercrystalline chemical exchange had terminated at 950°C. Figure 20 refers to a faster cooling rate. The specimen is a clinopyroxene from a mafic xenolith (namely, from a rock fragment originating from the deep Earth crust and transported to the surface by magma flow during eruption). Exsolution failed to develop lamellar texture and appears as spinodal decomposition, that is, without any sharp spatial or compositional boundary. A continuous compositional modulation wave occurs across the crystal, with pseudoperiodicity on the order of 400-500 A. Grove (1982) extensively studied exsolved clinopyroxenes from lunar basalts and compared the observations with the data obtained on thermally treated, synthetic clinopyroxenes. The thickness of the exsolution lamellae was related to the cooling rate, and the coarsening of exsolution lamellae in synthetic pyroxenes was described by the rate law 2 = Lo kt1'3, with a thickness of the (001)lamella, and where t is time in days and k is 107.3. Based on that calibration, Grove (1982) determined cooling rates that were variable from - 0.02"C/day to - 20"C/day for the different lunar basalts. On one hand, those values were in keeping with independent parallel determinations. On the other hand, those measurements extended the range of estimable temperature down to 800°C.
+
HRTEM AND GEOLOGY
317
FIG.20. Spinodal decomposition in quickly cooled clinopyroxene from mafic xenoliths. No sharp compositional or structural limit can be drawn.
VI. EXTRATERRESTRIAL MINERALOGY A . Grnrrul
Extraterrestrial material is extremely important, not only as it forms the database for any geological approach to planetary bodies other than the Earth, but also for the geophysical modeling of the Earth itself. Moreover, the knowledge of the chemical, textural, and mineralogical features present in meteorites and in cosmic dust may favor a sounder understanding of the origin, structure, and evolution of planetary systems. For instance, meteorites offer us the chance to collect samples from the asteroid belt that is situated between Mars and Jupiter. Meteorites convey to the Earth’s surface the record of the lithological state existing at different depths in the parent body, no matter whether it was actually a broken-up planet or a swarm of smaller accretional bodies. More exotic material may enter within the solar system via transport by one of the many comets, and, after melting of the ice in the comet, relics of that foreign material may be left behind. Finally, hydrated meteorites are a source of information about the nature and extent of chemical reactions in space, as well as about the meaning of extraterrestrial water and hydrocarbons.
318
MARCELLO MELLINI
Unfortunately, as meteorites and interplanetary dust are not common, they should not be wasted for useless or material-consuming determinations. Furthermore, meteorites suffer dramatic surface heating after impact within the atmosphere. As a consequence, shock metamorphism and local thermal metamorphism episodes may partially obliterate the original lithological state, which now perhaps survives only as local, tiny relics. Taking into account all these problems, the high spatial resolution of the TEM techniques becomes extremely valuable, in that it offers a way to determine the complete structural, chemical, and textural characterization of small objects, such as micrometric grains of polyphasic dust or reaction rims in unequilibrated systems. B. Meteoritic High-pressure Minerals
The spinelloids controversy constitutes an example of the successful application of conventional and high-resolution TEM techniques to highpressure mineralogy. Olivine, (Mg, Fe),SiO,, is an important constituent of the upper mantle, namely of the nearly outer portion of the Earth. However, early experimental work showed that olivine structure may transform to a polymorph spinel structure (the so-called gamma phase), as well as to a modified spinel structure (beta phase), at temperatures and pressures on the order of 100 kbar and 1000"C(Ringwood and Major, 1970).Phase transitions from olivine to spinel and/or to modified spinel have been suggested to explain the main seismic discontinuity occurring at 400 km depth within the Earth (Jeanloz and Thompson, 1983). Therefore, much of the interest in highpressure research has moved towards the characterization of spinelloid phases (Price, 1983). Unfortunately, as reaction rates in silicate systems are usually sluggish, equilibrium conditions may fail to be attained, and metastable products may commonly be produced in experimental runs. Paradoxically, much of the laboratory work may be completely unable to cope with the geological problem that suggested the experiment. However, in the case of high-pressure research, support may be found by the "field evidence" offered us by meteorites, such as the Tenham or the Peace River chondrites. Binns et al. (1969) first reported gamma-Mg,SiO,, also called ringwoodite, in the Tenham chondritic meteorite. The formation of ringwoodite was explained to be due to shock waves passing through the chondritic parent body after a major impact (Binns, 1970). However, the X-ray identification by Binns was not generally accepted, as it was suggested that the powder diffraction pattern might be due to admixed goethite (FeOOH), metallic iron, and garnet. The dispute was finally solved by Putnis and Price (1979). By TEM analysis, they definitely showed that gamma-Mg,SiO, actually occurs in the Tenham meteorite. They showed also that the extra lines of the X-ray powder pattern
HRTEM AND GEOLOGY
319
were due to the occurrence of the beta phase, formed as a quench product of gamma-Mg2Si04. Yet, the controversy was not yet completely over, in that interest in the mechanism of the gamma-to-beta phase transition was now growing because of the possible effects of that transformation over the Earth mantle dynamics and the production of deep-focus earthquakes. Two main proposals were made. In The first, the TEM data were interpreted to suggest that gamma-Mg,SiO, transformed to the beta phase by a topotactic replacement mechanism, that is. layer by layer (Price et al., 1982).The second interpretation also used TEM data, but denied the martensitic nature of the transformation and suggested nucleation and growth as an alternative mechanism (Vaughan et at., 1982; Boland and Liu, 1983). The point was particularly important, as martensitic transformations may be expected to depend on the orientation of the crystal with respect to the applied nonhydrostatic stress field, with possible consequences on the rheological properties of the mantle. Confusion was possibly due in part to the fact that different authors were not comparing chemically identical systems (e.g., magnesium, iron or nickei silicates. as well as magnesium germanates), and in part because the natural specimens available. namely, the shocked meteorites, are strongly heterogeneous because of the pressure and temperature gradients produced during the shock metamorphism episodes. Therefore. several transformation mechanisms might well be present just within the same meteorite, depending on the details of the shock-wave propagation ( Madon and Poirier. 1983). HRTEM has been largely used to decipher this complex situation. For instance. Davies and Akaogi (1983) reported on the important, fine-scale phase intergrowth observed in synthetic Ni,SiO,-NiAl,O, spinelloids. Not only were beta- and gamma-Mg,SiO, found to occur closely intermixed with each other, but new structural types were also detected and described. Doubts about the one-phase nature of the beta phase in other systems arose. Similar results were later obtained by Barbier and Hyde (1986) for the synthetic spinelloids in the system MgGa,O,-Mg2Ge0,. In both systems, HRTEM indicated a lamellar association of several structures. and that textural indication was consistent with a possible layer-by-layer martensitic transformation. Those results were found to match the field evidence offered by the natural magnesium and iron spinelloids. Price (1983) performed an extensive computation of lattice images for spinelloids and was able to determine the nature of the stacking faults present within the beta phase. Basically, faults were due to displacement vectors R1 = 1/4[010] and R2 = 1/2[ - 1011. Also, more complex faulting was observed in the Peace River chondrite, and it was structurally interpretated by image-matching procedures. Finally, Price ( 1983) proposed a two-stage mechanism that was able to explain the gamma-to-beta transformation. In the first stage, a cooperative, or martensitic, transformation
320
MARCELLO MELLINI
produces the (1 10)faults and starts to convert spinel to the beta phase. Later, the development of several faults on crystallographically equivalent (1 10) planes of cubic spinel creates a critical volume of the beta phase, and, during this second stage, the critical volume acts as a nucleus and starts to grow along the three directions of the space.
C. Layer Structures and Hydrous Material
The study of the material forming the matrix of many carbonaceous chondrites offers a good example of the close relationships existing between scientific knowledge and technical improvements. Within 10 years, we moved from the knowledge of only bulk properties to a textural description of phase intergrowth by HRTEM and, finally, to structural and chemical determinations of what those phases are, by progressive integration of the HRTEM data with EDS and EELS analyses. Early optical and bulk chemistry work showed the occurrence of “serpentine-like’’ or “chlorite-like’’ material in carbonaceous chondrites. However, the very-fine-grained nature of that material seemed to prevent any reliable, unbiased optical, microprobe, or X-ray determinations. People described the matrix in terms of “spinach-phase” and acronyms such as PCP were introduced to mean “poorly characterized phase.” PCP was potentially important to determine whether the hydrous alteration evident in the carbonaceous chondrite matrix was actually due to the terrestrial history of the meteorite or might go back to hydration within the meteorite itself or in its parent body; or it was perhaps due to condensation processes in a presolar nebula. Therefore, much interest was devoted to hydrous minerals and meteorite matrix in the early 1980s (e.g., Barber, 1985). 1 . I t was first realized that bulk chemistry, electron diffraction, and electron imaging data might be reconciled, suggesting the occurrence of Ferich serpentine [something like terrestrial cronstedtite Fe,SiFeO,(OH),]. This variegate serpentine was able to show several different textures (Akai, 1980; Barber, 1981). 2. However, material other than terrestrial serpentine did occur within the matrix of carbonaceous chondrites (Mackinnon and Buseck, 1979; Akai, 1980; Barber, 1981). The identification was largely based on lattice fringe periodicities, which typically were 17 A or 18 A (depending on the author’s ruler), 1 1 A, and 7 A. By general agreement, it was concluded that the 7-A structure was Fe-rich serpentine; the 11-A structure was the ordered SB intergrowth between brucite, Mg(OH),, and Fe-rich serpentine; and that the 18-A phase was the SBB ordered intergrowth (Mackinnon and Buseck, 1979;
HRTEM AND GEOLOGY
32 1
Akai, 1982b; Mackinnon, 1982). Unfortunately, no really quantitative approach to the nature of the 11-A and 18-A structures was yet possible because of scanty electron diffraction data. imaging conditions far from the best setting, and an absence of reliable chemical data. Whereas many textural data were becoming available and the existence of unexpected structures was becoming evident, interpretative work on their actual nature was still lacking. Texture was not enough to explain such a can of worms. 3. The resolution of the 11-A and 18-A structures was finally given by interpretation of the EDS and EELS chemical data (Barber et al., 1983).The 1 I-A structure contained Fe, Ni, 0, and S, but no silicon at all, and that was too low a value for a “serpentine-like phase.” Therefore, hypotheses other than intergrown brucite and serpentine had to be put forward. Barber et al. (1983) suggested possible relationships with an almost unknown, poorly defined terrestrial hydrous sulfide, tochilinite, consisting of alternating Mg(OH), and (Fe, Ni)S layers. This model was further supported by Mackinnon and Zolensky (1984), who also explained the nature of the 18-A phase in terms of alternating tochilinite and serpentine modules (Fig. 2 1). 4. The final petrological appraisal of the problem is due to Tomeoka and Buseck (1985a). Starting from the known occurrence of iron and nickel alloys
78,
FIG. 2 I . Proposed structural arrangement within the meteoritic 18-A phase. with alternating I
I-A lochilinile and 7-A serpentine. (From Mackinnon and Zolensky, 1984.)
322
MARCELLO MELLINI
(kamacite) and of iron and magnesium silicates (olivine and pyroxene) in meteorites, they interpreted the HRTEM textural evidence. The proposed mechanism was based on three separate stages of alteration, and, most Importantly, on alteration within the parent body. According to them, PCP had now moved from the status of “poorly characterized phase” to the status of “partly characterized phase”: a still more developed status, that might be named “perfectly characterized phase,” may be expected in the near future.
D. Interplanetary Dust Purticles The term inteplnnetury dust purticles (IDPs) is used to identify a recently discovered class of extraterrestrial material, namely, particles smaller than I mm. This material seems to be rather common, both in space, as indicated by the craters produced in spacecrafts, as well as on the Earth’s surface. For instance, most of the mass that is annually accreted by the Earth (approximately 10,000 tons) is actually formed from IDPs, rather than from large meteorites, IDPs are important not only in terms of their annual mass, but also as they seem to represent primitive materials formed in the early solar nebula. Therefore, much work on IDPs has been produced recently. However, inany frustrating uncertainties about the nature and origin of IDPs still prevent a complete understanding. TEM analysis is largely used to propose acceptable solutions to those problems. Bradley et al. ( I 983) studied the so-called chondritic porous dust collected at a n altitude of 20 km in the stratosphere. The specimen contained whiskers and platelets of the magnesium silicate enstatite, Mg,Si,O,. Even if enstatite is a common terrestrial mineral, and volcanoes release a lot of enstatite in the atmosphere, Bradley et al. (1983) excluded the terrestrial origin of enstatite whiskers and platelets in the dust, using the results of the TEM analysis. In fact, habitus, morphologies, and twin patterns were completely different with respect to terrestrial magmatic and metamorphic enstatite. The specimens usually consisted of clinoenstatite and orthoenstatite lamellae, alternating with each other along the whisker axis. The role of screw dislocations in metastable crystallization processes was suggested as an explanation for the unusual morphologies. Finally, the particles were interpreted as being pristine gas-to-solid condensates, which have survived intact, perhaps from before even the earliest stages of solar system formation. Owing to the variegate accretion phenomena and subsequent possible reequilibration, the TDPs are highly heterogeneous and usually consist of several smaller particles. Therefore, any attempt to define the mineralogical nature of IDPs must necessarily distinguish from grain to grain of the same dust aggregate. Within that approach, the analytical capabilities of the
HRTEM AND GEOLOGY
323
HRTEM, equipped with EDS and EELS spectrometers, and capable of collecting CBED patterns, are unrivaled. Tomeoka and Buseck (1984) offer a nice example of the complex world that is enclosed just within an individual IDP. In particular, they studied the particle with the nickname Low-CN. Within Low-Ca, they observed a fluffy material, producing 10A lattice fringes, that was akin to terrestrial smectite in composition. Olivine, Mg,SiO,, occurred as submicrometric grains. Different iron and nickel sulfides were also detected. Magnetite and chromite were also present in Low-Ca. For each phase, relevant chemical, structural, and textural data were supplied, including high-resolution lattice images. Based on the overall picture offered to us by the TEM analysis, Low-Ca came out to be quite unique in its mineralogy when compared to other IDPs previously studied. A subsequent study was later performed on the Skywalker IDP (Tomeoka and Buseck. 1985b).Skywalker is also a hydrated dust particle, which contains a smectite or mica-like phase, just as does Low-Ca. This phase, however, forms by hydrous alteration of pyroxene rather than olivine, as indicated by lattice images. Tomeoka and Buseck (1985b), therefore, used TEM to identify another different example of aqueous interplanetary alteration, even if at the time it was not possible to state whether liquid or vapor water were responsible for the alteration. As the available details on IDPs grew in number and accuracy, progressively deeper comparisons within the IDP's family and with respect to carbonaceous chondrite meteorites were becoming possible. Obvicusly, each I D P aggregate constitutes a completely different object with respect to other aggregates. For instance, Spray (Christoffersen and Buseck, 1986) turned out to be composed of refractory minerals and, in particular, of calcium and aluminum silicates. A specific point of interest was the observed symplectitic intergrowth between diopside, Ca Mg Silo,, and spinel, MgAI,O,. That intergrowth was suggested to be indicative of a reaction between melilite, Ca,(Mg, Al, Si),O,, formed by gas-to-solid condensation, and vapor within the nebular gas. The study of IDPs by TEM has started only recently, and we cannot yet anticipate the final results. However, even if some detail of phase identification or interplanetary processes might be questioned, there is no doubt that the current TEM work on IDPs is making a good contribution to our knowledge of interplanetary matter and interplanetary processes. CONCLUSIONS
High-resolution electron microscopy has produced very important contributions within several different areas of the Earth Sciences. The most impressive results have perhaps been obtained in the widespread application
3 24
MARCELLO MELLINI
of high-resolution imaging to the study of the defective nature of minerals. Those investigations have widely demonstrated the important role of real structure in minerals, as well as the common deviations from the ideal crystal order. Based on those results, controversial bulk data may be profitably interpreted. However, high-resolution electron microscopy has been applied not only to the study of defective structures, but also to the determination of average structures. Usually, these studies have largely been qualitative, in that they have produced only a general appraisal of structural topology, rather than a quantitative model based on detailed bond geometry. Further development toward more quantitative determinations may, however, be expected, for example, supporting the electron optical imaging approach by the measurement of diffracted intensities. More recently, interest has also moved toward the analytical capabilities of the electron microscope by full exploitation of the high spatial resolution of the different analytical modes and by introduction of new spectroscopic techniques. These new data further contribute to the usefulness of quantitative rather than qualitative HREM. That technical development has been accompanied by an increasing complexity of problems under investigation. Whereas early applications of HREM were perhaps ancillary with respect to other techniques, HREM has now become one of the most important tools for the study of complex systems, such as mineral reactions and petrological equilibria. Most important, not only have more data have been measured, but also, since qualitatively new sources of information have become available, new approaches to old problems have been developed, and these may produce important results in the near future.
REFERENCES Ahn, J., and Peacor, D. R. (1985). Clays C l a y Mineral. 33,228-236. Ahn, J., and Peacor, D. R. (1986). Clays C l a y Mineral. 34, 165-179. Ahn, J., and Peacor, D. R.(1987). h e r . MineruL72,353-356. Ahn, J., Peacor, D. R., and Essene, E. I. (1986). Cilframic'r.19, 375-382. Akai, J. (1980).Mem. Nat. Inst. Pol. Res. 17, 299-310. Akai, J. (1982a). Contrih. Minerd. Petrol. 80, 117-131. Akai, J. (1982b). Mem. N a t . Inst. Pol. Res. 25, 131-144. Amouric, M. (1987). A m Crysr. B43, 57-63. Amouric, M., and Baronnet, A. (1983). Phys. Chem. Minerds 9, 146-159. Amouric, M.. Baronnet, A,, and Finck, C. (1978). Mai. Res. Bull. 13, 627-634. Amouric, M., Mercuriot, G . ,and Baronnet A. (1981). Bull. Mineral. 104, 298-313. Amouric, M., Gianetto. I., and Proust, D. (1988). Bull. Mineral. 111, 29-37.
HRTEM AND GEOLOGY
325
Barber. D. J. (1981).G~YJC/I. Cosmoch. Actcr 45, 945-970. Barber. D. J. (1985).Clay M i n e r d . 20,415-454. Barber D. J., Bourdillon, A., and Freeman, L. A. (1983). Nutitre 305, 295-297. Barbier, J., and Hyde. B. G. (1986). Phy.s. Chern. Minerals 13, 382-392. Baronnet. A. (1980). I n “Current Topics in Materials Sciences,” Vol. 5. North Holland, Amsterdam, 447-548. Binns, R. A. (1970).Phys. Erirlh Planer. Interiors 3, 156-160. Binns. R. A., Davis, R. J., and Reed, S. J. B. (1969).Nature 221,943-944. Boland, J. N., and Liu. L. (1983). Nature 303, 233-235. Bradley. J. P.. Brownlee. D. E., and Veblen, D. R. (1983).Nature 301,473-477. Bursill, L. A., Thomas, J. M., and Rao. K. J . (1981).Nature 289, 157-158. Buseck, P. R. (1983).Anier. Scienr. 71, 175-185. Buseck, P. R., and Bo-Jun, H . (1985). Geoch. Cosmoch. Actu 49,2003-2016. Buseck, P. R., and Cowley, J. M. (1983).Amer. Mineral. 68, 18-40. Carpenter. M. A. (1981).Contrib. M i n e r d . Petrol. 78,441-451. Carpenler, M. A. (1986).Phys. Chem. Mincwls 13, 119-139. Champness, P. E. (1987).M i n e r d . Mag. 51. 33-48. Christonersen, R.. and Buseck, P. R. (1986).Scicwce 234, 590- 592. Cowley, J. M.. and Smith. D. J. (1987). Acru Crysr. A43, 737-751. Davies, P. K., and Akaogi, M. (1983). Nature 305, 788-790. Eggleton, R. A. (1984).Clays Clay Mineral. 32, 1-1 I . Eggleton, R. A. (1987).Clays Clay Mineral. 35, 29-37. Eggleton. R. A,, and Banfield, J. F. (1985).Amrr. Mineral. 70,902-910. Eggleton. R. A., and Buseck, P. R. (1980). Cltrys C h p Mineral, 28, 137-178. Ferraris, G., Mellini, M., and Merlino, S. (1986).Rend. Soc. Ira/. Mineral. Petrol. 40. 229-240. Ferraris, G.. Mellini, M., and Merlino, S. (1987).Amer. M i w r u l . 72, 382-391. Gjonnes, J., and Moodie, A. F. (1965). Acta Cryst. 19, 65-67. Griffin, W. L., Mellini, M., Oberti, R., and Rossi, G. (1985).Conrrib. Mineral. Petrol. 91, 330- 339. Grove. T. L. (1982).Amrr. Mineral. 67,251-268. Headley, T. J., Ewing, R. C., and Haaker, R. F. (1981).Nurure 293,449-450. Iijima. S., and Buseck. P. R. (1978). Acta Cryst. A34, 709-719. Jeanloz, R..and Thompson,A. B.(1983). Rev. Geoph. Sp. Phys. 21, 51-74. Lee, J. H., Ahn, J. H., and Peacor, D. R. (1985).J . Sedim. Petrol. 55,532-540. Lumpkin. G. R., and Ewing, R. C. (1987).Proc. 45th Ann. Meet. Electr. Micr. Soc. Amer., 376-377. Lumpkin, G . R.. Chakoumakos, B. C., and Ewing, R. C. (1986a).Amer. Mineral. 71. 569-588. Lumpkin. G. R.. Ewing, R. C., Chakoumakos. B. C., Greegor, R. B., Lytle, F. W., Foltyn, E. M., Clinard, F. W.. Boatner, L. A., and Abraham, M. M. (1986b).J . M a t e r . Re.?. 1, 564-576. Mackinnon, 1. D. R. (1982). Geoch. Cosmoch. Acfu 46,479-489. Mackinnon. I. D. R., and Buseck, P. R. (1979).Nature 280,219-220. Mackinnon, 1. D. R., and Zolensky, M. E. (1984). Nature 309,240-242. Madon. M.. and Pokier, J. P. (1983). Phys. Earth Plurrrr. Inferiors 33, 31-44. Mellini. M. (1982). Tst,hermaks Min. Perr. Mitt. 30, 249-266. Mellini. M.. Amouric, M.. Baronnet, A,, and Mercuriot, G . (1981).Amer. M i n e r d . 66, 1073 1079. Mellini, M., Merlino, S., and Pasero, M. (1984).Phys. Chem. Minerals 99-105. Mellini. M.. Ferraris, G., and Compagnoni, R. (1985). Amer. Mineral. 70, 773-781. Mellini, M.. Merlino, S., and Pasero. M. (1986). Amrr. Mineral. 71, 176-187. Mellini, M.. Trommsdorff, V., and Compagnoni. R. (1987).Conrrih. Mineral. Petrol. 97, 147- 155. Mellini. M., Carbonin, S., Dal Negro, A., and Piccirillo, E. M. (1988). Lirhos, 22, 127-134. Nakajima. Y., and Ribhe, P. H. (1981).Conrrib. MinPrcil. Petrol. 78, 230-239. Otten, M. T., and Buseck, P. R. (1986). Proc. 44th Ann. Meet. El. M i r r . Soc. Amer., 706-707.
326
MARCELLO MELLINI
Otten, M. T., and Buseck, P. R. (1987). Phys. Chem. Minerals 14,45-51. Page, R. (1980) Contrib. Mineral. Pefrol. 75. 309-314. Page, R., and Wenk, H. R. (1979). Geology 7, 393-397. Price, G. D. (1983). Phys. Earth Planet. 1nlerior.s 33, 137-147. Price G. D., Putnis, A., and Smith, D. G. W. (1982).Nature 296,729-731. Putnis, A,, and Price, G. D. (1979). Nature 280. 217-218. Ringwood, A. E. (1985). M i n e r d . Mag. 49, 159- 176. Ringwood, A. E., and Major, A. (1970). Phys. Earth Planet. Inter. 3.89-108. Spence, J. C. H. (198 I ) . “Experimental High-Resolution Electron Microscopy.” Clarendon Press, Oxford. Tomeoka, K., and Buseck, P. R. (1984). Earth Planet. Sci. Letters 69, 243-254. Tomeoka. K., and Buseck. P. R. (l985a). Geoch. Cosmoch. Acta 49,2149-2163. Tomeoka, K., and Buseck. P. R. (1985b). Nofure 314.338-340. Thompson, J. B., Jr. (1978). Amer. M i n e r d . 63, 239-249. Vdughan, P. J., Green, H. W., and Coe, R. S. (1982).Nature 298,357-358. Veblen, D. R. (1985a).A G U Groph. Monogr. 31. 122-131. Veblen, D. R. (1985b). Ann. Reu. Earth Planet. Sci. 13, 119-146. Veblen. D. R., and Buseck, P. R. (1979).h e r . Mineral. 64,687-700. Veblen, D. R., and Buseck, P. R. (1980).Amer. Mineral. 65, 599-623. Veblen, D. R., and Buseck, P. R. (1981).Amer. Mineral. 66, 1107-1134. Veblen, D. R.. and Buseck, P. R. (1983). Proc. 4/st Ann Meet. El. Micr. Soc. Am., 350-353. Whittaker, E. J. W., Cressey, B. A,, and Hutchison, J. L. (1981). Mineral. Mag. 44,27-35. Williams, D. B. ( 1984).“Practical Analytical Electron Microscopy in Materials Sciences.” Philips Electron Optics Publishing Group. Mahwah. Worden. R. H.. Champness, P. E.. and Dr0op.G. T. R. (1988).Proc. Con$ Phase Trunsformations. Inst. Metals, London. Yada, K., Tanji, T., and Sunagawa, 1. (1987).Phys. Chem. Minerals 14, 197-204. Yau. Y C., Anovitz, L. M., Essene, E. J., and Peacor, D. R. (1984).Contrib. Mineral. Petrol. 88, 299-306. Zussman. J. (1987).Mineral. M a y . 51, 129-138.
ADVANCES IN ELEC'IRONICS A N D ELECTRON PHYSIC'S. VOL XI
On Generalized lnformation Measures and Their Applications INDER JEET TANEJA
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . II . Shannon's Entropy and its Generalizations . . . . . . . . . . . . . . A . Shannon's Entropy . . . . . . . . . . . . . . . . . . . . . B. Entropy of Order r . . . . . . . . . . . . . . . . . . . . . . C. Entropy of Degree s . . . . . . . . . . . . . . . . . . . D . Entropy of Kind I . . . . . . . . . . . . . . . . . . . . . . E. Entropy of Order r and Degree s . . . . . . . . . . . . . . . . F. List of Generalized Entropies . . . . . . . . . . . . . . . . . G . Unified (r,s)-Entropy . . . . . . . . . . . . . . . . . . . . . H . Analytic and Algebraic Properties of Unified ( r . s)-Entropy . . . . . . . . I . Inequalities and Bounds on Generalized Entropies . . . . . . . . . . . 111. Generalized Distance Measures . . . . . . . . . . . . . . . . . . . IV . Generalized Measures of Directed Divergence . . . . . . . . . . . . . . V . Generalized Divergence Measures . . . . . . . . . . . . . . . . . . A . Information Radius and the J-Divergence . . . . . . . . . . . . . . B. Generalizations of R-Divergence . . . . . . . . . . . . . . . . . C. Generalizations of J-Divergence . . . . . . . . . . . . . . . . . VI . Generalized Entropies for Multivariate Probability Distributions . . . . . . . A . Entropy of Degree s for Multivariate Probability Distributions . . . . . . B. Unified ( r ,s)-Conditional Entropies . . . . . . . . . . . . . . . . VII . Applications to Statistical Pattern Recognition . . . . . . . . . . . . . A . Generalized Entropies. Distance Measures and Error Bounds . . . . . . . B. Generalized Jrnsen Difference Divergence Measures and Error Bounds . . . . C. Generalized Measure of Chernoff.Bhattacharya Distance. and the Probability of Error . . . . . . . . . . . . . . . . . . . . . . . . . . D . Generalizations of J-Divergence and the Probability of Error . . . . . . . Entropy Graph . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
.
328 329 329
330 331 332
333 333 336 339 342 3.52 3.53 3.59 3.59 360 364 368 36') 376 3x0 3x7 393 401 40.5 410 411
3'7
.
.
Copyright II IYXY by Acadcniic P r w Inc All righi, 01 reproduction in any form rcidrvcd 0- I2-011676-?
328
I N D E R JEET TANEJA
I. INTRODUCTION Information theory is a relatively new branch of mathematics that was made mathematically rigorious only in 1940s. The term infbrmution theory does not possess a unique definition. Broadly speaking, information theory deals with the study of problems concerning any system. This includes information processing, information storage, information retrieval, and decision making. In a narrow sense, information theory studies all theoretical problems connected with the transmission of information over communication channels. This includes the study of uncertainty (information) measure and various practical and economical methods of coding information for transmission. The first studies in this direction were undertaken by Nyquist in 1924 and 1928 and by Hartley in 1928, who recognized the logarithmic nature of the measure of information. In 1948, Shannon published a remarkable paper on the properties of information sources and of the communication channels used to transmit the outputs of these sources. Around the same time, Wiener (1948) also considered the communication situation and came up, independently, with results similar to those of Shannon. In the last 40 years, the literature on information theory has become quite voluminous. Apart from communication theory, it has found deep applications in many social, physical, and biological sciences, e.g., economics, accounting, language, statistics, physics. ecology, psychology, pattern recognition, fuzzy sets theory, computer sciences, etc. A key feature of Shannon information theory is that the term information can often be given a mathematical meaning as a numerically measurable quantity, on the basis of a probabilistic model, in such a way that the solutions of many important problems of information storage and transmission can be formulated in terms of this measure of the amount of information. This important measure has a very concrete operational interpretation: It roughly equals the minimum number of binary digits needed, on the average, to encode the message in question. The coding theorems of information theory provide such an overwhelming evidence for the adequacy of the Shannon information measure that to look for essentially different measures of information might appear to make no sense at all. Moreover, it has been shown by several authors, starting with Shannon (1948), that the measure of the amount of information is uniquely determined by some rather natural postulates. Still, all the evidence that the Shannon information measure is the only possible one is valid only within the restricted scope of coding problems considered by Shannon. As pointed out by Renyi (1961) in his fundamental paper on generalized information measures, in other sorts of problems other quantities
329
INFORMATION MEASURES AND THEIR APPLICATIONS
may serve just as well, or even better, as measures of information. This should be supported either by their operational significance or by a set of natural postulates characterizing them, or, preferably, by both. Thus the idea of generalized entropies arises in the literature. I t found its birth in Renyi (1961), who characterized a scalar parametric entropy as an entropy of order r, which includes Shannon entropy as a limiting case. In these notes we propose to discuss various generalized entropies and generalized divergence measures. Mainly, we have taken generalizations of Shannon’s entropy, directed divergence, J-divergence, and information radius. Applications of these generalized information measures to statistical pattern recognition are discussed. These studies have been conducted on the generalized information measures written in unified expressions. Most of the results presented are the author’s or joint contributions with the author. In part, these results were presented as seminars at the Universita di Salerno, Italy during 1983/1984; as a summer course at Universidade Federal do Rio de Janeiro, Brazil during January and February, 1987, and as a short course at the International Symposium on Information and Coding Theory, Campinas, Brazil during July 1987.
11. SHANNON’S ENTROPY AND ITS GENERALIZATIONS
This section deals with the Shannon’s entropy and its generalizations. The generalizations considered involve one and two scalar parameters. Some of these generalizations are written in a unified way. Some properties of unified entropy are studied. Some results connected with inequalities and entropy series are also specified. A . Shannon’s Entropy
Let n
A,
=
[P
= ( p I , p 2 , . . . , p n ) :p i >
0, i
=
1,2,. . . , i t ,
1 pi = 11,
i= 1
and n
A: = { P = ( p , , p 2 , . . . ,p,): p i 2 0, i
=
l,2,.. ,n, i= 1
pi = I ) ,
n22
be two sets of discrete finite (n-ary) probability distributions. Shannon (1948), first investigated and characterized, through certain
330
INDtiR JEET TANEJA
postulates, a measure of information given by
H(P)=
-
1 Pilogpi,
(1)
i= I
for all P = ( p I ,p Z , .. . ,p,) E At. For Equation ( 1 ) H ( P ) is known in the literature as Shannon’s entropy. Remarks. Throughout these notes it is understood that all the logarithms are in base 2 and O.log0 = 0. Also, we shall take pmax= max{ p l , p 2 , . . ., p ,I ~ .
Equation (1) is also known as a measure of uncertainty. It measures the amount of information contained in a distribution, i.e., the amount of uncertainty concerning the outcome of an experiment. It has been shown by several autors, starting with Shannon (1948), that the measure of the amount of information [Eq. (l)] is uniquely determined by rather natural postulates. I n other words, it arises naturally from statistical concepts (Ash, 1965). For a brief review refer to Aczel and Daroczy (1973, Mathai and Rathie (1975), and Taneja ( 1979). During its characterization, mainly three properties or postulates are considered by several authors, together or separately. These properties are given by:
I.
Additiuity. We can write
H(P*Q) = H ( P ) + f f ( Q ) , for all P = ( p , , p 2 , . . . , p n ) ~ 0A , ,Q = ( q l l q 2 r . . . r q m ) ~ A Eand , P*Q (PIqI ., PI Y m .,Pn41, Pnq2 )...(P n q m ) A t m . 2. Rrcursioity or hrunchiny. We can write 9
3 . .
=
9 . .
H(Pl?P2,...9P,) =
H(P1
+P2,P3,...9Pn)
for all P = ( p , , p 2 , . . . ,p,) E At. 3. Sum representution. We can write i= 1
for all P
= ( pl , p r . . . . , p n ) E A:,
where f ( p ) = -plogp, p
E
[O, 11.
B. Entropy of Order r
A systematic attempt to develop a generalization of Shannon’s entropy was carried out by Renyi (1961), who characterized an entropy of order r
INFORMATION MEASURES AND THEIR APPLICATIONS
33 1
given by
for all P = ( pl, p z , ...,p,) E A:, where r is a real parameter. We can easily verify that limr-l Hr(P)= H ( P ) . Campbell (1965), for the first time, has shown that the variable length version of the elementary coding theorem carries over to entropy of order r if one considers exponential averaging in the definition of average codeword length instead of the standard arithmetic procedure. Parker (1980) proved that a simple generalization of Huffman algorithm solves the problem of minimizing generalized exponential length and its increasing functions, which includes, in particular, a generalized length in terms of entropy of order r. Blumer (1982) considered the problem of minimizing redundancy of order r defined in terms of entropy of order r and obtained bounds sharper than that of Gallager (1978). Taneja (1984a; 1985b) extended the concept of exponentiated average codeword length of order r to the best one-to-one codes. For some other applications of entropy of order r, refer to Jelinek (1968a, 1968b), Jelinek and Schneider (1972), Csiszar (1974), Nath (1975), Arimoto (1975, 1976),Ben-Bassat and Raviv (1978),Kieffer (1979),Campbell (1985,1987), and Kapur (1983, 1986). Entropy of order r satisfies additivity but lacks recursivity as well as sum property. Renyi (1961)considered an additional axiom generalizing sum property that is generally known as quasi-linearity. Based on the same motivations of Renyi, later researchers ( Aczel and Daroczy, 1963; Varma, 1966; Kapur, 1967;and Rathie, 1970)generalized the entropy of order r by changing some of its postulates. These generalizations are given in Section 1I.F. C . Entropy of Degree s
For operational purposes, it seems more natural to consider, the slar expression Cl= pf as an information measure instead of RCnyi’s entropy of order r. So, Havrda and Charvat (1967) proposed the following entropy of degree s (3) for all P = (plrpz, . . . , p n ) E A:. In this case, we can also easily verify that lims+I H s ( P ) = H ( P ) . This quantity permits a simpler characterization (Havrda and Charvat, 1967). It lacks the additivity property but satisfies the recursivity of degree s
332
INDER JEET TANEJA
and the sum representation given by
P1 + P 2 > 0, and
respectively, for all P = ( p l , p 2 , . . .,p,) E At, where f,(p ) = (2l-' - 1)- [p s - p ] , s # 1, s > 0 for all p E [0, I ] . Taneja (1975) (refer also Sharma and Taneja, 1975, 1977) studied a generalization of H " ( P )involving two scalar parameters. Its exact expression, along with other entropies, is given in Section 1I.F. Some results on redundancy of degree s can be seen in Taneja (1986a). D. Entropy of Kind t
Arimoto (1971) considered a generalized f-entropy involving a real function .f' with some conditions. Being an example of this generalized f-entropy, Arimoto came up to a generalized entropy involving a real parameter, here we call it entropy of kind l , given by [ H ( P )= (21-1 - 11-1[(
2
i= 1
pq -
11,
t # 1, t > 0
(4)
for all P = ( p l , p z , . . . ,p , ) E A,". In this case, also, we can easily verify that lim,- , H ( P ) = H ( P ) . Arimoto's main motivation in considering generalized f-entropy was to prove some important results on decision theory connected to the Bayesian probability of error. This entropy of kind t is neither additive, nor recursiue, nor does it satisfy the sum representation. Thus, we see that out of the three generalized entropies, i.e., the entropy of order r, the entropy of degree s, and the entropy of kind t, the first one is additive, the second one is recursive and satisfies sum representation, while the third one does not satisfy any of these properties. However, all contained the common function Cl= p i , which makes them related as follows: Hr(P)= (1 - r)-l log[(2l-' =
for all P
-
1)-'Hr(P) +
11
+
r(1 - r)-1~og[(2r-'" - I ) ~ , ~ H ( P 11, )
= ( p 1 , p 2 , . . ,p , ) E
At.
r
z 1, r > o
INFORMATION MEASURES AND THEIR APPLICATIONS
333
E. Entropy uf Order r and Degree s Sharma and Mittal (1975) introduced and characterized two entropies called entropy qf order 1 and degree s and entropy of order r und degree s given by
and s- l / r - 1
H:(P) = ( 2 ' - s -
1)-1[(
1=
1p:)
-
11,
r # l , s # l,r>O,s>O,
(6)
respectively, for all P = ( p i ,p 2 , . . . ,p.) E A,". Sharma and Mittal's main motivation was to generalize the three entropies, Hr(P), H s ( P ) , and ,H(P). With this aim, they arrived at H;(P). H"P) reduces to H s ( P ) and , H ( P ) when r = s and r-' = r = (2 - s), respectively. H : ( P ) reduces to H",(P) and H,(P) when r -+ 1 and s -+ 1, respectively. Also, H",P) reduces to Shannon's entropy, H ( P ) ,when s -+ 1. Thus, we can see that the entropy of order r and degree s contain, either as a limiting case or as a particular case, Shannon's entropy, the entropy of order r, thc entropy of degree s, the entropy of kind t, and the entropy of order I and degree s. The entropies H"P) and H"P) are not additive, not recursive, and do not have sum representations. Before proceeding further, we shall present in the next section a list of most of the generalized entropies known in the literature. For convenient reference, the entropies listed above are also written again.
F. List of Generalized Entropies
.
For all P = ( p 1 , p 2 , . . , p n ) E A:, the following generalized entropies are known in the literature by their respective authors, starting with Shannon (1948). In some cases it is understood that P E An. By no means can we say that the list is complete. At the end of this chapter it is shown in a graphic way, how these entropies reduce to Shannon's case either in the limiting or in the particular case. Shunnon (1948)
334
INDER JEET TANEJA
RCnyi (1961)
A c z d and Dardczy (2963)
,
$J4(p) = (s - r ) - l log
r#s,r>O,s>O
i=l
Vurmu (1966)
Huurda and Charvat (1967)
&(P)
= (21-s
-
l)-l
c
p;
[iIl
-
]
1 ,
s # 1, s > 0
Belis and Guiusu (1968) PiwilogPi &(P) = - i = l " , Piwi i= 1
w i > 0 , i = 1 , 2 ,...,n
INFORMATION MEASURES AND THEIR APPLICATIONS
335
Ruthie ( I 970)
si 2 1 , i = 1,2,..., n , r # l , r > 0
Arimoto ( IY71)
&(P)
= (21-1
-
11-l[(
f p:">'
i=l
-
t
I],
# 1, t > 0
Sharmu and Mittul ( I 975) $ 1 3 ( ~ )=
-
I)-'
)-I], s- l / r - I
i= 1
Tuneju (1975) (refer also Sharma and Taneja, 1975, 1977)
s # kn, k = 0, 1,2,..., r
Picurd (1979)
i= t
>0
s#l,.s>0
336
INDER JEET TANEJA
r # 1,s# I,r>O,s>O where vi > 0, i = 1,2,. . . ,n and P = ( p l , p 2 , .. .,p,)
E
A,.
Ferreri (1980)
Sant 'anna and Taneja (1983) 423(p)
f pi);[gol
= -i =
sin sp,
1
i= 1
0<s <
2 sin -
sin sp,
sin sp,
2 sin -
2 sin -
.
and sinsp,
,
O<s
2 sin The last entropy, 4,,(P) does not reduce to Shannon's entropy but has some properties similar to Shannon's entropy (Sant'anna and Taneja, 1983). G. Unified (r, s)-Entropy
The reduction of H:(P) to ,H(P) is obtained by substituting r = l/t and 2 - t. As s > 0, this implies that 0 < t < 2. This means that ,H(P) is contained in H'(P) only for 0 < t < 2. To avoid this problem, let us relax the condition of the positivity of s, i.e., we shall take any s. Instead of studying the properties of the generalized entropies given in Eqs. (2)-(6) individually, our s
=
337
INFORMATION MEASURES AND THEIR APPLICATIONS
aim here is to study them in a unified way. This unification is as follows:
a;(P)=
(H"P), H;(P), Hf(P), [H(P),
I
r # 1, s # 1, r > 0, r = 1, s # 1, r # 1, s = 1, r > 0, r = 1, s = 1,
(7)
for all P = ( p l r p z ,. . . , p n )€A:, where H,'(P) is the same as H,(P) given in Eq. (2). From now onwards we shall use the notation H,'(P) instead of H,(P). The entropies H " ( P ) and , H ( P ) do not appear in the unified expression given in Eq. (7) because they are a particular case of H"P) and hence are already contained in it. According to the notations above, H s ( P ) given in Eq. (3) means H : ( P ) for r = s in H:(P). Hence forth, we shall consider this notation too. We call the unified expression (7), i.e., 8 ; ( P), as unified (r,s)-entropy. As specified before, our aim here is to study some important properties of the unified (r,s)-entropy. Before proceeding further, we shall first present some definitions and composition relations used in the subsequent sections. Definition I. A numerical function 8: A: -+ %?(reals) is concave in A: if for all P, U E A:, we have O(1P p U ) 2 W P ) pO(U),
+
+
where iL + p = 1, /1 > 0, p > 0. For the convex functions the above inequality is reversed. Definition 2. A numerical differentiable function 0: A," + 9 (reals) is pseudoconcave in A: if for all P, U E A:, we have
Vt)(P)(U- P ) 5 0 implies O(U) 5 8 ( P ) , where V represents the gradient operator. For the pseudoconvexity, we have
VO(P)(U - P ) 2 implies O(U) 2 H(P), for all P, U
E
A,".
Definition 3. A numerical differentiable function 0: A: concuve in A: if for all P, U E A:, we have
.+ 9 (reals) is
O ( U ) 2 O(P)implies VO(P)(U- P ) 2 0. For the quasiconvexiry we have Q U ) 5 O(P) implies VO(P)(U - P ) 5 0,
for all P. U
E
A:.
quusi-
338
INDER JEET TANEJA
Definition 4. Schur Concuvity. Before giving the definition of Schur concavity, first we shall define the concept of majorization. Majorizution. For all P = ( p I , p 2 , . . .,p,) E A: and U = ( u l , u 2 , .. .,u,) E A:, we say that P is majorized by U , i.e., P < U if m
with
or equivalently, there exists a doubly stochastic matrix
1,2,. . .,n with
=
(cik}, cik
2 0, i, k
=
C;= cik = 1 such that pi
C CikUk,
=
i = i , 2,..., n.
k=l
Schur Concuvity. A numerical function 8: A: + .JR (reals) is Srhuv conin A," if P < U , i.e., P is mujorized h y U in A," implies O(P) 2 d ( U ) for all m P, U E A:. For the Schur convexity, for all P, U E A:, we have
u "
P
< U implies H(P) 5 O(U). m
Definition 1 is already known in the literature. For definitions 2 and 3 refer to Mangasarian (1961).For definition 4 refer to Marshall and Olkin (1979)or to Hardy et al. (1934). Let y,(x) = (21 - 9 - 1)-"2" -s)x - 11, SZ1 (8) be a function defined for all x 2 0. Then we can write H"P)
=
H"P)
= 8SCH(P)I?
SS[Hf(P)I?
and for all P
.
= ( p I , p 2 , . ., p , ) E
A,".
The following proposition holds: Proposition 2.1.
We have (i) lim y,(x) = x. s+ 1
(ii) g,(x) 2 0 for all x 2 0 and any s.
INFORMATION MEASURES AND THEIR APPLICATIONS
339
(iii) gs is an increasing function of x. (iv) g s is a convex function of x for s < 1. (v) gs is a concave function of x for s > 1.
where
"4
=
(1 21
-A
41112 - 1
< 1,
s
< I,
1,
s = 1,
> 1,
s > 1,
=
Proof. Parts (i)-(v) are easy verifications. (vi) It follows from the known result (Hardy et al., 1934, pp. 106, Theorem 150),
lnv
5v
-
I,
u
2 0,
where we substitute v = 2" -'Ix. (vii) It follows from the result (Hardy et al., 1934, pp. 40, Theorem 42),
2 y(v
-
11,
S y ( v - I), u
2 0, where we substitute v
=
2' -s and y
7 2 1,
0 2 7 5 1, = x.
H . Analytic and Algebraic Properties of UniJied ( r ,s)-Entropy
In this subsection, we shall study some properties of unified ( r , s)-entropy given in Eq. (7). Some of these properties can be seen in Capocelli and Taneja (1985).Unless otherwise specified, it is understood that the results given below are true for all r > 0 and any s. Property 1. Nonnegativity. For all P Property 2. Continuity. For all P uous function of P.
. . ,p , ) E A:, 8 : ( P ) 2 0.
=(pI,p2,.
= ( p , , p 2 , ., . , p , , ) E
A:, & ( P ) is a contin-
340
INDER JEET TANEJA
Property 3. Symmetry, For all P = ( pl, p 2 , . . . ,p,) E A:, &(P) is a symmetric function of its arguments, i.e., 8S(~17~29..*3Pn) = &s(Pr(l),Pr(2),....Pr(n)),
where t is an arbitrary permutation of { 1,2,. ..,n). Property 4 . Normality.
as(*,4)= 1.
Property 6 . Limiting case. For all P
lim & ; ( P ) =
= ( p l ,p 2 , . . . ,p,) E
- l)-l[p;;:
A:, we have s # 1,
- 11,
r+cc
For s = 1, i.e., limr-w H , ' ( P ) = -logp,,, can be seen in Shiva et al. (1973). For s # 1, the result can be proved by the composition relation given in Eq. (9). Property 7. Monotonicity. For all P = ( pl, p z , . . .,p,) E A:, &:(P) is a decreasing function of r (s fixed).
For s = 1, refer to Shiva et al. (1 973). For s # 1, s > 0, refer to Sharma and Mittal (1975). While the extension to s 0 is an easy verification. Property 8. Concauity. For all P = (p1,p2,.. . , p , ) E A:, &;(P) is a concave function of P for ail (r,s) E rl,where ( r , . ~ r) :> 0 with s 2 r or s 2 2
-
(12)
It is already known that H z ( P )and , H ( P )are concave functions of P for all the values of the parameters. H,!(P)is a concave function of P for 0 < r < 1. The concavity of H ( P ) is already known. The concavity of H",P) for s > 1 is a
INFORMATION MEASURES AND THEIR APPLICATIONS
34 1
direct consequence of the concavity of H ( P ) because of relation (10) and propositions 2.1 (iii) and 2.l(v). The concavity of H"P) for s 2 2 - l/r, r # 1 , s # I , r > 0 can be seen in Van der Pyl (1977). The concavity of H"P) for s 2 r > 0 follows on the lines of proposition 4.2(ii), and it can be seen in Taneja (1988~).Thus combining all these we get the required result. Property 9. P seudoconcauity. For all P = ( p l , p z ,. . . ,p n ) E A:, &;( P ) is a pseudoconcave function of P for all r > 0 and for any s. Pseudoconcavity of H ( P ) follows from the concavity of H ( P ) , because every concave function is pseudoconcave. For pseudoconcavity of H,!( P ) , r # 1, Y > 0 refer to Ben-Bassat and Raviv (1978). The pseudoconcavity of H : ( P ) ( r # I , s # 1, r > 0) and H ; ( P )follows from proposition 2.l(iii) with the composition relations (9) and (lo), respectively. Property 10. Quasiconcavity. For all P = ( p I , p 2,..., p , ) quasiconcave function of P for all r > 0 and any s.
E
A:, a ; ( P ) is a
The proof follows from property 9, because every pseudoconcave function is quasiconcave (Mangasarian, 1961, pp. 143, Theorem 5). Property 11. Schur concavity. For all P = ( p l , p z , . . . ,p , ) Schur-concave function of P for all r > 0 and any s.
E
A:, &:( P ) is a
Prooj. In case of Shannon's entropy, H ( P ) , the result is already known (Csiszir and Korner, 1961). In view of relations (9), (lo), and proposition 2,l(iii), it is sufficient to prove the result only for Hf(P) ( r # 1, r > 0). Let P, U E A: such that
and
2 cit 2 cik
i= 1
We can write
=
k= 1
= I,
cik 2 0, i, k = 1,2,..., n.
342
INDER JEET TANEJA
We know that (Gallager, 1968, pp. 523)
for all i = 1,2,.. . , n. Taking log(.) on both sides of Eqs. (14), multiplying by ( 1 - r ) - ' ( r # l), and using the expression (1 3), we get H,'(P) 2 H,'(U),
r # 1, r > 0.
This completes the proof. Property 12. Maximality. For all P = ( p l , p 2 , . . .,p,) E A:, 8 s ( P ) is maximum when the probability distribution is uniform, i.e.,
for all r > 0 and any s. The proof follows from the property 1 I (Marshall and Olkin, 1979, pp. 7).
I.
INEQUALITIES AND
BOUNDSON GENERALIZED ENTROPIES
In this subsection we shall provide some inequalities involving generalized entropies. Upper and lower bounds on the unified (r,s)-entropy in terms of maximum probability are given. Some bounds on the entropy series in the case of Shannon's entropy are also given. Inequality I. we have
Inequalities among entropies. For all P = ( p p z ,. . .,p,)
5 H";P), 2 H",f), (ii) H:(P){ (iii) H ; ( P )
forallr# 1,r>0.
5 H(P), H(P),
r > 1,s # 1, 0 < r < I , s # 1, r > 1, 0 < r < 1,
2 N(s)* Hf(P),
5 N ( s ) . H,'(P),
s < 1, s > 1,
EAt,
INFORMATION MEASURES AND THEIR APPLICATIONS
343
where N ( s ) appearing in (5)and (iv) is given by Eq. (1 I). (v) H m {
I H,!(P), 5 H,!(P),
( H , ' ( P )2 I , s < 1) ( H , ' ( P )S I , s < I )
or or
(H,!(P) $ I , s > I), (H,!(P)>= I, s > 1)
for all r # 1, r > 0. (vi) H ; ( P ) { 2 H(P)' 5 H(P),
( H ( P )2 I, s < 1) ( H ( P )5 1, s < 1)
or or
( H ( P ) g 1, s > l), ( H ( P )2 1, s > 1).
The proof of parts (i) and (ii) follows from property 7. Parts (iii) and (iv) follows from proposition 2. I (vi). Parts (v) and (vi) follow from proposition 2.1 (vii).
For the proof for s = I , it., in case of entropy of order r, H : ( P ) ( r # 1, r > 0) refer to Shiva et al. (1973). The other cases follow from relations (9) and (lo), and from proposition 2.1(iii). Inequality 3 .
Bounds on b:(P). For all P
= ( p , , p 2 , .. ., p n ) E
A:, we have
...
(Ill)
1 1 - pmax5 -a;(P). 2
Inequalities 3.(i)and 34i) are true for all r > 0 and any s, while inequality 3.(iii) is true for (r,s) E r2,where 1 or pmax2 -, (s 2 2 or (r,s) E rl 2
for all r > 0.
Proof. (i) In case of Shannon's entropy, the left-hand side of the inequality can be proved by recursivity property and the right-hand side follows from Jensen's inequality (McEliece, 1977). In view of relations (9) and (lo), and proposition 2.l(iii), it is sufficient to prove the result only for the entropy of order r, i.e., we need to show that
n-o ( n - o) times We know that (Gallager, 1968, pp. 523)
,
1 5 o 5 n. (17)
INFORMATION MEASURES AND THEIR APPLICATIONS
345
n
where 1 5 o S n. Taking log(.) on both sides of Eq. (18), multiplying by - r ) - ' ( r # 1) and simplifying, we get the right-hand side of inequality (1 7). Let us now prove the left-hand side. Again, we know that (Gallager, 1968, PP. 523)
(1
Similarly,
where 1 2 0 5 n in Eqs. (29) and (20). Adding Eqs. (19) and (20), we get
Taking log( .) on both sides of Eq. (21), multiplying by (1 - r ) - ' ( r # I), and simplifying, we get the left-hand side of Eq. (17). (ii) Without loss of generality we can suppose that p n = pmax.Let 0 = n - I , then 1 - pmax= 1 - pn = 1 p i = 1 - C= : I pi. Making these substitutions in part (i), we get the required result. ( =
346
INDER JEET TANEJA
(iii) The proof of this part is divided into two parts. First part. In this part we will show that
for all r > 0. Consider a function
and
where
(2' -s - l)-l(s - l)p"-2, In2 p '
s = 1,
and
Also,
G>
[ - = ((1)
=o.
s # 1,
347
INFORMATION MEASURES AND THEIR APPLICATIONS
If s < 2, t c ( p ) < 0 for all p E (0,1]. This implies that the function [ , ( p ) is strictly concave and attains its single maximum at t : ( p ) = 0, i.e., when [2(1 - s)-'(2'-&- I)] 1i s - 2 , SZ1, s = 1.
p = L 2 2
Thus the only zeros of t , ( p ) are when p -m(s 5 1). Thus for s < 2, we have
1
LO,
Similarly for s > 2, we have
I
10, -
For s = 2, we have
+ 0,
(,(p)
+
1 ifO
LO,
U P )=
1/2 or p = 1. As p
1 ifO
10, -
5AP) =
=
0 < p 5 1.
& ( p ) = 0,
Finally,
We have an equality sign in Eq. (23) when s = 2, p = 1/2, or p replace p by pmax= max{p,,p2, . ..,p,,] in Eq. (23), we have
=
1 . If we
for all s 2, I / @ 5 pmax5 1/2 or s 2 2, pmax2 1/2. Expression (24) and property 6 together give
(25) where CrS, ( P ) = lim 8:(f'). r-
r
348
INDER JEET TANEJA
By property 7, we can write 0 < r < 00. €g(P)5 &;(f), Expressions (25) and (26) together give (22). Second part. In this part we shall show that
1 1 - Pmax 5 j €:(P),
1
2 -2'
(r, S ) E rl,
~ m a x
(27)
where rl is as given in Eq. (12). In order to prove this we shall use the following properties: (el) € : ( p , 1 - p ) is a continuous function of p E [0,1]. (ez) € ; ( p , 1
-
p ) is a concave function of p
(e4) S;(I , 0) = d';(O, 1)
= 0.
(es)
S &:(P),
- Pmax,Pmax)
P
E
[0, I] for (r,s)E rl.
EAt.
We know that the graph of 1 - max{p, 1 - p } , 0 5 p 5 1 contains two and between (i, 4)and (0,O). Thus using straight lines between (0,O) and (3,i) (e I )-(e4), we have 1 1 -max{p,l-p)~-b:(p,l-p), 2
max{p,l-p}Z-,
1 2
(r,s)Erl.
(28) Substituting max{ p , 1 - p } with max{ p , , p 2 , . . .,p,,} = pmaxin Eq. (28) and using (es) we get Eq. (27). Thus, combining Eqs. (22) and (27) we get the required result. Inequality 4 . Generulized Shannon's or Gibb's inequalities. For all P ( p 1 , p 2 ,..., p n ) E An and U = ( u l ,u 2 , .. .,u,,) E A,,, we have
S:(P) 5 ' € ; ( P ( (U ) ,
CI
=
1 and 2,
=
(29)
where
V : ( P l1 U ) =
(21 -s - l)-'C"M,(P I I U ) r p 1 - s - ,)-1[2'" I)II(PllU) -
-
11,
(1 - r)-l ~ o g C " ~ r ( P I I ~ ) I , WPII U ) ,
11,
r#l,s#l,r>O, r = l , s # 1, r # 1,s= 1,r>0, r = 1,s= 1
INFORMATION MEASURES AND THEIR APPLICATIONS
for LY
= 1
349
and 2, with
and n
H ( P J 1 U )= -
1 pi10gui.
i- 1
(30)
Proof. Nath (1975) and Van der Pyl ( I 978) proved the following inequalities: H,'(P) 5 "H,(PI(U),
r # 1, r > 0
(31)
for all P, U E An, LY = 1 and 2, where
and
In the limiting case we have lim 'H,(PI/U) = lim 'H,(PII U ) = H(P11 U ) , r+ 1
r-I
where H ( P 1 J U )given in Eq. (30) is the well-known inaccuracy measure (Kerridge, 1961). In this case, we know that
H ( P ) 5 H(PI1U ) ( 34) for all P, U E An is the well-known Shannon's or Gibb's inequality. The remaining part of the proof follows from relations (9) and (lo), and proposition 2. I(iii) applied to Eqs. (31) and (34).
350
INDER JEET TANEJA
for all P, U E A,,, where 'Hr(PII V )and *H,(PII V )are given in Eqs. (32) and (33), respectively, and Df(PIlU)=(l-r)-'log
)
cpiu,'-' ,
(i11
r#l,r>O
is the directed divergence of order r (Renyi, 1961) given in Section IV.
Proof. We know that [Van der Pyl(1978)l
i:PI
i= 1
Taking log(-) on both sides of Eq. (36), multiplying by (1 - r)-'(r # l), and simplifying we get the required result. Inequality 5. Bounds on the entropy series. Let Pa, = ( p l ,p z , . . .) be a sequence of probability distribution such that pn 2 0, n 2 1, pn = 1, with pn 2 pn+ n. It is well known (Wyner, (1973) that the entropy series N ( P ) given by
x:=,
+
m
HRm) =
-
1 PnlogPnr
n= I
converges if and only if the series
converges. Moreover, the following inequalities (Capocelli, Santis, and Taneja, 1988) hold:
(0 H ( e J 2 S(P,) 2 0, (ii) H ( P 3 2 UP,),
INFORMATION MEASURES AND THEIR APPLICATIONS
(iii) H ( f , )
35 1
5 W k ( f x )k, = 1, 2,...,
where
and
+ 0.766k + 8.531 is a constant independent of the probability distribution f,, and for x 2 0, i = 1,2, .... log0 x
= x,
log'x
=
log*x
=
and
'
if log'- x 2 1 otherwise.
log(l0g'- x),
i
0, O < X S l logx + log*(logx),
x >1
i.e.,
log* x
= logx
+ log(l0gx) + log[log(logx)] + .. .
with addends all positive. (v)
lim [ L ( f , ) - Wl(f,)]
=
a.
s(P)- r
(vi) lim [Wk(fx)- Wk+l(P,)] = m. S(P)+ n
(vii)
lim [W,(P,)
s(P)- x
-
V(Pm)]= m.
352
INDER JEET TANEJA
111. GENERALIZED DISTANCE MEASURES
r > 0 plays a n important role in the enThe quantity (Z:= tropy of order r and degree s. Let us write it in a simplified form, given by
for all P = ( p l , p 2 , . . .,pn) E A:. The quantity G;(P) given in Eq. (37) is either called the generalized distance measure (Boekee and Van der Lubbe, 1979; Capocelli et al., 1985) or the generalized certainty measure (Van der Lubbe et al., 1984). Another generalized distance measure considered by Capocelli et al. (1985) is given by
T;(p)=[g] 1/r-p
,
r # p, r L 0 , p 2 0
(38)
i= 1
for all P = ( p 1 , p 2 , .. . , p , ) E A,". The quantities (37) and (38), in particular, contain the distance measures considered by Trouborst et al. (1974), Gyorfi and Nemetz (1975), Devijver ( 1974), and Vajda (1 968). The generalized distance measures (37) and (38) satisfy some properties (Capocelli et al., 1985) given in the following two propositions: For all P = ( p l , p 2 , .. . ,p , ) E A,", we have
Proposition 3.1.
( el )(i) Gf(P) is a convex function of P for r > 1, rp 2 1 or 0 < r < 1, p < 0.
(ii) G,P(P)is a concave function of P for 0 < r < 1, p > 0, rp i1. (e2)(i) G;(P) is a decreasing function of r ( p fixed and p > 0).
(ii) G f ( P ) is an increasing function of r ( p fixed and p < 0). (iii) G;(P) is a decreasing function of p (r fixed and r > 1). (iv) G;(P) is an increasing function of p ( r fixed and 0 < r < 1).
(e3) (i) Gf(1 - Pmax, Pmax) 5 G ; ( p )
O
O(orr>l,p
INFORMATION MEASURES AND THEIR APPLICATIONS
353
(ii) Gf(l - P m a x , Pmax) 2 Gf(P)
r>l,p>O(orO
= ( p I , p 2 , . . ,pn)E An, we
have
(el)(i) T f ( P )is an increasing function of r ( p fixed). (ii) T;(P) is an increasing function of p ( r fixed). (e2) (i) T;(P) S
Pmax.
Applications of these properties of G;(P) and Tf(P) in obtaining bounds on the probability of error are given in propositions 7.2 and 7.3, respectively.
IV. GENERALIZED MEASURES OF DIRECTED DIVERGENCE Kullback and Leibler (1951) first introduced a measure of information between the two probability distributions as
for all P, U E A,,, Equation (39) is known in the literature as a function of discrimindon, relatice information, or directed divergence between the distributions. Renyi (1961) first presented a parametric generalization of as
D,!(PIIU)= (r - l)-'log for all P, U
E A,,.
,
r # 1, r > 0
(40)
Another well-known generalization of Eq. (39) is given by
354
I N D E R JEET TANEJA
for all P, U
E
A,. The following limits are easy t o check: limD,!(PIIU) = limD:(PI[U) r-
= D(P1IU).
s- 1
I
Sharma and Mittal(1977) studied the following two generalizations: Dsl(P((U)= (1
and
-
2 1 - s ) ) - L [ 2 ( S - ' ) D-( P11, ~ ~ U ) s # 1,
[(il
Ds(PI(U ) = (1 - 2'
p;u! -r)'-"'-'
-
11,
(42)
s # 1, r # 1, r > 0
(43) for all P, U
E
An. Again, we can easily verify the following limits:
limDs(PIIU) = D,'(PIIU); S'I
When r
=s
limD:(PIIU) = D(PIIU). s-+ 1
in Eq. (43), we have D:(PIIU) = D'(PIIU).
As in the case of generalized entropies, here we can also write D:(pIIU)
=
qs(~,!(PIIU)),
The following proposition holds: Proposition 4.1.
T h e following are true: (i) lim q,,(x) = x for all x s-
0.
1
(ii) qs(x)2 0 for all x 2 0 and any s.
(iii) qs(x)is an increasing function of x. (iv) qs(x)is a convex function of x for s ) 1.
(v) q,(x) is a concave function of x for s ( 1.
(44)
INFORMATION MEASURES A N D THEIR APPLICATIONS
355
Proqf. Parts (i)-(v) are easy verifications.
(vi) We know that
Substituting u = 2(’-lhx,in the above inequality, we get
Multiplying both sides of Eq. (48) by ( 1 get the required result.
-
2’ -’)- (s # 1)and using Eq. (1 1 ) we
Let us put the measures given in Eqs. (39), (40), (41), and (42) in a unified way as follows:
(Di(PIIU),
r # I , s # 1, r > 0,
for all P, U E A,,. The measure D:(PIIU) given in Eq. (41) is not written in a unified expression (49) because it is already contained in D;(PII U )as a particular case when r = s. We call the measure S i ( P I I U ) a unified ( r ,s)-direcfed divergence.
Remurks. The definitions of D”,PIIU) and DS(PIIU) given initially by Sharma and Mittal (1979) involve s > 0, but here, in our study, we have relaxed this condition. The constant initially considered was (2’-’ - l ) - ’ x (s # I). Here we have taken (1 - 2’-‘)-’(.s # 1) to simplify our study on applications of the measures given in Section VII. To avoid difficulties that arises in the measures when some of the probabilities become zero, we have taken A, instead of A t . For a simplified characterization of D;(PllU) refer to Taneja (1984b). More details on the measures D ( P ( ( U ) ,Df(PIIU), and Di(PIIU)can be seen in Mathai and Rathie (1975) and Taneja (1979). Some important properties of F ; ( P ( lU ) are summarized in the following proposition. Proposition 4.2. For all P, U E A,, the unified (r, s)-directed divergence, .Fi(PlI U ) satisfies the following:
(i) .S:(PII U ) 2 0 for all r > 0 and any s.
356
INDER JEET TANEJA
(ii) F : ( P 1 1 U ) is a convex function of the pair (P, U ) E An x An for all sLr>0. (iii) 9 i ( PI I U ) is an increasing function of r (s fixed).
(v) Let
and
where n
n
i= 1
k= 1
C cik = 1 cik = 1,
cik2 0,
i, k = 1,2,... , n .
Then
Proof. (i) In view of relations (44), (45), and proposition 4.l(iii), it is sufficient to prove the nonnegativity of D,'(P 11 U ) .However, the nonnegativity of D,'(P11 U ) is already known in the literature (Mathai and Rathie, 1975).
INFORMATION MEASURES AND THEIR APPLICATIONS
357
(ii) Proof of this part is based on the following two lemmas (Taneja, 1986b). Lemma 4.1. The quantity n
3Mr(PIIU)=
c p;u:
(52)
-I,
i= 1
is a convex function of the pair ( P , U ) E An x An for r > 1 or r < 0 and is concave for 0 < r < 1. Lemma 4.2. The function [(x) concave for 0 < w < I .
= x"
is convex for w > 1 or w < 0 and is
The proof of Lemma 4.1 can be seen in Ferentinos and Papaioannou (1983) and in Csiszar (1972). Lemma 4.2 is already known in the literature.
Proof sf (ii). Let P, = ( p Z lp, a z , . . . ,p X n E ) An and U, = (uO1,ua2,. . . ,uun)E A,, 2. From Lemma 4.1, we have
a = 1 and
E.,
1 p;pf;' + jL,C ~ ; ~ u : ; ' n
n
i=1
i= 1
where > 0, Ebz > 0, and i1 + EL, tion, we can write
=
1. By the concavity of logarithmic func-
Inequality (54) is obtained from inequality (53), where we have used the fact that the logarithmic function is increasing. Multiplying by ( r - l)-'(r # 1) on both sides of (54), we get
+
E.,D~(PlllU1) l.,D~(P2IlU2) 2 D,!(EL,P,+ izP2llE.,U1 O
+ 3-2U2),
358
INDER JEET TANEJA
Thisgivestheconvexityof D,!(PIIU)forO < r < 1.Theconvexityof D;(PIIU) for s > 1 is an immediate consequence of the convexity of D(P(1U).The convexity of D(PIIU) is well known in the literature (Csiszar and Korner, 1981). By the use of Lemma 4.2 and inequalities (53), we have
Subtracting I , multiplying by (1 - 2'-')-l(s # 1 ) in (55), and simplifying we get the convexity of D:(P I I U ) ,i.e.,
2 D%lPl + ~ 2 P 2 1 1 ~ * + 1 ~22U2), 1
+
~ . l ~ ; ( ~ l l l n2Ds(P2llu2i,) ~ l )
for all s 2 r > 0, r # 1, s # 1. Finally, combining all these results we get the convexity of g > ( PIlU) in A, x Al, for s 2 r > 0. (iii) In view of proposition 4.l(iii) and the composition relation (44), it is sufficient to prove that D,!(Pl[U)is an increasing function of r. In order to prove this, let us write
for all P, U
E
All. Let ui/pi = wi, i
=
I , 2 , . . . ,n. Then from Eq. (56) we can write
where P = ( P I ,p 2 , .. . , p n )E A, and W = ( w l , w 2 , . . . ,W J , wi > 0, i = 1,2,. . . ,n. Wc know (Hardy et al., 1934, pp. 15, Theorem 5 ) that the function given in Eq. (56) is increasing in r. Since log(.) is an increasing function, this proves the required result. (iv) I t can be proved on similar lines as property 3(i). (v ) It follows on the lines of property 11.
INFORMATION MEASURES AND THEIR APPLICATIONS
359
( v i ) The inequalities given in ( e l )and (el) are due to proposition 4.1 (vi) and the composition relations (44)and (45),respectively. The inequalities given in (e3)and (e4)are due to part (iii).
V. GENERALIZED DIVERGENCE MEASURES
This section deals with the generalizations of two different kinds of divergence measures. One is known as the j n ~ i ~ ~ ~ aradius f j o n or the Jensen diflerence divergence measure (Sibson, 1969) and the other is well known as J divergence (Kullback and Leibler, 195 1; Jeffreys, 1946). A. Information Radius and the J-Divergmce
By using the concavity of Shannon's entropy, we can write H ( P )+ 2 H ( U )5 H ( T )
(57)
for all P, U E: A,,. The difference
for all P, U E A,, is known as the information radius (Sibson, 1969) or Jensen difference divergence measure (Burbea and Rao, 1982).For simplicity, we shall call R(PI1 U ) , the R-divergence. Another measure of divergence known in the literature is J-divergence (Kullback and Leibier, 1951; Jeffreys, 1946) and is given by
J ( P I I U )= D(PIIU) + D(UIIP) =
c (pi n
i= 1
-
u;)log-Pi ui
for all P, U E A,,, where D ( P ( I U )is as given in Eq. (39). By simple calculations, we can write
for all P, U
E
A,,.
(59)
360
INDER JEET TANEJA
B. Generalizations of R-Divergence
In this subsection, we shall present three different ways to generalize R-divergence, i.e., the Jensen difference divergence measure given in Eq. (58). These generalizations are as follows. Taking € ; ( P ) in place of H ( P ) in Eq. (58), we have 1w-;(PJJ U ) = €;(+ P
+U
-
W"P)
+ SS(U) 2
for all P, U E A,,, where € ; ( P ) is the unified (r,s)-entropy given in Eq. (7). More clearly, we have r # 1, s # 1 , r > O5
('RXPIIU),
for all P, U
E
An, where
'R:(PI I V )
When r
= s in
= ( 1 - 2'
')-
{5 [(,k r=l
s-ljr-l
p;)
+
(
u;]
Iir
'1
Eq. (62), we have
An alternative way to generalize R(PIIU) is to replace D(PIIU) by
,F;(PllV) in Eq. (60). Then we get 2
f
1
+
: ( P l l u ) = j [ 3 " : P l ( U ) .F;(UiiP)]
INFORMATION MEASURES AND THEIR APPLICATIONS
for all P, U
E
36 I
A,,, where 9 ; ( P l l V )is given in Eq. (49). More clearly, we have
r = I,s= 1
for all P, U
E
A,,, where
r # 1,s # l , r > 0
s # 1,
When r
=s
in Eq. (67), we have
s f 1. s > 0.
There is also a third way to generalize the R-divergence similar to Eqs. (67) and (69) based on an expression given in Eq. (70).These generalizations are as follows:
1'
# 1, s # 1, r > 0,
r # I,r>O.
362
INDER JEET TANEJA
The following limits hold: lim 3R";PllU) = 3R:(PIIU);
lim3R:(PI(U) = 3 R ; ( P l l U )= 'R;(PIIU),
s- 1
r- I
where ' R i ( P J J U )is given in Eq. (63). When r
= s in
Eq. (71), we have
3R:(PllU)= *R;(PIIU),
where 2Ri(PllU)is as given in Eq. (70). The last generalizations can be unified as follows:
for all P, U
E
An.
Remarks. The generalized measures given in Eqs. (61), (66), and (73) are the author's contributions and are presented here for the first time, except Eqs. (64) (Rao, 1982) and (65) (Burbea and Rao, 1982).The measure given in Eq. (70)can be seen in Taneja ( I 988a). The following proposition holds: Proposition 5.1.
For all P, U (i)
E
A,, we have
'..I.'':(PIIU) 2 0 for ( r , s )E
rl,
where I-, is given by Eq. (12). (ii) 'Y ",PllU) 2 0 for all r > 0 and any s. (iii) 3"1':(PIIU)2 0 for all r > 0 and any s.
(74)
Proof. (i) This follows from the concavity of € s ( P ) ( P E A,,) for all (r,s) E r, given in property 8. (ii) This follows by the nonnegativity of Fi((PllU) given in proposition 4.2(i). (iii) We can write
INFORMATION MEASURES AND THEIR APPLICATIONS
363
and where qs is as given in Eq. (46). In view of relations (75) and (76), and proposition 4.l(iii), it is sufficient to prove the nonnegativity of 3R:(Pl/U)(r # 1, r > 0) because the nonnegativity of R(PIIU)is obvious from Eq. (57). Let us now prove the nonnegativity of 3 ~ ; ( ~ l l By Lemma 4.2, we can write
v).
for all i = 1,2,.. . ,n, P = ( p l , p 2 , .. .,p,) E A, and U = ( u 1 , u 2 , .. . ,u,) E A,. Multiplying Eq. (77) by [ ( p i + ui)/2] and summing over all i = 1,2,.. . ,n, we get
Taking log(-) on both sides of Eq. (78) and multiplying by ( r - l)-'(r # I), we get the required result. (iv) Again using Lemma 4.2, we can write
s-I r- 1
-21
or
1
s-
r-l
(79)
< 0.
Subtracting 2 from both sides of Eq. (79), multiplying by (1 - 2l -')-'(s and simplifying we get
# l),
364
INDER JEET TANEJA
for all r > 0. Using the concavity property of the logarithmic function we can write
for any r > 0. Multiplying Eq. (81) by ( r - l)-'(r # l), we get
s
~ R ; ( P I I U ) , > 1, 'R:(PlIU){ 2 3 R : ( ~ l I ~ ) , 0 < r < 1.
In a similar way we can prove that
{:
~ R ; ( P ~ I U ) , o < s < I, 'RS(PI I U ) = 3R:(PIIU), s > 1.
Combining Eqs. (80)-(83), we get the required result.
C . Generalizations of J-Divergence In this subsection we shall present two different ways to generalize the J-divergence given in Eq. (59) involving one and two scalar parameters. The generalizations involving one scalar parameter (Rathie and Sheng, 1981; Bubea and Rao, 1982; Taneja, 1983; Burbea, 1984) are given by
J:(PIIU) = (1 - 2l-')-'
[
+
p;.i'-s
c n
pi'-";
-
i= 1
s # l,S>O
1
2 , (84)
and 2Jf(PllU)= ( r
-
1)-'210g
2
9
(86)
r#l,r>O
for all P, U
E
A,,. We can easily verify that
limJ:(PIIU)
s- 1
= lim r-
1
'J:(PIIU) = 21im2J,'(PIIU)= J(PIIU). r-1
INFORMATION MEASURES AND THEIR APPLICATIONS
365
The generalizations involving two scalar parameters considered by Taneja given by
( 1983) are
s-
Ijr- 1
and
The following limits are easy to verify:
We can also write
where D f ( P I I U ) and qs are given by Eqs. (40) and (46), respectively. Also
and
366
INDER JEET TANEJA
Both the generalizations of J-divergence involving one and two scalar parameters can be unified in the following way:
for all P, U E A,,, and a = 1 and 2. The following proposition holds: Proposition 5.2.
For all P, U
E
A,,, we have
(i) “%‘f:(PIIU ) 2 0 ( a = 1 and 2) for all r > 0 and any s. (ii) “W-;(PIIU ) ( a = 1 and 2) are convex functions of the pair of distributions (P, U ) E A,, x A,, for all s 2 r > 0. (iii) Proof. (i) In view of proposition 4.2(i) and relation (89), the nonnegativity of ‘J:(P 11 U ) is clear. In view of relation (90), it is sufficient to prove the nonnegativity of ’J,‘(PI I U ) given in Eq. (86). Its proof is as follows: By Lemma 4.2, we can write
i.e.,
Taking log(.) on both sides of Eq. (93), multiplying by (r - l)-’(r # l), and simplifying we get
INFORMATION MEASURES AND THEIR APPLICATIONS
367
(ii) It can be proved on lines similar to proposition 4.2(ii), where instead of using Lemma 4.1, we use the fact that the function CY=,(plu!-' p! -ru,! '), r # 1 is convex in the pair ( P , U ) E A,, x A, for r > 1 or r < 0 and is concave for 0 < r < I . (iii) Again by the use of Lemma 4.2, we have
+
for all P = ( p 1 , p 2 , .. . ,p,) E A,, and U = ( u l ,u 2 , .. . , u,) E A,,. Subtracting 2 on both sides of Eq. (94), multiplying by ( 1 - 2 ' - ' ) - ' ( s # I), and simplifying we get
for all P, U E A,, and r > 0, r # 1. Using the concavity of the logarithmic function, we can write
(96) Multiplying Eq. (96) by ( r - I ) - ' ( r # l), we obtain (97 1
In a similar way we can show that
Combining Eqs. (95)-(98), we get the required result. For statistical applications of the measures given in Eq. (91) refer to Taneja (1987).
368
INDER JEET TANEJA
VI. GENERALIZED ENTROPIES FOR MULTIVARIATE DISTRIBUTIONS PROBABILITY The idea of entropy measure needs to be developed for multivariate probability distributions, in particular, for bivariate cases, especially in the problems of communication that require analysis of messages sent over a channel and received at the other end. The same is also required in the bounding Bayesian probability of error. In ordcr to develop this idea, let us consider two discrete finite random variables X = {1,2,..., n ) and Y = { 1,2,.. .,m}or a joint experiment ( X , Y ) with joint and individual (marginal) probabilities denoted by a
aij = P r { X = i, Y = j } , 0 A = ( a l , , a l z , ... , a i m , .. . , a n i , a n 2 ,. . , a n m )E An,,
0
pi = Pr{X
=
P
if,
qj=Pr{Y=j), for all i = 1,2,. . . ,n; j X = i is denoted by a
=
bjli = P r { Y
= ( p l , p z , .. . , p n ) E
At,
and
Q=(q1,q2,...,qrn)~A;
1,2,. . .,m.The conditional probability of Y =j
IX
=i),
=j
given
Bi = ( b , , i ,bzl,,. . ., bmli)E A:
for all i = 1,2,. . . ,n; j = 1,2,. . . ,m. Similarly, the conditional probability of X = i given Y = j is denoted by 0
silj= P r { X
= iI
Y =j } ,
Bj = ( b l l j , b z l,..., j bnlj) E A:
f o r a l l i = 1,2,..., n ; j = 1,2,..., m. Let us also denote, P*Q
= ( ~ 1 4 1~71
4 2’ . . > ~ 3
1 q 3m. .
., P.419.. .,P n q m ) E A f m .
The following relations are well known in the literature:
f o r a l l i = 1,2,..., n ; j = 1,2,..., m. If X and Y are independent random variables, then i = 1,2,..., n ; j = 1,2,..., m. aij=piqj, Based on the above notations, the joint and individual unified (r, s)-entropies can be written as: &;(x, Y )= €;(A), € : ( X ) = a;(P),
INFORMATION MEASURES AND THEIR APPLICATIONS
369
and
&XY) = gXQ), where €: is the unified (r,s)-entropy given in Eq. (7). Also, we can easily write &:(X, Y, Z ) , etc. Similarly, the individual conditional unified (r,s)-entropies are given by
& c ; ’ ( Y (= X i) = &(B,),
i = 1,2 ,..., n
&s(X I Y = , j )
j = 1,2,.
and = &:(El,),
. . ,m.
There is no unique way to define the conditional generalized entropies. I t has been defined in different ways by different authors. We shall specify here five different ways to define conditional generalized entropies. One is restricted to only entropies of degree s given in Eq. (3), and the other four are for the unified (r,s)-entropy given in Eq. (7). We shall observe that these different approaches in the limiting case reduce to the well-known Shannon’s conditional entropy. These five approaches have been divided in two subsections: the first approach is only for the entropy of degree s and the second approach is for the unified entropy. Henceforth, unless otherwise specified, the letters X , Y , Z , . . . ,XI, X , , . . . etc. will represent the discrete finite random variables. A . Entropy
01’ Degree s for Multiuariate Probability Distributions
In this subsection, we shall define a conditional entropy of degrees, which in the limiting case contains Shannon’s conditional entropy. This definition was first considered by Daroczy (1970) and satisfies many of the properties of Shannon’s case. In order to simplify the results, let us unify these two entropies in the following way: s # l,s>O, C S ( P )= -
f p,logp,,
I =
for all P
s=1
1
= ( p I , p 2 . . . , p n )E
A:.
Define CYX 1 Y ) =
m
j= I
4”C”(X
1 Y = /).
s > 0,
370
I N D E R J E E T TANEJA
where C S ( XI Y
s # 1 , s > 0,
, =j
)=
(99) -
C bil,loghi,j.
s = 1.
i= 1
In a similar way, we can define Cs(Y I X). Define
C"(X,Y I Z ) =
1 C?C"X, Y I z = !), f= Y
1
where
and C~
=
Pr{Z
=
P ) for all / = 1,2,.. . ,v.
Also define
l s ( xA Y ) = c'(x)- cs(xIY),
9
>0
and
where
and
Cs(XIY
=
j,Z = /)
=
f o r a l l j = 1,2,..., m a n d / = 1,2,..., v. The measure l S ( X A Y )is known in the literature as the mutual information of degree s. Based on the definitions given above, the following propositions hold (Taneja, 1988b).
INFORMATION MEASURES A N D THEIR APPLICATIONS
37 1
Proposition 6.1. For all s > 0, we have
+
(i) C ' ( x , Y ) = C s ( X ) C'(Y 1 X ) = c?( Y)
+ C'(X 1 Y).
(ii) C s ( X )= C ' ( x 1 Y ) + rs(xA Y ) . (iii) C s ( X ,Y, Z ) = C ' ( X ) + Cs( Y ,Z I X )
+ C S ( Z x, ] Y) = C S ( X )+ CS( Y 1 X ) + cyz 1 x, Y ) . (iv) c'(x,,x ~ ,. .. ,x,) = c'(X, + C'(X2 1 X , ) + C'(X3 I X , , X 2 ) = C"X, Y )
+ . . ' + C " X , I X , , X , ,..., X , ~ I ) = 1 CS(Xi1 X I ,x2,. .. , x i d
1).
1=1
(v) C'( Y, z I X ) = CS(Y I X ) + cyz I x , Y).
1 Z ) = c'(X 1 Y, Z ) + P ( x Y 1 z). (vii) I ' ( X , Y Z ) = Is(X Z ) + Is( Y Z I X ) . (viii) f s ( X Y) = C s ( X )+ C'(Y) C " ( X ,Y ) = I'(Y X ) . (ix) C s ( X )+ C'( Y ) + C s ( Z )- C ' ( X , Y, Z ) = rs(xA Y I Z ) + ~ ' (A xY). (X) I'(X A z ) f { '(x A Y I z )= I'(x Y) + {'(x z 1 Y ) . (xi) I ~ ( XX ~, A, X , I X,) = I " ( X , X , I X,) + I s ( X 2 X 3 I X , , X,). (vi) c'(X
A
A
A
A
A
A
-
A
A
A
A
The proof of these properties is a simple verification. Proposition 6.2. For all s 2 1, we have (i)
rs(xA
(ii) I s ( X
A
Y) 2 0, i.e., cs(x1 Y) 5 c ' ( X ) . Y I Z ) 2 0, i.e., C ' ( x I Y, Z ) 5 C s ( X I Z ) .
The proof of part (i) can be seen in Daroczy (1970), and part (ii) follows form part (i).
372
INDER JEET TANEJA
(iv) C s ( Y J X + ) C s ( Z I X )2 C s ( Y , z [ X ) , (V)
I " ( x ,Y
A
z )2 Is(Y A zlx),
S
s
2 1.
2 1.
(vi) If C s ( X , , X 2 )# 0, then C S ( XI Y ) cyx, Y )
+
I
CS(Y Z ) > C S ( X12)
s
C S ( Y , Z )= C S ( X , Z ) '
2 1.
Proof. (i) This is obvious from proposition 6.1(i). (ii) This is obvious from proposition 6.l(v). (iii) For all s 2 1, we have
Cs(XI Y )
+ C2(Y I Z ) 2 Cs(X I Y ,Z ) + Cs(Y 12) =
(proposition 6.2(iii))
C"X, Y I Z ) 2 C S ( XI Z).
(iv) For all s 2 1 , we have C S ( Y1 X )
+ C S ( ZI X ) 2 C S ( YI x , Z ) + CS(ZI X ) , =
Cs(Y,Z I X ) (proposition 6.1(v)).
(v) For all s 2 1, we have
rs(Y A Z l x ) 5 Is(xA
-
+
z )+ I S ( Y A zlx)= IS(x,Y A z ) .
C S ( XI Y ) CS(Y I Z ) C S ( XI Y ) CS(Y I Z ) CS(Z)'
+
+
d t ( X , Y ) = d",x,y , C"X, Y )'
C"X, Y ) # 0,
INFORMATION MEASURES AND THEIR APPLICATIONS
373
and
Then for all u = 1,2, and 3, we have
(i) d",X, Y ) 2 0,
d " , X , X ) = 0, s > 0.
(ii) d:(X, Y ) = &( Y, x),
(iii) d",X, Y )
s > 0.
+ d",Y,Z) 2 d",X,Z),
s 2 1.
This means that for s 2 1, d:(X, Y )(u = 1,2, and 3) form pseudometric spaces among the random variables. Proof. For u = 1 and 2, the proof follows from proposition 6.3(iii) and (iv), respectively. For s = 1, when ct = 1,2, and 3 refer to Horibe (1973, 1985). Let us prove the result for u = 3. We will prove this in three different cases. Cuse I . When C s ( X )2 Cs(Y) 2 C s ( Z )> 0, we have
Cuse 2. When C s ( X )2 C s ( Z )2 Cs(Y ) > 0, we have
374
I N D E R J E E T TANEJA
Case 3. When C s ( Z )2 C s ( X )2 C s ( Y )> 0, we have
This completes the proof of the proposition.
Proof. We have (i) d i ( X , Y ) = C s ( X 1 Y ) = 2C“X,
+ Cs(Y1 X ) ,
Y ) - C S ( X )- C S ( Y i ,
L(
C S ( X )- C S ( Y ) , CS(Y ) - C S ( X ) ,
=
ICS(X)- C S ( Y ) ( ,
s > 0.
(ii) For s 2 1, we know that
C s ( X 12)5 C s ( X 1 Y ) C S ( X1 Z ) - CY(Y 1 Z ) 5
+ Cs(Y I Z ) , i.e.,
cyx 1 Y ) ,
5 C”(X I Y ) + CS(YI X ) , = d ; ( X , Y ) .
(100)
Since C s ( X , I XI) 2 0, this gives
C S ( X ,1 X 2 ) 2 CS(X,)- CS(X,).
Expressions (106)and (107) together give the required result.
376
INDER JEET TANEJA
B. Unijied (r,s)-Conditional Entropies In the previous subsection, the definition of Cs(X I Y) is based on the wellknown property of Shannon's entropy, i.e.,it is especially defined to satisfy the following property:
C"X, Y)
=
CS(Y)
+ CS(XI Y),
s
> 0.
(108)
Some authors (Sahoo, 1983; Van der Lubbe et al., 1987) extended Eq. (108) for other entropies, but it didn't give a simplified expression, as in the case of C s ( X 1 Y )given in Eq. (99). In this subsection, we shall use four different ways to define the unijied (r,s)-conditional entropies. When s = 1 in Eq. (99), we have n
H(X(Y ) =
C qjH(X(Y =
(109)
j),
j =1
where n
H(X1 Y =j ) = -
I
1birjlogbilj,
i= 1
j = 1,2,. . , ,m.
Let us replace H(X Y = j ) given in Eq. (99) by the unified (r,s)-conditional (individual)entropy &:(X 1 Y = j ) ( j = 1,2,.. . ,m).Then we have
'&:(X 1 Y ) =
m
C qj&(X 1 Y = j )
(110)
i= 1
for all r > 0 and any s. More clearly, we have the following individual expressions: s-l/r-l
s # l , r # 1,r>0,
Y ) = (2'-l - l)-'
qj [jI1
1 b$ ) - 1 ],
(iI1
t # 1 , t > 0.
INFORMATION MEASURES AND THEIR APPLICATIONS
377
We shall now use the expressions given in Eqs. (1 14)and (1 15) as the basis for writing an alternative way of defining unified (r, s)-conditional entropies. Let us define
'H,!(X 1 Y ) = ( 1
-
r ) - I log
c qj
{jml
]
bilj , i l l
r # 1, r > 0
( 1 16)
and
The definition of 2H,'(X 1 Y )is based on expression (1 14),and the definition of 3H,'(X 1 Y ) is based on expression (1 15). In the limiting case we have lim 'H,!(X I Y ) = lim 3 ~ f ( 1 XYI r-
I
r-
=H(X
I Y),
1
where H ( X 1 Y ) is as given in Eq. (109). We shall now use expressions (116) and (117) to define the conditional entropies of order r and degree s, using the compositivity relation given in Eq. (9). These definitions are as follows: 'H:w
I Y) = YA2H,'(X I Y)) = (21 - s
-
1)-
1
{(jtl
qjbilj)-l'r-' -
I),
s # 1, r # 1, r > 0, ( 1 18)
and
"XX
I Y ) = YA3Hf(XI Y ) ) ,
In the limiting case we have lim 'H;(x1 Y ) = lim 3H;(X 1 Y), r-
I
r-l
378
INDER JEET TANEJA
Also we can check that
'HE(X I Y ) = 2H:(X I Y) and : H ( X I Y ) = : H ( X I Y ) . The exact expressions of : H ( X 1 Y ) and 3 H 3 X 1 Y ) are given by $4(XIY)=(2'-'-
( 120)
l)-l
and
(121) respectively. Expression (120) is obtained from (118) by taking r
=
l/t and
s = 2 - t. Expression (121) is obtained from (1 19) by taking r = s.
We know that
I(x A
Y ) = H ( X ) - H(X 1 Y ) ,
where I ( X A Y ) is the well-known mutual information (Ash, 1965) between the random variables X and Y. Based on the definitions of unified (r,s)-conditional entropies given above, we can generalize I ( X A Y ) in the following way:
a-)us(x A Y ) = &",x) - "&s(x 1 Y), where ct = 1, 2, and 3. By simple calculations we can write I(X
A
Y ) = D(AJIP*Q),
where n
D ( A ( ( P * Q= ) i='
m
2 a i j l o a.. gL j=1
Pi4j
is a directed divergence between the distributions A and P*Q. We shall now present a fourth way to define the unified (r,s) conditional entropy. This is based on the generalizations of D(AJIP*Q)in terms of the unified (r,s) directed divergence .F;(AI I P*Q) given in Eq. (49). This definition is as follows: 48i(x1 Y ) = &i(x)- 4~1,'i(x A Y), where
'~ 1 ';(x Y ) = 9 t ( A I I P * Q ) . A
Thus we can write .Jr/.;(X
A
Y ) = &s(x)- "&;(x I Y),
INFORMATION MEASURES AND THEIR APPLICATIONS
379
where
and
for o! = 1, 2, 3, and 4. Remarks. 'H,'(X I Y ) is defined in a natural way, as is Shannon's entropy. ' H , ' ( X Y ) can be found in Aczel and Daroczy (1963) and Behara and Nath (1970). 3H,'(X Y ) has been taken by Arimoto (1975) to relate it to Gallager's random coding exponent function. 4H,'(X 1 Y ) has been adopted by Renyi (1960) and is based on the definition of mutual information between random variables.
I
Based on the above definitions the following proposition holds: Proposition 6.6. We have (i) € i ( X ) 2 0,
"&s(X 1 Y ) 2 0 ( a = 1, 2, 3, and 4).
(ii) &:(X, Y ) 2 & : ( X ) or &:( Y ) .
(iii) If X and Y are independent random variables, then
&;(x,Y ) = &;(x)+ a;(Y) + (2' (iv) "&:(xI Y ) 5 &:(X), (el) for c1
=
-s
-
l)-'€;(x)&;( Y).
1 it is true for ( r , s ) E r l ,
(e2)for c1 = 2, 3, and 4 it
IS
true for all r > 0 and any s.
(vi) 2 8 s (I Y) ~ 5 3&;(XI Y ) . Parts (i), (ii), (iii), and (vi) are true for all r > 0 and any s.
(122)
380
INDER JEET TANEJA
Proof. Parts (i), (ii), and (iii) are easy to verify. Part (iv) (el) follows from the concavity of & ; ( X ) for (r,s) E rl given in property 8. For part (iv) (ez) when ot = 2 and 3, it is sufficient to prove the results 'Hf(X 1 Y ) 5 H , ' ( X ) and 3H,'(X [ Y ) 5 H,'(X). The first follows from Van der Lubbe et al. (1982), and the second follows from Arimoto (1975). For c( = 4, part (iv) (e2)holds because of the nonnegativity of Y;(AIJP*Q)(i.e., F:(Pll V) for all r > 0 and any s given in Section IV. Let us now prove parts (v) and (vi). (iv) From Lemma 4.2, we can write n
\s-l/r-l
,.
> - I
1
-> r-1
1,-
s- 1
'-
s-1 0<< 1. r-1
Subtracting 1 from both sides of Eq. (124), multiplying by (2' -'- I)-' (s # I), and simplifying we get
When r = s in Eq. (125), we use the equality sign. Using concavity of the logarithmic function we can prove that
Similarly, we can prove 'H",X
1 Y){
Y), 5 2H",X( Y ) , 2H;(xI
r > 1, 0 < r < 1.
Combining Eqs. (125), (126), and (127) we get the required result. (v) Again using Lemma 4.2, we can write
For r = 1, we use an equality sign in Eq. (1 28). Taking log(.) on both sides of Eq. (128), multiplying by (1 - r ) - l ( r # l), and simplifying, we get 'Hf(X 1 Y ) 5 3 ~ , ! ( 1XY ) ,
r # I, r > 0.
This gives 9s(2Hf(x1 Y)) 5 9s(3Hf(xI Y ) ) ,
(129)
INFORMATION MEASURES AND THEIR APPLICATIONS
38 1
i.e., 2HS ,(X 1 Y ) 5 3 H 3 X 1 Y ) , When r
=
r # 1 , s # 1 , r > 0.
( 1 30)
1 , we have
* H ; ( X 1 Y ) 5 3Hs(X 1 Y ) ,
s # 1.
(131)
Combining Eqs. (129)-(13l), we get the required result.
I
Proposition 6.7. We have
H”X, Y )
s-1
5 H ; ( Y )+ ‘H;(X I Y),
s
2 r, r r - l 2 1,
2 H s ( Y ) + ‘H:(XI Y ) ,
r
5 s, r - -
~
s-1 r-1
(132)
5 1.
for all r # 1, s # 1, r > 0 Proof. We know (Behara and Chawla, 1974; Rathie and Taneja, 1989) that
where 0
Sj
5 y j , j = 1,2,.. .,m. We also know (Gallager, 1968, pp. 523) that
1
5
u; i= 1
2
uij]
= 45,
o < r 5 1, (1 34)
(
aijJ = 45,
r
21
for all j = 1,2,.. . ,m. Case 1. 0 < r 5 1. In this case, substituting n
h j = C u L , j = l , & ...,m, i= 1
yj=q5,j=
1,2,..., m,
and s-1
p=l_1 ,
r#l,s#l
382
INDER JEET TANEJA
in Eq. ( 1 33), we get
I
s-l/r-1
s- l/r- 1
2 1 or-
, m
s-l/,-l
r-1
(135) s- l / r - 1
m
s-1 I 1. r-1-
0 < r < 1,0 <--
Multiplying both sides of Eq. (135) by (2l-' - l)-'(s # 1) and simplifying, we get
where
s # 1,rf 1,rbO.
(137)
Case 2. r > 1. In this case, substituting
6.,= q',, yj
=
,Ia:j,
I=
1
j =
LZ...,m, j = I , 2,. . .,m,
INFORMATION MEASURES AND THEIR APPLICATIONS
383
and s-
p
=
1 z,
s#l,rfl,
into Eq. (133), we get
1.e..
(138)
\
s-1 r > 1,0<-5 1. r- 1
Multiplying Eq. (138) by (2'-'
-
l)-'(s # 1) and simplifying, we get
Z H I ( Y ) + V. zH:(Y)+V.
s z r > 1, l
( 1 39)
where V is as given in Eq. ( 1 37). Corn bining ( 1 36) and ( 1 39), we get
H X X , Yl{ for all r # 1 , s # 1, r > 0.
5 H : ( Y ) + V, 2 H : ( Y ) + V,
2 r, r5s s
(140)
384
INDER JEET TANEJA
We have
When r
=s
in (6.43), we have
v = C S ( XI Y), where Cs(X 1 Y ) is as given in Section V1.A. In this case we use an equality sign in Eq. (140) i.e., (140) reduces to (108) when r = s. We know that
for all j result.
=
1,2,.
..,m. Expressions (140) and (142) together give the required
Proof. (i) This follows from propositions 6.6(iv) and 6.7. (ii) This is an extension of proposition 6.7. (iii) From Eq. (132), we can write Iff;(
Y, z1 X ) 5 'H;( Y 1 X ) + ' H ; ( Z 1 x,Y)
(143)
INFORMATION MEASURES AND THEIR APPLICATIONS
385
for all s 2 r, r(s - l / r - 1) 2 1, ( r # 1, s # 1, r > 0). Also, from proposition 6.6(iv)we can write
'H:(Z I x, Y ) 5 'H:(Z 1 X )
( 144)
for all (r, s) E rl.Expressions (143) and (144) together complete the proof of this part. (ivf We know that
'H;( Y I Z ) 5 ' H " X , Y I Z ) ,
5 'H"X I Z ) + 'H;( Y I x , Z ) , 5 ' H ' ( X I Z ) + 'H;( Y I X ) for all s 2 r, r(s - l / r - 1) 2 1 ( r # 1, s # I , r > 0), where we have used propositions 6.6(ii) and (iv), and expression ( 1 32).
Proof. (i) This follows from proposition 6.8(iv). (ii) We have
d",X, Y ) = 'H:(X I Y ) + ' H ; ( Y I X ) ,
2 H ; ( X , Y ) - H ; ( X ) + ' H : ( X I Y ) - H i ( Y) [proposition 6.8(i)], =
2 H i ( X , Y ) - H F ( X ) - H:( Y),
=
I H X X ) - HXY)I,
for all r 2 s, r ( s - l/r
-
1) 2 1 ( r f 1 , s # 1, r > 0).
VII. APPLICATIONS TO STATISTICAL PATTERNRECOGNITION In statistical pattern recognition the key problem is feature selection. Usually the performance of a recognition system is expressed in terms of the probability of error or misclassification. The aim of feature selection is to reduce the number of features without adversely affecting error performance. The feature selection problem can be viewed as the selection of the set of features that minimizes the probability of error, P,. The computation of Pe, unfortunately, is usually very difficult, involving the determination of the decision regions and the integration of appropriate class-conditional densities over multidimensional spaces. We are, therefore, lead to seek an auxiliary criterion for determining the relative importance of the features and to use it in the selection process. However, unless we know the connection between the auxiliary criterion and the error probability, the use of a feature set chosen according to this criterion, instead of any other, can not be justified. We are thus lead to the choice of an auxiliary criterion that provides a measure of the separability or distance between classes and that has a direct relationship with the probability of error. A variety of such measures have been proposed in the literature and bounds relating P, to various distance and information measures can be found in Chen (1 976) and Kana1 ( 1 974). The classification problem is stated as follows: Suppose we have n pattern classes X = (xlr.x2,...,x,) with (I priori probability p i = P r { X = .xi), i = 1,2,. . .,n. Let the feature y on Y have a
INFORMATION MEASURES AND THEIR APPLICATIONS
387
class-conditional probability density function p ( y 1 x i ) , i = 1,2,. . . ,n. We assume that pi and p ( y I xi)are completely known. Given a feature y on Y , we can calculate the conditional LI posteriori probability p ( x i I y ) for each i, by the Bayes rule:
It is well known (Fergunson, 1967) that the decision rule that minimizes the probability of error is the Bayes decision rule, which chooses the hypotheses (pattern classes) with the largest posterior probability. Using this rule, the partial probability of error for a given Y = y is expressed by P(e I Y)
=
1
-
max { P ( X 1 I Y)? P ( X 2 1 Y), . . . P(.G 1 Y)l. 7
Prior to observing Y, the probability of error Pe, associated with X is defined as the expected probability of error, i.e., pe = E , { p ( e 1 I.)> =
b
P(e 1 Y)P(Y)dY,
where p ( y ) = Eyz p i p ( y I x i ) is the unconditional density of Y evaluated at y. In recent years, researchers have paid attention to the problem of bounding this probability of error for two- or multiple-class problems taking some information, divergence, and distance measures into consideration (Kailath, 1967; Kanal, 1974; Chen, 1976; Boekee and Van der Lubbe, 1979). Our aim here is to give bounds on the probability of error in terms of unified (r,s)-entropy and the distance measures given in Sections 1I.G and 111, respectively. Some particular cases are also considered. Some bounds involving divergence measures given in Sections V.B and V.C are also given. A . Generalized Entropies, Distance Measures, and Error Bounds
This subsection deals with the upper bounds on the probability of error in terms of the generalized entropies and distance measures given in Sections 1I.G and Ill, respectively. Analogues to the Fano-type bounds are also given. Some lower bounds on the probability of error in terms of distance measures are also presented. Proposition 7.1.
We have
388
INDER JEET TANEJA
and
.
*
J
n - 1 times
where
Proof.
Substituting P ( X I Y = y) for P in inequalities 3(ii) and 3(iii), we get
and
respectively. Multiplying Eqs. (150) and (151) by p ( y ) and integrating over Y, we get (147) and (148), respectively, where Eq. (148) follows because of the concavity of € : ( P ) for all (r, s) E rl. Equation (147) gives the upper bound on the probability of error in terms of the unified (r,s)-conditional entropy. Equation (148) is a generalization of the well-known Fano inequality or Fano bound in terms of Shannon's entropy. Using Eqs. (122) and (123) given in Section V1.B we can observe that 1 1 1 P < -'&:(XI Y ) 5 -2&:(xIY) 5 -3&;(xlY) "=2 2 2
for all s 2 r > 0, where ' € ; ( X I Y ) and 3&s(X I Y) can be written in a similar way as was Eq. (149), using the expressions given in Section V1.B. Thus from Eq. (152) we can conclude that the bounds obtained in terms of ' b : ( X I Y ) are better for all s 2 r > 0. Proposition 7.2. We have the following bounds: -rPe + (1 - Pe)']P, r > l,p>O,rp> l ( o r O < r < l,p 1 , r p g 1.
5 [(n
- 1)'
Proposition 7.3. We have the following bounds: (i) P,
sI
-
Tf(X I Y ) ,
r 2 0, p 2 0, r # p ,
The proof of propositions 7.2 and 7.3 is based on propositions 3.1 and 3.2, respectively, and it can be seen in Capocelli et al. (1985). The particular cases of propositions 7.1-7.3 involve known entropies and distance measures and are as follows: Shannon's entropy. (Chu and Chien, 1966; Hellman and Raviv, 1970). We have 1 P <-H(XIY) '=2
(155)
and H ( X I Y ) 5 -P,logP,
-
(1
-
P,)lOg(l
-
P,)
+ P,log(n
-
l), (156)
390
INDER JEET TANEJA
where
Chu and Chien (1966) studied the upper bound (155) by using Shannon's inequality. Hellman and Raviv (1970) studied the same bound using the branching property given in Section 1I.A. The bound (156) is the well-known Fano-type bound. It has also been studied by Chu and Chien (1966) and Kovalveski (1968). Quadratic entropy. (Vajda, 1968). We have pe 5 @ A X I Y),
and
where
If we put r = s respectively,
=
2 in Eqs. (147), (1 48), and (149), we get (1 58), (1 59), and (160),
Cubic entropy. (Chen, 1976). We have
2 p, 5 3 @ 3 ( X 1 Y), and
where
If weput r = s respectively.
=
3 inEqs.(147),(148),and(149),weget(l61),(162),and(163),
Enrropy of' degree s. (Devijver, 1977; Ben-Bassat, 1978; Taneja, 1983).
INFORMATION MEASURES AND THEIR APPLICATIONS
39 1
We have ( 164)
and
s # 1 , s > 0,
where
When r = s with s # I , and s > 0 in (l47),( l48), and (149), we get (164), (l65), and ( I66), respectively. Entropy of' order r. (Ben-Bassat and Raviv, 1978). We have 1 P <-'Hf(XIY), "=2
O
and
where
When s = I with r # I , I' > 0 i n Eqs. (147), (148). and (149), we get Eqs. (167), (168). and (169), respectively. For the bound (168), also refer to Toussaint (1977) and Taneja (1983). The first condition for the bound given in (167) is 0 < I' < I , which is because of Eq. (147),but it has been proved independently by Ben-Bassat and Raviv (1978) that i t holds for 0 < r 5 2 ( r # 1). Entr-opj,of' kind
t.
(Boekee and Van der Lubbe, 1980;Taneja, 1982). We have 1 P <-:H(XIY). "=2
t#I,t>O,
( 170)
392
INDER JEET TANEJA
and
t#I,t>O,
where
(172)
When r - ' = t = 2 - s i n Eqs. (147), (148), and (149), we get Eqs. (170), (171), and (172), respectively. Entropy of order 1 and degree s. We have
s # l,s>O.
(175)
When r = 1 in Eqs. (147), (148), and (149), we get (173), (174), and (175), respectively. Entropy of order r and degree s. (Taneja, 1985). We have
INFORMATION MEASURES AND THEIR APPLICATIONS
393
for all r # 1, s # 1, r > 0, where
rfl,s#f,r>O.
(178)
The bounds (1 76) and (1 77) are the obvious consequences of the bounds (147) and (148), respectively. Using the inequalities among the entropies given in the Section 11.1.1, we can compare some of the upper bounds given above. Bayesian distance. (Devijver, 1974). We have
-4
[
55 1-
1-
n
J1-
iP, 5 1 - G:(X I Y ) , (179)
n-1
where
c:(xIY)=1 - r n 2 ( X I Y ) =
jy[i,
P(xi I Y)z] P(Y) dy.
(180)
The inequalities given in Eq. (179) follow from the proposition 7.2 by taking r = 2 and p = 1. Measure G:(X I Y ) given in Eq. (180) is known as Bayesian distance (Devijver, 1974). B. Generalized Jensen Dtflerence Divergence Measures and Error Bounds
In Section V.B we gave different generalizations of the Jensen difference divergence measure in the discrete and finite case of the probability distributions. In the same way, we will now write some of the generalizations of the Jensen difference divergence measure between the continuous probability distributions p(y I xl) and p(y I xz). These generalizations are as follows: 2
(p(v'x + I
-
) 2
P(vo),og(p('i"l)
+ P(Y/xz)j]dy, 2 -r
r # 1, r > 0,
394
INDER JEET TANEJA
r#l,r>O,
r # l,s#I,r>O,
and
r#l,s#l,r>O.
Let us write these generalizations in a unified way: " R : , r # 1, s # 1, r > 0, " R ; , r = I, s # 1, r # 1, s = I, r > 0, R, r = 1 , ~ = 1,
where CI = 2 and 3. We have the following relation [refer to expression (74)]
z3d ;,
ssr,
29"S,
s
2 r.
INFORMATTON MEASURES AND THEIR APPLlCATlONS
395
Let us write the measures relating to "9'; in a more general form for the two-class case as follows:
and
r # 1,sf l,r>O.
Let us write these measures in a unified way:
When pl = p2
=
1/2 in Eq. (183), we have
396
INDER JEET TANEJA
Thus we have
where
and
Using the concavity of the logarithmic function, we can write
From Eqs. (186) and (187), we have
where
and
INFORMATION MEASURES AND THEIR APPLICATIONS
= (;)'-'c1
-
397
'H;(X I Y ) ] ,
where
s - I/*- 1
-p(y)dy)
-
I]},
By Lemma 4.2, we can write 1s -
lir- 1
r # 1, s # 1, I > 0.
(190)
398
INDER JEET TANEJA
From Eqs. (190) and (1 91), we obtain
where
and
H"XI Y = y ) = ( 2 l - s - l)-l{[p(xl ly)'+ p(x2fy)']s-I'r-1 r # 1, s # 1 , r > 0.
-
11,
Unifying the results given in Eqs. (185), (188), (189), and (192), we have
where ' Y : ( X [Y ) is as given in Eq. (149) with X = (x1,x2)and Y as a continuous random variable. Based on the relations given above we shall now present some error bounds. 1. Upper Bounds on the Probability of Error in Terms of
3 * y X ~ lPJ , and 3C
Proposition 7.4. We have
and
where 3 Y 3 p , , p 2 ) and
33yi are
given by Eqs. (183) and (181), respectively.
Proof. From Eq. (147), we have
INFORMATION MEASURES AND THEIR APPLICATIONS
399
By Eqs.(193) and(196) we get Eq. (194), while Eq. (195)followsfrom Eq. (184) and ( 194). 2. Lower Bounds on 3 V " s ( p l , p 2 ) and 3Ysin Terms of the Probability of Error When n = 2 in Eq. (156), we have
H(X 1 Y ) 5 H(Pe),
(197)
where
H ( P e ) = - P, log Pe - (1 - PJlog(1 From Eqs. (185) and (197), we have R ( P , , P ~5) 1 - H(Pe)* Using Lemma 4.1, we can write
Let p(e I y ) = min
then from Eq. (199), we get
S P : + ( l -Pe)', ( 1 - Pe)',
2 P:
+
O < r < 1, r > 1.
(200)
Taking log(.) on both sides of Eq. (200), multiplying by ( 1 - r ) - l ( r # l), and simplifying, we obtain
3 W ~ 1 2, ~1 -2 H) ~ ( P , ) ,
r # 1, r > 0,
where
-
H,?(P,) = (1 - r)-' Iog[P:
+ (1
-
When r -+ 1, Eq. (201) reduces to Eq. (198).
P,)'],
r
+ 1, r > 0.
(201)
400
INDER JEET TANEJA
We can write
=
(')
1 --s
~ 1 ~-? ( p e ) 1 9
(202)
where H"Pe)
= (21-3 -
1)[2(1-"'H'Pe' - 11,
s # 1.
Also,
V~,C~R,'(P~,PJI,
3 R X ~ 1 , ~= 2)
2 qs[1
- Hf(Pe)I,
(i)
1 -s
=
r # 1 , s # 1, r > 0,
[l
-
H"Pe)],
+ (1
-
Pe)*]s-l'r-l - l},
(203)
where HpJJ
= ( 2 1 - s-
l)-l{[P;
r#l,s#I,r>O.
Combining Eqs. (198), (201), (202), and (203), we have proved the following proposition: Proposition 7.5. We have
INFORMATION MEASURES AND THEIR APPLICATIONS
40 1
Particular case of Eq. (204)
When p1 = p 2
=
1/2, then from Eqs. (204) and (184), we have
2
3v;
(3'~ s[I
-
a;(Pe)].
From relation (1 82) and result (205), we can also obtain
where 2Y'f is given in Eq. (181) for c( = 2. From the inequalities given in Eq. (182), it is quite clear that the result, Eq. (205), is better than Eq. (206). C. Generalized Measure of Chernof, Bhattacharya Distance, and the Probability of Error
Let
r > 0.
When r
=
1/2 in Eq. (207), we have (208)
K,,2 =
where F is the well-known Bhattacharya distance or Matusita's measure of affinity (Matusita, 1967). The measure r
P(Y I X l ) ' P ( Y
I X 2 Y -'dY,
'0
(209)
is known as the Chernoff measure (Kailath, 1967). Thus, based on Eq. (209), we call K , given in Eq. (207) the generalized measure of Chernofl. Let us write K , and F in a more general form involving the prior probabilities p1 and p 2 given by
402
INDER JEET TANEJA
and
When p l = p 2
=
112, we have
and
We can simplify measures (210) and (211) in the following way:
and
where
and F ( Y ) = J P ( X 1 I Y)P(X2 I Y).
Based on the above notations, we have the following proposition: Proposition 7.6. We have
Proof. We have
INFORMATION MEASURES AND THEIR APPLICATIONS
:;:p( I:;)]
+ p 2E , [
'"]2(1
403
-r),
(2 15)
where E, and E 2 represent the expected values in their respective forms. Using Lemma 4.2 in Eq. (215), we have
2(1 - r) 2 1 or (2(1 - r) 5 0, 2(1 - r )
0 5 2(1
-
r) 5 1.
Simplifying Eq. (216) we obtain the required result. Particular cases of Eq. (214)
(i) When r
=
1/2, we have K1/2(Pl?PZ)= F(P1,Pz).
(ii) When p1 = p z = 1/2, we have
(2 F2(1-r),
Proposition 7.7. We have
1 0 < r 5 x,r 2 1 ,
404
INDER JEET TANEJA
where 1 Kr(Pe)= -[P:(l 2
-
Pe)'-r
+ (1 - Pe)'Pd-r],
r > 0.
(218)
Proof. We have
where
Let
It is easy to verify that K r ( p )is a convex function of p for r > 1 and a concave function of p for 0 < r < 1. Therefore, we can write
i.e.,
Also, we can write
Expressions (219), (220), and (221) together give the required result. Particular case of Eq. (217) (i) When p1 = p2 = 1/2, then from Eqs. (217) and (212), we have
2 P:(l 5 P:(l
-
+
Pe)' ( I - P,)'P,' - I , Pe)l-r+ (1 - P J P ; - ~ ,
r
1,
O
s 1.
(222)
INFORMATION MEASURES AND THEIR APPLICATIONS
405
D. Generalizations of J-Divergence and the Probability of Error
In Section V. C , we presented different generalizations of J-divergence in the discrete and finite cases. This section deals with the different generalizations of J-divergence between two continuous distributions p ( y I x l ) and p ( y I x2). These generalizations are then related to Bhattacharya distance and the probability of error. We have
r#l,r>O,
r
+ 1 , s # 1 , r > 0.
406
INDER JEET TANEJA
The measures given above can be unified in the following way: r # 1 , s # 1 , r > 0, r = 1,s # 1, aw;=[y,?, r#l,s=l,r>o, r = 1 , s = 1,
"J;, "J;,
where a = 1 and 2. The following inequalities also hold [refer to expression (92)]:
Let us write the measures given in Eq. (223) for a = 2 in the more general case involving prior probabilities p1 and p 2 in the following way:
(J(Pl,P2),
where
r = 1 , s = 1,
INFORMATION MEASURES AND THEIR APPLICATIONS
and 2J;(pl,p2) =
( 1 - 2'-")-'
{[
(y(lPIP(Y
I X1)l"P2P(Y I X 2 ) 1 1 - r
+ CPIP(Y I X l ) l 1 - T P 2 P ( Y I X2)l')dY = (1 - 2'
{(Iy
I
CP(X1 Y)'P(X2
I Y)'
'-ll, --*
s- l j r - 1
+ P(X,lY)'P(X,
IY)1-rlP(Y)dY)
r#l,s#l,r>O. When p1 = p2 = 112 in Eq. (225), we have
If we write
then
and
where
- 11,
407
408
INDER JEET TANEJA
and JS(Y) = (1 - 21-s)-1{cP(x,JY)'P(X,lY)'-'
+ p(x, 1 y)' -'p(x, I y ) q -
I/'-
-
I},
r # L s # 1,r>0,
respectively. In a unified way we can write:
where
"UY)
=
r # 1, s # r = 1,s # r # 1, s = r = 1, s =
J;(y), J",y), J,!(y), J(y),
1, r > 0, 1, 1 , r > 0, 1.
It is easy to check that the following inequalities hold:
Based on the above considerations we shall now present relations between generalizations of J-divergence, Bhattacharya distance, and the probability of error. Proposition 7.8. We have 2WS(Pl,P2) L I l Y - V e ) ,
3 W S ( p l , ~L2* p)S ( ~ e ) ,
(230) s L r > 0,
2w; 2 2WS(Pe),
(231) (232)
and where
I
' W ; 2 2W;(pe),
s 2 r > 0,
(233)
r#l,s#l,r>O, J;(Pe) = ( 1 - 2l-s)-'[K,(Pe)S-'/'-1 - 11, J",P,) = ( 1 - 21-s)-1[2(5-1)J(pe) - 11, r = 1,s # I, r # 1 , s = 1 , r > 0, Ws(Pe)= J ; ( P e ) = ( r - l ) - ' logK,(Pe), (234) J ( P e ) = (2Pe - 1)log
~
and K:(Pe) is as given in Eq. (218).
",>.
r = 1 , s = 1,
INFORMATJON MEASURES AND THEIR APPLICATIONS
409
The proof of Eq. (230) follows from relation (217) given in proposition 7.7. Equation (232) follows from (230) and (226) by taking p1 = p 2 = 1/2. Equation (231) follows from relations (229) and (230). Equation (233) follows from relations (232) and (224). Proposition 7.9. We have
for any s, where
and F( p l , p 2 ) is as given in Eq. (2 1 1). The proof follows from inequalities (214) given by the proposition 7.6. Particular case of Eq. (235).
When p1 = p 2 = 1/2, we have
where F is as given in Eq. (208). The particular cases of the propositions 7.8 and 7.9 for r = 1 and s = 1 can be seen in Toussaint (1974) and in Devijver and Kittler (1981).
410
INDER JEET TANEJA
ENTROPY GRAPH
The following graph indicates how all the entropies given in Section 1I.F reduce to Shannon’s case in the limiting or in the particular case:
i 423
0
425
0
422
INFORMATION MEASURES AND THEIR APPLICATIONS
41 1
REFERENCES Aczel, J., and Darbny, 2. (1963). Publications Mathematicae 10, 171-190. AczCI, J., and Daroczy, 2. (1975). “On Measures of Information and their Characterizations,” Academic Press, New York. Arimoto, S. (1971). Information and Control 19, 181-190. Arimoto, S. (1975). Colloq. on Information Theory, Kesthely, Hungary 41-52. Arimoto. S. (1976). I E E E Trans. on Inform. Theory IT-20,460-473. Ash, R. (1965). “Information Theory,” Interscience New York. Behara, M., and Chawla, J. M. S. (1974). In “Entropy and Ergodic Theory: Selecta Statistica Canadiana.” 11, 15- 38. Behara, M., and Nath, P. (1970), In “Probability and Information Theory 11” (M. Behara, K. Krickeberg, and J. Wolfowitz, eds.). Springer Verlag, Berlin, pp. 102-137. Belis, M., and Guiasu S., (1968). I E E E Trans. on Inform. Theory IT-14, 591-592. Ben-Bassat, B. (1978). Information and Control 39, 227-242. Ben-Bassat. B., and Raviv, J. (1978).IEEE Trans. on Inform. Theory IT-24, 324-331. Blumer, A. (1982). Ph.D. Thesis, University of Illinois at Urbana-Champaign, Department of Mathematics. Boekee, D. E., and van der Lubbe, J. C. A. (1979). Pattern Recognition 11,353-360. Boekee, D. E., and van der Lubbe, J. C. A. (1980). Informafion and Control 45, 136-155. Burbea, J. (1984). Utilitas Mathematica 26, 171-192. Burbea, J., and Rao, C. R. (1982). IEEE Trans. on Inform. Theory lT-28,489-495. Campbell, L. L. (1965). Information and Control 23,423-429. Campbell, L. L, (1985). In/orrnation Sciences 25. 199-210. Campbell, L. L. (1987). Queen”sMathematical Preprint, No. 12. Capocelli, R. M., and Taneja, I. J. (1984). Proc. I E E E Intern. Conf. on Systems, Man and Cybernetics, Oct. 9- 12, Halifax, Canada. pp. 43-47. Capocelli. R. M., and Taneja, I. J. (1985). Cybernetics and Systems 16, 341-376. Capocelli, R. M., Gargano, L., Vaccaro, U., and Taneja, 1. J. (1985). Proc. IEEE Intern. ConJ on Systems, Man and Cybernetics, Arizona, U.S.A.. November 12-15, pp. 78-82. Capocelli, R. M., de Santis, A., and Taneja, 1. J. (1988).IEEE Trans. on Inform. Theory IT-34.134138. Chen, C. H., (1976). Information Sciences 10, 159-171. Chu. J. T. and Chueh, J. E. (1966). J. Franklin Inst. 282, 121-125. Csiszar, I. (1972). Periodica Marh. Hung. 2, 191-213. Csiszar, I. (1974). Trans. of the 7th Prague ConJ, pp. 83-86, Prague; Chechoslovakia. Csiszar, I. and Kdrner, J. (1981). “Information Theory: Coding Theorems for Discrete Memoryless System.” Academic Press, New York. Daroczy, 2. (1970). Information and Control 16, 36-51. Devijver. P. A. (1974). IEEE Trans. on Comp. C-23, 70-80. Devijver, P. A. (1977). Informarion and Control 34,222-226. Devijver, P. A., and Kittler. J. V. (1982).“Pattern Recognition: A Statistical Approach.” Prentice Hall, London. Ferentinos, K., and Papaioannou, T. (1983).J. Comb. Inform. and Syst. Sci. 8,286-294. Fergunson, T. S. (1967). “Mathematical Statistics.”pp. 284--308, Academic Press, New York. Ferreri, C. (1980). Stutistica XL, 155-168. Gallager, R. G., (1968).“Information Theory and Reliable Communication. John Wiley and Sons, New York. Gallager, R. G. (1978).IEEE Trans. on Inform. Theory lT-29,668-674.
412
INDER JEET TANEJA
Guiasu S. (1977).“Information Theory with Applications.” McCraw Hill, New York. Gyorfi. L., and Nemetz, T. (1975). Collog. on Inform. Theory, Keszthely, Hungary, pp. 309-331. Hardy, G. H., Littlewood, J. E., and Pblya, G. (1934).“Inequalities.” Cambridge University Press, London. Hartley, R. V. L. (1928).Bell System Tech. J . 7, 535-563. Hellman, M. E., and Raviv, J. (1970).I E E E Trans. on Inform. Theory IT-16, 368-372. Horibe, Y. (1973). Information and Control 22,403-404. Horibe, Y. (1985).I E E E Trans. on Systems, Man, and Cybernetics SMC-15,641-642. Jeffreys, H. (1946). Pror. Royal Sor., A186,453-561. Jelinek, F . (1968a). “Probabilistic lnformation Theory.” McGraw Hill, New York. Jelinek, F. (1968b). I E E E Trans. on Inform. The0r.y IT-IS, 765-774. Jelinek. F., and Schneider, K. (1972).l E E E Trans. on Inform. Theory lT-l8,765-774. Kailath, T . (1967). I E E E Trans. on Commun. Tech. COM-15, 52-60. Kanal, L. N. (1974).IEEE Trans. on Inform. Theory lT-20,687-722. Kapur, J . N. (1967). The Math. Seminar 4,78-94. Kapur. J. N. (1983).J . Infbrm. and Optim. Sci. 4,207-232. Kapur, J. N. (1986).Indian J . Pure & Appl. Math. 17,429-449. Kerridge, D. F. (1961).J . Royal Statist. Sac. 823, 184-194. Kieffer,J . C. (1979).Information and Control 41, 136-146. Kovalevski, V. A. (1968). In “Character Readers and Pattern Recognition.” pp. 3-30, V. A. Kovalevsky, Ed.. New York: Spartan. Kullback, S. and Leibler, R. A. (1951). Ann. Math. Statist. 22, 79-86. Longo, G . (1980).“Information Theory” (in Italian. Boringhieri, Torino, Italy. Mangasarian, 0.L. (1969). “Nonlinear Programming,” Tata McGraw Hill, New Delhi/Bombay. Marshall, A. W., and Olkin, 1. (1979).“Inequalities: Theory of Majorization and Its Application,” Academic Press, New York. Mathai, A. M., and Rathie, P. N. (1975).“Basic Concepts in Information Theory and Statistic.” Wiley and Sons, New York. Matusita, K. (1967). Ann. Inst. Statist. Math. 19, 181-192. McEliece, R. J. (1977).“The Theory of Information and Coding.” Encyclopedia of Mathematics and its Applications, Vol. 3. Addison Wesley, Reading, Massachusetts. Nath, P. (1975).Infbrmarion and Control 29, 234-242. Nyquist, H. (1924). Bell Sysr. Tech. J . 3, 324-. Nyquist, H. (1928). A I E E E Trans. 47,617-. Parker, J., D. S. (1979). S l A M J. Comput. 9, 470-489. Picard, C. F. (1979).J . Comb. Inform. und Syst. Sci. 4,343-356. Rao, C. R. (1982). Theor. Popul. Biology 21, 24-43. Rathie, P. N. (1970). J. Appl. Probl. 7, 124-133. Rathie, P. N. and Sheng, L. T. (1981). J . Comb. Inform. andSyst. Sci. 6 , 197-205. Rathie, P. N., and Taneja, I. J. (1989). Information Sciences, to appear. Renyi, A. (1960). M T A I I I Oszthlyhnak Kozf 10,251-282. Renyi, A. (1961).Proc. 4th Berk. Symp. Math. Statist. and Probl., Vol. I , pp. 547-461, University of California Press. Berkeley, California. Sahoo, P. K. (1983). J . Comb. lpform. and Syst. Sci. 8, 263-270. Sant’anna, A. P. Taneja, I. J. (1985). Information Sciences 35, 145-155. Shannon, C. E. (1948). Bell System Tech. J . 27,379-423; 623-656. Sharma, B. D., and Mittal, D. P. (1975).J. Math. Sci. 10,28-40. Sharma, B. D., and Mittal, D. P. (1977). J . Comb. Inform. and Syst. Sci 2, 122-133. Sharma, B. D., and Taneja, I. J. (1975). Metrika 22,205-215. Sharma, B. D., and Taneja, I. J. Elecc. Inform. Kybern. 13,419-433.
INFORMATION MEASURES AND THEIR APPLICATIONS
413
Shiva, S. S. G.. Ahmed, N. U., and Georganas. N. D. (1973).J. Appl. Probl. 10, 666-670. Sibson, R. (1969).Z. Wahrs. und Verw Geh. 14, 149-160. Taneja. 1. J. (1975). PhD. Thesis, University of Delhi. Taneja, 1. J. (1979).J. Comb. Inform. and Syst. Sci. 4,253-274. Taneja, 1. J. (1982).Proc. l E E E Intern. Cot$ on Cybern. and Soc., Washington, D.C., October 2830, pp. 463 -466. Taneja. 1. J. (l983a). l E E E Trans. on Systems. Mun. rind Cybernetics, SMC-13,241-242. Taneja, 1. J. (1983b).J. Comb. Inform. & S w t . Sci. 8, 206-212. Taneja, I . J. (1984a).Matemtiticu Aplicuda e Comprrtucional3, 199-204. Taneja, 1. J. ( I 984b). J. Comb. Iqjbrm. arid Sysr. Sci. 9, 169- 174. Taneja, I . J. (1985a). Purrern Recognition Lettors 3, 361 368. Taneja, I. J. (1985b).Proc. Intern. Con[. on Telet~ommunicatic,nand Control, Rio de Janeiro, Brazil, December 9-12, pp. 48-51. Taneja. I . J. (l986a). Information Sciences 39, 21 1 2 16. Taneja, I. J. (1986b).J. Comb. Injorm. & Syst. Sc,i. 11,99-109. Taneja, I . J. (1987).Statistical Planning arid It+wnce 16, 137-145. Taneja, 1. J. (1988a). In/i)rmation Sciences, to appetrr. Taneja, 1. J . (1988b). Tamkang J . Math. 19. Taneja. I . J. (1988~).Tumkang J . Math. 19. Toussaint. G. T. (1974). Proc. Second Intrrn. Joint Con/’. on Pattern Recog., Copenhagen, Denmark. Toussaint. G. T. (1977). I E E E Trans. Systems, M a n . and Cyhernetics SMC-7, 300-302. Trouborst, P. M.. Backer, E., Boekee, D. E.. and Boxma Y . (1974).Proc. Second Intern. Joint Corf: on Pattern Recog., Copenhagen, Denmark. Vajda, 1. ( I 968). Infijrm. Trans. Problems 4 , 9 - 19. van der Lubbe, J. C. A. (1978). Proc. 8th Prague Con/:. pp. 253-266. Prague, Chechoslovakia. van der Lubbe, J. C. A., Boxma, Y. and Boekee, D. E. (1984). Information Sciences 32, 187-215. van der Lubbe, J. C. A., Boekee, D. E. and Boxma, Y . (1987). Information Sciences 41, 139-169. van der Pyl, T. (1977). Colloq. Intern. d u C.N.R.S., No. 276, Teorie de I’lnformation, Cahan, France, 4-8 July, pp. 161-171. Vdrma, R. S. (1966).J. Math. Sci. I, 34-48. Wiener, N. (1948).“Cybernetics.” M.I.T. Press, Cambridge. Wyner, A. D. ( I 972). Iq/i)rmution and Conrrol20, I76 I8 I . ~
~
~
This Page Intentionally Left Blank
A
B
Aberrations astigmatism, 40, 89 chromatic, 44 quadrupole lens, 143 round lens, 92 transaxial lens, 169 two-dimensional lens, 152 coma, 40, 88 distortion, 40, 91 field curvature, 40, 90 geometrical, 36, 41 coefficients, 39 correction of, 43, 184, 193, 197, 198 quadrupole lens, 139 round lens, 88 third order, 39 transaxial lens, 165 mechanical, 45, 118 spherical, 40 correction of, 43, 185, 194, 198, 199 crossed lens, 179 measurement of, 50 quadrupole lens, 140 round lens, 84 transaxial lens, 166 two-dimensional lens, 152 Absorption, laser, 261 Acceptance, 60 Additivity, 330, 331 Algebraic property, 327, 339 Amorphous Ge, crystallization, 263-273 Analysis of message, 368 Analytic property, 327, 339 Antiphase domains, 309 Arithmetic, 331 Auger process, 214 Auxiliary criterion, 386 function, 407 Average codeword length, 331
Backscattered electrons, 212-214 Bayes decision rule, 387 rule, 387 Bayesian distance, 393 Bayesian probability of error, 332, 368 Beam damage, 285,314 Beam-device matching, 61 Bench, electron optical, 47 Bhattacharya distance, 327,401,405,408 Binary digit, 328 Bivariate, 368 Bounds, in information theory, 327, 331, 342, 343, 350, 353, 386, 387, 388, 389, 390, 391, 393 Branching, 330 property, 390 Brightness, 70, 74
C Calculated images. 289, 293 Capacitor, deflecting, 266 Cardinal elements, 27 measurement of. 48 quadrupole lens, 126 transaxial lens, 163 Cathodoluminescence,215 Cation ordering, 292 Channeling, 291 Characterization (of information), 330, 331, 355 Chernoff measure, 401 Class, 386, 387, 395 Class-conditional density, 386 probability density function, 387 Classification problem (in information theory), 386 Clays, 308
415
416
INDEX
Coding information, 328 problem, 328 theorem, 328, 331 Communication, 328, 368 channel, 328 theory, 328 Composition relation, 337, 340, 341, 358, 359 Compositivity relation, 337, 377 Compressing systems, 62, 69 Concave, 337, 357, 367 function, 339, 340, 341,348,352, 354,404 Concavity, 340, 341, 357, 359, 362, 364,367, 380,388, 396 Conditional, 369, 387 entropy of degree s, 369 entropy of order rand degree s, 377 generalized entropy, 369 probability, 368 Contamination, 296 Continuity, 339 Continuous distribution, 405 function, 339, 348 probability distribution, 393 random variable, 398 Convergent beam electron diffraction, 290 Convex, 337, 357, 367 fUnction. 339, 352, 354,356, 357, 366,404 Convexity, 358 Coordination number, 291 Cosmic dust, 322 Crossover, 65 Crystal growth, 264267,270,299 Crystallization,amorphous Ge, 263-273 Cubic entropy, 390
D Decision making, 328 Decision region, 386 Decision rule, 387 Decision theory, 332 Decreasing function, 340, 352 Defects, 297,299 Deflection, electrons, 225-228 Density current, 69, 73 distribution, 69, 74, 75 phase-space particles, 55 Diagenesis, 308
Directed divergence, 329, 353, 378 of order r, 350,378 Dirichlet problem, 176 Disc of least confusion, 87 Discrete finite, 393, 405 (wary) probability distribution, 329 random variable, 368, 369 Distance measure, 327, 352, 386, 387, 389 Divergence, 359 measure, 359, 360, 387 Doubly stochastic matrix, 338
E Effective length, 122 Electrolytic tank, 13 Electron, interaction with matter, 211-216 Electron beam induced conductivity, 216 testing, 214-254 Electron diffraction, 289 Electron emission photo, pulsed, 228 thermal, pulsed, 224,229-232 Electron energy loss spectrometry,291 Electron energy spectrometer, 243, 245 Electron gun, 228-233 Electron emission, microscopy 218, 220, 221 mirror, 217, 220, 221 reflection (REM), 217,220 scanning (SEM), 221-223 transmission (TEM), 216-220 Emittance, 57 Encode (message), 328 Energy dissipation, electrons, 213 Energy spectrum, 213,243 Entropy, 332, 333, 336, 337, 342, 368, 369, 376, 389, 410 of degree s, 327, 331, 332, 333, 369, 390 graph, 327, 410 of kind t , 327,332, 333, 391 of order 1 and degree s, 333, 392 of order r, 327,329,330,331, 332,333, 343, 344, 391 of order rand degree s, 327, 333, 352, 392 series, 329, 342, 350 Envelope, 61, 64, 86 angular, 62 equation, 62 linear. 62
INDEX Equality. 347, 380, 384 Equation Hamiltonian, 53 Laplace, 13, 15, 18, 161 motion, 9, 124 trajectory. 11 paraxial, 22, 23, 81, 125, 150, 161 third order, 37, 140 Error, 386 bound, 327.387, 393, 398 probability. 386 Expected probability of error, 387 Expected value, 403 Experiment, 330 Explosive crystallization, 263-273 Exponential average codeword length of order r, 331 Exposure, short time, 233, 257-260,266,267 Exsolution, 315
F Fano, bound, 388 Fano, inequality, 388 Fano-type bound, 387, 390 Faulted sequences, 287 Feature, 328, 386, 387 selection, 386 Field (potential)distribution, I2 coaxial lens, 188 crossed lens, 177 quadrupole lens, 116, 119, 121 nonlinearity, 119 rectangular model, 121, 124 round lens, 79 axial. 94, 99, 102 transaxial lens, 161, 171 two-dimensional lens, 149, 157, 159 Focal length. 28, 83, 126, 151, 163 Focusing astigmatic, 26, 115, 160, 174 stigmatic, 26 Frequency mapping, 249 tracing, 247 Frequency-contrast characteristic, 75 Function, 332, 338, 346, 347, 357, 358, 367 Function, of discrimination, 353 Fuzzy sets theory, 328
417 G
Gain, MCP, 235, 236 Ge, crystallization, 263-273 Generality, 345 Generalized certainty measure, 352 coordinates, 52 distance measure, 327, 352 divergence measure, 327, 329, 359 entropy, 327,329, 332, 333, 336, 342,354, 368, 387 exponential length, 331 f-entropy, 332 information measure, 327, 328, 329 Jensen difference divergence measure, 393 length, 331 measure, 353, 362 of Chernoff, 327, 401 of directed divergence, 353 momenta, 52 Shannon’s or Gibb’s inequality, 348 velocities, 52 Geology, 283 Gradient operator, 337 Graphs (in information theory), 348, 410 Gunn diode, 246, 247
H Hamiltonian, 53 Huffman algorithm, 331
I Image, 25 converter, 234 formation, TEM, 218, 219 line, 26 Imaging artifact, 288 Inaccuracy measure, 349 Increasing, 358, 359 function, 331, 339, 352. 353, 354, 356, 358 Independent, 351 random variable, 368, 379 Individual, 368, 369 conditional unified (r, s)-entropy, 369 Inequality, 327, 329, 337, 342, 343, 344, 345, 348, 349, 350, 355, 357, 358, 359, 375, 388, 393, 401, 406, 408, 409 among entropies, 342, 393
418
INDEX
Information theory, application, 327, 328, 329, 331, 353, 355, 386 Information, 328 amount of, 328, 330 distributions, 330, 378 measure, 331, 386, 387 processing, 328 radius, 327, 329, 359 retrieval, 328 source, 328 storage, 328 theory, 328 Interaction, electron with matter, 211-216
J J-divergence,327, 329, 359, 364, 366,405, 408 Jacobian, ?2 Jensen difference divergence measure, 359, 360 inequality, 344 Joint, 386 experiment, 368
K Kinetic theory, phase transition, 270
L Lagrangian, 52 Language, of information theory, 328 Laser annealing, 255,258,259, 263-273 driven electron gun, 228-232 induced processes, 260-263 Lattice imaging, 287 Lemma, 357,358,363,366, 367, 380,397,403 Lens astigmatic, 26, 115, 160, 174 box-like, 199 coaxial, 187-189 crossed, 174-187 systems, 182 einzel, 29. 101, 155, 176, 180 immersion, 29, 94, 110, 152 quadrupole, 115-148 achromatic, 144 doublet, 129 quadruplet, 135
systems, 128 triplet, 132 radial, 189-193 round, 78-115 stigmatic, 26 transaxial, 159-174 tube, 199 two-dimensional, 148-159 zoom, 79, 112 Lens aberration, 219, 221 Limit, 354,362, 365 Limiting case, 329, 333, 340, 349, 369, 377, 410 Literature, of information theory, 328, 329, 330, 333, 338, 353, 356, 357, 358, 359, 368, 370, 386 Logarithmic function, 357, 364, 367, 380, 396 Logarithmic nature, 328 Logic state tracing, 249 Lower bound, 342, 387, 399
M Magnetic wall, imaging, 219 oscillation, 237-240 Magnification angular, 26 crossover, 66 linear, 26, 28, 130 Majorization, 338 Matrix, 31, 129, 136 determinant, 32 drift space, 32 einzel lens, 32 inverse, 33 mirror, 34 multiplication, 33 Matusita’s measure of affinity, 401 Maximality, 342 Maximum, 342, 347 Maximum probability, 342 . Measure, 328,330, 349, 355, 359, 362,367, 370, 386,393, 401,402,406 of divergence, 359 of information, 328, 329, 353 of separability, 386 of uncertainty, 330 Metamorphism, 309 Meteorites, 317 Method charge density, 17,95 conformal transformation, 15, 121
INDEX finite-difference, 18 finite elements, 20 separation of variables, 15, 94, 156, 170, 190 shadow, 48 Mica, 299 Microanalysis, 291 Microchannel plate (MCP), 234-237 Mineral reactions, 305 Model, explosive crystallization, 268-273 Modulated structures, 303 Monotonicity, 340 Multidimensional space, 386 Multiple-class, 387 Multivariate probability distribution, 327, 368, 369 Mutual information, 378, 379 of degree s, 370
N Nonnegativity, 339, 340, 356, 362, 366, 380 Normality, 340 Nuclear waste, 314 Nucleation, 269, 222 Numerical differentiable function, 337 function, 337, 338
0 Octupole, 193 One-to-one code, 331 Oscillation Bloch line, 241 magnetic wall, 237-240 Oxidation state, 291
P Pair, 356, 357, 367 of distribution, 366 Parameter, 340 Parametric generalization, 353 Particular, 352, 368 case, 333, 337, 387, 389,401,403,404,409,410 Pattern class, 386, 387 misclassification, 386 recognition, 328
419
Phase shift, by potential, 219 Phase space, 53 contour, 57 ellipse, 57, 68 parallelogram, 57, 68 properties, 54 six-dimensional, 53 trajectories, 53 two-dimensional, 54 volume, 55 Phase transition, theory, 268-271 Photo emission, 215 Plane focal, 28 image, 25 object, 25 phase-space, 54, 57 principal, 28 Polysomatism, 301 Polytypism, 289, 297, 299 Positivity, 336 Posterior probability, 387 Prior probability,401,406 Probabilistic model, 328 Probability, 342, 355, 368, 386, 387 a posreriori, 387 a priori, 386 distribution, 342, 350, 351, 353, 368, 393 of error, 327, 353, 386, 387, 388, 398, 399, 401,408 Problems (related to information theory), 328, 331, 336,386, 387 Propagation crystallization,265-267, 270 high-field domain, 246, 247 phase transition, 262 Property, 328, 329, 330, 331, 336, 337, 339, 340,341, 342, 343, 344, 347, 348, 352, 353, 355, 358, 362, 364, 369, 370, 371, 376, 380 Proposition, 338, 341, 342, 344,349, 352, 353, 354, 355, 356, 358, 359,362. 363, 366, 367, 370, 371, 3’72, 374, 379. 381, 384, 385, 387,388, 389, 393, 398,400,402, 403,408,409 Pseudoconcave, 337, 341 function, 341 Pseudoconcavity, 341 Pseudoconvexity, 337 Pseudometric space, 373 Pulsed detector, 233-237
420
INDEX
Pulsing electron beam, 224-233
Q Quadratic entropy, 390 Quantity, 328, 331, 352, 357 Quasi-concave, 337, 341 function, 341 Quasi-concavity, 341 Quasi-convexity, 337, 341 Quasi-linearity, 331
R R-divergence, 327, 359, 360, 361 Radioactivity, 314 Random coding exponent function, 379 variable, 373, 378, 379 Real function, 332 parameter, 331, 332 Real-time microscopy, 223 REM, 257,259 TEM, 254-258,263-268 Recognition system, 386 Recursive, 332, 333, 344 Recursivity, 330, 331, 344 of degrees, 331 Redundancy of degree s, 332 of order r, 331 Reflection microscopy, 217, 220, 257 Refractive index, 8 Relative information, 353 Remark, 355, 362 Resistance network, 14 Resolution electron beam testing, 249-252 real-time microscopy, 273-276 spatial, SEM, 221, 249 spatial, TEM, 219 Results, summaries of, 329, 339, 340, 341, 344, 348, 350,355. 358, 363, 364, 367,369, 373, 374,380,384,398,401,403,404 S Scalar parameter, 329, 332. 364,365,366
parametric entropy, 329 Scanning microscopy, 221-223 Scattering electrons, 211-214 function, R Schur concave, 338 function, 341 concavity, 338, 341 convexity, 338 Secondary electrons, 212-214 Semiconductor junction gun, 232 Serpentine, 282, 285, 290, 304, 312 Shannon information measure, 328 theory, 328 Shannon’s case, 369, 410 Shannon’s conditional entropy, 369 Shannon’sentropy, 327, 329, 330, 333, 336, 341, 342, 344, 359, 375, 376, 379, 388,389, 390 Shannon’s inequality, 390 Shannon’s or Gibb’s inequality, 349 Shot noise, 251, 274 Similarity theory, 11 Solidification, 270-273 Specimen preparation, 284 Spinelloids, 319 Statistical applications, 367 Statistical concepts, 330 Statistical pattern recognition, 327, 329, 386 Strictly concave, 347 Stroboscopic microscopy, 223, 237-241, 247 Structure determination, 292, 294 Sum property, 331 representation, 330, 332, 333 Supercooled liquid, 270-273 Superposition principle, 13 Symmetric function, 340 SYNROC, 314
T Theorem Helmholz-Lagrange, 27, 29 Liouville, 55 Liouville, corollary, 55, 60 Thin lens approximation, 30, 32 quadrupole lens, 128 round lens, 83
421
INDEX transaxial lens, 163 two-dimensional lens, 151 Time-resolving REM, 257,259 techniques, 210, 254-268 TEM, 254-258, 263-268 Transformation Fourier, 76 linear, 31, 55, 57 Transient states, laser-induced, 258, 260, 266,267 Transmission, 328 of information, 328 Transmission microscopy, 216-220, 254-258 Transmit, 328
U Uncertainty amount of, 330 Uncertainty measure, 328 Unconditional density, 387 Unified entropy, 329, 339, 369 expression, 329, 337. 355 (r, s)conditional entropy, 327,376,377,378, 388 (r. s)-directed divergence, 355, 378 (r, s)-entropy, 327, 336, 337, 339, 342. 360, 368, 369, 387 way, 329, 355, 369, 394, 395,408
Upper bound, 342,387,388, 390,393, 398
v Variable length, 331 Velocity crystallization, 264-268,270-273 flow, 262, 265 Verification, 339, 340, 355, 371 Voltage contrast, 242-245
W Waveform sampling, 246 Weathering, 306 Wronskian, 24,32, 56
X X-ray emission, 214
Y Yield. electron emission, 213
Z Zero, 347, 355 Zirconolite, 314
This Page Intentionally Left Blank