ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS
VOLUME 72
EDITOR-IN-CHIEF
PETER W. HAWKES Laboratoire d’Optique Electr...
18 downloads
859 Views
17MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS
VOLUME 72
EDITOR-IN-CHIEF
PETER W. HAWKES Laboratoire d’Optique Electronique du Centre National de la Recherche Scientifique Toulouse, France
ASSOCIATE EDITOR
BENJAMIN KAZAN Xerox Corporation Palao Alto Research Center Palo Alto, California
Advances in
Electronics and Electron Physics EDITED BY PETER W. HAWKES Laboratoire d’Optique Electronique du Centre National de la Recherche Scientifique Toulouse, France
VOLUME 72
ACADEMIC PRESS, INC. Harcourt Brace Jovanovich, Publishers Boston San Diego New York Berkeley London Sydney Tokyo Toronto
COPYRIGHT @ 1988 BY ACADEMIC PRESS, INC. ALL RIGHTS RESERVED. NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM,WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.
ACADEMIC PRESS, INC. I250 Sixth Avenue, San Diego. CA 92101
United Kingdom Edition published by
ACADEMIC PRESS INC. (LONDON) LTD. 24-28 Oval Road. London NWI 7DX
LIBRARY OF CONGRESS CATALOG CARD NUMBER: 49-7504 ISBN 0-12-014672-X PRINTED IN THE UNITED STATES OF AMERICA 88 89 90 91
9 8 7 6 5 4 3 2 1
CONTENTS CONTRIBUTORS .............................. PREFACE ..................................
vii ix
Optical Characterization of 111-V and 11-VI Semiconductor Heterolayers G . BASTARD. C . DELALANDE. Y . GULDNER. AND P . VOISIN I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 I1. Energy Levels in Heterolayers . . . . . . . . . . . . . . . . . . . . 4 I11. Formal Optical Properties . . . . . . . . . . . . . . . . . . . . . . 70 IV . Experimental Methods in Unstrained 111-VSystems . . . . . . . 85 V . Strained Layer Systems. . . . . . . . . . . . . . . . . . . . . . . . 125 VI . 11-VI Superlattices: Optical Determination of the Band Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 170 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
I. I1.
111.
IV. V. VI . VII .
Dimensional Analysis JOSEF. CARINENAAND MARIANO SANTANDER Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conventional Dimensional Apalysis . . . . . . . . . . . . . . . . The Mathematical Foundations of Dimensional Analysis . . . . The Physical Meaning of Dimensional Analysis . . . . . . . . . Kinematic Groups and Dimensional Analysis. . . . . . . . . . . Dimensional Analysis and Symmetries of Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lattice Quantization JERRYD . GIBSONAND KHALIDSAYOOD I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. Scalar Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . V
182 183 199 216 226 234 245 255
259 262
vi I11. IV . V. VI . VII . VIII . IX . X.
C0NTENTS
Definitions and Motivation for Optimal Vector Quantization . Motivation for Lattice Quantization . . . . . . . . . . . . . . . . Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lattice Quantizer Design. . . . . . . . . . . . . . . . . . . . . . . Fast Quantization Algorithms. . . . . . . . . . . . . . . . . . . . Performance Comparisons. . . . . . . . . . . . . . . . . . . . . . Research Areas and Connections to Other Fields . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
265 270 275 296 304 316 325 326 321 327 328 331
CONTRIBUTORS The numbers in parentheses indicate the pages on which the authors’ contributions begin.
G. Bastard (l), Groupe de Physique des Solides de 1’Ecole Normale Superieure, 24 rue Lhomond, 75231 Paris Cedex 05, France Jose F. Carinena (181), Departamento de Fisica Teorica, Facultad de Ciencias, Universidad de Zaragoza, 50.009 Zaragoza, Spain C. Delalande (l), Groupe de Physique des Solides de 1’Ecole Normale Superieure, 24 rue Lhomond, 7523 1 Paris Cedex 05, France Jerry D. Gibson (259), Department of Electrical Engineering, Texas A&M University, College Station, Texas 77843
Y. Guldner (l), Groupe de Physique des Solides de 1’Ecole Normale Superieure, 24 rue Lhomond, 7523 1 Paris Cedex 05, France Mariano Santander (1 8 l), Departamento de Fisica Teorica, Facultad de Ciencias, Universidad de Valladolid, 47.005 Valladolid, Spain Khalid Sayood (259), Department of Electrical Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska 68588 P. Voisin (l), Groupe de Physique des Solides de 1’Ecole Normale Superieure, 24 rue Lhomond, 7523 1 Paris Cedex 05, France
vii
This Page Intentionally Left Blank
PREFACE Two of the three chapters of this volume are devoted to subjects in lively development, while the third is concerned with dimensional analysis, a topic of perennial interest that every now and again proves to have still unexplored depths. We begin with a long and detailed examination of optical methods of characterizing the two families of semiconductor heterolayers, the 111-V group and the 11-IV group. Despite all the attention that these materials are attracting, there is still much to be discovered about them, and optical absorption and photoluminescence measurements can be very informative. G. Bastard, C. Delalande, Y.Guldner and P. Voisin describe these methods fully and put the results in context, explaining how the different types of information combine to help us unravel the structure and properties of these semiconductor layers, of such great potential commercial interest. Dimensional analysis, the subject of the second chapter, has a long history, and, although most of us meet it in our schooldays, there are aspects of it that are still in active development and are occasionally the object of polemic. The group theoretical foundations of the subject are of particular concern to J. F. Cariiiena and M. Santander, but they take care to juxtapose these with the more traditional approach. They thereby succeed in shedding light on both the old and new features of dimensional analysis. The closing chapter by J. D. Gibson and K. Sayood is concerned with a new and exciting branch of signal coding, vector quantization. With the need to store large numbers of images, often multi-component images, more efficient data compression is becoming an urgent need, and vector quantization, with which several sample values (or pixel values) are coded as a single entity, may provide the answer. Shannon showed long ago that vector quantization is superior to scalar quantization and methods of exploiting this superiority are now emerging. This clear and authoritative review should enable those of us how need improved data compression to understand what these vector techniques have to offer. As usual, we end with a list of forthcoming reviews in these advances. Peter W. Hawkes
ix
X
PREFACE
J. K. Aggarwal Parallel Image Processing Methodologies H. H. Arsenault Image Processing with Signal-Dependent Noise M. Bertero Inverse Problems H. Bley Pattern Recognition and Line Drawings 0. Bostanjoglo Electron Microscopy of Very Fast Processes. A. Bratenahl and P. J. Baum
Magnetic Reconnection
J. L. Brown Sampling Theory J. M. Churchill and F. E. Holmstrom Electrons in a Periodic Lattice Potential
J. M. Coggins The Artificial Visual System Concept H. G. Craighead High-Resolution Electron Beam Lithography R. L. Dalglish Corrected Lenses for Charged Particles G. Donelli The development of Electron Microscopy in Italy
J. Fink Energy-Loss Spectroscopy W. Fuhs Amorphous Semiconductors N. C. Gallagher and E. Coyle Median Filters
J. J. Gagnepain Resonators, Detectors and Piezoelectrics S. and D. Geman Bayesian Image Analysis
PREFACE
E. Hahn Aberration Theory
J. Huggett SEM and the Petroleum Industry
D. Ioanoviciu Ion Optics G. H. Jansen Statistical Coulomb Interactions in Particle Beams.
M. Kaiser Systems Theory and Electromagnetic Waves
K. Kano et al. Phosphor Materials for CRTs
H. Van Kempen The Scanning Tunnelling Microscope
H. Kobayashi and S. Tanaka Multi-Colour AC Electroluminescent Thin-Film Devices K. Koike Spin-Polarized SEM J. S. C. Mc Kee and C. R. Smith Proton Microprobes
M. Mellini HREM and Geology. S. Morozumi Active-Matrix TFT Liquid Crystal Displays
C. Mory and C. Colliex Image Formation in STEM J. Pawley Low-Voltage SEM
R. H. Perrott Languages for Vector Computers G. A. Peterson Electron Scattering and Nuclear Structure F. H. Read and I. W. Drummond Electrostatic Lenses
xi
xii
PREFACE
J. H. Reisner Historical Development of Electron Microscopy in the USA.
T. Sakurai Atom-Probe FIM G. Schmahl X-Ray Microscopy
J. Serra Applications of Mathematical Morphology T. Soma et al. Focus-Deflection Systems and Their Applications Y. Uchikawa Electron Gun Optics K. Ura Electron Beam Testing A. M. Wittenberg Thin-Film Cathodoluminescent Phosphors.
.
ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS VOL . 72
Optical Characterization of 111-V and 11-VI Semiconductor Heterolayers G . BASTARD. C . DELALANDE. Y . GULDNER AND P. VOISIN Groupe de Physique des Solides de I'Ecole Normale Supirieure Paris . France
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . I1. Energy Levels in Heterolayers . . . . . . . . . . . . . . . . . A . The Envelope Function Model . . . . . . . . . . . . . . B. Specific Examples for Flat Band Heterostructures . . . . . . . . C. Perturbation of Heterostructure Electronic States by External Fields . D. Coulombic Impurity States in Heterostructures . . . . . . . . E . Many Body Effects in Heterostructure Energy Levels . . . . . . Ill . Formal Optical Properties . . . . . . . . . . . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . B. Interband Absorption in an Idealised Quantum Well . . . . . . C . Band Mixing Effects . . . . . . . . . . . . . . . . . . . D. Optical Absorption in Superlattices . . . . . . . . . . . . E . Excitonic Effects . . . . . . . . . . . . . . . . . . . . F. Magneto-Optical Absorption . . . . . . . . . . . . . . . IV. Experimental Methods in Unstrained IIILV Systems . . . . . . . . A. The GaAs-Ga, -,AI, As System . . . . . . . . . . . . . . B. Other Unstrained 111-V Systems . . . . . . . . . . . . . V. Strained Layer Systems . . . . . . . . . . . . . . . . . . . A . Structural Aspects . . . . . . . . . . . . . . . . . . . B. Electronic Properties of Strained-Layer Superlattices . . . . . . C . Experimental Studies . . . . . . . . . . . . . . . . . VI . 11-VI Superlattices: Optical Determination of the Band Structure . . . A. HgTe-CdTe SL Band Structure Calculations . . . . . . . . . B. Magneto-Optical Measurements in HgTe-CdTe SL's . . . . . . C. HgTe-CdTe SL Infrared Transmission at 300 K . . . . . . . . D. Other 11-VI SL Systems . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. 1 . 4 . . 4 . . 15 . . 35
. . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .
.
. . . .
. . .
. . . . . .
45 52 70 70 71 74 75 79 82 85 85 117 125 126 132 136 151 151 157 166 169 170 170
I . INTRODUCTION
The last few years have witnessed an explosive increase in the research activities on semiconductor heterolayers (Kyoto. 1985. 1986; Ando et al., 1982). On the one hand. the most studied GaAs.Ga,-.Al. As system has 1 Copynght (01988 by Academic Press. Inc All nghts of reproduction in dny form reserved KUN ai?.nidLm Y
2
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
benefitted from the significant improvement in sample quality. In fact, it is now recognized as the reference system against which new device ideas or electronic properties are tested. On the other hand, the diversity of available heterolayers has considerably increased owing to the dissemination and increased mastery of modern growth techniques such as Molecular Beam Epitaxy (M.B.E.) or Metal Organic Chemical Vapor Deposition (M.O.C.V.D.). To GaAs-Gal -,Al,As one may now add GaSb-InAs, InPIno~,,As-Gao,,, Ino,s3A's, GaSb-AlSb, G a l -&,AsGao.4, Ino,s 3 As, GaAs. The successful growth of strained layer materials relaxes the severe constraint of choosing lattice-matched semiconductors as possible hosts for the heterolayers and therefore multiplies the number of available heterostructures in a considerable way. To the 111-V heterolayers, one should add 11-VI's either with a narrow bandgap such as HgTe-CdTe or a wide bandgap such as Cdl -,Mn,Te-CdTe. All these technological efforts have been motivated by device aspects: search of improved materials, design of light emitters and detectors operating in the 1.2 pm, 1.5 pm windows for optical communications, realization of fast field effect transistors, etc. Both the quality improvement and the increased sample versatility are giving substance to Capasso's concept of bandgap engineering (Capasso et al., 1983) where a given electronic or electro-optic function is achieved by designing on the fine scale ( - 100 A) the required band edge profiles of multi-heterolayers. Yet, these fascinating prospects and achievements should not hide the plain fact that the control of growth processes often escapes our hands, resulting in heterostructures which are less perfect than one would desire. Residual impurities and interface defects often plague the quality of heterolayers, most significantly when Al is involved. For instance, the growth of the inverted GaAs-Gal -,AI,As interface (i.e. GaAs grown on top of G a l -,Al,As is usually difficult (Morkoc et al., 1981;Heiblum et al., 1984; Miller et al., 1983) due to the incorporation in GaAs of residual impurities which have kept floating on Gal -,AI,As. The studies of energy levels associated with coulombic impurities has had some successes in the specific case of GaAsGa,-,AI,As (Miller et al., 1982b; Masselink et al., 1983, 1985; McCombe et al., 1986; Mailhiot et al., 1982; Meynadier et al., 1985a) but is still unexplored in other heterostructures. Systematic studies are required correlating growth conditions with optical characterizations and recognizing the marked dependence of the impurity binding energies upon the impurity location in the heterolayers (Masselink et al., 1983, 1985; Meynadier et al., 1985a; Bastard, 1981a). A better understanding of the electronic energy levels in heterolayers is often obtained by perturbing these energy levels by static external fields (electric or magnetic) and by using optical probes to measure the effects of the perturbations. The electro-optics (Miller et al., 1984d, 1985a, 1985b, 1985;
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
3
Wood et al., 1984; Polland et al., 1985b; Alibert et al., 1985; Yamanishi et al., 1986, Mendez et al., 1982; Viiia et al., 1986) or magneto-optics (Voisin et al., 1983; Englert et al., 1983; Stormer et al., 1983; Schlesinger and Wang, 1986; Erhardt et al., 1986; Guldner et al., 1981; Maan et al., 1984; Miura et al., 1984; Belle et al., 1985; Duffield et al., 1986) of semiconductor quantum wells or superlattices have recently received some attention. The room temperature excitonic electro-absorption (Miller et al., 1984d3,1985a, 1985b; 1985; Wood et al., 1984) is now relatively well understood and even used to achieve fast electro-optical modulators. The magneto-optics (Voisin et al., 1983; Englert et al., 1983; Stormer et al., 1983; Schlesinger and Wang, 1986; Erhardt et al., 1986; Guldner et al., 1981; Maan et al., 1984; Miura et al., 1984; Belle et al., 1985; Duffield et al., 1986) should in principle be an ideal tool to disentangle the complicated valence energy levels of heterolayers. However, although a wealth in interband magneto-optical transitions are usually observed, their related fan charts have up to now eluded any quantitative interpretation, denying any access but qualitative to the valence subbands. This may arise from the significant electron-hole pairing (excitons) which compete with the Landau quantization in wide gap, undoped, materials such as GaAs-Ga - .Al,As. Apart from the doping (or ni-pi) superlattices (Ploog and Dohler, 1983; Ruden and Dohler, 1983; Ploog et al., 1986; Dohler, 1986b) most of the photoluminescence experiments were performed on heterolayers which did not contain free carriers. This was largely due to the difficulty of growing high quality modulation-doped quantum wells. Recently, improved growth conditions (Inoue et al., 1984; Tanaka et al., 1986; Fukunaga et al., 1986; Drummond et al., 1983) have allowed access to photoluminescence characterization of doped quantum wells (Pinczuk et al., 1984; Kleinman and Miller, 1985; Ryan et al., 1984; Meynadier et al., 1986; Chaves et al., 1986; Sooryakumar et al., 1985; Skolnick et al., 1986; Delalande et al., 1986). Even though the size quantization of the carrier motion in the wells and the Moss Burstein (i.e. band filling) effect contribute to a blue shift of the optical absorption edge of the doped GaAs well with respect to the bulk GaAs bandgap, the experimental emission or absorption lines are often found well below their expected energy positions (i.e. calculated within the Hartree approximation). This feature calls for a large bandgap renormalization (Bauer and Ando, 1986a; Kleinman, 1985, 1986; Ruckenstein et al., 1986; Schmitt-Rink and Ell, 1985) due to exchange and correlation effects amongst the electrons and to correlation effects of the photocreated hole with the electrons (in the case of n-type modulation-doped quantum wells). The scarce experimental results are well explained by theoretical estimates of these many body effects, but again systematic studies are still lacking. In this review, we shall present a survey of some basic optical data obtained
4
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
on a variety of III-V and II-VI heterolayers. We shall focus our attention on characterization purposes and therefore discuss only absorption and photoluminescence data. We shall then omit several important topics such as Raman spectroscopy (Abstreiter et al., 1984; Zucker et al., 1983; Jusserand and Paquet, 1986; Colvard et al., 1980), high excitation phenomena (Gobel et al., 1985; Chemla et al., 1984), laser characteristics and more generally devicerelated aspects. Second Section will deal with the energy levels in heterolayers; firstly, for free particles in unperturbed structures, then for particles subjected to static external fields (electric or magnetic) and for particles bound to hydrogenic impurities. The last part of this Section will deal with some manybody effects: the excitonic bound states in insulating heterolayers and the energy levels of heterostructures containing charges. In the third Section, we shall discuss the selection rules governing the optical transitions in heterolayers and their links with the absorption coefficient of light beams and photoluminescence spectra. Fourth and fifth Sections will be devoted to a presentation of some results obtained in unstrained and strained III-V heterolayers respectively while the sixth Section will deal with II-VI based superlattices. 11. ENERGY LEVELS IN HETEROLAYERS A . The Envelope Function Model
Many elaborate theoretical approaches (Schulman and Chang, 1981,1985; Chang and Schulman, 1983,1985; Caruthers and Lin-Chung, 1978; Ihm et al., 1979; Pickett et al., 1978; Jaros et al., 1985; Ninno et al., 1985) such as pseudopotential or tight binding calculations have been applied to the determination of the energy levels in semiconductor heterolayers. These methods usually provide a global description of all the heterolayer electronic states. However, they are often computationally prohibitive and, for this reason, restricted to heterolayers which are very thin (few atomic planes), either thin slabs or short period superlattices. Besides, most of the optical characterization experiments probe heterostructure states whose energies are close to the band edges of the hosts’ materials. In the case of direct gap III-V and II-VI based materials these states are relatively well described by the envelope function model (Bastard, 1981b, 1982,1986; White and Sham, 1981; Schuurmans and t’Hooft, 1985; Broido and Sham, 1985; Yang et al., 1985; Altarelli, 1983; Ekenberg and Altarelli, 1984; Fasolino and Altarelli, 1984, 1986; Kriechbaum, 1986; Potz et al., 1985; Smith and Mailhiot, 1986; Bangert and Landwehr, 1985, 1986; Ando, 1985; Bastard and Brum, 1986; Sanders and Chang, 1985; Nedorezov, 197 1). This model fully exploits the similarities found between the periodic parts of the Bloch function in various III-V and II-VI materials, as wit-
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
5
nessed by the near constancy of the Kane (1957) matrix element:
among these two families of compound semiconductors. In eq. (l), IS) and IX) denote the periodic parts of the Bloch functions at the center of the Brillouin zone which transform like atomic s and x functions under the symmetry operations which map the local tetrahedron onto itself (se e.g. Bir and Pikus, 1974). Suppose that two semiconductors A and B are lattice-matched and crystallize in the same cristallographic structures, and consider a single A B heterojunction. We assume that the heterostructure states can be written in the A and B layers in the form:
where the summation in Eq. (2) runs over a finite number of band edges. ui$”(r) are periodic parts of the Bloch functions of the bulk A and B materials and f;(r) are envelope functions which are slowly varying on the scale of the hosts’ unit cell. The heterostructure hamiltonian is written:
2
P2
= __ 2m0
+ V,(r)Y(z) + V,(r)Y(
-z)
(3)
where Y(z) is the step function and V,(r), VB(r) are the one-electron atomic potentials (eventually including spin-orbit terms) in the A and B materials respectively. The growth axis is the z axis and z = 0 is the location of the A and B interface, assumed to be perfect. Since the uIowill be assumed to be identical in A and B materials, we shall in fact find an effective hamiltonian which will act on the slowly varying envelopes. In this effective hamiltonian, the rapidly ~ have disappeared, surviving only implicitly through varying functions u , will effective parameters: bandgaps, interband p matrix elements, etc. Letting iV to act on $, we readily obtain that the envelope functions .fi(r) satisfy the coupled second-order differential system:
where we have made use of the different scales of spatial variations of the fi’s and the uIo’s. In Eq. (4),el:), are the energy positions of the Ith band edge at the r point (Brillouin zone center) in the A and B layers respectively and p is the electron momentum operator (p = - ihV). We notice that the heterostructure states depend on the band edge discontinuities E!:) - c!:). Often the bandgaps E!:) - E:;, c!:) - E(,; are known.
6
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
Thus, the heterostructure energy levels depend only on a single band offset, say the Ts valence band offset AE”. The envelope function model uses AE, as an input parameter which is adjusted to reproduce the experimental data. More elaborate approaches attempt to predict the magnitude of AE,, noticeably Harrison’s tight binding approach (Harrison, 1977, 1985, 1986) or TejedorFlores-Tersoff’s model (Tejedor and Flores, 1978, Flores and Tejedor, 1979; Tersoff, 1984, 1985, 1986). The situation is presently controversial. Harrison’s approach leads to the common anion rule, which states that two semiconductors having a common anion (say GaAs and AlAs) and being lattice-matched should display a valence band offset which is essentially zero. This arises from the fact that the valence band states are mostly built from the anion p orbitals which are spatially well localized. In Tejedor-FloresTersoff’s model one assumes that the relevant energies which have to be lined up are those of the bulk gap states which are such that the associated wavefunctions have equal admixture of conduction and valence states. In all the theoretical estimates, the A E,’s are obtained as an energy difference between two large quantities. For instance, in Harrison’s model AE, is equal to the difference of the energy separation between the vacuum level and the valence band maxima in each bulk material respectively. This means that AE, can
a
b
C
d
FIG. 1. Illustration of the part played by different apportionments between the valence and conduction bands of the bandgap energy difference E* - cA of two semiconductors A and B on the electronic states of a BAB rectangular quantum well. (a) Electrons are confined in the A layer, holes in the B layer (type I1 quantum well).(b) Same as in (a) except that the A layer is no longer a barrier for hole. (c) Both electrons and holes are essentiallyconfined in the A layer (type I quantum well).(d) The A layer is a barrier for electrons and a well for holes. Thus this structure is a type II quantum well, inverted with respect to the case (a). After Bastard and Brum (1986).
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
7
seldom be known to better than a fraction of an eV, which is an impressive performance for a band structure calculation but is insufficient for most practical purposes. As will be shown in Section IV, optical probes can be useful to ascertain our knowledge of valence band discontinuities between semiconductors. The key part played by band alignment in the electronic properties of heterostructures is exemplified in Fig. 1 in the specific case of quantum wells. When a slowly varying potential V(r) is added to the heterostructure hamiltonian (Eq. (2)), it appears in the effective hamiltonian (Eq. (4)) as a diagonal term V(r)&,,. Such a slowly varying potential is for instance produced by a coulombic potential (V(r) = -ez/Kr for donors, where K is the relative dielectric constant) or by an electric field (V(r) = eF r) or by a band bending potential -eqsc(r) due to the background fixed charges and the mobile carriers. As for the effects of an external magnetic field B, they are described in terms of a vector potential A(B = V x A) which also varies slowly in space. When A is nonvanishing, one should replace the momentum operator p in Eqs. (3.4) by p + eA/c. Depending on the heterostructure under consideration and for direct gap 111-V or 11-VI host materials, the summation over 1 in Eq. (1) will involve or six (r6,r8) or four (re)or two edges (r6).The ulo's eight (r6,r7,rg) corresponding to these edges are listed in Table I. In this table, the total angular momentum J has been quantized along the [Ool] direction which is the growth direction for many 111-V heterostructures (GaAs-Ga, -,Al,As, GaSb-AlSb). On the other hand, the HgTe-CdTe and CdTe-Cd, -,Mn,Te superlattices are often grown along the [1111 direction. For the S-like band (r,symmetry) there is J = 1/2, while the P-like levels are splitted into a T8 quadruplet (J = 3/2) and r7doublet (J = 1/2) which lies lower in energy than the quadruplet (Fig. 2). In the case of multi-heterojunctions (i.e. quantum wells and superlattices), one may define three piecewise constant functions V,(z), V,(z), Vd(z) which account for the spatial variations of the r6,r7,Ts edges across the heterostructures. These functions vanish inside one type of layer (A) and are equal to V,, V,, V, in the other type of layer (B) in the case of AB heterostructures. V,, V,, V, are thus the algebraic energy shifts of the r6, re,r7 edges when going from the A to the B materials (see Fig. 3).
-
- &1B) - &(A)
sP
r6
r6
- &(B) - &(A) - re
re
(5)
- $3) - &(A) a - r7 r7 In terms of carrier confinement there exists two main types of heterostructures (Fig. 4) depending on whether the product V,V, is positive or negative. If
TABLE I
'60
u5
u2
u6
i f , -;)
0
(5-i>
-80
i
k FIG.2. Band structure of a direct gap 111-V or 11-VI semiconductor in the vicinity of the center of the Brillouin zone. 8
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS A
FIG.3 . Definition of the quantities V,, V,,
9
B
in terms of the bulk parameters AA, AB, cA. eB.
FIG.4. Band edge profiles of a type I (a) and a type I1 (b) quantum well.
V,Vp < 0 one gets a type I configuration in which one kind of layer attracts both the valence (r,)and conduction (r,)electrons. This situation is the most frequent (GaAs-Ga, -,AI,As; GaSb-AISb; Ga0,,In,,,,As-InP; GaO,,In,,,, As-A~,,,,In0,,,As.. .) and is the most useful for opto-electronic devices. If V,Vp > 0, one kind of layer attracts the conduction electrons while the other attracts the valence electrons. This type 11, or staggered, configuration is relatively rare, being found in GaSb-InAs; InP-A~,,,,In,,,,As and GaAs ni-pi superlattices (although in the latter case there is no V,, V, but a band bending potential whose curvature is of opposite sign in n-doped and p-doped GaAs layers). Finally, the Hg chalcogenides-based heterolayers HgTe-CdTe; Hg, _,Cd,Te-Hg, _,Cd,Te; x < 0.16 and y > 0.16 at low temperature do not fit into this type I-type I1 classification but constitutes a type 111 family with unique electronic properties (see Section VI). The effective hamiltonian, which is an 8 x 8 matrix, acting on the envelope function fi(r) is easily written using Table I:
.L
4"
h
+ t:
G?
0
0
0
0
0
0
0
%
v
c: I9
I
.L
4"
h
0
+
kL
v
N
h
0
0
a
h
.-4"
I
I2
v
0
I
d
0
.I
4"
h
I
' %I?0
0
0
a
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS 1 1
where the free electron term p2/2mo has been dropped and where and AA are the bandgap and r spin orbit energy in the A material respectively. Zwhich is real, is defined as:
In the absence of perturbing potentials other than K(z), V,(z),V,(z)the effective hamiltonian does not depend on x and y. Therefore the in-plane components of the momentum operator pl = (p,,p,) commute with 2 and their associated eigenvalues hk,, hk, are good quantum numbers. Their conservation reflects the in-plane translational invariance of the heterostructure. The fi(r) in Eq. (2) can thus be written:
where S is the sample area. It is noticeable that the 8 x 8 effective hamiltonian is block diagonal if k, = 0. Moreover, the eigenvectors can be classified according to m,, the eigenvalue of the component of total angular momentum along the growth axis. One readily finds that the heterostructure eigenstates fall into two categories. (i) The light particle states which correspond to m, = f 112 and are hybrids of r6,T8 (m, = f 1/2) and r7states. (ii) The heavy hole states which correspond to m, = k 3/2. As apparent in Eq. (7) the heavy role states are dispersionless; i.e. corresponds to fixed V, respectively).This is due to our truncation and E = energies ( E = of the expansion over 1 in Eq. (1) to r6, r7,T8 edges and to the fact that within this subspace f3/2 states are not k p coupled to +_ 1/2 states if k, = 0. This short coming may be cured in many ways, for instance by including more edges in Eq. (1). However, the other edges are significantly separated from the r6,r7, ones. Thus, one often accounts for the existence of remote bands only up to the second order in k. This procedure amounts to adding to i%?an 8 x 8 matrix 6% which is parametrized by higher band (Luttinger) parameters (Bir and Pikus, 1974; Luttinger, 1956). The 6 2 matrix has been derived by several
+
-
12
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
authors for bulk materials. For heterolayers it is written:
h2
c =-
~
mo 2
[y2(k: - k:) - 2iy3kxky]
(15)
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
13
In Eqs. (10-13), the summation over v runs over all the states but the r6, levels. In Eqs. (14-19), products of 7’s by p z have been symmetrized and the bulk kinetic energy operators written in such a way that 8% in hermitian. In the 6 2 matrix the inversion-asymmetry splitting terms have been neglected (quasi-Ge model). These terms involve odd powers of p and arise because the hosts’ unit cells have a basis composed of two different atoms (e.g. G a and As in the case of GaAs). The inversion asymmetry terms are however so small that their magnitude has seldom been reliably determined in bulk materials. The heterostructure eigenstates at finite k, can only be determined numerically. If the host materials are under flat band conditions a possible way to calculate their in-plane dispersion relations consists of computing the band structure of the host materials for real and imaginary wavevectors and to expand the heterostructure wavefunction inside each host layer as linear combinations of these bulk states. The necessity of using bulk states with imaginary wavevectors stems from the lack of translational invariance of the heterostructure along the growth axis. The evanescent states which would be forbidden in the bulk materials, as their wavefunctions are not normalizable, may become allowed in layers of finite thicknesses and matched to propagating or evanescent states in the other layers. Let us write the eigenfunctions of 2 + 8 2 as an 8-components column vector f. Across an interface f is continuous to warrant the continuity of the total wavefunction $(r) of Eq. (2). The f continuity thus provides 8 continuity conditions at each interface. Since 2 + 8% is a second order differential system, we may integrate % + 8 2 across each interface to obtain 8 other continuity conditions. The integration of *(which is of the first order in d / d z ) does not give rise to any new independent continuity condition. Rather we retrieve the f continuity. The integration of 8 2 does provide 8 independent boundary conditions. They are formally written:
r,,
Mf
continuous
where
Mij=jdz8qj;1 < i , j < 8 in addition to the boundary conditions at the interface, we need to specify how f behaves at large IzI. This asymptotic behavior depends on the heterostructure under consideration. For the bound states of a quantum well
14
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
structure f should decay to zero at large (z(,whereas for superlattices the spatial periodicity of Vs(z), Vp(z), r/g(z), y,(z),. . . leads to the Bloch theorem:
+ d) = exp(iqd)f(z) (22) In Eq. (22), d is the superlattice period (d = LA+ L,, where LAand L , are the f(z
layer thicknesses of the A and B materials respectively)and q is the superlattice wavevector along the growth axis. Without loss of generality q can be restricted to the segment 71
-- I q < -
71
d In superlattices each unit cell contain two interfaces. The bulk hamiltonian generates for each energy E 8 independent wave-vectors (kl,klASB)), 1 I i I 8. The z components klAsB)are either real or imaginary in each host layers respectively. Thus the eigenstates of % + dA? can be written:
where:
8
xj(z) = i= 1
aZVB’(k,)exp(iklA*B)z) + PFsB’(kl)exp(- iklA*B)z) (26)
figvB)
Altogether 32 coefficients uFSB’, have to be determined. The 2 x 16 continuity conditions obtained at the interfaces are just sufficient for this purpose and the eigenenergies (labelled by k, and q) are obtained as the zeros of a 32 x 32 determinant. A similar reasoning can be made for quantum wells. If there is a band bending present in the heterostructure the eigenvalue problem is formally identical to that of flat band heterostructures but much more complicated due to self consistent requirements. One still disposes of the continuity conditions at the hetero-interfaces but the plane waves in Eq. (26) have to be replaced by the exact wavefunctions of the bulk hamiltonian in the presence of band bending. Needless to say, there is no other method but
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
15
numerical to find the energy levels (Lassnig, 1985; Lommer et al., 1985; Stern and Das Sarma, 1984; Ando, 1982; Mori and Ando, 1979; Bastard, 1984). Nevertheless, in the case of large bandgap materials, r6 levels decouple from the Ts, r, ones and the band bending effects on the r6 levels can be taken into account by variational methods.
B. Specific Examples for Flat Band Heterostructures In this paragraph, we present some results of calculated energy levels and subband dispersions in heterolayers. We first discuss the k, = 0 case and then A1,As heterolayers will often be used as the subband dispersions. GaAs-Gal -, examples. The band parameters we have used in the calculations are: E,(x) = 1519.2
+ 1247x
meV's
A(x) = 348 - 34x meV's
mF,(GaAs)
(27) (28)
= 0.067 m,
y,(GaAs) = 6.95 yz(GaAs) = 2.25
(31)
y3(GaAs)= 2.86
(32)
(33) The relative part Q, of the bandgap energy difference E,(x) - ~ ~ (taken 0 ) by the r6 edge will be taken as 0.6: F(Ga, -,Al,As)
=0
Even in the extensively studied GaAs-Gal -,Al,As system, the values of the bulk band parameters remain subjects of controversies. Noticeably, the Luttinger parameters of the G a l -,Al,As valence band are poorly known. Even the r-bandgap of the alloy is disputed. In other systems, the situation is worse. The uncertainties on the bulk parameters, which are the input of heterolayers energy level calculations, cast doubts on their reliability. Although the gross features of the calculations are certainly correct, the fine details may be illusiory. In addition to the limitations of our knowledge of the bulk band parameters, the material parameters of the actual heterostructures are seldom as well known as one may desire. For instance, the thickness of each individual layer is often known through calibrations obtained on thick (i.e. several thousands Angstroms) layers and the A1 concentration in the ternary material G a l _,AI,As through flux calibration. These materials limitations also hamper the comparison between theory and experiments.
16
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
1. Energy Levels at k,
= 0 for
Flat Band Heterostructures
An inspection of the X and 6% matrices reveals that the energy levels at k, = 0 are relatively easy to obtain. This is because the heavy hole and light particle states decouple. The heavy hole states (hereafter labelled HH,) correspond to m, = i-3/2 while the light particle states correspond to m, = - 1/2.The HH, energy levels are obtained as the eigenstates of
+
where ~ ( zis) either xs(z) or x6(z). In a type I system V, c 0. Thus, for -cA 2 2 -eA + Vp the bulk states in the well-acting material (the A material for definiteness) are propagating (real k A ) while they are evanescent in the barrieracting (B) material (kB = kB).For E < -cA + V,, kA and k , are real. The boundary conditions at the AB interfaces are that x and M & ( z ) dx/dz are continuous, where M,&) is the heavy hole mass in the heterolayer. This mass is position-dependent since the 7's are a priori different in A and B layers: E
Making use of the boundary conditions as well as of the Bloch theorem, we readily obtain the superlattice dispersions for the HH, states at k, = 0:
cos(qd) = cos(k,L,)cosh(lc,L,)
-
sin(kAIA)sinh(rc,LB)(39)
> E > - e A + Vp. The heavy hole energy spectrum falls into bands of allowed energy states separated by forbidden gaps (Fig. 5). The magnitude of the superlattice bandwidths (respectively bandgaps separating allowed superlattice bands) increases (decreases)with increasing hole energies. Clearly, at large I E ~ ,i.e. large hole kinetic energy, the modulation of the superlattice potential Vp(z) becomes negligible and the continuum of the bulk A material is progressively recovered. if
-EA
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
0
100
17
200
FIG.5. The allowed heavy hole superlattice states (hatched areas) of GaAs-Gao,,Alo,, As superlattices are plotted versus the superlattice period d for equal GaAs and Ga,,AI,,As layer thicknesses. For hole energies larger than 1 V,( the holes are in propagating states in both kinds of layers. The zero of energy is set at the Tspoint of GaAs. After Bastard (1988).
The superlattice states corresponding to Eq. (39), i.e. those such that the hole energies are smaller than I V,( in Fig. 5, can be viewed as resulting from the hydridization of the bound states in each well due to the tunnel effect across barriers of finite thickness. Indeed, this view is quantitatively supported by performing a tight binging analysis of the superlattice envelope functions: C N
1
where 'x ( z - nd) is the envelope function of the vth bound state (energy HH,) in a well centered at z = nd and clad between infinitely thick barriers. By retaining only the tunnel effects which occur between wells which are nearest neighbors, the Eq. (41) leads to the approximate dispersion relation of the vth superlattice hole band: E , ~=
HH,
+ + 2t,cos(qd) S,
(42)
where s, and t , are the shift and transfer integrals respectively. The isolated level HH, shifts towards lower energy by s, and split symmetrically with respect to E , + s, to form v ' ~hole subband. Its bandwidth 4lt,l decreases exponentially with the barrier thickness L,.
18
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
The light particle states are more complicated to handle than the HH, ones. This arises from the off-diagonal coupling between the r, and (light hole) T8 levels. This coupling is not of much importance in antimonide compounds (A 0.7 eV) nor in tellurides (A 1 eV) which are characterized by large spin orbit effects (heavy atoms). It may be of relevance in arsenides (A 0.3 eV) and certainly in phosphides (A 0.1 eV) when the light hole energy becomes of the order 2A/3. Actually, when the 6% terms are omitted, the energies - t A - 2AA/3, + Vp - 2AB/3 become singular points in the light hole dispersion relations of bulk A and B layers respectively. At these energies, the light hole are infinitely heavy and, moreover, no propagating states exist in - 2AA/3 and the energy segment - AA I E 5 + Vp - AB s E I + Vp - 2AB/3. Such singularities are washed out by the r6- r7offdiagonal terms of the 6% matrix. Schuurmans and t’Hooft (1985) have shown how to treat these terms approximately in the case of GaAs-Ga,-,Al, As heterostructures. In the case of strained layer materials involving arsenic, e.g. GaAs-Ga, -,Inx As many T8 light hole levels have hole energies which are comparable to 2AA/3 and a more exact treatment of the T8- r, coupling is necessary. For the r,-related states (the conduction states En in most of the 111-V based heterolayers), the &X? terms have a very small effect and we feel legitimate to discard the whole 6% matrix. Under this approximation, the superlattice dispersion relations become very easy to derive as we have to deal only with coupled first order differential equations. The physics becomes more transparent if one projects the 8 x 8 system onto the 2 x 2 subspace spanned by S t and S J . (Bastard, 1981b, 1982, 1986; Lassnig, 1985; Lommer et al., 1985; Larsen, 1968). One readily finds that the conduction states are twice degenerate and that the envelope functions for the z motion xl(z) and xz(z) satisfy the same second order differential equation:
-
-
(T,+ V,(z))x(z)= E X ( 4
-
(43)
where the kinetic energy term is:
where 9has been defined in Eq. (7)and b, V,, in Eq. (5). It may be remarked that Eq. (43) is non-linear upon the eigenvalue E, as a result of the projection technique. The boundary conditions which have to be fulfilled across the interfaces are that: x(z) and (45)
OPTJCAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
19
are both continuous. The continuity condition for dX/dz can be written:
in complete analogy with the heavy hole case. In Eq. (47), P(E,z) is a positionand energy-dependent effective mass which appears in the implicit formulation of the dispersion relations in the Kane model for each host layer:
The continuity condition of Eq. (47) is thus a generalization to materials with non parabolic bands of the m-'(z) dX/dz continuity derived by Ben Daniel and Duke (Ben Daniel and Duke, 1966) for heterostructures with uncoupled bands. The superlattice dispersion relations for l-,-related states are obtained by writing that within each kind of layer ~ ( zis)a linear combination of incoming and outcoming plane waves. The associated wavevectors kA, k, are related to & by Eqs. (46, 47) respectively. k , is real while k , is imaginary (real) if E < V, ( E > G).The dispersion relations have the same functional form as found for heavy holes (Eqs. (37,39)) except that the parameter is equal to:
<
if E > V' and:
< V,. The characteristic features of the r,-related superlattice states (shown in Fig. 6 for GaAs-Ga,,,Al,,,As) are qualitatively the same as the heavy hole ones, except that the lighter r, mass, which favors tunnelling, leads to larger bandwidth upon the barrier thickness L , is very well obeyed, at least for the ground (El) subband. Figure 7 demonstrates that the bandwidths are quite As shown in Fig. 7, the exponential decay law of the superlattice bandwidth upon the barrier thickness L , is very well obeyed, at least for the ground ( E l ) subband. Figure 7 demonstrates that the bandwidths are quite significant as soon as the barrier thickness is thinner than 70 A. Obviously, excited subbands have even larger bandwidths. if
E
-
20
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
vs
200 h
2 Y
G a w
z
w
1oc
0 0
100
(A)
d
200
FIG.6. The allowed superlattice states (hatched areas) for electrons in GaAs-Ga,,,AI,.,As superlattices are plotted versus the superlattice period d for equal GaAs (LA)and Ga,,,AI,,,As (L,) layer thicknesses. For electron energies larger than Vs the electrons are in propagating states in both kinds of layers. The energy zero is set at the Tspoint of GaAs. After Bastard (1988).
I
I
0
50
LB
100
Ci)
FIG.7. The bandwidth of the ground ( E , ) superlattice subband for electrons in GaAs. Ga,,,AI,,,As superlattices is plotted versus the barrier thickness L , for three different GaAs slab thicknesses: LA = 30 A, 50 A and 100 A respectively. After Bastard (1988).
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
21
When the barriers become thick enough to suppress the tunnel coupling between the wells, the superlattice states tend towards the discrete bound states supported by isolated single wells. These bound states are the solutions of
where 5 is given by Eq. (51). A rectangular quantum well always admits one bound state, irrespective of LAand Vs. If is infinite the number of bound states is infinite and the series of allowed energy levels is given by the well known formula: kALA= pn; p = 1,2.
(53) The key parameter which controls the energy positions of the quantum well bound states is the magnitude of the barrier height. This is illustrated in Fig. 8 for the ground states E l , LH 1, H H of GaAs-Gal - .AI,As quantum wells with three different thicknesses. Figure 8 demonstrates that the particle-in-a-box result (Eq. 53) is justified only for very thick wells. It can however be checked that the significant departure of a simple particle-in-a-box law for the energy levels is not accompanied by a dramatic leakage of the associated envelope
2oo( 0
-
--- LH, -HH, h
/
/'
5 Y
0;
X Oi2
GaAs-Ga(AI)As
Of
0.;
J
FIG.8. Left panel: dependences of the ground light (LH ,)and heavy (HH ,) hole bound states upon the barrier height in GaAs-Gal_xAI,As quantum wells for three different GaAs slab thicknesses (30 A, 80 A and 150 A respectively). Right panel: same as in left panel but for the ground electronic state (El). After Bastard (1988).
22
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
functions outside the GaAs layers. In fact, for LA 2 100 A and x = 0.3, the integrated probability of finding the carrier in the Ga,,Al,,,As barrier is smaller than 3.5%, and 0.65% for Eland HH1 states respectively. In Figs. 9 and 10, we show the L dependences of the electron and heavy hole bound states in G ~ A S - G ~ ~ ~ ~quantum A ~ ~ , , wells. A S The dashed lines which appear above V, and I V,l are the locii of the transmission resonances in the quantum well continuum. They fulfill: kALA
= pn; p = 1,2,.
.. .
(54)
where k, is the wave-vector appropriate to electron and heavy hole in the GaAs layer respectively. These transmission resonances can also be viewed (Bohm, 1951) as quantum well virtual bound states whose classical picture is that of a carrier swinging back and forth in the well (like a true bound state) before escaping to infinity (unlike a true bound state). In fact, the transmission resonances match the quantum well true bound states at the onset of the continuum ( E = V, and E = -eA + V, for conduction and valence electrons respectively).
0 0
I
50
I
L
150
FIG.9. The energies of bound (solid lines) and virtually bound (dashed lines) states for electrons in GaAs-Gao,,Al0,,As quantum wells are plotted versus the GaAs slab thickness L. After Bastard (1988).
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
0
50
23
100
L (8, FIG. 10. The energies of bound (solid lines) and virtually bound (dashed lines) states for heavy holes in GaAs-Ga,,,Al,,, As quantum wells are plotted versus the GaAs slab thickness L. After Bastard (1988).
The non-parabolicity effects,i.e. the increase of pA(&)with E, and its related effects on the energies of the r,-related bound states play a minor (though not negligible) part in GaAs-Ga, - .Al,As quantum wells. This is because the dimensionless parameter En/&Awhich controls the effects is usually small (50.2) in GaAs. For heterostructures whose well acting materials have narrower bandgap than GaAs, the non-parabolicity is more important. This is illustrated in Fig. 11 where the r6 states of Hgo.,Cdo.,6Te-Hgo.,6Cdo.24Te rectangular quantum wells are plotted versus the Hg,.,,Cd, ,,Te well thickness L. One notices that the Enenergies decrease like L-' rather than L - 2 . This is because E~ vanishes in Hgo,,4Cdo,,,Te, which leads to light particle dispersion relations which are linear rather than quadratic upon the wavevector. Most of the quantum well and superlattice structures grown so far were designed to display rectangular band edge profiles, which implies abrupt interfaces. The abruptness is however never perfect. At least, there exists a transition region, which is two monolayers thick (see Fig. 12) where the local environment of the C atom is neither that of AC bulk material nor that of a BC bulk material. Thus, rectangular quantum wells should better be viewed as the idealization of actual structures which display some interface grading. Very little is known on the modelling of interface grading. Clearly, the grading
24
G. BASTARD, C. DELALANDE, Y.GULDNER AND P. VOISIN 1
I
I
A
W
01
I
I
I
100
200
300
L
I
(8,
FIG.11. The energies of bound (solid lines) states for electrons in Hg,,,,Cdo,,,TeHgo,76Cd0,24Te quantum wells are plotted versus the Hg,,,,Cd,,,,Te slab thickness L. The T, band offset V, has been taken equal to 3.2 meV. After Bastard et a[, (1987).
d B d B d B d B C B C B C B C B I
D
0 I FIG. 12. Schematic representation of an interface between two materials A (chemical formula BC) and B (chemical formula AB). Notice the existence of the hybrid bond A-B-C at the interface.
should be sensitively dependent upon the growth conditions, host materials, .. . . Empirically, one assumes that the band edge profile experienced by the envelope functions in an undoped heterostructure is piecewise constant except near the i I h interface, where the band discontinuities are taken up over a transition region of width bi. In'the transition regions, the band edges
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
25
are assumed to vary smoothly with position. Energy levels calculations of graded by otherwise rectangular GaAs-Ga, .Al,As quantum wells have been reported (Stern and Schulmann, 1985) and, as expected, small energy changes have been found if the widths b , of the graded regions are much smaller than the GaAs slab thickness L . This amounts to saying that the carrier wavefunction is little affected by changes which occur in regions where the particle has little probability to be found. It may in fact be shown (Bastard and Voos, 1985) that if the graded regions b,, b, of a quantum well are such that b , + 6, << L and if the grading is reasonably smooth, the energy levels of such a graded rectangular quantum well are the same as those of a perfectly rectangular well of thickness L + 1/2(b, + b,), irrespective of the exact shape of the grading. It seems that this rule is obeyed over a wide range of b,, b,, as shown in Fig. 13. In some non-standard heterostructures, the band edge profiles are intentionally designed not to be rectangular. This is either to meet a specific device requirement (e.g. sawtooth superlattices (Capasso et al., 1983)) or to ~
FIG. 13. A comparison between two kinds of energy level calculations for a graded quantum well (linear grading on one side of the well). The crosses correspond to the energy levels of a rectangular quantum well with thickness L + h/2 while the dots correspond to the numerical integration of the Schrodinger equation. The effective mass is taken equal to 0.048 m, and only the E , state is considered.
26
G. BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
1
2)
FIG.14. Conduction band edge of a pseudo parabolic (upper figure) and a narrow Separate Confinement Heterostructure (lower figure).The thick parabola in the upper figure is the apparent conduction band edge profile experienced by the carrier.
display energy levels which are sensitive to a given parameter, for instance the band offset. Two band edge profiles have successfully been designed to achieve a better sensitivity to the band offset in GaAs-Ga, -,Al,As quantum wells. These are the pseudo-parabolic wells (Miller et al., 1984b) and the Separate Confinement Heterostructures (Meynadier et al., 1985b)(also called two steps quantum wells (Miller et al., 1985a)).Their conduction band edge profiles are shown in Fig. 14. The reason why non rectangular quantum wells are better suited than the rectangular ones to extract the value of, say the conduction band offset out of optical measurements of energy levels can simply be found by using qualitative reasoning. Consider the Schrodinger equation for Tsenvelope function and neglect the band non-parabolicity, which is irrelevant to the present discussion. Assume that the electron moves in a symmetrical potential (Fig. 15):
v,($ V ( z )= v, V ( z )=
V(Z)= V ( - z )
0 Iz IL Z2L
(55)
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
27
; ' -1
0
z/L
FIG. 15. The potential energy V/Vois plotted versus the dimensionless carrier position z / L for a potential profile (Izl/L)"and several values of LX.
We are interested in selecting the a which renders the energy levels the more sensitive to the offset V,. Using the W.K.B. method we obtain the approximate quantization rule:
where m* is the carrier effective mass, assumed to be position-independent for simplicity. Thus, qualitatively, we get:
Equation (59) shows that the smaller a makes the EN'S to be the more sensitive to V,. In fact, very large a's correspond to almost rectangular quantum wells whose energy levels are nearly independent of V,, while small a's make EN to depend markedly upon V, .This is why band offset determinations are easier in nonrectangular quantum well structures. The pseudo-parabolic wells are multiple quantum wells whose thicknesses are adjusted to mimic a parabolic band edge profile. Thus, for these structures a = 2 and EN V;/'. The multiple steps quantum wells are close to a = 1 if they contain many narrow steps (pseudo-linear quantum wells). In such limit a = 1 and EN V g / 3 .Even in their two steps version (Separate Confinement Heterostructures), there are several advantages in optically investigating these structures. The lowest lying electron and hole states are essentially confined within the narrow wells, while
-
-
28
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
more excited levels are spread over the whole structure. This helps in identifying the origin of the optical transitions (see Section IV). 2. In-plane dispersion relations The in-plane dispersion relations E,,(k,) are, in general, impossible to obtain in close form. This is because the off-diagonal terms in 2 + 6 2 prevent the light particle states to decouple from the heavy hold states at finite k,. However for I-,-related states, if well separated from the I-,-related ones, it is sensible to discard the 6 2 matrix. This allows the use of the same projection technique as outlined previously and to obtain for flat band heterostructures entirely analytical results. Let k, and kB denote the z projection of the carrier wave-vector in A and B layers. Then, we have: E(E
+
&A)
2h ( k i 3
=p
+ k:)P2
For a AB superlattice with period d(d = L A+ L,) the in-plane dispersion relations are (Bastard, 1981b, 1982, 1986):
where:
and where A*, A B have been set equal to infinity. In Eqs. (60-62) propagating waves have been assumed to exist in both kinds of layers. In the case of evanescent propagation in the B layer kB should be replaced by kBin Eqs. (61,62). It may be pointed out that Eq. (62) is not simply obtained from Eqs. (39, 50) by using k,-dependent kA,kB.Instead, the superlattice in-plane dispersion relations contain an explicit k,-dependent contribution, which originates from the hosts’ non-parabolicity. Since there is an effective mass mismatch between the A and B materials the apparent band offset V, for the z motion becomes implicitly k,-dependent. For instance there may exist situations where a quantum well binds n r,-related states at k, = 0 but n’ # n at k, # 0. This k,-dependent binding or de-binding of states was recently discussed by Doezema and Drew (1986).
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
29
The flat band heterostructures are a particular case of band edge profiles which are symmetrical with respect to the centers of the A and B layers. Owing to this parity property, to the fact that we have neglected the inversionasymmetry splitting of bulk materials and to the assumed zero magnetic field, each allowed energy in Eq. (62) is twice degenerate (Kramers degeneracy). We shall see below that non-centro-symmetric band edge profiles combine with non-vanishing spin-orbit coupling to induce a sizeable lifting of the Kramers degeneracy for r,-related states at finite k l . For T,-related levels in wide bandgap heterostructures the corresponding splittings are much smaller (Stein et ul., 1984). This is due to the S-like symmetry of the periodic parts of the Bloch functions. Even if the hosts' bandgaps are large enough to allow T6 and Ts levels to be decoupled, the in-plane dispersion relations of the Ts subbands are very complicated. Numerical diagonalizations become necessary. These can be achieved in many ways. (i) An approximate but flexible method consists of diagonalizing the offdiagonal 831" terms on the basis generated by the solutions of the diagonal T8 terms of the &@ matrix (Ando, 1985; Bastard and Brum, 1986). Notice that when we restrict our considerations to the Ts subbands the definitions of the yl, y2, y 3 parameters ought to be modified in such a way as to include the r6 bands in the sets of remote bands. This amounts to making the changes:
Similar changes hold for the y parameters in the B layers. The diagonalization of the off-diagonal perturbation on a basis built from the eigensolutions of the d X diagonal part requires that the boundary conditions imposed to the perturbed and unperturbed solutions are the same. Clearly this is possible only if the y parameter are identical in both kinds of layers or if the eigenstates under consideration are fairly well localized within one kind of layer to minimize the effectsof the jumps of they parameters at the interfaces. Thus, the treatment is actually applicable only to heterostructures whose host materials have similar 7's. Fortunately this is the case for many systems including the GaAs-Ga, _,AI,As one. A clear counter-example is
30
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
k l ( H x 1O6em-’)
k l (ltxlObem-’)
FIG. 16. In-plane dispersion relations of r,-related states in GaAs-Ga,,,AI,,,As single quantum wells ( L = 100 8, and L = 150 A respectively) in the axial approximation. The dashed lines correspond to the diagonal approximation while the solid lines account for the off-diagonal terms in the Luttinger hamiltonian. After Bastard and Brum (1986).
provided by the HgTe-CdTe heterostructures where the y’s change sign at the iriterfaces (see Section VI). To illustrate the previous method we show in Fig. 16 the in-plane dispersions of r,-related states for two GaAs-Gal -,Al,As rectangular quantum wells. Since the hole energies are much smaller than the GaAs spinorbit splitting A*, the coupling between r, and T, levels has been neglected. The dashed lines represent the subband dispersions obtained by neglecting the off-diagonal terms. Hereafter this will be termed the diagonal approximation. In the diagonal approximation all the results are analytical. One finds that the k, = 0 heavy hole eigenstates HH,, HH, . . . behave as light holes for their in-plane motion (see Eq. (16)). Conversely, the light hole states for the z motion display a heavy hole character for their in-plane motion (see Eq. (17)). As a result of this mass reversal, there should exist crossings between the HH, and LH, levels. The off-diagonal terms replace these crossings by anticrossings. The only parameter-independent conclusion one can draw about the Ts dispersions is that HH,, the topmost lying valence levels, always displays a heavier in-plane mass than predicted by the diagonal approximation. This stems from the negative sign of the energy denominators HH,-HH,, LH,HH, at k, = 0 which implies that all the couplings between HH, and the other valence levels contribute to a lowering the HH, hole energy. Another feature often found in the in-plane dispersions is the electron-like segment exhibited by several subbands in the vicinity of k, = 0 (noticeably LH,). For
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
31
LH, this is a consequence of the balance between the downward shift due to HH, and upward shifts due to HH,, LH, . . . . For narrow wells, such that only HH, and LH, are bound, LH, always display a hole-like curvature. As soon as HH, (and other levels) becomes bound LH, first behaves like an electronic subband and then turns into a hole subband at large k,. All the levels shown in Fig. 16 are twice degenerate since a rectangular band edge profile is symmetric with respect to the center of the well. For asymmetric structures such as those in Fig. 17 one should expect a lifting of the Kramers degeneracy. In fact, the calculated subband dispersion of a GaAsGa,-,AI,As quantum well tilted by an external electric field does show (Fig. 18) a pronounced lifting of the Kramers degeneracy for accessible field strengths (lo4- lo5 V/cm). Finally, it may be stressed that although we keep labeling the k, # 0 - Ts subbands in the same manner as used at k, = 0, i.e. HH,, LH,, this denomination is somewhat misleading. At k, # 0 there is a strong mixing between the k, # 0 eigenstates which is witnessed by the marked departure of the actual subband dispersions from those obtained in the diagonal approximation. In other words, the expectation value of the z-component of the total angular momentum J, (or of any function f ( J , ) ) is neither purely +3/2 or
L
b) FIG. 17. Some examples of band edge profiles of four different heterostructures lacking for inversion-symmetry. (a) Asymmetric quantum well. (b) Pseudo saa-tooth quantum well. (c) Rectangular quantum well tilted by an electric field. (d) Modulation-doped p-type heterostructure.
32
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
>-
a
a w
F = 105V/crn
- 50
X 20.3
0
0.5
1
k l (Ki lo6 ern-’) FIG.18. In-plane dispersion relations of the valence subbands of a GaAs-Ga,,,AIo,,As quantum well ( L = 100 A) subjected to an external electric field F//z ( F = lo’ Vjcm). The twofold Kramers degeneracy prevailing at k, = 0 is lifted at k, # 0. After Bastard and Brum (1986).
f 1/2 (or f (k 3/2, f (f 1/2)). This is illustrated in Fig. 19 for the two wells whose subbands dispersions where shown in Fig. 17. This band mixing effect (Sooryakumar et al., 1985; Chang and Schulman, 1983, 1985; Altarelli, 1983, 1985, 1986; Bastard and Brum (1986) influences the optical spectra (see Section 111). (ii) Another method which, in practice, works only for flat band heterostructures consists of finding the (k,, kA); (k,, kR) propagating and evanescent bulk states of the hosts’ layer and, by applying the k, # 0 boundary conditions at the interfaces, of numerically calculating the in-plane dispersion relations. Such a method is exact and therefore applicable to any flat band heterostructure, with wide gap or narrow gap host materials, similar or completely different 7’s etc. We show in Figs. 20,21 the calculated in-plane subbands of a 70A-70A and a lOOW-lOOA InAs-GaSb superlattices performed in the six band models (i.e. A,, A, being set equal to infinity) (Berroir, 1985). The energy zero has been set at the bottom of the InAs (r,) conduction band which lies 0.15eV above the top of the GaSb valence band (Esaki, 1980). This staggered configuration leads to a crossing, for k, = 0, of the E , and HH, subbands at d = d , 180 A, where d is the superlattice period if equal InAs and GaSb layer thicknesses are assumed. The E , and H H ,
-
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
<:&/
1-
0.5
/-
I
33
= 0.3
I
.---------.
x ~0.3 0
0.5
1
kl ( K Xlo6 crn-’1 FIG. 19. The quantity (<J$))*’z is plotted versus k , for two rectangular GaAs-Ga, ,AI,,,As quantum wells ( L = 100 A and L = 150 A respectively). In the diagonal approximation ( ( J : ) ) ” * would be equal to 3/2 (HH, levels) or 1/2 (LH, levels) irrespective of the k , value. After Bastard et al. (1987).
subbands are heavily localized within the InAs and GaSb layers respectively. For superlattice periods d < d,, El(k, # 0) lies above HH,(k, # 0) while the reverse is true ford > d , . The in-plane subbands of the 70 A-70 A superlattice exhibits a relatively simple pattern. The El subband is electron-like in the layer plane while there exists hybridizations and anticrossings between the HH, and LH, subbands, like in GaAs-Ga, -,Al,As heterolayers. On the other hand, the 100 A- 100 A superlattice is such that El(k, = 0) < HH,(k, = 0) < HH,(k, = 0). As a result of the k,-induced couplings HH,, HH, and El anticross. A noticeable consequence of this level repulsion is the electron-like curvature exhibited by HH, at large enough k , . It is in fact remnant of what a E , subband would have given if it had been allowed to cross HH,. One notices on Fig. 21 that the lowest lying electron branch (HH,) is always separated from the topmost hole branch (HH,) by a finite energy gap. This result invalidates the concept of semimetal -+ semiconductor transition which
34
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
150
0 0
1
2
FIG.20. In-plane dispersion relations of a (70 A-70 A) InAs-GaSb superlattice with d = 140 A. The energy zero is taken at the bottom of the InAs conduction band. q = 0. Courtesy J. M. Berroir.
200
-% E
150-
100
0 0
1
2
k l (H/d 1 FIG. 21. In-plane dispersion relations of a (100 A-100 A) InAs-GaSb superlattice with d = 200 A. The energy zero is taken at the bottom of the InAs conduction band. q = 0. Courtesy J. M. Berroir.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
35
would have occured ford > d, (Altarelli, 1983, 1985, 1986). However, the forbidden gaps are very small (few meV's) and to our knowledge have never been evidenced (e.g. by a truly semiconducting behavior at low temperature). More works remain necessary before the complicated band structure of InAsGaSb superlattices is understood. (iii) Another method which works in any heterostructure has been developed by Altarelli and coworkers (Altarelli, 1983, 1985, 1986; see also Ekenberg and Altarelli, 1984, and Fasolino and Altarelli, 1984, 1986). It amounts to projecting the vector envelope function on a prescribed basis within each kind of layer and to enforce the correct boundary conditions at the interfaces. The numerical calculations appear to be heavy but the method is fairly general. It has been successfully implemented for rectangular type I, I1 and I11 quantum wells and superlattices, modulation-doped p-type heterojunctions, sawtooth superlattices, etc. When a comparison can safely be made, the three types of calculations give very similar results. Most of the differences arise from the choice of the higher band parameters which, as stressed previously, are often disputable. C. Perturbation of Heterostructure Electronic States by External Fields
The energy spectra of bulk 111-V or 11-VI cubic materials subjected to external electric (F) or magnetic (B) fields depend relatively little on the directions of these fields, through the band warping. The latter is the more pronounced for the Tsvalence bands. On the other hand, the heterostructures being strongly anisotropic media display a marked dependence upon the direction of the applied fields. The most striking effects arise when F or B are colinear with the growth direction whereas in-plane fields produce effects which are either similar to those observed in bulk materials (electric field effects) or relatively weak (magnetic field effects).In the latter case the Landau quantization (characteristic energy: heB/m*c, where m* is an appropriate effective mass) hardly competes with the size quantization (characteristic where L is the characteristic length of the quantum well). energy: h27cc2/2m*L2 1. Electric Field EfSects
i. In-plane Electric Field (F//x) Let us for simplicity restrict our consideration to r,-related heterostructure states. In the parabolic limit and neglecting effective mass and dielectric mismatches, the envelope functions are the solutions of:
36
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOlSlN
where V(z) is the heterostructure potential. The electronic motion can thus be separated into x-, y- and z-dependent contributions: &=Ex+-
h2k; 2m* +&,
In Eq. (68) E, is the nlh eigenvalue for the carrier motion along the growth axis and h2k;/2m* takes care of the free motion along the y axis. The energy spectrum for the x motion is continuous and unbound ( - 00 < E, < + 00). In fact, g(x) can be expressed in terms of the Airy functions: 7)1’3($ g(x) = A i [ (2m*eF - x)]
d2Ai dt2 + tAi(t) = 0 The continuous spectrum for the x motion merely recalls the fact that an electron is constantly accelerated along the field, which leads to an unbound motion. Also, a possible motion can be associated with any cxr positive or negative, since the eFx term takes arbitrary large values of either signs. This behaviour is not peculiar to heterostructures but has already been analyzed in bulk materials (Franz, 1958; Keldysh, 1958).The only difference between bulk materials and heterostructures lies in the replacement of a plane wave exp(ik,z) (bulk materials) by the envelope function x,(z) (heterostructures). In the case of r,-related subbands an in-plane electric field will induce inter-subband transitions since x, y and z motions as well as spin and orbital variables are admixed, even at F = 0. This particular configuration has not been much studied to our knowledge. ii. Longitudinal Electric Field (F // z) Restricting again our considerations to r,-related subbands and considering a quantum well configuration, we separate the electronic energy E into a longitudinal (E,) and a transverse (q)parts. The latter is the kinetic energy associated with the x and y motions: E=EZ+El=Ez+-
h2k: 2m*
The envelope function f(r) is factorized accordingly:
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
37
where x ( z ) is the eigensolution of:
-h2 d Z
1
+ eFz + V ( z ) ~ ( z=) E,X(Z)
(74)
Again the eigenvalues E, can take any value between )+ a(which, like in 1I.C.l.i, arises from the unboundness of the electrostatic potential. However, if the electric field intensity is not too large (say F 5 lo5 V/cm) certain quasidiscrete energies E, are such that their associated envelope functions pile up inside the quantum well, i.e. these 8,’s are virtual bound states. They often have lifetimes which are so long that they behave, for all practical purposes, as true bound states. Several authors (Miller et al., 1985a; Austin and Jaros, 1985; Singh, 1986) have recently discussed the widths and positions of the resonant states of a quantum well tilted by an external electric field. When the widths are small (see below for a criterion) the results (Miller et al., 1985a; Austin and Jaros, 1985; Singh, 1986)coincide with those of perturbation and variational calculations performed by neglecting the tunnel effect across the tilted barriers, i.e by treating the virtual bound states as true bound states (Bastard et al., 1983). Most of the experimental results and device applications involve the ground quantum well states El, HH,, LH, in GaAs-Ga,_.AI,As single or multiple quantum wells. Thus from now on we shall only discuss these cases. The criterion for neglecting the finite life-time of the quantum well bound states due to the field-induced carrier escape outside the well is that the effective barrier height V, - El (respectively I Vpl - HH,, IVpl - LH,) is not appreciably lowered by the electric field over a distance equal to the penetration length ti;’ (respectively KH,!,~, ti;!,l) in the barrier, i.e.
In Eqs. (75-77), E:”, HH\”, LH\” are the ground state confinement energies for electron, heavy hole and light hole respectively while m,, mhh and mlhare the corresponding effective masses. Assuming that the conditions (75-77) are fulfilled we notice that at low enough electric field a quadratic Stark shift prevails, e.g. for electrons one may write: E , ( F ) = E:” - m,F2 (78)
38
G . BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
There is no linear in F term in Eq. (78) since for a rectangular quantum well the zero field eigenfunctions have a definite parity, which prevents the existence of a permanent electric moment. A finite electric field polarizes the carrier wavefunction and the quadratic Stark shift reflects the interaction of the induced moment with the inducing field. The coefficient a, can be calculated by a second order perturbation expansion:
where the xioh are the zero field eigenfunctions of the quantum well. We see from Eq. (79) that a, > 0, i.e. that El experiences a red shift and a, scales like m,L4. This scaling law indicates that the heavy hole are more easily polarizable than the electrons, L being given. On the other hand, the range of validity of the perturbation expansion is narrower for heavy holes than for electrons. In fact, Eq. (79) is justified if: h2K2 eFL c 2m,L2
i.e. the electric field domain where Eq. (79) is valid narrows like mF1L-3. When the perturbative approach fails (but the tunnel escape keeps being negligible) one enters into the carrier accumulation regime. This regime is characterized by a smoother E , ( F ) variation upon F than a quadratic law and corresponds to a situation where the induced dipole tends to saturate instead of growing linearly with F. The carrier wavefunction accumulates near the interface (Fig. 22). If the confining barriers were impenetrable this regime would last up to infinite F, the carrier escape being blocked by the quantum well walls. Actually, xl(z)leaks more and more heavily outside the well until the tunnel effect becomes large enough to invalidate Eqs. (75-77). For larger fields the carrier is swept out of the well. For these large fields the continuous spectrum of allowed E, values no longer display quasi discrete&,’s.It should be stressed that the critical field F, beyond which fades away is typically larger than lo5 V/cm for actual GaAs quantum well thicknesses (50-200 A). The energy shifts E , ( F ) - E,(O),HH,(F) - HH,(O) may be equal to several tens of meV’s for F < F,. The capability of tuning the energies of the quantum well states and thus of the optical transitions (see Fig. 23) over a significant range while preserving (essentially) the carrier localization in the well is one of the key reasons for the achievement of novel high speed electro-optical devices. The effect of a longitudinal electric field on the in-plane dispersion of the valence subbands can only be numerically analyzed. We have already discussed the main effect, which is the lifting of the Kramers degeneracy due to the large spin orbit effects in the r,-related valence subband (see Fig. 18 for a relevant example).
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
-100
-50
0
50
100
I
39
0
FIG.22. Envelope functions for electron and heavy hole in a quantum well ( L = 30 A and L = 100 A respectively) tilted by an external electric field. ( F = lo5 V/cm).The dashed lines represent the well boundaries. After Bastard et al. (1983).
2. Landau Levels in Heterostructures Let a static, uniform, magnetic field B be applied to a semiconductor heterostructure. We denote by 8 the angle between B and the growth (z) axis:
B = (0, B sin 8, B cos 0)
(81)
40
G. BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
G -9 z, -100 -
Ga As -
0
50
100
150
F ( kV/cm 1
FIG. 23. The calculated exciton resonance energy shift hw(F)-hw(0)is plotted versus the electric field intensity F for GaAs-Ga,,,,AI,.,,As quantum wells of different thicknesses. After Bastard and Brum (1986).
The vector potential A associated to B can be chosen as: A
= ( B z sin 0,
Bx cos 0,O)
(82)
In the presence of a vector potential the momentum of the electron p should be replaced in the Hamiltonian Eq. (4) by p + eA/c where e is the magnitude of the electronic charge and c is the speed of light. One should also add to Eq. (14) the interaction between the electron spin u and the field, i.e. gopBB t~ where go = 2 is the free electron Lande g factor and pB is the Bohr magneton. ‘ + d 2 the These changes amount to replacing in the effective hamiltonian S operator p by p eA/c and to adding to X + 13% the diagonal “spin” matrix:
-
+
g*pBB * u l r , - K*PeJ * B(1r8
lr,)
(83)
where g* (g* = 2 F ) and K* are the remote band contributions to the r6,Ts, r7 gyromagnetic moments and where lr6,lrs, I,, are the identity matrix operating in the r6, and r, subspaces respectively. i. Longitudinal magnetic field (0 = 0) If B//2, we expect a separability between the x, y and z motions since the heterostructure potential depends
only on z while B only affects the in-plane components of the orbital motion.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
41
However, the Landau levels cannot in general be labelled by a single index n owing to the in-plane warping of the dispersion relations. However, if following Altarelli (1983, 1985), one adopts the axial approximation, which amounts to replacing y2 and y3 in Eq. (15) by their arithmetic average, the inplane dispersion relations become isotropic, and it is possible to label the Landau levels by a single index n. An inspection of Z 6% reveals that the following ansatz:
+
1
x -exp(ik,y)
fn(r) =
JL,
is a solution of the problem. In Eq. (84), (P. is the nlh harmonic oscillator function centered at -L2k, where 1is the magnetic length: =
(kJ2
For B = 10 Teslas the magnetic length is equal to 81 A. In f,(r) it is assumed that the indexes of the oscillator wavefunction are 2 0 . Thus n 2 -2. The Landau level energies are independent of k , which accounts for the in-plane translational invariance of the heterostructure problem when the field is applied along the growth axis. For a given n, except -2, - 1, 0 which are singular, there exist eight different manifolds of X + d X eigenstates. Each of these manifolds can be, for convenience labelled according to the zero field nomenclature: HH,, LH,, Ek, (r7)i. This convenient notation should not hide the fact that a finite magnetic field, like a finite k,, significantly admixes the zero field light particle states (LH,, E,, (r,),) with the zero field heavy hole state (HH,). Finally, even if the B = 0 eigenstates exhibited a twofold degeneracy, this degeneracy is lifted at finite B. Once n is given one may apply to the Landau level calculations the same recipes as outlined in k, # 0 case (Section II.B.2). If flat band conditions prevail one may again search for the exact B # 0 host eigenstates, extract the relevant kiA’, kiB’wavevectors (2 x 8 per layer), write that inside each layer the heterostructure eigenstate is a linear superposition of these host eigenstates and finally determine the coefficients of the linear expansion by using appropriate boundary conditions across the interfaces. Or one can take the B = 0 eigenstates and diagonalize the B-dependent terms (for a given n) or
42
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
proceed like Altarelli and co-workers (Altarelli, 1983, 1985, 1986; Ekenberg and Altarelli 1984; Fasolino and Altarelli, 1984, 1986). In any case, the numerical calculations are laborious. In heterostructures with wide gap hosts like GaAs-Gal -,AI,As, the r6related states are well energy-separated from the r,-related ones. The Landau levels of the r,-related subbands are complicated (see Fig. 24 for an example). In a given manifold they are unevenly spaced in n and exhibit a pronounced non-linearity upon B. This is a mere consequence of the complexity of the T, in-plane dispersion relations. The associated wavefunctions display a large mixing of m, = k3/2 and m, = & 1/2 components. Only the n = -2 manifold is simple for the associated wavefunctions have only a m, = - 3/2 component (see Eq. (84)). Thus, the n = - 2 manifold is purely heavy hole-like. The r,-related Landau levels of GaAs-based heterostructures are much simpler than the r,-related ones. They exhibit a faint spin-splitting (Lassnig, 1985; Lommer et af., 1985; Stein et af., 1984) and increase almost linearly with B (the deviation from the linearity may be due to the small band non parabolicity of GaAs). If we retain a parabolic description of the hosts’ r6 bands and neglect the effective mass and g* mismatches, one can analytically derive the Landau level energies. The effective hamiltonian acting on the fi
1
0
GaAs-Ga(AI1As
L=IOOi
10
5
[
15
B (TI FIG.24. Valence Landau levels of a GaAs-Gao,,AIo,,Asrectangular quantum well. L = 100
A. Axial approximation. After Brum (1987).
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
43
and f2 envelopes is:
where the
+ sign refers to fi. One readily obtains the eigen-energies:
where: eB
w, = m *c
and v labels the eigenstates of the z motion. The wavefunction associated with E,, n, a, is:
with
The interesting point is that the Landau level spectrum is entirely discrete (for the bound states of the z motion). For a given bound state v one can write the density of states as:
where gB is the universal degeneracy factor associated with the center of the cyclotron orbit:
The discreteness of the B # 0 spectrum is in marked contrast with the continuous spectrum which prevails at B = 0: p;u;o(&) = L,L
m* Y2& ~
Y(E - Ev)
(93)
The imperfections and defects round off the 6 singularities of the B # 0 density of states (Fig. 25). Ando has performed extensive numerical computations of
44
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
FIG.25. Density of states associated with the Landau levels of a semiconductor heterostructure (schematic). From top to bottom: unperturbed Landau levels; Landau levels perturbed by disorder (self consistent Bohrn approximation); empirical gaussian broadening; actual density of states: the hatched areas correspond to wave-functions which are localized in space.
the density of states of the broadened Landau levels (Ando, 1983, 1984). It appears, at least for short range scatterers, that the calculated density of states approximately retain a gaussian shape: pvnG,= (2nrn)-li2exp
(E
- E, - ( n
+ +))Am, - g*pBBiJz)2 217;
]
(94)
The widths rndepend on the magnetic field B. More importantly, Ando found that the states which correspond to the tails of the gaussians are spatially localized whereas a fringe of states around the unperturbed Landau level energies are spatially extended. The existence of localized states separating the extended states is one of the central ingredient for the explanation of the Quantum Hall Effect (von Klitzing et al., 1980).
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
45
The previous features (discreteness of the energy spectrum, localized tail states, etc.) are also present in the Landau levels of the r,-related subbands. The complexity of the unbroadened spectrum has, to our knowledge, precluded any attempt of extending Ando’s work to the calculation of the magnetic field dependent density of states of the valence subbands. ii. Arbitrary 6’ When the magnetic field is not parallel to the growth axis, the z and in-plane motions become coupled. As a result the eigenstates cannot be labelled by a single n (see however the ni-pi’s case (Maan, 1984).The most striking feature of the tilted magnetic field orientation is that the Landau orbital quantization can to an excellent approximation be described as if only the longitudinal component Bcos6’ of B played a part. This holds if the z motion is strongly quantized. This feature provides an immediate experimental test of the quasi bi-dimensionality of the carrier motion. The spin degree of freedom is however only dependent on the magnitude of B as imposed by the rotational invariance of the Zeeman term. Thus, for r,-related subbands, where to an excellent approximation spin and orbital motions separate, there is a possibility of some kind of “orientational spectroscopy” of the energy levels obtained by tilting the magnetic field. For r,-related subbands the spin-orbit coupling prevents such a simple decoupling. In superlattices with thin enough barriers, the carrier motion along the growth axis exhibits significant dispersions (see Figs. 5, 7 for order of magnitude of superlattice bandwidths). Thus, by tilting the magnetic field around the growth axis one should not observe peak positions varying like Bcos6’ but rather probe the anisotropy of the superlattice subbands. Belle et al. (1985) have shown how interband magneto-optics with an in-plane magnetic field provides a decisive test of the existence of superlattice bands along the growth axis of GaAs-Gal _,Al,As superlattices. Electron cyclotron resonances performed by Duffield ef al. (1986) on GaAs-Ga, -,Al,As superlattices have evidenced the anisotropy of the superlattice conduction band. The interpretation of these experiments have yielded the first determination of the electron effective mass along the growth direction.
D. Coulombic Impurity States in Heterostructures Like in bulk materials, the substitutional coulombic impurities give rise to series of shallow donor or acceptor levels in semiconductor heterostructures. These levels have been thoroughly theoretically analyzed as well as their behaviour under the action of external perturbations (magnetic field, electric field, uniaxial stresses). In most of the III-V heterolayers, the dielectric mismatch between the host materials is small. If one neglects this mismatch, a substitutional coulombic
46
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN -
impurity gives rise to a potential energy: V(r)
=
- Ze'e ~
K l r - ri(
(95)
where Ze' is the extra charge of the impurity (in practice only 2 = 1 has been studied to our knowledge), r and ri are the position vectors of the electron and of the impurity respectively and K the relative dielectric constant of the heterostructure. The in-plane translational invariance of the heterostructure allows to set x i = y i = 0 without loss of generality. On the other hand, one of the salient features of the coulombic bound states in heterolayers lies in their dependence upon the impurity position zialong the growth axis. Such a dependence is a direct consequence of the lack of translational invariance of the heterostructure along this axis. The acceptor problem (e' < 0) is more difficult to handle than the donor one (e' > 0) because the r,-related subbands are more intricate than the r6related ones. However, in both cases similar trends are obtained. Let us for simplicity restrict our considerations to rectangular quantum well heterostructures (other band edge profiles have been considered: see (Brum et al., 1985). If not too thin the quantum well supports several bound states for the z motion. These bound states are the onset of two-dimensional subbands associated with the in-plane free motion. A coulombic impurity with zifixed will, in general, creates an infinite number of true bound states below the ground quantum well subband and a (presumably) finite number of resonances (or virtual bound states) below the onset of each excited subband. The resonance nature of the latter impurity levels stems from their degeneracy with the two-dimensional continua of the lower lying quantum well subbands. As pointed out by Priester et al. (1984a), the broadening of the ground resonant donor states due to this degeneracy is very small. Both acceptor and donor binding energies are markedly dependent upon the quantum well thickness L . If we denote by a t the three dimensional Bohr radius of the impurity, one may take L/a; as the dimensionless parameter which governs this effect. When L / a t >> 1 the quantum well structure behaves as a bulk material with respect to the binding energy of the ground bound state of an on-center impurity. Qualitatively, the on-center impurity has enough room in the quantum well to display a 1s bulk-like hydrogenic wavefunction. When L decreases the confinement energy, measured from the bottom of the well, of the ground quantum well subband increases due to the increasing spatial localization of the carrier. So does the ground bound state of the impurity. However, the energy difference between these two quantities, i.e. the impurity binding energy also increases. This is because the particle is held close from the attractive impurity center by the quantum well walls. Thus, when L
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
47
decreases the binding energy first increases. If the barriers were impenetrable this increase would last until L = 0 (Bastard, 1981a) and one would recover the well known result that a two-dimensional hydrogenic impurity has a binding energy which is four times larger than the three-dimensional one. In actual heterostructures, the barrier heights are finite and instead of monotically increasing with decreasing L, the hydrogenic binding energy reaches a maximum. Then it decreases with decreasing L until it reaches at L = 0 the binding energy of a hydrogenic impurity in a bulk barrier material (Masselink et al., 1983, 1985; Mailhiot et al., 1982; Greene and Bajaj, 1983; Chang 1987) Off-center impurities display a similar behaviour with L as the on-center one. Considered as function of the impurity location z i , measured from the center of the quantum well, the impurity binding energy decreases monotically with increasing /zil (Masselink et al., 1983, 1985; Bastard, 1981a; Tanaka et al., 1983; Chang 1987). This may again be qualitatively understood by noticing that it is only at zi = 0 that the impurity center and the maximum of the quantum well bound state wavefunction coincide, i.e. that the carrier is forced by the quantum well to be the closer from the impurity site. As shown in Fig. 26 the on-edge donors still retain a sizeable binding and so do on-edge acceptors (Masselink et al., 1983, 1985, Mailhiot et al., 1982; Bastard, 1981a; Tanaka et al., 1983; Priester et al., 1984b; Chang 1987). When the impurity is in the barrier it still creates a bound state below the ground quantum well subband. The binding energy of this state decreases relatively slowly with increasing separation between the impurity and the well. Moreover, a barrier impurity also gives rise to a resonant level roughly located at one effective Rydberg below the onset of the quantum well continuum. As soon as zi - L/2 (or Izi + L / 2 ( )is larger than a:, this resonance is very narrow and, in practice, it is justified to consider it as a regular hydrogenic bulk impurity level attached to the barrier edge, unless it coincides in energy with a quantum well bound state (see Fig. 27). The impurity hamiltonian one has to consider for a quantum well structure is:
where the z origin has been taken at the center of the quantum well and V, is the barrier height (V, for electrons, IV,l for holes). In Eq. (96) T is the scalar kinetic energy term of r,-related electrons or the Luttinger 4 x 4 matrix for the kinetic energy of r,-related holes. In the latter case both the barrier and impurity potentials have to be multiplied by a 4 x 4 identity matrix. It is clear that the eigenvalues &(L,zi)of A? are such that:
E(L,Zi)= E(L, - Z J
(97)
on- edge impurity m
1 0
50
L
100
4)
FIG.26. The binding energy of on-edge donors in GaAs-Ga(A1)Asquantum wells is plotted versus the GaAs slab thickness for several assumed barrier heights: (1) V, = 212 meV, (2) Vb = 3 18 meV, (3) V, = 424 meV, (4) infinite &. After Priester et al. (1984b). 107
104 7
;
/ I
I
I
I
I
I 4006
I I
I
103
I I I
I
isoA
102
10
I
I I
1
I
I
I
10210
I
looo
lo-'
10
L (A) FIG.27. The average tunnel time from a quasi-discrete hydrogenic donor level attached to the barrier to the quantum well extended states is plotted versus the quantum well thickness L for two distances separating the impurity site from the interface: zi - L/2 = 400 A (left scale); zi - L/2 = 150 A (right scale). After Brum and Bastard (1987). 48
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
49
This parity property results from the symmetry of the unperturbed heterostructures. Let us denote by pimp(ci)the impurity density of states per unit binding energy assuming that the impurity sites are distributed at random in the well:
The parity property Eq. (97) implies that pimp(e,,,) is infinite, where E,,, is the maximum binding energy, which occurs at zi = 0. It has been shown (Masselink et al., 1983, 1985; Bastard, 1981a) that pimp(zi)may also display a secondary maximum if L/a: 5 1. This secondary maximum takes place at the binding energy of the on-edge impurities. Thus, for an equirepartition of the impurity sites in the heterostructure one expects that the on-center impurities play a dominant part and the on-edge impurities a significant part if the well is narrow enough. However, actual GaAs-Gal -.Al,As quantum wells are seldom characterized by an equirepartition of the impurity sites. At least for acceptors there is a segregation of impurities in the vicinity of the inverted interface. If we denote by gimp(zi) the impurity profile in the heterostructure pimp(ci)becomes:
Thus, if the inverted interface segregates many impurities, the peak of due to on-edge impurities may offset that due to the on-center impurities. The correlation between impurity distribution functions and optical spectra well be thoroughly discussed in Section V. There does not seem to exist an exact solution to the impurity hamiltonian. Variational solutions of the impurity problem have thus been sought (Masselink et al., 1983, 1985; Mailhiot et al., 1982; Bastard, 1981a; Brum et al., 1985; Priester et al., 1984a, 1984b; Greene and Bajaj, 1983; Tanaka et al., 1983; Chang 1987), either by projecting the impurity wavefunction on a set a gaussian functions with prescribed characteristic lengths or by using trial wavefunctions of the form:
for coulombic donors. In Eq. (loo), xl(z) is the ground quantum well bound state wavefunction and II is the variational parameter. The acceptor problem is more difficult to handle but symmetry considerations help to reduce the difficulties for on-center acceptors. Masselink et al. (1983, 1985) have used the
50
G. BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
trial wavefunction: 7
C nr;l/Z 1 Cn(i)exp- a i ( x Z+ y 2 + pzz) n=O 4
$(r) =
i=l
(101)
when T;‘I2is a five component basis whose elements displays or d symmetry,, Cn(i)and p are variational parameters and a, fixed lengths chosen to cover a large physical range. Masselink et al.3 results for the ground acceptor state are presented in Fig. 28. We also show in Fig. 29 Mailhiot et al.’s results (1982) for the ground on-center donor state. As expected both curves display similar trends: in both cases the binding energy first increases with decreasing L and then decreases for very narrow wells. There have been several theoretical studies devoted to the perturbation of the coulombic bound states by external fields.
‘ 0
50
150 200 $50 300 350 Well Width (A)
100
FIG. 28. The binding energies of acceptors in GaAs-Ga,,,AI,,,As quantum wells are plotted versus the GaAs slab thickness. After Masselink et a/., 1983, 1985.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
51
Energy w i t h Respect t o 1st Conduction Subband
-2
O n - Center Donor Ground State
I
-12
10.1
I
:03
I :0.4
-14 -16
1
40 60 80 100 1 2 0 Number or GaAs Monolayerr 20
FIG. 29. The binding energies on-center donors in GaAs-Ga, _,AI,As is plotted versus the GaAs slab thickness. One GaAs monolayer i s 2.83 A thick. After Mailhiot et al., 1982.
When a magnetic field is applied parallel to the growth direction of a heterostructure, it shrinks the impurity wavefunction in the layer plane and therefore increases the impurity binding energy (Masselink et al., 1983, 1985; Greene and Bajaj, 1985; McDonald and Ritchie, 1986).This effect, well known in bulk materials (Yafet et al., 1956), has been analyzed in great details in GaAs-Gal - .Al,As quantum wells. At low field, the “1S” donor state exhibits a quadratic Zeeman shift while at large field it approaches from below the energy of the n = 0 Landau level. Excited “2P*” states can be analyzed in a similar fashion: the “2P+”(“2P-”) states approach the n = 1 (n = 0) Landau levels at large field. The calculated energy differences between the 1 s and 2P, states and their variations upon B are in good agreement with experiments (McCombe et al., 1986). The effect of a longitudinal electric field is peculiar in that it increases the binding energy of the impurities located near the interface where the carrier wavefunction accumulates under the action of the field, while it decreases the energy of the impurities located near the other interface (see Fig. 30 for the donor case (Brum et al., 1985).This feature may explain some reinforcement of the impurity photoluminescence line in p-type doped GaAs quantum wells subjected to a longitudinal field (Miller and Gossard, 1983a).
52
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
- I
I
.-
W
0
100
200
300
F (kV/cm 1 FIG.30. Impurity binding energy versus electric field intensity F(F1.z)for a quantum well of thickness. L = 200 A and L = 100 A. V, = 0.4 eV. Five impurity positions are considered: (a) -L/2; (b) -L/4; (c) 0; (d) L/4 and (e) L/2. After Brum et al. (1986).
E. Many Body Effects in Heterostructure Energy Levels
We conclude Section I1 by considering two different sorts of effects linked to the electron-electron interaction in heterolayers. In insulating heterolayers the creation of electron-hole pairs give rise to a series of sharp lines below the threshold of the band to band absorption (see Section IV). These lines are due to the excitons. The excitons in undoped quantum wells have been the subject of a number of theoretical investigations (Miller and Kleinman, 1985; Miller et al., 1981b; Bastard et al., 1982; Greene et al., 1984; Jiang, 1984; Brum and Bastard, 1985a; Duggan et al., 1985; Dawson et al., 1986; Sham, 1986; Sanders and Chang, 1985; Chan, 1986; Ekenberg and Altarelli, 1986; Bauer and Ando 1987). Besides their interest for fundamental studies, they have enabled the outcome of a new generation of optical and electro-optical devices (Miller et al., 1985b; Miller, 1986) operating at room temperature.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
53
On another hand, one may achieve the creation of high mobility quasi bidimensional electron or hole gases by selectively doping the barrier-acting material of a heterostructure (modulation-doping (Stormer, 1980).The energy levels of heterolayers containing charges is a complicated many body problem. We shall present a brief overview of this problem, restricting ourselves to a Hartree treatment of the carrier-carrier interaction (exchange and correlation effects are discussed in (Bauer and Ando, 1986a; Kleinman, 1985, 1986; Ruckerstein et al., 1986; Stern and Das Sarma, 1984; Ando, 1982). 1. Exciton States in Type I Quantum Wells The lowest lying excited states of undoped semiconductor quantum wells are the excitons, i.e. the shallow bound states formed between a conduction electron and a valence hole (Wannier excitons). If the valence subbands were as simple as the conduction ones, the exciton problem would resemble very much that of the coulombic donor problem. Actually, despite the fact that the hole kinematics considerably obscure the exciton algebra, it remains true that the exciton states in quantum wells share many features with the coulombic impurity levels. Noticeably, the trend of the binding energy versus the quantum well thickness and the occurence of sharp excitonic resonances below the edge of excited subband + subband transitions (i.e. HH, -+ Em or L H , -,Em;n + m even, n or rn > 1) in addition to the HH, -, E, exciton bound state are identical to what is found in the impurity problem. The enhanced binding energies (in type I quantum wells) allows the observation of excitonic absorption peaks at room temperature in GaAs-Gal -,Al,As and other type I quantum well structures (see Sections 111 and IV). The excitons are delocalized entities: the electron-hole reduced motion can be bound or unbound but the center of mass (or its equivalent) is free to move in layer plane of perfect heterostructures. Coulombic or interface defects may however trap the excitons at low temperature, which affects the optical properties (see Section IV). In this section we summarize some of the salient features of excitonic bound states in type I quantum wells. Let us first consider an idealized quasi bi-dimensional exciton formed between an electron and a hole whose subbands display a quadratic dispersion upon the in-plane wave-vector. The exciton hamiltonian is then:
54
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
with: PI =
melmhl
= %l
+ mhl +mhl
( 104)
pl and M, are the reduced and total in-plane effective masses of the electron and the hole respectively. is the bandgap of the well-acting material (thickness L ) and K the relative dielectric constant of the heterostructure. The two-dimensional vectors:
are the in-plane projections of the reduced electron-hole position vector and of the center of mass position vector respectively. It is clear that the in-plane wave-vector (K,) of the exciton center of mass is a good quantum number, its associated wavefunction being the plane wave exp(iK, .RJ. Since the center of mass motion decouples from the internal degrees of freedom, we shall drop the former from now on. The in-plane reduced motion and the electron and hole longitudinal motions do not separate. Thus we shall again resort to the variational method to obtain the exciton binding energies. Notice that several sets of excitons may be formed with the electron and the hole belonging to the various conduction and valence quantum well subbands. A single set is truly bound: the excitons which are formed between electrons and holes both belonging to the ground subbands. The other excitons are resonances superimposed to the two-dimensional continua of the lower lying electronhole subbands. The broadenings which are associated with the autoionization of the excited exciton states are in fact small since it is possible (Miller et al., 1984b, 1985a; Meynadier et al., 1985b; Dingle, 1975) to observe excitonic structures (i.e. peaks) in the absorption spectra associated with optical transitions between highly excited electron and hole subbands. Here we shall neglect these broadenings. Consequently, if I)",,, denotes the trial exciton wavefunction with the electron in the conduction subband (En)and the hole in the Omthvalence subband (H,) and R,, the (H, - En)exciton binding energy there is: R n m = &A
+ En + H m - Min($nmlXxcl$nm)
(107)
where the minimization is performed over all the variational parameters of the ,$, trial wavefunction. ,, have been used. A very convenient one works rather well over Several $ the range of GaAs quantum well thicknesses which is the most investigated, i.e.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
55
0 . 5 U * , ,< L 5 2a*,, where a*, denotes the in-plane effective bulk Bohr radius (i.e.calculated with the in-plane reduced mass p J . This simple wavefunction is:
hn(rl,ze,
zh)
= Xn(Ze)Xm(Zh)gnm(rl)
gn,,,(rl)= L,(rJexp(ijq);j
= 0,
f 1, f 2 , . . .; 0 Iq I271
(108) (109)
where the x ’ s are the En and H, evelope functions and where we have made use of isotropicity of the coulombic interaction in the layer plane. For the ground (quasi 1s)exciton state j = 0 and:
where An, is a variational parameter. To write Eqs. (108, 109) amounts to assuming that the longitudinal electron and hole motions are forced by the quantum well potential while the bound states for the in-plane reduced motion are provided by the coulombic electron-hole interaction averaged over the probability densities of presence xi(zl) x i ( z h ) . Besides leading to simple algebra, Eqs. (108, 109) have the advantage of providing excited states (n, m == 1) wavefunctions which are automatically orthogonal to the lower lying solutions. More elaborate trial wavefunctions have been proposed leading to small improvements over the results derived from Eqs. (108, 109). The idealized quasi bi-dimensional excitons are an approximation of actual excitons built form r,-related electrons paired with r,-related holes. To obtain Sex, from the exact treatment one should use the diagonal approximation for the hole subbands (see Section 11.2). Under this approximation (Miller et al., 1981b; Bastard et al., 1982; Greene et al., 1984;Jiang, 1984; Brum and Bastard; 1985a), the excitons formed between heavy holes and electrons (heavy hole excitons) are decoupled from the excitons formed between light holes and electrons (light hole excitons). The effective masses mh//, mhl are defined in terms of the Luttinger parameters of the valence band: mhh// mlh
= (71 - 272)m0 = (71
+ 272)m0
+ Y2)m0
(111)
= (71 - 72lm0
( 1 12)
m h h l = (71 mlhl
where hh(1h) stands for heavy hole (light hole) and mo is the free electron mass. The definitions of p,, M , have to be modified accordingly. We show in Fig. 31 the L dependences of several ground (i.e. quasi IS) light hole and heavy hole excitons binding energies R,, in GaAs-Gao,,AIo,,As quantum wells calculated from Eqs. (108, 109) (Brum and Bastard, 1985a). Several features are noticeable in Fig. 31 :
(i) R,,(L) first increases with decreasing Land then decreases. This behaviour is the same as found for coulombic impurity states. It reflects the
56
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
FIG.3 1. Thickness dependence of the binding energies of the (n.m) excitons formed between the nlh electron subband and the mlh hole subband (n,m = l,2, ...) of a GaAs-Ga,,,AI,,,As quantum well (diagonal approximation). The full (broken) curves correspond to electron-heavy (light) hole excitons. After Brum and Bastard (1985a).
increasing quasi bi-dimensionality of the coulombic problem when L decreases until one of the two particles (or both) loses this quasi bidimensionality by having a confinement energy which approaches the top of the confining well. For this reason all the excitons En- HH,, En- LH, disappear at small enough L, unless n = m = 1. Notice however that it is in principle possible to form an exciton between a true bound state (say a valence state) and virtual bound state (say a conduction state) of the quantum well if the resonant state is narrow enough. The magnitude of the binding energy for such an exciton and its stability against dissociation have however never been calculated, to our knowledge. (ii) The binding energies R,, decrease with increasing n or m, but relatively slowly. In a first approximation, the exciton binding energies R,, are nearly the same for all n(granted that both the En,HH, or En,LH, pairs of levels are tightly bound in the weli). This arises from the weak dependence of the averaged electron-hole interaction upon the electron and hole quantum well bound state. (iii) The ground light hole exciton LH, - E, is found more bound than the heavy hole exciton HH, - E l , unless L is very small. This is a consequence of the mass reversal effect for the hole subbands in the diagonal approximation: the heavy hole subbands have a lighter in-plane mass than the light
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
57
hole subbands. Thus, the in-plane reduced mass is heavier for electron and light hole than for electron and heavy hole. This, joined to the insensitivity of the averaged electron-hole interaction to the difference between the xi's of the light hole and heavy hole, implies that LH, - El is more bound than HH, - El unless L is so small that the light hole wavefunction x1 leaks much more heavily than the heavy hole x1 in the barrier. There have recently been several calculations (Duggan et al., 1985; Dawson et al., 1986; Sham, 1986; Sanders and Chang, 1985; Chan, 1986; Ekenberg and Altarelli, 1986) attempting to improve the diagonal approximation for holes in the evaluation of the exciton binding energies. We have already stressed in Section II.B.2 that the diagonal approximation poorly describes the actual hole subbands. However, a great deal of the exciton characteristics arise from the electron, whose light mass prevails in the exciton reduced masses and thus in the binding energy. In fact, calculations including the subband mixing in the valence band improve very little ( 51 meV) over the evaluation of the HH, - El binding energy performed in the diagonal approximation for the holes. On the other hand, it appears that the actual cammel back shape of LH, may significantly enhance (2-3 meV) the binding energy of the E l - LH, excitons over the value obtained in the diagonal approximation. In no case, however, have these refined exciton calculations obtained E, - HH, excitons which were more bound than the El - LH, excitons. This is in agreement with optical data but disagrees with magnetooptical data (see Section IV). i. Excitons in a Longitudinal Electric Field If a longitudinal electric field is applied to a quantum well one should add a term eF(z, - zh) to the exciton hamiltonian Eq. (102). The electric field polarizes the electron and hole wavefunctions in opposite directions and therefore weakens the excitonic binding. However, we have seen in Section 11.1 that the field-induced carrier escape is considerably inhibited by the presence of the quantum well barriers. This allows for the persistence of a sizeable excitonic binding up to very large fields (Miller et al., 1984d, 1985a; Miller, 1986; Brum and Bastard, 1985b; Bauer and Ando 1987). In fact, clear excitonic resonances in the absorption coefficient have been observed in GaAs-Ga, -,Al,As quantum wells subjected to longitudinal fields which were in excess of lo5 V/cm. These fields are one order of magnitude larger than the ones which ionize the excitons in bulk GaAs. We have already discussed in Section 11.1 the longitudinal Stark effect of free particles. Keeping the same ideas as the ones which led to Eqs. (108, 109), we notice that the external field will change the electron and hole confinement energies in the well but will affect the excitonic binding only through the modification of the averaged electron-hole coulombic interaction; i.e. the electric field effect on the excitonic binding is a second order effect. These ideas
58
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
can be substantiated by detailed variational calculations using Eqs. (108, 109) for the exciton wavefunction, except that the xl’s now refer to the ground electron and hole envelope functions in a quantum well titled by the field. In order to treat both the quadratic Stark shift and the carrier accumulation regimes, the ground exciton wavefunction is written:
where N is a normalization constant and the x\O”s are the zero-field envelope functions for the ground electron and hole states in the well. P is a variational parameter which, through the exponential exp[ -P(ze - zh)], expresses the shifts of the electron and hole envelope functions towards each interfaces of the quantum well. We show in Fig. 32 the electric field dependence of the El - HH, exciton binding energy in GaAs-Gao,,,Al0,,,As quantum wells of different thicknesses. The most striking feature is that the exciton binding energy varies little with F compared with the E,(F), HHl(F) shifts. At low field the exciton binding energy decreases quadratically with F, the decrease becoming steeper with increasing quantum well thickness. At larger fields the carrier accumula-
w GaAs -Ga,-,AI, AS
X :0.32
10
-3 E
- 5 A= A=
a
0
20
40
60
80
F ( kV/cm 1
FIG.32. Electric field dependence of the heavy hole exciton binding energy in GaAsGao,68Alo,32As quantum wells of different thicknesses (F// 2). After Brum and Bastard (1985b).
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
0
25
50
59
75
F(kV/cm) FIG.33. Comparison between the calculated (solid line) and measured (bars) values of the energy position of the HH, - E , exciton peak in GaAs-Ga0.,,A1,,,,As. After Brum and Bastard (1985b).
tion regime prevails and the exciton binding energy little varies with F. Ultimately the electric field sweeps the carriers outside the well and the excitonic binding energy should drop to zero. This regime is however not describable by the trial wave-function given in Eq. (1 13). Figure 33 shows a comparison between the calculated and measured HH I - E, exciton peak energies in GaAs-Gao,,,,A1,,,As multiple quantum wells ( L = 95 A). The small discrepancy between theory and experiment may be due to the simplified form of the exciton trial wavefunction and/or to strains effects not included in the calculations (absorption measurements require to etch part of the GaAs subtrate). One may conclude from Fig. 33 that the electric field effects in quantum wells are at least semi-quantitatively understood. When the electric field is applied in the layer plane, it affects the excitonic binding drastically. In fact the physical situation is much the same as that found in bulk materials when the field-induced energy difference across one Bohr diameter becomes comparable with the zero field exciton binding energy; the exciton ionizes. Being in a quantum well, the exciton is more bound than in bulk material and its in-plane effective Bohr radius is smaller. Thus a larger field is needed to ionize an exciton in a quantum well than in the corresponding bulk material. The gain is however modest, typically a factor of 2. ii. Bound Excitons in Quantum Wells. U p to now we have considered delocalized excitons in ideal heterolayers. In the presence of defects or impurities the excitons can be trapped to become excitons. Like in bulk materials bound excitons have relatively small binding energies in quantum
60
G . BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
well structures. The excitons bound to neutral acceptors were considered by Miller et al. (1982d) and Kleinman (1983) who showed that their binding energies are a fixed fraction of the acceptor binding energies (Haynes’ rule). The fraction is however larger in GaAs-Gal -,Al,As quantum wells (0.13) than in bulk GaAs. A new class of defects exist in heterostructures: the interface defects. At the scale of the envelope function an ideal interface is a plane (e.g. z = L/2). In actual heterostructures this plane is warped. There exists islands where the barrier material protrudes in the well and vice-versa. In good GaAsGa, _,Al,As heterolayers these islands are one monolayer thick and may extend over several hundreds of Angstroms in the layer plane. It has to be kept in mind that the dimensions of the interface defects are strongly dependent upon the growth techniques (M.B.E. versus M.O.C.V.D.) and upon growth conditions (substrate temperature, flux of elemental species, etc.).Some optical data in M.0.C.V.D.-grown GaAs-Gal -,Al,As quantum wells have been interpreted in terms of small islands (in plane extension -300 A) (Bastard et al., 1984a) while other experiments performed in M.B.E.-grown GaAsGa, -,Al,As systems called for the existence of large islands (21000 A wide) (Brum and Bastard, 1986 (unpublished)). The physics of excitons bound to interface defects is markedly dependent upon the defect size. Let us assume for simplicity that the exciton binds to the interface defects through a size quantization of its center of mass while keeping its internal degrees of freedom unchanged. If the interface defect has a large area the motion of the exciton center of mass will be poorly quantized: the effective two-dimensional quantum well which binds the exciton center of mass will admit many bound states. In other words, inside the large interface defect (characterized by a local thickness L + 6L) the exciton is almost free. The actual heterostructure then behaves as a collection of micro samples, each being almost perfect and each having a given thickness. This leads to inhomogenously broadened excitonic lines (see Section V). If the interface defects are small (in-plane extension 5 500 A) the exciton center of mass will be effectively size quantized and it will be possible to observe these bound excitons at low temperature. Very little is known about the shape of the interface defects. Thus, we are forced to use rough models. A defect is characterized by its depth b and its inplane extension a (neglecting any in-plane anisotropy). If the defect corresponds to the barrier protruding in the well, it repels both the electron and the hole (type I quantum well) and therefore can only scatter theexcitons. If, on the other hand, the defect corresponds to the well protruding in the barrier it may also bind the exciton. Suppose one such attractive defect is centered at the point (O,O, - L/2) of the z = - L/2 interface. If one assumes that it displays a
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
61
(b) 34. (a) Schematic representation of a semi-gaussian interface defect. (b) In-plane average potential energy seen by an electron (or a hole) moving in a quantum well whose interfaces display semi-gaussian defects. The average is taken over the ground state for the z motion of the unperturbed well. Attractive and repulsive defects of equal depth b are considered. Courtesy J. A. Brum. FIG.
semi gaussian shape (Fig. 34) the exciton defect hamiltonian will be:
(1 14)
The exciton-defect hamiltonian has to be inserted in Eq. (102)to obtain the full hamiltonian of the bound excitons. We are interested in finding the lowest bound exciton state of the lowest lying exciton (HH, - El). The latter (quasi
62
G. BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
IS free exciton) has a wavefunction:
(115) where we have explicitly written the wavefunction of the center of mass degree of freedom and where S is the sample area. The energy corresponding to Eq. (1 15) is:
The binding energy q of the bound exciton is measured with respect to El,(0). If q is small enough it may be roughly calculated by assuming that only the center of mass is affected by interface defect. The trial wavefunction for the bound exciton will thus be written as:
(117) The choice of a gaussian localization is somewhat arbitrary. Note however that it provides an exact result in the limit of large a, where a two-dimensional oscillator ladder is obtained. Figure 35 presents the binding energy q as a function of the in-plane defect size a for several depths b in the case of GaAsGao,48AIo,,2Asquantum well. It may be seen that the curves q versus a flatten when a exceeds 300 A. It may also be seen that the binding energy increases sublinearly with b when b increases: with the trial wave-function Eq. (1 17) the defects whose b’s are 22 ~ l,; 21~;’ (where K ; ~h~ are the penetration lengths of the El and HHl wavefunctions in the barrier) do not bind the exciton more . 36 presents the trapped effectively than those with b 5 21~;’, 2 ~ ; ~Figure exciton binding energy versus the in-plane extension of the defects a for several well widths L in GaAs-Ga,,Al,.,As quantum wells. The defect depth b has been set equal to one GaAs monolayer, i.e. 2.83 8. The binding energies are very small if L 2 150 A. This is due to the fact that for such “wide” quantum wells the ground state wavefunctions xl(z,), xl(zh) are heavily localized in the GaAs well, which prevents the El - HH1 exciton to probe the interface defects. When L decreases IC;’ and K;’ increases and concomitantly the trapped exciton binding energy also increases. Thus, the manifestation of excitons bound to interface defects is a characteristic feature of narrow quantum wells. The defect sizes a, b are randomly distributed in a given sample. Thus, the optical features associated with the excitons bound to these interface defects
-
‘,
I
=a52
h
-
100
0
200
300
LOO
(A)
a FIG.35. The trapped exciton binding energy is plotted versus the lateral size a of attractive interface defects for different defect depths b in GaAs-Ga, _,AI,As quantum wells (x = 0.52). L = 70 A. After Bastard et al. (1984a).
t
16'
c, 0
200
100
a
(b)
300
Fic. 36. The trapped exciton binding energy is plotted versus the lateral size a of attractive interface defects for several quantum well thickness L in GaAs-Ga0,,A1,,As quantum wells. The defect depth b is taken equal to 2.83 A. After Bastard (1988). 63
64
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
should represent an average of the energy level results over the distribution of the defect sizes. The connection between the energy level calculations and the optical experiments will be more thoroughly discussed in Section V. 2. Energy levels of heterostructures containing charges In bulk semiconductors the carriers are, apart from thermal generation, supplied by impurities. Thus, their low temperature mobility, which is dominated by the ionic scattering, is inherently low. In semiconductor heterostructures it is possible to simultaneously create a quasi bi-dimensional electron (or hole) gas and to maintain a high carrier mobility. This is achieved (Stormer, 1980) by doping selectively the barrier-acting material of the heterostructure (Fig. 37). Let us for definiteness assume that the dopants are donors (e.g. Si in Gal -,Al,As). All the extra electrons supplied by the donors cannot stay bound to their parent atoms without leading to a discontinuity of the chemical potential in the heterostructure. Thus, to ensure the necessary continuity of the chemical potential, some of the electrons are transferred in the well-acting material leaving positively charged impurities in the barrier. The electrostatic potential which results from the charge separation adds to the “natural” potential energy profile of the heterostructure to give rise to a quasi-triangular potential shape in the vicinity of
MODULATION DOPING
UNSTABLE
*-,
THERMAL EQUILIBRIUM
charge transfsr
UNSTABLE
THERMAL EQUILIBRIUM chargs transfer
FIG.37. Schematic illustration of the modulation doping
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
65
the interfaces which separate the wells and the barriers. The electronic motion becomes quantized along the growth direction while being free in the layer plane (if one neglects the doping fluctuations). The electron envelope functions are essentially localized in the wells. Thus, the ionic scattering is considerably reduced over the bulk situation by the imposed spatial separation between the carriers and their parent donors. This separation can be further increased by inserting an undoped barrier spacer between the doped part of the barrier and the well. In GaAs-Gao,,A10,3Asmodulation doped single heterojunctions the low temperature electronic mobilities has reached values as large as - 2 x lo6 cm2/Vs for areal concentrations of -10" cm-2 (Weimann and Schlapp, 1985). This is considerably larger than any low temperature mobility achievable in bulk GaAs. Similar mobility improvements have been reported for hole gases (Mendez and Wang, 1985). Compared with the undoped heterostructures, the energy levels in heterostructures which contain free carriers are significantly changed by the coulornbic interaction between the carriers. In the simplest approach, namely the Hartree approximation, the interaction of a given carrier with all the others is expressed in terms of an averaged self-consistent potential which depends only on the coordinates of that given carrier. Thus, a complicated many body problem seems to have been reduced to a one-body problem which resembles those already encountered in previous sections. However, the formal analogy is incomplete to the extent that the band edge profile is not a rigid one, i.e. fixed from the outside, but actually depends, via the Poisson equation, on the electronic states of all the other carriers. This selfconsistency requirement between the potential energy profile and the carrier wavefunctions is one of the genuine feature of energy level calculations in heterostructures containing charges (see e.g. Ando et al., 1982). The Hartree approximation is only the leading term in the expansion of the potential energy of the carriers versus their density n,. There exist corrections, the exchange and correlation terms, which may be parametrized as a function of n,. These corrections should be larger for hole gases than for electron gases of equivalent areal concentrations owing to the larger masses of the holes. In the following, we restrict our considerations to the Hartree approximation and for simplicity we only discuss r,-related subbands. The two coupled Schrodinger and Poisson equations which have to be solved self-consistently are:
66
G. BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
where we have neglected the in-plane fluctuations of the doping profiles Nd+(z), N ; ( z ) to restore the in-plane translational invariance of the heterostructure. In Eq. (1 18), V,(z) is the conduction band edge profile in the absence of charges. In Eq. (1 19), n, is the areal concentration of carriers of the ith occupied subband whose band edge is c i ; IC is the static dielectric constant of the heterostructure (the dielectric mismatch between the wells and the barriers have been neglected) and Nd+(z),N;(z) are the ionized donors and ionized acceptor profiles in the heterostructure respectively. The doping is either intentional (as e.g. the donors which have been deliberately introduced in the barriers) or is residual arising, e.g. from the imperfect control of the growth. In M.B.E.grown layers the residual doping is often due to Carbon, which is an acceptor, while in M.0.C.V.D.-grown layers the residual doping is often of the donor type. In thermal and electrical equilibrium the chemical potential is constant and the net charge is zero. The latter condition amounts to requiring that the electric field (- dq,,/dz) is the same at both ends of the heterostructure, i.e. vanishes if there is no external field imposed from the outside. These equilibrium conditions together with the solutions of Eqs. (1 18, 119) univoquely determine the charge transfer and the energy levels in the heterostructures (Delagebeaudeuf and Linh, 1982;Wang et al., 1984).Notice that the existence of an interface between the heterostructure and the ambient may alter the charge transfer if the doped part of the barrier is not thick enough. Also the charge transfer in photoexcited samples has to be calculated by taking into account the persistent photoconductivity effect arising from deep donor levels (DX centers) (Stern, 1986). The selective doping and the pronounced carrier localization in the wells makes the details of the self-consistent potential in the barrier to be unimportant for the determination of the E;S. Thus, instead of having to solve simultaneously the Poisson, Schrodinger and equilibrium equations one may split the problem into two parts: (i) Search for the E;S and x;s for a given set of n,'s. (ii) Search for the equilibrium values of n, using the functionals Ei(ni)in the equations which express the electrical neutrality of the heterostructure and the equality of the chemical potential on both sides of the interfaces. In practice, several ways can be used to solve Eq. (1 18),i.e. purely numerical ones or by diagonalizing -'eqsc(z) on a prescribed basis or by means of a variational procedure. We illustrate the energy level calculations in doped heterostructures by considering the case of one side modulation-doped GaAs-Gal - .Al,As quantum wells (Meynadier et al., 1985b). The selective doping takes place in the barrier which is grown after the GaAs well. This ensures an improved mobility
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
67
of the electron gas over that found in quantum wells which are doped on either sides (Burkhard et al., 1986). Compared with the single heterojunctions the one-side doped wells offer the advantage of confining both the electrons and the (photocreated) holes in the GaAs well. This allows photoluminescence studies of the energy levels. Figure 38 shows the conduction band edge profile of a one-side modulation-doped quantum well. The Schrodinger equation has been solved in the Electric Quantum Limit, i.e. by assuming that only the ground subband is occupied. This assumption is valid in most instances. The trial wavefunction is chosen in the form:
X,@)
= NX\O)(Z)
exp( - Bz)
( 120)
to account for the accumulation of electrons near the interface separating the GaAs and the doped Gal -.Al,As layers. In Eq. (120),~\‘)(z) is the ground state envelope function of the quantum well under flat band conditions and is a variational parameter. The residual doping includes both charged acceptors sitting at the inverted interface (areal concentration N A ) and volumic charged acceptors (volumic concentration NAcc)which are ionized over a depletion length 1, and enter in the calculations via their equivalent areal concentraThe Fermi level is assumed to coincide with the tion Ndep(Ndep = NAcclA). Gal _,Al,As neutral acceptors far away in the undoped Gal -,Al,As side.
FIG.38. Schematic conduction and valence band edge profiles in a one-side modulationdoped GaAs-Ga, -,AI,As quantum well. The actual growth occurs from positive to negative z. The narrow wells are included in the structure to improve the quality of the inverted GaAsGa, _,AI,As interface.
68
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
0
50
100
150
200
250
L (%I FIG.39. Quantum well thickness ( L )dependences of the ground electron state (El)and the conduction band edge drop (AV) between the two sides of the well in a one-side modulation doped GaAs-Gal -,AI,As quantum well. Ndep and NArespectively denote the equivalent areal or areal concentrations of charged acceptors in the bulk or at the inverted interface respectively. n, = 5 x 10" cm-2. After Bastard (1986).
Figure 39 illustrates the evolution of the ground electron state El and of the potential energy drop over the well AV versus the quantum well thickness L for n, = 5 x 10" cm-' and a fixed set of heterojunction design parameter (Ndep, N A , V,. . .). One clearly sees the progressive evolution from a quantum well situation ( L + 0) where El > AV towards that of single heterojunction ( L --t m) where AV > El which corresponds to a situation where the electron practically does not experience the second interface of the well. These trends are also visible in Fig. 40 where the dimensionless wavefunction (m)xl(z) and the self-consistent potential - ecp,,(z) are plotted against the dimensionless position in the well z / L for three different quantum well thicknesses L = 50 A, 100 A, 150 A keeping the same set of heterostructure parameters. The wavefunction is more and more concentrated near the z = 0 interface which separates the GaAs well from the doped part of the barrier. The potential energy drop across the well increases about linearly with L. The calculated r,-related subbands of one-side modulation-doped quantum wells display a significant lifting of the Kramers degeneracy (Fig. 41) which is due to the marked asymmetry of the electrostatic potential with respect to the center of the well. Otherwise, they do not qualitatively differ much from the valence subbands of rectangular wells exhibiting anticrossings between light and heavy hole subbands, cammel-backs, etc.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
69
FIG.40. Dimensionless envelope function (dashed lines) and conduction band edge profiles (solid lines) versus the dimensionless position in the well for one-side modulation-doped GaAsGa, -.AI,As quantum wells. The material parameters are the same as in Fig. 40.
30
0-
-10
0
I
I
I
I
I I I
0.5
1
K I h a lo6 cm-') FIG.41. In-plane dispersion relations of the valence levels in 150 A thick GaAsGa,,,,AL,,,,As one-side modulation-doped quantum well. After Brum (1987).
70
G . BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
111. FORMAL OPTICAL PROPERTIES
A . Introduction
This section is devoted to the basic optical property of superlattices or quantum well structures, namely the way they absorb light. The importance of optical absorption lies in the fact that it reveals essentially intrinsic properties, that is electronic states having large densities of states. As such, optical absorption is an ideal tool for the characterization of band structure properties. However, the experimental procedure consists of measuring the transmission of light through the sample, which results in two serious problems: i) the substrate has to be transparent in the region of interest, or has to be removed using elaborate etching techniques; ii) the absorption is detected as a variation in the large background of transmitted light, and due to signal-to-noise-ratio considerations, it is in practice impossible to study structures having less than 1: 10 quantum wells. On the other hand, luminescence is an extremely sensitive tool suitable for the study of near band extrema intrinsic states as well as shallow defects. An intermediate technique is the excitation spectroscopy in which the intensity of a luminescence line is detected as a function of the energy of the exciting light. If the process of internal energy relaxation which occurs between the excitation and the luminescence has no resonant features, the excitation spectrum should be essentially proportional to the absorption coefficient. For samples having a high luminescence efficiency, this technique allows the measurement of the absorption by a single quantum well, without any constraint on the nature of the substrate. Starting with the general formula of the simple one-electron model of the absorption process, we examine in Section B the selection rules which arise from the symmetry properties of the cell periodic and envelope parts of the wave function. The latter are quite different in type I and type I1 systems. We first discuss the shape and the magnitude of the absorption coefficient (Section B) within a simplified model, neglecting the effects of valence subband mixing at finite k l . Then (in Section C) we describe the influence of the valence band mixing which occurs in real structures. In Section D, we emphasize the differences between structures having isolated quantum wells and superlattices with strongly coupled QWs. In type I heterostructures, the excitonic interaction is enhanced by the two dimensional character, and this affects in a considerable way the shape of the absorption edges. The theory of excitonic absorption is beyond the scope of this paper, and only the major results and their incidence on the absorption lineshape are described in Section E.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
71
When a magnetic field is applied parallel to the growth axis, the in-plane motion becomes quantized into Landau levels, and the density of states becomes quasi-discrete. Interband magneto-absorption brings rich experimental informations, which we discuss in Section F.
B. Interband Absorption in an Idealised Quantum Well The calculation of the absorption coefficient (Johnson, 1967) proceeds in two steps: the rate at which photons disappear in the sample is first calculated, using the Fermi golden rule, and the spatial attenuation of theelectromagnetic wave is then deduced from the energy balance equation. In the weak absorption regime, this procedure gives the absorption coefficient associated with the transition between states li) and If) of energies Ei and E,:
-
a ( o ) = 4 7 r 2 e 2 / n c m ~ R o[(fie C pli)I26(E,
-
Ei - hw){f(Ei) - f ( ~ , ) } (121)
the refractive index of the In this formula, ho is the photon energy, n = medium, and R = S L k the volume of the sample, L k being the length along the propagation direction. More specifically, R is the volume in which the states li) and If) are normalized. e and p are the polarisation vector and momentum operator, respectively, and the function f(E) is the Fermi distribution. We now have to estimate the optical matrix element ( f l a - p l i ) and perform the summation other the relevant states (Voisin, 1986). The wavefunction of a state I i ) is written in the envelope function scheme, as in Eq. 3 and Eq. 8:
where n is a subband index, while the index v runs over the eight host band edges. The basic assumption, inherent to the effective mass treatment, is that the envelope functions x are slowly varying at the scale of a bulk lattice parameter, so that we may write:
In the following, we consider transitions between the few low lying subbands in QWs grown out of reasonably large gap materials, and we neglect the complications arising from the intricate r,-band in-layer dispersion. We assume that the conduction band is purely S-type and that the valence band consists of uncoupled heavy hole, light hole and split-off subbands. In this case, only one index v contributes significantly for each band, the envelope functions xv do not depend on kl, and we have simple parabolic in-layer
12
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
dispersion relations:
E,(k,)
=
Em(0)+ hzkf/2m‘,
H,(k,)
=
H,(O)- h2k:/2rny
where H stands for any of the valence subband. These assumptions correspond basically to the “diagonal approximation” of Sec. IIB. By examining the cell-periodic part of the Bloch wave functions shown in Table 1, we readily get the selection rules associated with the polarization, which are indicated in Table 11, with the notation ( S l p , l X ) = n. It is seen that in the usual configuration k//z corresponding to photons propagating along the growth axis, the three transitions are allowed. At the opposite, for an electromagnetic wave propagating in the layer plane, the transitions HH, -+Em become forbidden in the E , polarization while the LH, + Emtransition remains allowed. This circumstance can be exploited to determine the nature of observed transitions in systems where their assignment is not obvious (Marzin, 1985).The overlap of the envelope function reads:
It leads to two selection rules: the first one, ky = k,”, is the direct consequence of the translational invariance in the layer plane. The second acts on the subband index: (i) In type I systems (Fig. 4a), the conduction and valence wave functions are localised in the same Q W layer. They have a definite parity with respect to the center of this layer, which leads to the selection rule n - rn even. If the TABLE I1 POLARIZATION SELECTION RULESFOR INTEKBAND TRANSITIONS Polarization Propagation
c,
C”
EL
Transition
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
73
quantum wells are infinitely deep, these envelope functions are: X ; ~ ( Z )=
J 5 / L Z sin nnz/LZ
(126) and their overlap is simply dnm.In QW’s with finite depth, the conduction and valence wave functions are not identical, due to their different evanescent wings in the adjacent barrier layers. Transitions with n - rn = 2 , 4 . . . become allowed, though they usually remain weak. (ii) In type I1 systems (Fig. 4b), the overlap is always small, as it is due to the leakage of the wave functions outside their confinement layers. For a given the number of width LA of the conduction QW, it goes to zero as l/&(but possible transitions increases accordingly). All the states still have a definite parity with respect to the centers of the A layers, but even and odd valence states are degenerate, so that there is no selection rule on the subband index. Transitions with An = 0 or An # 0, An even or odd have a priori equivalent strengths. We now examine the order of magnitude of the absorption coefficient. To be specific, we consider the HH, -+E, transition in a type I QW, for which the overlap of the envelope functions is nearly 1, so that the optical matrix element is simply r I / f i ; we have:
+
(~(0) = ( 8 n 2 r 2 / n c r n o R w ) ( r 1 2 / 2 m6(E, o ) ~ ~ ~HH,
+ E, + h2kj/2pL
-
hw)
(127) or:
+
+
a ( o ) = (1/LZ)(4ne2/ncrnowh2)(I12/2mo)plY(E, El H H ,
-
hw) (128)
where pl is the in-plane reduced mass and Y ( x )the step function. For any I I I V compound, 2112/rno rr 23 eV. If myH is large compared to m y , the reduced mass is not too different from the conduction mass, which scales with the bandgap. We thus get a “universal” number a(w)L, 2: 6.10-3 per transition and per quantum well. The absorption coefficient displays a step at the energy E, + E l + HH,, which evidences the energy level quantization and the 2D density of states. The observation of step-like absorption spectra together with absorption edges shifted to higher energies compared to the bulk material (Dingle, 1974) was the clearest experimental evidence of the quantization of the energy levels in real QW structures. If the structure consists of 10 QW’s, a y = 10 degeneracy factor appears in Eq. 128, and we find a ( o ) L z= 6.10-2, which indeed agrees very well with various experimental data. Transitions involving light holes are three time weaker (Table 11) and should also exhibit a different reduced mass. The absorption in a type I1 system is generally much smaller, due to the poor overlap between the conduction and valence envelope functions. The consideration of the order of magnitude of the absorption is
74
G . BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
usually a sufficient argument to decide of which type a given system is (Voisin et al., 1984b, 1986; Marzin et al., 1985). Finally, the 1 / L , dependance of the absorption coefficient deserves some attention: for a separate confinement heterostructure, the in-layer absorption coefficient for a given transition is proportional to the reciprocal thickness of the cavity where photons are confined; it does not depend on the Q W thickness. Remark: it is true that heavy and light particles are completely decoupled at k, = 0. However, non-parabolicity often affects significantly the calculation of the energy levels En,LH,. This means that even at k, = 0, the admixture of Pl,+S) states in the conduction (light-hole)wavefunction is not negligible. For a Iight-hole to conduction transition for instance, the optical matrix element becomes: {(Xg,n
1 ~ 4 % ) + ( X ; , n I ~k,:>)n/&
(1 29)
In fact, at k, = 0, X; is proportional to the derivative dXg/dz, and zkH proportional to dXbH/dz.The two terms in Eq. 129 thus have the same parity. The general result is that at k, = 0, non-parabolicity does not affect the parity selection rule. C . Band Mixing Effects
On the other hand, in real structures and at kl # 0, light and heavy valence subbands are coupled and mix even in “parabolic” or large gap materials. This means that the selection rules (both parity and polarization) are certainly relaxed away from kl = 0. However, in general, a forbidden transition will not become abruptedly allowed, nor an allowed one abruptedly forbidden. This is why the selection rules which we have discussed remain certainly a sensible guide-1ine. The expansion of the eigenfunctions at finite kl on the basis of the k, = 0 solutions described in Section 1I.B is particularly suitable for the discussion of the effect of the valence subband mixing on the optical properties. Indeed, the wavefunction of the i l h valence subband is written: ‘Yi,(z) = I/fieiklrlCX?k&)
lU”>
( 1 30)
with: In these equations, the indices i, 1 refer to the valence subband labelling and v = 1 to 8 to the host bandedges. The a’s can be determined from the resolution of the eigenvalue problem at finite kl (See Sec. 1I.B).
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
75
Equivalent equations may be written for the conduction subbands, though the band mixing in this case is often negligible (except for the InAs-GaSb and HgTe-CdTe systems). The optical matrix element then reads:
In many cases non-parabolicity is weak enough to be neglected in this calculation. For “parabolic” materials (large Eg and large A), we have: which corresponds to a purely S-type conduction band, and: l u , ) = [ u s )or l u g ) if i corresponds to a heavy-hole subband lu,) = J u 3 ) or Iu4) if i corresponds to a light-hole subband lu,) = Iu,)
(134)
or [ u s ) if i corresponds to a spin split-off subband
Besides, the optical transitions at kl = 0 often obey the practical selection rule j - i = 0. In such a case, the summation in Eq. 132 reduces to a reasonably small number of terms. To be specific, consider the absorption toward the first conduction subband in a type I QW. At kl = 0, only the HH, and LH I subbands contribute significantly to the absorption, with respective weights 1 and 1/3. At the opposite, at kl # 0, any of the valence subband may have a noticeable admixture of HH, and LH, character in its eigenfunction, and can therefore contribute to the absorption toward El. The general trend is that valence subband anticrossings occuring at large kl can be considerably washed out when the optical transitions are considered, giving rise to a step-like absorption coefficient looking much as that discussed in the “diagonal approximation”. At the opposite, special band configuration can be designed, with strong interactions at small kl: such samples clearly exhibit in their excitation spectra the effectof band mixing and the related breakdown of parity selection rules (Miller et al., 1985~). D . Optical Absorption in Superlattices A system of coupled QW’s presents several additional features. The ~ are now Bloch waves and depend on an additional envelope functions x , , , , ~(z) quantum number, the superlattice (SL)wave vector q in the z direction. To this new translational invariance corresponds a selection rule A q = Oequivalent to the selection rule Ak, = 0 found in Sec. 1II.B. As discussed below, the qdependence of the optical matrix elements results from the symmetry properties of the super-potential, which are different in type I and type I1 systems (Voisin, 1984; Voisin et al., 1984a).
76
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOlSlN
At k , = 0, x ~ , ~ , (hereafter ~ ~ = ~ ,x ,~, ~ )is solution of a 1 x I effective hamiltonian H , obtained by projecting the 8 x 8 hamiltonian on the state I v ) . Consider two consecutive A, B layers, and let PA,P, be the planes bisecting the A, B layers. The product RBRA of two reflections R A and R , with respect to PA and P,, respectively, is equal to a translation zd of the superlattice period d = LA + L , (Fig. 42). RA, R,, and zd commute with H,, but not with each other, except for q = 0 and q = n/d.For these values, which correspond to standing Bloch waves, the X , , ~ ’ S are eigenfunctions of R A and R , with eigenvalues f 1. They are even or odd with respect to PA,P,. The relation iqd R B R A X , , , ~= e shows that the parity with respect to the centers of one type of layers must be the same at q = 0 and q = n/d while the parity with respect to the centers of the other type of layers must be opposite at q = 0 and q = n/d.This is easily seen in Table 111. For an electromagnetic wave propagating along the SL axis, the interband optical matrix element becomes, in the simple “diagonal approximation” (Sec. 1I.B and Sec. 1II.B):
= (u,lpIu,)SX,*.eq.kIX~.q,kldZ
=
<’elPIUH)Mnrn(q,kl)
(135)
For type I systems, we assume that xe, x H both retain the same symmetry with respect to PAat both q = 0 and q = n/d.Then if the transition is parityallowed at q = 0 ( n - m even), it will remain parity allowed at q = n/d, as illustrated in Fig. 42(a) for the case of the ground state wavefunctions, n = m = 1. On the other hand, for a type I1 SL, x e (respectively xH) is expected to retain the same parity with respect to PA(respectively P,) at both q = 0 and q = n/d. If the transition is parity allowed at q = 0 (say that 1“ and z H are even with respect to PA,like in Fig. 42(b), then this transition becomes parity forbidden at q = n/d because in the integral in Eq. 135, X“ remains even with respect to PAwhereas x H becomes odd with respect to PA. More generally, we find in this model that in type I SL’s, M,,(k,,q) is essentially independent of q, is parity allowed if n - m is even and parity forbidden if n - m is odd, the n - m = 0 transitions being by far the most intense. At the opposite, in type I1 SL’s, the interband matrix element depends strongly on q. The transitions which are parity allowed at q = 0 become TABLE 111 TRUTH TABLE OF
q = O ( R B R A= + I )
+I
THE
PARITY STATEMENT
+I
-1
-I
+1
-1
or q = n/d(RBR, = - I
-1
+1
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
a ) type
I
I
77
.
PA
PB
CI type fl FIG.42. Schematic envelope functions for the conduction and valence ground subbands at 4 = 0 (full lines) and 4 = n/d (dashed lines) in type I and type I1 (b) superlattices.
parity forbidden at q = n/d,and vice versa. In a wide range of practical situations, e.g. semiconductor InAs-GaSb SL’s with reasonably large band gap, we find: 1
I Mnm(kL q)12 = 5 I Mnm(k1 0)12(1+ ( - 1)” - m c oq~d ) 9
9
(136)
Note that transitions with n - rn = 0 or n - rn # 0 will a-priori have equivalent strengths, always small compared to that of a n = rn transition in a type I SL. Finally, it is worth-noting that in the case of type I1 SL’s, the selection rule relies basically on the phase coherence of at least one of the Bloch envelope functions f , z H .Thus it could be relaxed if both the conduction and valence subband widths become smaller than their scattering induced broadenings: Here, we recover the “no selection rule” statement of Sec. 1II.B. On the opposite, the selection rule for type I SL’s are essentially those of the isolated quantum wells, and therefore they are less dependent on the scattering. The shape of the absorption edge will reflect the three-dimensional character of the SL. For many realistic cases, the subband dispersion relations along the SL axis are very well approximated by the following formulas, which are obtained analytically in a tight-binding description of the envelope
78
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
functions (Voisin et al., 1984a):
E,,,(k,, 4) = E,,,(kl, 0) - (- l)"AEm(1 - cos qd)/2 H,(k,, 4) = H,(kl, 0)
+ (-
(137)
l), AH,( 1 - cos q d ) / 2
where AE,,,(AH,) is the width of the involved conduction (hole) subband. Neglecting again the valence subband mixing, and assuming that M,,,,,(k,, 0) does not depend on k l , we perform the summation over k , and q and get, for the HH, + E , transition: I,
for type I and type I1 SL's respectively. ( is the reduced photon energy (hw - EF)/(AEl AHH,), where E? is the superlattice bandgap; A is the overlap of the electron and hole envelope functions near one interface, in the type I1 case, and C / L , is the prefactor of the step function in Eq. 128. The corresponding profiles are shown in Fig. 43. Near the threshold (g = 0) we get an absorption profile corresponding to an anisotropic 3D material, while far above (c > l), the 2D plateau is recovered. The transition at 5 = 1 is eye-marked by a Van Hove singularity in the type I SL, while this singularity is washed out by the q-dependent of the optical matrix element in the type I1 SL. These characteristics of type I1 SL's are easily observed in Fig. 44 (Chang et al., 1981), which shows the experimental absorption spectrum in a 27 A-
+
-
a (d3 c
c
.-c0
E2
5: n
type I1
-
(b)
-
( 1
0
I
FIG.43. Absorption lineshapes in type I (a) and type I1 (b) superlattices. 5 (AE, AHH,)is the reduced photon energy.
+
I
= (hv -
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
1.6
79
-
1.2 0.8 -
0.4-
0.2
0.3
0.4
0.5
0.6
PHOTON ENERGY
(cV)
0.7
FIG.44. Low temperature optical absorption of a 27 A - 4 4 A InAs-GaSb superlattice (dashed-and-dotted line), from Chang et af. (1981); Joint density of state of the conduction and valence subbands (dashed line), arbitrarily scaled with respect to the experimental curve (calculation using thicknesses 27 A-40 A); theoretical absorption lineshape (solid line), fitted to the experimental curve.
44 A InAs-GaSb SL, together with a theoretical spectrum. The calculation was made at kl = 0 (and with thicknesses 27 A-40 A), in the two-band model; only M , , = 0.23 was calculated. The step-heights in Fig. 44 have been adjusted
to fit the data: for the HH, -, E , transition, the experimental absorption is about twice the calculated value (see Sec. 1II.E). Figure 44 also displays the joint density of state (no selection rule, no q-dependence of M,,,), which shows the Van Hove singularities. Finally, it is worth noting that in this type I1 system, due to non-parabolicity, the subband width AE, and the overlap A decrease significantly at large k, (Bastard, 1982; Schulman and Chang, 1981, 1985). It follows that the absorption coefficient associated with a given transition decreases instead of being constant on the two-dimensional plateau. E. Excitonic Effects
The theory of optical absorption which we have sketched in Sec. 1II.B is oversimplified because it relies on a one electron model. In fact, we should consider the initial state as N electrons in the valence band, and the final state as N - 1 electrons in the valence band plus one electron in the conduction band. As long as Coulomb interaction is neglected, this correct treatment
80
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
gives exactly the same result as the one electron model, though much more algebra is involved. Coulomb interaction in the final state leads to exciton states, In the electron-hole formalism, with some approximations and after considerable manipulations (Johnson, 1967; Knox, 1963), an exciton may be described by a two-particles envelope wave function in the form: ,,!,;(re, ),.,
=
c,
AEeiW-rH)
eiK(m.r,+mHrH)/(m.+mH)
(139)
The AL's are the Fourier components of a function cpn(r, - r,) which is the solution of the Schrodinger equation of the hydrogen atom. The optical matrix element associated with the creation of an exciton is (u,lplu,)cp"(O), and the equivalent in terms of exciton of the selection rule k, = k, is K = 0. Thus only discrete bound states with S-type wave functions and zero wave vector contribute to the optical absorption, and they manifest themselves as sharp lines below the one particle band gap E,. Their oscillator strength is proportional to 1cpn(0)1' and it decreases rapidly with n. Usually, only the 1 s exciton is clearly resolved. The excitonic interaction is enhanced by the 2D-character, as the series of binding energies in isotropic 3D and purely 2D systems are respectively: E;D
= R*/n2
and
E ; = ~
(n 2 1)
R*/(n - 1/2)'
(140)
where R* is the 3D effective Rydberg energy: R*
= 0.5(e'/h~,)' {m,m,/(m,
+ mH)}
(141)
The oscillator strength (q"(O)(' is also correlatively larger and decreases faster in the 2D than in the 3D regime: (cp"(O)(:D
=
l / n a * w and (cp"(0)(iD = 4/.na*'(n
-
1/2)3
(142)
where a* = (2R*/e2)-"' is the 3D effective Bohr radius. Beyond E,, the absorption in the continuum corresponding to ionized pair states is still affected by the Coulomb interaction. The band-to-band absorption coefficient given by Eq. 121 is multiplied by the "Sommerfeld factor", which considerably increases the absorption near the onset. This Sommerfeld factor S(w) has been calculated analytically for the 3D and 2D cases (Lederman and Dow, 1976): S3D(Eg
+
E)
= (2n/&/R*)/(l - e-2n/G/R' )
+ E ) = 2/(1 + e - 2 n / f i / R1 *
S~D(E,
(143)
In the 2D case, the Sommerfeld factor exactly doubles the absorption near the band gap, and in fact, it decreases quite slowly above it. The ideal absorption profiles in the 2D and 3D cases are shown in Fig. 45. Real type I quantum well structures represent an intermediate situation in which the density of states is purely 2D while the Coulomb interaction remains
h
3
-
1s
m
a a
CONTINUUM
nS
/
z
0 I-
n
a
0 rr,
a
-*--
a
/
/--
I
I
I1
-4
___----- _----
0
I
5
&/
10
R*
FIG.45. Schematic absorption profiles for 3D (a) and purely 2D (b) materials, with (solid lines) and without (dashed lines) Coulomb interaction.
essentially 3D. The measured binding energies, which are in reasonable agreement with the various calculations, are considerably larger than in bulk materials. The related enhancement of the oscillator strength allows the observation of exciton features at room temperature in relatively small gap materials such as In,,,Ga,,,As-InP (Temkin et a!., 1985) or In,53Ga,,,AsIno,52Alo,48As (Miller et al., 1985b) Q W structures. In real structures, the excitons features are broadened, both homogeneously through various scattering mechanisms and inhomogeneously trough trapping on layer thickness fluctuations, etc.. . . If the integrated absorption is conserved, the contrast between the exciton peak and the band to band absorption is easily evaluated in function of the exciton broadening. A representative figure is a contrast of ‘v 15 for an exciton linewidth of N 3 meV. In type I1 heterostructures, the spatial separation of the carriers weakens the Coulomb interaction. The corresponding “interface exciton” has a binding
82
G. BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
energy which is smaller than the effective Rydbergs of the host materials (Bastard et al., 1982). However, the experimental data discussed in Sec. 1II.D seem to indicate that the Sommerfeld factor is still of the order of 2.
F . Magneto-Optical Absorption Whereas direct optical absorption brings rich information such as the nature of the system, confinement energies of several pairs of subbands, etc it is clear that this technique cannot address the fundamental problem of the inlayer kinematics of the r, subbands in 2 D systems. The only parameter directly related to this problem, which is the electron-hole pair reduced mass pl, is not measured with a sufficient accuracy. Magneto-optical absorption brings richer experimental data which may help for this purpose. When a magnetic field B is applied parallel to the growth axis, the in-plane motion is quantized into Landau levels, and the density of states becomes a series of 6 functions, as sketched in Fig. 46a. We discuss first an extremely simplified situation assuming parabolic, non-degenerate subbands. The energies of the Landau levels associated with the ground conduction and valence subbands are: EiM* =
El
+ ( M + 1/2)hwc f 1/2 g e P d
Hi,* = -Hi
- (N
(144)
+ 1/2)Aw, f 112 g H @
where oi = eB/cml is the cycloton frequency, g e ( H ) the Lande factor of the corresponding subband and p B the Bohr magneton. The associated wave functions are: 4GlM
l(/?lh'
l/JL,eikyY = l/&eikyY =
X3Z)cpMM((X
- xo)/4lu,
f)
XY(Z)cpjg((x
- xO)/A)lu"
f)
(145)
where 1= (hc/eB)1/2 is the magnetic length, xo = -A2kY, and the cp's are the eigenfunctions of the harmonic oscillator. In this simplified picture, optical transitions between the El and H, Landau levels are allowed, with the selection rule N - M = 0 imposed by the overlap of the oscillator functions. The strength of these transitions is proportional to and thus it increases linearly with the magnetic field. Transmission minima are observed at photon energies hv, = E, + El, H 1,. This may be done experimentally by recording the transmission at fixed photon energy while sweeping the magnetic field, or at fixed B while sweeping the photon energy. These transition energies extrapolate at zero magnetic field toward E, + E, H,, and exhibit a slope:
+
dhv,/dB = ( N + l/2)he/cp1
(146)
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
83
Again, the electronic contribution to the reduced mass usually dominates, but the accuracy of magneto-optical data may allow the determination of my. If we are dealing with true superlattices having noticeable bandwidths, the Landau levels become Landau subbands. When the spacing hw, becomes larger than the subband width AEl, the density of states of the conduction band presents two sharp maxima per Landau subband, respectively at q = 0 and q = x / d . This is illustrated in Fig. 46b. Magneto-optical transitions occuring at q = 0 and q = n / d may be observed simultaneously, allowing a direct determination of the bandwidth AE, AH, (Maan et al., 1981). Non-parabolicity and r,-subband mixing do not change this qualitative picture, but the quantitative interpretation of actual data will generally require a sophisticated calculation of the Landau level energies and of the associated oscillator strengths. Again, the expansion on the basis of the B = 0 envelope functions is particularly suitable for this calculation. The spinor of Eq. 84 is rewritten:
+
kN,B(r)
=
l/&eikyyx
Xi,v,N.B(Z)(Pv,N((X
- xO)/l)luv>
(147)
V
with
L
t n(E1
6)
Superlattice
FIG.46. Comparison of the densities of states of Landau levels in a QW structure(a) and of Landau subbands in a superlattice (b), in the high magnetic field limit.
84
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
where, again, i, 1 refer to the subband labelling and v to the host band edges, and the a’s can be determined from the solution of the eigenenergy problem. The optical matrix element then reads: (I(/je,NIPlI(/%)
where:
M =
c
=(
~JS/4M
(149)
(150)
~ ~ ~ , y . ~ ~ ~ ~ , , . ~ ~ ~ , I P I ~ p ~ ~ x ~ , ” I x I 1 l , p ~ / ~
Note that, at finite magnetic field B, the selection rules are more conveniently discussed using the circular polarizations. The evaluation of the oscillator strength is in fact a prerequisite for the comparison of theory with magneto-optical data (Ancillotto et al., 1987; Brum et al., 1988). Indeed, the valence band mixing at finite kl or finite B relaxes the parity selection rule, and a very large number of transitions become allowed, though only a few of them have a significant strength. This is illustrated in Fig. 47, which shows calculated interband magneto-optical transitions for a GaSb-A1Sb strained layer QW (see Part V). Only the transitions towards the first spin doublet of the ground conduction subband, with the circular polarization u - , are represented here. The three bold lines correspond to the scarce transitions for which the square of the wave function overlap M is larger than 0.2n2.
FIG.47. Calculated magneto-optical transitions towards the first spin doublet of the ground conduction subband, in the u- polarization, for a 180 A strained GaSb QW. The three bold line correspond to the scarce transitions which have a significant strength.
OPTICALCHARACTERIZATIONOF SEMICONDUCTOR HETEROLAYERS
Iv. EXPERIMENTAL METHODSIN
UNSTRAINED
1II-V
85
SYSTEMS
This section is devoted to a survey of the experimental results obtained in 111-V systems, Due to the number of publications related to the optical properties of quantum wells (QW’s), we apologize for any omission in the reference list. We have only tried to emphasize some aspects of the problem in relation with the previous theoretical sections. We recall that, except when the results are directly related to a problem under discussion, the Raman spectroscopy, high excitation behaviour, hot electron problems and device applications are outside the scope of this review. We will begin with the GaAs-Gal -,AI,As system, in which most of the experimental results have been obtained. We will continue by surveying other unstrained 111-V systems, i.e. systems where the two host materials have approximately the same lattice constant, and whose quality is now increasingly better. The strained layers will be approached in Section V. A. The GaAs-Ga, _,Al,As System
The low temperature absorption and luminescence spectra of GaAsGal _,Al,As quantum wells (QW’s) generally display well-defined excitonic peaks. This is partly due to the enhancement of the exiton binding energy discussed in Section 11. However, the most striking optical property of QW’s is obviously the increase of the bandgap associated with the confinement of electrons and holes in the thin GaAs slab. This gives a unique opportunity for measuring the energy position of the fundamental and excited subbands, with the uncertainty -between 5 and 15 meV- of the binding energy of the exciton. We will deal with an application of such measurements, the conduction and valence band offsets determination, in the second part of this section, after having recalled the main experimental techniques: absorption and photoluminescence excitation spectroscopies, photoluminescence, photoconductivity and related methods. In the third part, the experimental data concerning the exciton problem (binding energy, temperature dependence, width of the excitonic line and interface defects) will be discussed. The problem of extrinsic processes is approached in the fourth part. The optical properties of quantum wells are modified when an electron or hole gas is confined in the well and will be the subject of the fifth part. Finally, we will deal with some related problems of the optical behaviour of QW’s under magnetic and electric fields, and the so-called “vertical” transport in superlattices. 1. Experimental Methods i. Absorption Spectroscopy. One of the most striking evidences of the size-quantization of the energy levels in GaAs-Gal _,AI,As multi-quantum
86
G. BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
wells was obtained from optical absorption measurements (Dingle, 1975; Dingle et al., 1974; Weisbuch et al., 1981; Masumoto et al., 1985). The lowtemperature transmission spectra exhibit a series of sharp exciton peaks separated. by absorption plateus characteristic of the constant twodimensional density of states. Let En, HH,, LH, be the electron, heavy-hole and light-hole levels in the well; an exciton and the onset of a new plateau appears at each E, - HH, or E, - LH, optical transition. In real QW’s, that display necessarily a finite barrier height, the weakly allowed An = 2 (E, - HH3 essentially) lines can be observed in symmetrical square quantum wells (Miller et al., 1981a) (see also Fig. 48). In asymmetrical structures, these selection rules are obviously lifted. Because of the low absorption coefficient of a single QW (about 0.6%, see Sect. III), a multi-quantum well structure is needed to perform absorption experiments. A multi-quantum well structure is composed of n (usually n > 10 to 20) QW’s separated by GaAlAs barriers which are sufficiently thick to prevent any coupling between successive QW’s. Furthermore, since the bandgap energy of the GaAs substrate and buffer layers is smaller than the bandgap energy of the multi-QW system, it is necessary to remove the GaAs susbstrate and buffer layers. In this case a few mm diameter hole is selectively etched in the substrate and the thin remaining multi-QW film can be distored. On the contrary, when the multi-QW structure is still on the GaAs substrate, the GaAs layers are expected to be close to their natural lattice constant as dictated by epitaxial growth on the thick substrate. Finally, the use of a multiQW structure supposes that all the wells have the same thickness and are equivalent. This tends to be the case in high-quality multi-QW’s structures grown by Molecular Beam Epitaxy. Another method is available in a single QW structure; it is the photoluminescence excitation spectroscopy. This technique essentially provides the same kind of information as absorption spectroscopy.
ii. Photoluminescence Excitation Spectroscopy. In an excitation spectroscopy experiment, the variation of the intensity of a given luminescence line is measured versus the wavelength of the exciting light source (Weisbuch et al., 1981). The similarity of the absorption and excitation spectra is shown in Fig. 48. The substrate etching problem is also resolved, as excitation and luminescence can be performed on the same side of the sample. Excitation spectroscopy enables also to determine, in a complicated luminescence spectrum, what involves the QW and what involves other parts of the investigated structure. The LD 700 dye laser, pumped by a Kr’ laser, covers without any difficulty the wavelength range of interest (between 7100 A to 8300 A). But excitation spectroscopy involves also the intra-band relaxation process (full arrow in Fig. 49) which occurs between the creation of the
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS n = l 8-hh I
-* -*
87
MULTI-QUANTUM WELLS GaAs- Ga0,7Al~.3As 2 5 X l88A I.8K
I~LASS~~LUY
0
n = le - l h
k v) z
I
n.2 e-hh I
W
I-
FORBI DDEN TRANSITION I
z
I-
I (3
-I 0
W I-
k
5Z
TRANSMISSION ( b )
7
a K
I0
z a W
0
z
W
0 v)
W
zz
3 -I
0 I0
I 4.
LUMINESCENCE ( a ) 1
1.520
1
1.530 1.540 1.550 PHOTON ENERGY ( e V )
1
1
1.560
1.5 3
FIG.48. Luminescence spectrum (a), transmission spectrum (b) and excitation spectrum (c) of a 25 x 188 A GaAs/Ga0,,A1,,As multi Q W s structure. The low energy shift in the transmission peak is due to the strain induced by the etching of the substrate in this case. After Weisbuch el al. (1981).
88
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
Et
FIG.49. Schematic energy diagram of the lowest conduction E , and heavy hole H , levels in the growth (left) and in-plane (right) directions. Between optical excitation (hvJ and photoluminescence(hv,,,) arises the relaxation of electrons and holes towards the bottom of the bands (full arrows).
photoexcited electron and the recombination near the bottom of the bands. This gives rise, in bulk GaAs, to high disparities between the adsorption spectrum (Sturge, 1962), which is characteristic of the conduction- and valence- band densities of states, somewhat complicated by the exciton problem, and the excitation spectrum (Weisbuch, 1977). Figure 50 shows the
144
108
72
36
ELECTRON ENERGY (meV 1
0
FIG.50. Conduction band density of states (lower part), photoluminescence excitation spectrum (upper part) of a thick GaAs layer. A t each phonon LO replicum, the relaxation of electrons towards the bottom of the conduction band is more efficient with respect the nonradiative mechanisms. Due to wave-vector-k conservation, resonances separated by hw,, (1 rn,/m,) are observed.
+
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
89
excitation spectrum of the bulk exciton photoluminescence of the (bulk) buffer layer of a high quality M.B.E. grown sample. It is easily observed that the excitation spectrum does not exhibit a (E - EGap)l/’behaviour. On the contrary, due to non radiative losses, the photoluminescence intensity decreases as the photon energy increases with oscillations corresponding to efficient optical phonon-electron interaction. The fact that the excitation spectra of QWs clearly display steplike densities of states without any phonon replica strongly indicates that, as compared to the bulk case, the non-radiative recombination processes are less effective with respect to the relaxation of the excited electrons towards the bottom of the conduction band. iii. Photoluminescence. Absorption and excitation spectroscopies give informations on high-density-of-states intrinsic processes over a wide band of energy above the bandgap. On the contrary, photoluminescence spectra exhibit often one line, sometimes several lines in a small range of energy. This is due to the rapid relaxation of photoexcited carriers towards the bottom of the bands or towards energy levels, related to shallow impurity levels, close to these bands. Thus, photoluminescence involves levels which remain populated long enough during the carrier recombination process and for which radiative recombination is efficient enough with respect to non-radiative processes. As the bandgap of a GaAs QW structure is dependent upon the well width, the data obtained from photoluminescence spectroscopy (on excitons, extrinsic processes) have often to be completed by excitation or absorption spectroscopy. Nevertheless, photoluminescence experiments are easy to achieve and give unique informations on low-density of states levels which are not observed in absorption.
iv. Photoconductivity and Other Related Methods. Although much less used than the previous techniques, other methods of measurement of the optical properties of Q W s have been developed on the basis of photoconductive or modulation techniques. Using the photoconductivity response of a selectively doped nAlGaAs/GaAs heterojunction grown beneath a multi-QW structure in the same sample, it is possible to obtain the low-temperature absorption spectrum of the MQW without the problems of the substrate etching and of the photoluminescence aperture detection (see Fig. 51). One has only to contact the 2D electron gas with suitable electrodes (Tanaka et al., 1986). Without the use of such an intricate structure (i.e. without the growth of an heterojunction), one can obtain the sum of the photoconductivity signal of the GaAs buffer layer and of the QW’s (Meynadier et al., to be published). Fundamental excitonic features have been also observed by measuring the photoconductive effect along the growth axis of the structure by inserting the QW’s between a
90
G. BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN GaAs-
ALAS
MPW
Si- doped A K a A s Undoped GaAr S.1. CaAs Subrtrati
77'K
-v
,k,
120
80C
WAVELENGTH(nm1 FIG.51. Scheme of a sample grown for a photoconductivity experiment from Tanaka et al., 1986.
conducting doped layer (eventually the substrate) and a Schottky contact evaporated on the surface (Vina et al., 1986; Yamanaka et al., 1986). The modulation techniques, which are sensitive in fact to the accidents of the dispersion curve, contain the electro-reflectance method. The wavelength dependence of the reflection coefficient is measured while the potential of a Schottky contact is modulated by an external potential. They contain also the photo-reflectance method. The modulation of the Schottky contact is then replaced by a modulation due to another high intensity laser. These modulation techniques (Glembocki et al., 1985; Shen et al., 1986; Alibert et al., 1985; Klipstein et al., 1986), which will not be discussed here, can be very sensitive and useful in particular in the infrared range. Nevertheless, they generally do not provide the same fine results as pure optical methods. 2. Band Offsets Determination From the measurement of the energies of the fundamental. and excited optical transitions, it is possible to check the results of the theory described in Sec. 11. One can also attempt a measurement of the conduction band (AE,) and valence and ( A E v = A E , - AE,) offsets. The first determination was achieved by Dingle (1975), by absorption spectroscopy and a value Qc = A E J A E , = 0.85 was determined, for a concentration x 0.2 of the Gal -,AI,As barrier. But if the energy positions of the electron states Enand hole states HH, (or LH,) are sensitive to the conduction and valence-band offsets, because of the relative penetration of the carrier wavefunctions in the GaAlAs barrier, the predominant n - m = 0 optical transitions are found to
-
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
91
be weakly sensitive to the ratio of the offsets. The increase of the energy confinement of E, which is obtained when AEc is increased is partly compensated by the corresponding decrease of the energy confinement of HH,. On the contrary, the energy difference between successive hole and electron levels are sensitive to the offsets; so do the weakly allowed E , - HH,,, (or E, - LH,,,) optical transitions, when they are observable. A study of QW’s with a parabolic shape, whose subband energies are more sensitive to band offsets (see the discussion of Sec. II), and which present a lot of n - m = 2 allowed transitions, lead to Qc N 0.5 (x 0.3) (Miller et af., 1984b). A careful analysis of the El -HH, optical transition occuring in rectangular Q W s in excitation spectroscopy gives Qc N 0.57 (Miller et af., 1984~).Other excitation spectroscopy experiments performed on M.B.E.-grown samples whose well widths are carefully controlled by R.H.E.E.D. oscillations during growth provide Qc 2: 0.75 (x 0.3) with the use of n - rn = 0 transition lines [Dawson]. Meynadier et al. have carried out a determination of Qc from excitation spectroscopy in Separate Confinement Heterostructures (1985). These structures consist of a GaAs Q W embedded in a Gal -x,AI,,As thicker one, the latter being clad between thick Gal -,,Al,,As layers with x2 > xl. A conduction (valence) level located in the wide Gal -,,Al,,As QW is only weakly confined in this Q W and its energy does not depend on AEc (AEv); on the contrary, the energy of a valence (conduction) level located in the thin GaAs QW is sensitive to AEv (A&), Thus an optical transition between these two kinds of levels is very sensitive to the offsets, as shown in Figs. 52 and 53. On the other hand, the measurement of the energies of the transitions involving two levels of the thin GaAs QW’s or two levels of the large Ga,_,,Al,,As Q W provides precise values of the well widths and of the Ga, -,,Al,,As bandgap. Those measurements give a value Qc N 0.59 (x 0.13), which has been corroborated by another study performed in another set of similar samples (Miller et af.; 1985d). The problem of the sensitivity of this determination has been carefully discussed by Meynadier (These de Doctorat) and an uncertainty of about 10% in the Qc determination has been proposed (see Fig. 54). An optical transition energy is sensitive simultaneously to the conduction and valence sublevels. Some authors have attempted to use a physical phenomenum involving only the conduction band. Internal photoemission experiments in modulation-doped heterojunctions provide a value Qc N 0.8 (x = 0.2)(Abstreiter et af.,1986a).A recent determination from electronic light scattering under intense illumination in undoped QW’s gives Qc N 0.69 (x = 0.06) (Menendez et af., 1986). That shows that this experimental determination is still a subject of controversy, although a value Qc N 0.6 to 0.65, independent of the aluminum concentration, is currently more consistent with
-
-
-
92
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOlSIN
ENERGY
(mcV
1
FIG. 52. Photoluminescence excitation spectrum of a 50A-thick GaAs Q W embedded into a QW, the latter being clad between thick Gao,,Al0,,As symmetric 250 A-thick Ga,, 7A113,3A~ layers. E , H 1 and E , L , are transitions between a conduction level and a hole level both essentially located in the GaAs Q W : their energies are used to determine precisely the well width. E 2 L , and E , H , involve a conduction level and a hole level both located in the Ga,,,,AI,,.,As barrier: their energies are sensitive to the aluminum concentration x = 13.3%. E , H , and E,€J,, which involve an electron level located in the well and a hole level located in the barrier are used to determine Q, [from Meynadier et al. (1985)J
-> E"
Y
I
I-
0
w
I-
f5 3
1600
c!
100
75
50
FIG. 53. Calculated dependence of the transition energies on the conduction-band discontinuity Q , = A E J A E , for the structure of Fig. 52. The observed transitions, corrected for the binding energies of the excitons are indicated in the right side of the figure [from Meynadier et al. (1985)l.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
93
FIG.54. Calculated dependence of the transitions energies in a separate confinement heterostructure on the conduction-band discontinuity Q , and on various parameters of the sample: the GaAs Q W thickness W, the intermediate barrier thickness L + W + R and the aluminum concentration of this barrier x [from Meynadier, These de Doctorat].
most of the recent determinations. This is pointed out in a recent review paper by Duggan (1985), which includes also electrical determinations (see also (Kroemer, 1986)), as well as theoretical calculations and methods based on charge-transfer processes in heterojunctions (Wang et al., 1984).
3. Excitonic Effects i. The Excitonic Eflect: A Striking Feature of QW’s Spectroscopy. Due to the high quality of the samplesiand to the enhancement of the exciton binding energy, the optical properties of GaAs Q W s display important excitonic features. As already shown in Figs. 48 and 52, perfectly resolved excitonic peaks are observed in absorption or excitation spectroscopy at low temperature. These peaks are always well-defined at room temperature in absorption spectroscopy (Fig. 55 (Miller et a/., 1982a)).They are observed not only near the fundamental (E, - HH, and E, - LH,) transitions but, slightly more weakly, near the excited transitions, even if one or two of the levels are located in the GaAlAs barrier (Meynadier et al., 1985b; Bastard et al., 1984b; Zucker et al., 1984). QW’s grown by molecular beam epitaxy or by Metal Organic Vapor Deposition exhibit the excitonic peaks [Bastard et al., 1984a). The same behavior occurs in GaAs-Ga(A1)As superlattices, i.e. structures with thin GaAlAs barriers (Deveaud et al., 1984) or in GaAs-AIAs superlattices (Iwamura et al., 1984).
94
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN PHOTON ENERGY (eV) 1.5
1.4
1.6
4
3
2
1
0 I
I
I
I
I
1
1
1
1
1
'
850 800 WAVELENGTH (nm)
FIG.55. Room temperature absorption spectra of a GaAs layer and of a GaAs-Ga(A1)As multi-QW structure showing the enhancement of excitonics effects [from Miller et al. (1982a)l.
At low temperature, in undoped QWs, the photoluminescence displays generally one line whose width depends on the quality of the structure. From the energy position of this line compared to the excitonic peak in absorption or excitation spectroscopy (the same energy), also from polarization arguments (the high polarization degree of the luminescence line indicates that the luminescence does not involve highly depolarizing centers), it is possible to assign the photoluminescence line to be due to excitonic recombination (Weisbuch et al., 1981). The direct measurement of the recombination lifetime of quasi-2D excitons at low temperature has been performed using picosecond light source excitation and streak camera detection (Gobel et a!., 1983; 1985). An exponential decay was found, with a lifetime which decreases as the well width is decreasing, reaching about 250 ps for 50 A thick Q W s (see Fig. 56). This behaviour at low temperature for high quality Q W s is very likely due to a decrease of the radiative lifetime because of exciton localization. The radiative lifetime of the free-exciton is indeed proportional to the square of the exciton diameter. In a QW, this size decreases as the extension of the exciton both in the growth direction and in the plane decreases with decreas-
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
95
SQW
1.10 K
MOW
v
Bulk
V
a
A '
Boo& o
200
I
I
2
I
0
0
o I
I
6
I
I
10
1
,5G
14
L, [nml FIG.56. Measured recombination lifetime T in various GaAs-Ga(A1)As QW's versus well thickness L , [from Gobel et al. (1983)l.
ing well width. The fact that the low temperature data follow the same L, dependence for different structures supports also this interpretation of a radiative recombination. The results of cathodoluminescence experiments display the same behavior (Christen et al., 1984). ii. Room Temperature Photoluminescence: Excitonic Versus Band-to-Band Process. When the lattice temperature is increased, the exciton X,which can be created in absorption, is ionised before recombination following the relation X e e h and the two-dimensional mass action law (Chemla et al., 1984):
+
( N e ,Nh = N, and N , are the electron, hole and exciton densities, Ex the binding energy of the exciton and T the temperature). At 300 K, in GaAs QWs, the minimum ionization ratio N,/N, + N , is found to be about 0.5, assuming the maximum exciton density l/(na:) 3.10" cm-', corresponding to excitons of radius a, occupying all the QWs plane. (In fact, many body processes intervene before this critical density as shown in Sec. IV.A.5). This calculation indicates that almost all the excitons are ionized before recombination (i.e. N, >> N,) at low excitation level and that the luminescence does involve band-to-band recombination. On the contrary, some authors claim that the room temperature photoluminescence is excitonic, either by measuring the temperature dependence of the energy of the QW photoluminescence (Bimberg et al., 1985) or by comparing the position of the exciton peak in excitation spectrocopy and that of the luminescence peak (Dawson et al.,
-
96
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
1983). But the results of time-resolved experiments at room temperature (Tanaka et al., 1984, Fouquet et al., 1985; Arakawa et al., 1985) are consistent with a bimolecular band-to-band recombination process (at least at high excitation level the low-excitation regime being sometimes dominated by extrinsic effects). When the energy difference between the excited subbands and the fundamental ones is not too large with respect to the carrier temperature, excited transitions can be observed in luminescence. This carrier temperature may be that of the lattice (at room temperature for instance) or may be obtained in the high excitation regime when a pseudo-equilibrium of the excited carriers arises with a temperature much larger than the lattice temperature (Xu,et al., 1983) (Fig. 57). iii. Exciton Binding Energy Measurements. As the bandgap is a function of the QW thickness, and due to the width of the excitonic line and to the smoothness of the band-to-band absorption onset in real QWS,the measurement of the binding energy of quasi-2D excitions in QW’s is difficult. Nevertheless, in high quality samples excitation spectra exhibit weak secondary peaks, in the high energy side of the El - HH, and El - LH, excitonic peaks (Fig. 58). These peaks have been identified to be related to the excited 2 s state of the exciton (Miller et al., 1982d). It is then possible to obtain a direct measurement of the well-width dependence of the difference between the
17
1.6
1.5
ENERGY ( e V ) FIG.57. Photoluminescence spectra of a 7 x 145 A GaAs-Ga(A1)As multi QW’s structure showing, under very high excitation level in case (b),emission between then = I, 2 and 3 subbands of the conduction and valence bands, from Xu et al. 1983. The temperature is 77°K.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
N
n~
X X J I-
iI w
ZI w
w
1 1
w
1
w
N
uj~
w
O
l
N
w
1 1 1 1
0
h w
5Y
97
t
n n
J
W
v ,
I
n
1700
1600
ENERGY
(meV)
FIG.58. Photoluminescence excitation spectrum of a 50 A-thick QW. The two dashed arrows indicate the shoulders assigned to the excited 2s level of the E , H , and E , L , exciton.
binding energies Eks and .Eisof the IS and 2s states. The model calculations reviewed in Sec. I1 have been checked. The net result is that the binding energy is slightly smaller for the heavy hole (El - HH,) exciton than for the light hole (E, - LH,) one and that an order of magnitude of the binding energy is 9 to 12 meV about in the 50-100 A range for reasonable values of aluminum concentration x (0.15 < x < 0.45) (Meynadier et al., 1985b; Duggan et al., 1985; Dawson et al., 1986; Miller et al., 1982d). This 2 s line has been also observed in photoluminescence (Moore et al., 1986). When a magnetic field B along the growth axis of the structure is applied, discrete Landau levels are created in the conduction and valence bands and a set of optical transitions between these levels is observed in excitation spectroscopy or absorption spectroscopy (Maan et al., 1984; Miura et al., 1985),or photoconductivity spectroscopy (Rogers et al., 1986). The excitonic transition is also observable, even in luminescence (Sakaki et al., 1985a). By an extrapolation toward B = 0 of the energy of these transitions, one can attempt to perform a reliable measurement of the gap of the QW and of the exciton binding energy. The values obtained by this method are larger than the values measured by the 2s method. That may be due to the integrated structure of the spectra and to the uncertainties of the extrapolations towards B = 0. Another analysis by Ossau et al. (1 986), magneto-photoluminescence experiments (Petrou et al., 1986), a careful analysis of oscillators strengths (Ancilotto et al., 1987), and the observation of the diamagnetic shift of the exciton give a lower value of the binding energy of the exciton, in concordance with the 2s measurements and the most sophisticated calculations.
98
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
iv. Width of the Excitonic Luminescence Line and Interface Defects. The width of the excitonic line in absorption (or excitation spectroscopy), at low temperature, has been (Weisbuch et al., 1981) related to interface defects. These interface defects must be of the order of one (or a few) atomic layers in depth and larger than the exciton diameter in the layer plane. In the case of a continuous distribution of defect size that gives rise to a broadening of the absorption line roughly of the order of AE, the variation of the energy confinement of the electrons is induced by the variation of well width. When the size of these defects is larger, a series of sharp lines corresponding to the various well widths is seen (Lz, nominal thickness of the well, Lz i-4 2 . . . ,a, lattice parameter), as shown in Fig. 59 (Deveaud et al., 1985). This behaviour, i.e. the observation of a set of sharp lines, can be also observed in the low temperature photoluminescence in QWs grown by M.B.E. either continuously (Deveaud et al., 1985; Reynolds et al., 1985)or with growth interruption (Hayakawa et al., 1986). But the lineshape of the excitonic luminescence depends not only on the shape and the distribution of shapes of the defects, but also on the diffusion and relaxation mechanisms. An experimental study has been performed in M.O.C.V.D. grown samples
I
ENERGY
(tV)
FIG.59. Photoluminescence excitation spectrum of a M.B.E.-grown multi QW’s structure composed of L = 66 A-thick wells. Various sharp peaks corresponding to areas where the will width is L, L + a/2 are observed. No excitation transfer is observed through the barriers in this sample. This demonstrates that L and L + a / 2 zones do occur in the same well [from Deveaud et
a/.(19891.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
99
3
d
E-
I
1
I
-
T=2K
Y,
z
w
c
z
I\
/ ,; I' ',\ I
8-
5Y
5G -
0,
I
I
\ \
I
%-/'
I
I
\
/
I
-
I
'-1
ENERGY ( t V ) FIG.60.2°K excitation (solid line) and luminescence (dashed line) spectra of a 70 A-thick moderate quality single GaAs QW, showing 3 meV Stokes shift between the heavy-hole peak in excitation and in luminescence.
(Bastard et al., 1984a). It is worth noticing that a Stokes shift between the first absorption (or excitation) peak and the luminescence appears in moderate quality samples (see Fig. 60). This Stokes shift can be associated to the binding of free excitions (observed in absorption because of their high density of states) on interface defects. Following the approach explained in Sec. 11, the lineshape of the luminescence, related to the size distribution of the defects, and a typical value of the Stokes shift can be found in reasonable agreement with experiment, in view of the lack of knowledge of the distribution of defects (see Fig. 61). In this comparison, it is supposed (i) that a free exciton has time enough to be trapped by the defects before recombination, and (ii) that a bound exciton has not time enough to migrate from one defect to another defect of higher binding energy, i.e. that there is no thermalization of the bound exciton luminescence line. Following these assumptions, a calculation of the phonon-assisted trapping and hopping time provides an estimate of the density of defects in the range of 10" to 10" cm-2, for typical 300 A-diameter defects. When the temperature is raised, the bound excitons are de-trapped and the Stokes shift between the free-exciton-related absorption and the bound-exciton-related luminescence decreases (Miller et al., 1984a; Delalande et al., 1985). Moreover, the position of the temperature dependence of the luminescence line can be calculated through a simple two level model (Delalande et al., 1985).As shown in Fig. 62, the comparison with experiment gives a density of defects in the 10" cm-' range consistent with the density found in the calculation of the trapping and hopping times in the same structure. A more sophisticated theory (Takagahara, 1985)of energy transfer of the
G. BASTARD, C. DELALANDE, Y.GULDNER AND P. VOISIN
100
100
1
0
.--
I
1
1 -
-
-
lo-’ 0
I
I
1
2
I
3
4
I
5
I
1
6 7 BINDING ENERGY (meV 1
I
8
FIG.61. Calculated shape of the averaged trapped exciton density of states, taking the values of Bastard et al. (1984a).The energy origin is take at the free exciton edge. I
I
I 10
I
I
I
20
30
40
I
I
t-
rn
0
T(K)
FIG.62. Calculation, for different values of the trap density ND,of the Stokes shift between the luminescence and first excitation peak as a function of temperature(so1id lines).The observed Stokes shift in the 70 A-thick sample of Fig. 60 is also presented (dashed line). After Delalande et al. (1985).
quasi-two dimensional excitons in GaAs-A1As QWs has also been developed to explain the observed (Masumoto et al., 1984) slow and non-exponential energy relaxation of excitons in terms of one-phonon-assisted transfer of localized excitons among island like defects within the QW. In the best
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
101
quality samples, the excitation and luminescence widths are smaller and no Stokes shift is seen between excitation and luminescence. The models we have discussed are essentially a one-isolated defect model plus possible coupling between the defects for the transfer of energy (see also (Singh et al., 1984) without thermalization analysis). In moderate quality M.B.E.-grown samples, it has been shown by Resonant Rayleigh Scattering (Hegarty et af., 1982) and transient grating experiments (Hegarty et al., 1984) that the problem of the motion of an exciton in these samples must be regarded as the problem of the motion of an exciton in a random potential due to well-width fluctuations. A mobility edge has been found, as predicted by percolation theory (Mott and Davis, 1979), the excitons being localized below this mobility edge and delocalized above it. When the temperature is raised (say above 100 OK), the thermal LOphonons participate to the broadening of the free exciton line in absorption. The temperature dependence of the width r of the heavy-hole exciton peak has been well fitted by (Chemla et al., 1984)
r = ro+ r p h
/(
exp-
h;;
-
1
)
where rois the low-temperature inhomogeneous broadening due to interface defects (2 meV for the sample under consideration in (Chemla et a/., 1984), r p h (about 5.5 meV) represents the strength of the LO phonon-exciton interaction, which is multiplied by the density of thermal LO-phonons.
4. Extrinsic Processes In high quality non-intentionally doped multi-QW's, the photoluminescence appears to be essentially intrinsic, in sharp contrast with the bulk case. This feature occurs either in M.B.E.-grown structures, which present a p residual doping NA - N , 5 1014 cm-j for GaAs (Weimann and Schlapp., 1986), or in M.0.C.V.D.-grown samples, which are residually n-type with N,, - NA few 1014 cm-' (Frijlink, 1986). This can be due to the trapping of excitons on interface defects, which prevents the excitation to reach the impurity, or to the decrease of the exciton lifetime, which favors this disexcitation channel, or to the segregation of impurities in the first few wells of a multi-QW structure, as will be discussed below. The best way to characterize extrinsic luminescence is by its appearance in an intentionally doped structure. The binding energy E of the impurity, in the case of e-A" or Do-h luminescence, has to be measured with respect to the extrema of the bands. As absorption (and luminescence) exhibits excitonic lines, the binding energy Ex of the exciton has to be taken into account, and the Stokes shift between the excitonic line and the impurity line is then E - Ex.
-
102
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
FIG.63. Photoluminescence spectrum of a nominally undoped 50 &thick GaAs/ Gao,85Alo,, ,As quantum well for three different power densities. The low-energy impurityrelated line saturates with increasing power density from a to b whereas the integrated intensity of the high energy excitonic line can be found to be linear with the excitation power.
Another character of extrinsic luminescence is the possibility of its saturation at high excitation level, as shown in Fig. 63. By selectively doping a few monolayers of a GaAs QW,it has been possible from photoluminescence and excitation spectroscopy (Miller, 1984) to measure the binding energy of the acceptors as a function of the well thickness and of the position of the impurity in the well (on-center and on-edge impurities). These measurements support the results of the theoretical models discussed in Sec. 11, i.e. the increase of the binding energy when the well width decreases and when the impurity is located near the center of the QW.The well width variation of hydrogenic-like Si donors located on-center of the well has also been investigated by Shanabrook and Comas (1984) including Resonant Raman Scattering experiments (Perry et al., 1985). The first experimental observation of impurity-related features in photoluminescence spectra of non-intentionally doped M.B.E.-grown QWs has been reported by Miller et al. (1982b, 1983). Depending on the concentration of residual impurities and on the shape of the investigated structure, the energy of the impurity peak indicates that the residual acceptor is randomly
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
3oL
1
I
I
103
\
lot 100
50
w
150
ti,
FIG.64. Calculated (full line) and experimental (dots, in nominally undoped QW’s) dependence of the on-edge acceptor binding energy on the well thickness for an aluminum concentration of 0.15. After Meynadier et al. 1985a.
distributed throughout the well in multi-QW structures (in this case, the density of states effect provides a peak with an energy characteristic of oncenter impurities), or that there is a segregation of impurities near the interface in the case of simple or double QW structure (see Fig. 64). It has been suggested that interface roughness and impurity trapping are correlated. An explanation (Miller et al., 1983; Petroff et al., 1984) is that the impurity, probably Carbon, is less soluble in GaAlAs than in GaAs and keeps floating at the GaAlAs/vacuum interface during growth. After the growth of a thick GaAlAs layer, a great amount of these impurities are trapped in the first few monolayers of GaAs and in the first few QW’s in a multi-QW structure. An interface roughness may arise from the growth-inhibiting nature of the carbon floating at the GaAlAs surface. It has been already observed (Meynadier et al., 1985a;Weimann and Schlapp, 1986;Masselink et al., 1984;Fischer et al., 1984) that the inverted GaAs/Ga(Al)As interface (i.e. GaAs grown on the top of GaAlAs) is improved and that the luminescence efficiency is enhanced when a few GaAs prelayers, acting as trapping layers, are grown before the GaAs QW. A simultanenous decrease of the luminescence efficiency and an increase of the impurity-related recombination has also been recently observed in some Q W s grown by M.B.E with growth interruption (Bimberg et al., to be published). Conversely, optical spectroscopy appears to be a powerful tool for the determination of the impurity distribution near the QW edge. Supposing that there is no thermalization of the holes on the acceptors, the luminescence lineshape is then related to this distribution, through the dependence of the
104
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
w
0
I
n L
-30
I
-20
-10
0
ENERGY (meV 1 FIG.65. Experimental (solid line) and calculated (dashed line) electron-to-acceptor photoluminescence lineshape for a L = 50 A-thick QW. The extension of the impurity distribution is found to be b = 6.5 A in the Ga0,,,A1,,,,As barrier, W = 12 A in the GaAs QW and the maximum of the impurity concentration is found one monolayer inside the barrier. After Meynadier et al. 1985a.
impurity binding energy on its position in the QW. Such a determination has been performed by Meynadier et al. (1985a) in a set of M.B.E. grown samples. The impurity distribution peaks at the GaAlAs/GaAs interface, with an extension of about 12 to 30 A in the well and 6 to 8 A in the Ga(A1)As barrier (see Fig. 65), this extension being wider when the QW is grown after a thicker Ga(A1)Aslayer. Other impurity related lines have been reported in non-intentionally doped M.B.E.-grown samples: Do-hline (Lambert et al., 1982; Yu et al., 1985; Reynolds et al., 1984), Do-X recombination (Yu et al., 1985; Reynolds et al., 1984). A D+-X mechanism has been also observed in a Si-doped sample (Miller and Gossard, 1983b). The M.O.C.V.D. grown samples, perhaps because of the excitation trapping on interface defects, seem to present less extrinsic recombinations.
5. 20 Carrier Gas in Quantum Wells and Optical Spectroscopy When an electron or a hole gas is present in the QW, the optical properties can be modified through many body effects. This can be obtained, like in 3D systems, by doping directly the QW with acceptors or donors (Shanabrook and Comas, 1984).But another opportunity consists in using the modulationdoping and to place the impurities in the Ga(A1)As barrier at some distance
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
105
(the spacer layer thickness) of the QW. One obtains then a high mobility carrier gas in which it is possible to study the intrinsic optical properties of the quasi-2D systems in presence of a low temperature electron (or hole) gas. The first optical property of a modulation-doped QW (M.D.Q.W.) is the so-called Moss-Burstein shift (Pinczuk et al., 1984): because of the Pauli principle and the in-plane k , conservation, the onset of the absorption takes place at the in-plane Fermi wave-vector k , = (2nr1,)”~ (ns is the 2D-carrier density). Assuming quadratic dispersion relations for both the electrons and holes, the onset of the absorption is shifted by E,(1 + me/mh) above the 2D band-gap. For a hole gas, me and mh must be inverted in the last formula. But, in this case, the absorption onset may arise at k = 0 on the light-level subband if EF(1 + mh/me) is larger than the difference between the light hole and heavyhole confinements (Fig. 66). The heavy or light hole character of the onset of absorption can be characterized by polarization spectroscopy: the polarizations of the onset peak (light hole character) and of the luminescence peak (heavy hole character), being opposite (Miller and Kleinman, 1985) (Fig. 67). It is worth noticing that this polarization spectroscopy is possible only in p-type modulation-doped structures as the photo-created electron gas can conserve a part of its polarization, while in n-type M.D.Q.W. the photocreated hole loses rapidly its spin. A second feature of M.D.Q.W.3 is, due to screening and occupancy effects, the vanishing of the binding energy of the exciton, at least for carrier densities larger than a few 10” cmp2(Kleinman, 1985,1986). In fact, the luminescence
FIG.66. Schematic in-plane dispersion relation in n-doped (left) and p-doped (right) QW’s. In a n-doped QW, the Stokes shift between the onset of absorption hvabsand the photoluminescence is EF(l + m,/m,). In a p-doped QW, the onset of absorption may involve the lighthole subband at k , = 0, with a transition energy hvabSlower than the transition energy hvabr involving the heavy-hole subband at k, = k,.
106
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
3
Ii
Z’
I/
yy
I
r
I 0
a
o++ U-
,
:: :I
1550
I
1600
1650
ENERGY
1700
(meV)
FIG.67. Photoluminescence excitation spectra (full line) and photoluminescence (dashed line) of a 100 .&thick, 7 x 10” cn-’-areal hole concentration, modulation-doped GaAsGa,,,AI,.,As QW: lower spectrum, without any polarization measurement; upper spectrum, in u+ configuration for luminescence, showing the enhancement of heavy-hole transition; middle spectrum, in u- configuration, showing the enhancement of light-hole transition, especially the first excitation transition. The excitation polarization is u c .
involves band-to-band recombination between a degenerate electron (hole) Fermi gas and a photocreated Boltzmann hole (electron) population near kl = 0. This has been clearly observed in M.D.Q.W.’s of various widths and mobilities (Pinczuk et al., 1984; Meynadier et al., 1986). In particular, the hole subband involved in the band-to-band luminescence is always the fundamental heavy hole subband, even in p-type M.D.Q.W.’s (Miller and Kleinman, 1985). The growth of one-side M.D.Q.W.’s presents the advantage of a high electron mobility. The latter is due to the larger spatial separation achieved between the quasi-bidimensional electron gas and the inverted interface (i.e. GaAs grown on GaAlAs) and to the absence of deleterious Si segregation and/or in-well diffusion which apparently takes place when one attempts to dope in the inverted interface side. In these samples, and with the help of a calculation of the conduction and valence dispersion curves, it has been shown that the Stokes shift between excitation and luminescence is in accordance
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
107
with a band-to-band absorption at k, = k , and band-to-band luminescence at k , = 0 (see Fig. 68a). It should be stressed that measurement by transport experiments of the carrier concentration must be performed under the same illumination conditions as the optical experiments due to persistent photoconductivity effects (Stern, 1986).The fact that, in the excitation spectrum, peaks and not step-like structures are observed may arise from enhanced absorptions due to unbound excitons (Kleinman, 1985, 1986) and/or from resonances in the relaxation mechanism. In the case of rather wide n-type M.D.Q.W.'s (about 500 A) with more than one occupied subband the recombination may occur between an excited -but populated- conduction subband and the fundamental hole subband because of a better overlap integral due to band bending effects (Meseguer et al., 1987). A third property of M.D.Q.W.'s is an apparent shrinkage of the bandgap which compensates partly the confinement effects. One part of this shrinkage is due to the curvature of the bottom of the well: electrons and holes are not located in the same place of the QW and the differencebetween the respective subband energies is lower (Fig. 68b). Nevertheless, a careful comparison between theory, in the Hartree approximation, and experiment shows that a shrinkage of the bandgap due to exchange and correlation mechanisms -the so-called Bandgap Renormalization- must be assumed (Pinczuk et al., 1984; Meynadier et al., 1986). This bandgap renormalization is of the order of 20 meV for a few 10" cm-2 carrier density gas. Moreover, it is possible to control the carrier concentration by illuminating the sample with photons of energy larger the Ga(A1)As bandgap (Chaves et al., 1986). A simultaneous probing of the excitation spectrum and a comparison with the results of the Hartree calculation in the actual structure provide a measurement of the Bandgap Renormalization as a function of the carrier concentration (Delalande et al., 1986) (Fig. 69). A comparison between experimental results and theoretical ones (Bauer and Ando, 1985, 1986b; Sanders and Chang, 1985; Schmitt-Rink et al., 1984; Kleimann and Miller, 1985; Chang and Sanders, 1985) obtained under various assumptions can then be attempted (Delalande et al., 1986). Many-body effects can lead to an anomalous behavior of the polarization of optical spectra in M.D.Q.W.'s. A shake-up of free electrons in the conduction band by the electron-hole coulomb interaction, in which optical processes occur with simultaneous creation or annihilation of single particle or collective excitations of the Fermi sea, has been evoked (Sooryakumar et al., 1985; Chang and Sanders, 1985; Sooryakumar, 1986) to explain the polarization of in-plane photoluminescence. Similarly, the anomalous polarization of the n = 2 heavy-hole excitation transition may be explained by many-body effects (Ruckenstein et al., 1986).
108
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
FIG.68. (a) Upper part: aluminum concentration profile of one-side n-type modulationdoped QW’s. Lower part: band-edge profiles in the vicinity of the doped well (left panel); schematic in-plane dispersion relations (right panel). (b) Photoluminescence (dashed line) and excitation spectrum (solid line) of a 150 &thick modulation-doped QW. The lower part of the figure indicates the calculated energies of the various subband-to-subband transitions accounting for a 19 meV bandgap renormalization. The transitions without asteriks occur at k , = 0, the ones with asteriks at k , = k,.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
109
lS40-----17
0
1
2
n, (10”
3
4
5
cm-*)
FIG.69. Line: electron areal density n,-dependence of the band-to-band luminescence energy calculated in the Hartree approximation as described in Delalande el a!. (1986). Crosses: experimental data obtained by optical control of n, in the 150 &thick QW. Points: experimental data obtained by voltage control. The difference between the Hartree calculation and the experimental data obtained by optical control of n, in a 150 .&thick QW. Points: experimental bandgap renormalization.
Because of the relaxation of electrons towards the lowest Landau level, the luminescence of undoped Q W s involves only the lowest Landau level of the conduction band. O n the contrary, in n-type M.D.Q.W.3, the luminescence may occur from all the Landau levels located in the Fermi sea towards the populated hole levels, with various intensities depending on the strength of the transition. Provided that the separation between the Landau hole levels is small, several hole levels are populated and several recombination lines can be observed (Petrou et al., 1984).Furthermore, because the exciton has no bound state in M.D.Q.W.’s with sufficient carrier concentration, a comparison is possible between the energies of the subband to subband peaks of the luminescence and the theoretical values which must take into account the complex structure of the in-plane valence band dispersion (Broido and Sham, 1985; Yang et al., 1985; Altarelli, 1983, 1985, 1986; Ando, 1985). By carrying out magnetoluminescence experiments and theoretical calculations in the same sample, Delalande et al. (1987) have recently performed this comparison in a one-side n-type M.D.Q.W. and have provided striking experimental
110
G. BASTARD, C. DELALANDE, Y.GULDNER AND P. VOISIN
-;I
I
a L
I
I
J
3
*
t rn z w
-z I-
%
5 0 r n w
I
I
3
1510
I
I
1520
1530
ENERGY
( meV )
(a)
0
5
B (Tesla 1
10 0
5
B (Tesla 1
10
(b) FIG. 70. (a) Photoluminescence spectrum of the modulation-doped sample under consideration in Fig. 68 obtained at B = 4.2 Teslas, T = 2°K through a 'u analyser. The structures near 1515 meV are attributed to the bulk GaAs buffer from excitation spectroscopy. (b) Comparison between the experimental (crosses and circles) and theoretical (curves) values of the interband magneto-optical transitions. For notations, see Delalande et al. (1987).
evidence of the coupling between the valence subbands (Fig. 70). Several valence Landau levels are probed in these experiments, while in the case of cyclotron resonance and magneto-transport experiments in p-type modulation-doped heterojunctions (Stormer et al., 1983; Eisenstein et al., 1984), the valence subband is probed only close to the Fermi level. Indirectly, the broadness of the HH1 + LH1 resonant Raman line in a p-type M.D.Q.W. has been found to be likely due to the intricate nature of the valence band dispersion
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
1 11
already described in Sec. I1 (Pinczuk et al., 1986). Finally, although this is out of the scope of this review, it is worth noticing that the presence of a predominate kind of carriers provides an interesting system for the relaxation of hot carriers in quasi-2D systems (Ryan et al., 1984; Hopfel et al., 1985). The properties of so-called ni-pi superlattices can be examined briefly in this part of this section. For a complete review, one must refer to a recent paper by Dohler (1986a). These superlattices, which have been for a long time an alternative proposal to heterostructure superlattices, consist in a periodic structure of n- and p- doped semiconductor layers separated by intrinsic layers of the same semiconductor, GaAs for instance. Usually, the intrinsic layer is suppressed. The doping concentration is of the order of lo’* cm-3 and the layers are in the few hundreds of A range. The n- and p- doped parts of the superlattice can be contacted by separate selective contacts in order to perform a variation of the magnitude of the oscillations of the band edges, which are reproduced in Fig. 71. It is clear from the observation of the real space energy diagram that the band bending induces a dramatic decrease of the effective bandgap: the photoluminescence energy can be reduced by about 30 per cent from its uniform bulk value (Schmitt-Rink et al., 1984). Incidently, the observation of this low-power photo-luminescence shows that the “tunneling” radiative recombination, which involves an electron and a hole spatially separated, is an effective recombination process with respect to non-radiative channels. The main optical property of the ni-pi structures is the tunability of the photoluminescence energy. By electrical or optical injection it is possible to accumulate the electrons and the holes in their respective potential valleys. The bands are then flattened and the effective bandgap is increased (Fig. 72). The same effect can be obtained by applying an external voltage through selective contacts (Kunzel et al., 1982). This is an example of some specific properties that can be expected to be developed to provide interesting applications.
FIG. 71. Schematic conduction and valence-band edges in a ni-pi-structure.
112
G . BASTARD, C. DELALANDE, Y . GULDNER AND P. VOISIN
ENERGY (QV 1
15
1
WAVELENGTH (km) FIG.72. Photoluminescence spectrum of a ni-pi- structure at various excitation powers from Dohler, (1986a).
6. Quantum Wells Under Magnetic or Electric Fields Most of the results obtained from photoluminescence or absorption experiments carried out with a magnetic field parallel to the growth axis have been discussed previously, in relation with the problems of the valence-band dispersion relation and of the exciton binding energy. The Landau levels in a superlattice with a magnetic field parallel to the layers have also been studied from excitation spectroscopy (Belle et al., 1985). When the cyclotron-orbit radius is larger than the superlattice period, one can obtain a tool for probing the dispersion relation in the growth direction. The optical properties of Q W s under an electric field perpendicular to the well plane have been the subject of many experiments which have been recently reviewed (Miller et al., 1986b).The electric field can be applied between a semi-transparent Schottky contact evaporated on the surface of the sample and a conductive part of the structure (substrate, doped buffer). The QW or multi-QW's structure can also take place in the undoped part of a p+in+ junction. A decrease of the photoluminescence intensity when applying an electric field has been observed by several authors both at low temperature (Mendez et al., 1982; Miller and Gossard, 1983a) and at room temperature (Yamanishi et al., 1985). An increase of the recombination lifetime has also been found (Polland et al., 1985b; Yamanishi et al., 1986; Kash et al., 1985)and can be explained in terms of the field induced reduction in the overlap between the electron and hole wavefunctions. This increase of the recombination
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
1 13
u -2 0
-3
-1
1
VOLTAGE (hid FIG.73. Energy of the excitonic peak as a function of applied external voltage obtained by 2°K photoluminescence in a 5 x 160 A-thick multi-QW structure sandwiched between a semitransparent Schottky contact and a Si-doped substrate, from Vina et al., 1986.
lifetime with respect to other non-radiative recombination processes, or a possible field induced escape of the electron and hole out of the QW, explains the reduction of the luminescence efficiency. The band-bending induces also the low-energy Stark shift of the exciton-related transition, without any ionization of the exciton, usually called quantum-confined Stark effect (Polland et al., 1985b; Mendez et al., 1982; Vina et al., 1986; Iwamura et al., 1985)(Fig. 73). At fields up to lo5 V/cm, and when the hole is pushed towards the “inverted” QW interface containing defects and acceptors, a strong quenching of the exciton luminescence has been observed (Polland et al., 1985a). This is followed, at higher fields, by an efficient extraction of free carriers out of the QW as demonstrated by the increase of the photocurrent already measured in the same experiment. At room temperature, the Quantum Confinement Stark Effect has been observed in the absorption spectrum of multi-QW structures (Miller et al., 1984d, 1985a) (Fig. 74). This property, and other non-linear properties associated with the excitonic nature of the absorption onset in 2D systems, have induced many fundamental and device-related studies, including in the waveguide configuration (Miller et d., 1984d, 1985a, 1986b). Finally, the crossed transport or optical experiments under in-plane electric field involve rapidly an ionization of the exciton in a high field and hot electron regime (Hopfel et al., 1985; Shah et al., 1984),and are out of the scope of this review. 7. Superlattices and Vertical Transport When, in a multi-QW structure, the GaAlAs barriers are thin enough, a strong coupling occurs between successive QWs. The quantum states must be considered as delocalized over the whole superlattice and not localized in a single QW. The formation of the superlattice mode with decreasing barrier has
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
114
h
3
Y
U
1.43
1.48
t w (aV) FIG.74. Room temperature absorption of a 65 x 96 A multi-QW structure near the band edge for three static potential applied in the growth z-direction, from Miller et al., 1984d, 1985a.
G a As - G aO,*& A
a A
As
DOUBLE WELLS L, = 4 5 A h = 12A
*
b rn
z w
I-
z, W
0
z w o 0
rn
w
E
ZE
3 0 I-
0
I
a
/ /
\
'\
0
FIG.75. Excitation (full line) and photoluminescence (dashed line) spectra of a two-coupledwell sample at 2 and 40 K. The wells are 45 A-thick and the barrier is 12 A-thick. The transitions A and C (B and D) involve heavy (light) hole levels; the transitions A and B (C and D) occur between symmetric (antisymmetric) levels induced by the coupling through the 12 A-thick barrier.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
1 15
been studied in a double QW structure (Bastard et al., 1984b; Dingle et al., 1975) (Fig. 75):a succession of symmetric and antisymmetric states are observed in the case of a two QW-symmetric structure. The observation of levels in accordance with theoretical calculations has been also performed in GaAs-GaAlAs superlattice structures (Chomette et al., 1985). In a two-QW-structure, the GaAlAs barrier can be large enough to avoid any significant coupling (i.e. the splitting AE of the eigenstates due to the coupling is very small with respect to E, the confinement energy), but thin enough to allow a tunneling of the excitation from the thinner well, which presents the larger confinement, to the larger one, which contains the fundamental level (i.e. AE large with respect to h/z, where T is the recombination lifetime). One obtains then the simplest scheme of vertical transport (Fig. 76). When the thin enough barrier separates a single QW from the surface
1700
1600
ENERGY
(meV)
FIG.76. Upper curve: low-temperature photoluminescence spectrum of two 50 A-thick (dashed line) and 100 A-thick (full line) QW's separated by a 200 A-thick Ga,,,,AI, ,,As barrier. Middle curve: excitation spectrum of the 50 A QW. Lower curve: excitation spectrum of the 100 A QW, showing the peaks involving the levels of the 100 A-thick QW and those involving the levels of the 50 A-thick QW. The last peaks are due to a tunneling of the excitation from the 50 8, QW to the 100 A QW.
116
G. BASTARD, C. DELALANDE, Y. CULDNER A N D P. VOISIN
of the sample suitably activated to negative electron affinity, the electrons can be photoemitted out of the structure by tunnel effect (Houdre et al., 1985). By selectivelyexciting a heavy-hole or light-hole transition, this system could lead to the production of highly polarized electron beams, due to the lifting of the degeneracy of heavy-and light-holes. In the same spirit, vertical transport of the excitation through a tailored structure, in particular resonant tunneling through double barriers, leads to numerous device applications reviewed recently by Cappasso et al. (1986). Impurities or enlarged wells localize the excitation in a superlattice structure. By studying the photoluminescence of an enlarged well either by steady-state (Chomette et al., 1985) (Fig. 77)or time-resolved (Deveaud et al., 1986) experiments, the mean transfer time through 9000 A of superlattice was found to be about 4 ns in a 40140 A superlattice and 880 ps in a 30130 A, which evidences the rather high mobility in small period superlattices in the growth direction. An intentionally increased disorder in the well width fluctuation has also been shown to localize the excitation in the larger wells, under a mobility edge, in accordance whh localization theory (Chomette et al., 1986).
ENERGY ( e V 7
0.8
0.75
a73
WAVELENGTH (pm) FIG. 77. Photoluminescence spectra of 30/30 A, 40/40 A, 70/70 A, CaAs/Ca,,,AI,,,,As superlattices where an enlarged well W is grown at a 9000 A distance from the surface of the sample. The relative intensity of the superlattice and enlarged well luminescences indicates the efficiency of the vertical transport through the superlattices, from Chomette er al., 1985.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
1 17
I t is possible to grow superlattices with very short periods (few monolayers-thick). By replacing the AlGaAs ternary barriers by AlAs layers, the alloy disorder is suppressed and one may obtain structures of better crystallographic quality (Sakaki et al., 1985b; Ishibashi et al., 1985). The name “super-alloys” is sometimes given to such structures. For superlattices with AlAs to GaAs thickness ratios significantly less than one, the lowest energy transition is direct in character, the GaAs well is large enough and the AlAs barrier thin enough to avoid a large confinement of the conduction state and/or to permit a substantial penetration of the wavefunction into the barrier; the fundamental transition is associated with the r state. As for superlattices with larger AlAs to GaAs thickness ratios, the Ts state may be higher than the X state in the AlAs barrier, the Ts heavy-hole state remaining the fundamental level for holes. One obtains a fundamental transition which is indirect both in real space (spatial separation of holes and electrons) and reciprocal space (r and X-valleys). The shift between the direct absorption onset and the indirect photoluminescence, the decrease of the radiative efficiency, and the increase of the recombination lifetime up to a few ps support this interpretation (Danan et al., 1987, Danan 1988; Tamargo et al., private communication). These problems, vertical transport, short period superlattices, are among those which will contribute to the development of the study of optical properties in the GaAs-Ga(A1)As system and which will be also examined in the future in other 111-V systems. B. Other Unstrained I l l - V Systems
The lattice parameter of unstrained I11 -V heterostructures is dictated by the lattice parameter of existing high-quality substrate materials. Figure 78 shows various possibilities: the GaAs-A1As and GaAs-Gal _,AI,As heterostructures have been studied in the previous section. The ternary alloy In0,485Ga0.515P is lattice matched on GaAs and can be grown by the M.O.C.V.D. technique (Kuo et ul., 1985; Andre et al., 1986; Hino and Suzuki, 1984). This system is of interest for its potential (visible) short wavelength optical applications if the growth of large-gap AlGaInP quaternary alloy and the realization of abrupt ternary-quaternary interface are controlled in the future. In the same way, high quality QWs based on ternary and quaternary compounds containing Al, Ga, As, Sb and lattice-matched on InP have not been achieved. On the contrary, the ternary alloy In, -,Ga,As, lattice matched to InP with x = 0.467, is the subject of intense study for a variety of opto-electronic applications. In -,Ga,As-InP QW’s permit emission to be obtained in the wavelength range 1.1 to 1.6 pm. This wavelength range covers the regions of low-absorption and low dispersion in optic-fibre communication systems. The
,
118
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
LATTICE CONSTANT
( 8)
FIG.78. Room-temperature energy-gap and lattice constant for various binary and ternary 111-V alloys.
InP barriers can be replaced by Ino,~2Alo,48As, especially in M.B.E.-grown structures as M.B.E. growth is difficult to achieve with phosphorus. The properties of these two systems, In,,,,Ga,~,,As-InP and In,~,,Ga,,,,AsIno,~2Alo,48As, will be reviewed in this section. Finally, a short study of the type I1 InAs-GaSb system will be carried out. 1. InP-ln,,,,Ga,,,,As Following Goetz et al. (1983), the 2°K bandgap value of the In, -,Ga,As ternary alloy at optimum lattice match (x = 0.468) is 811 meV. The bulk exciton binding energy is about 2 meV, A new difficulty in the growth of InGaAs Q W s is the control of the gallium concentration. The concentration bandgap dependence is (Goetz et a/., 1983): E,(x)
= 0.4105
+ 0.6337~+ 0 . 4 7 5 ~ ~
Thus a 1% variation of x induces a 11 meV variation of E,(x), which is of the same order of magnitude as the energy details discussed previously for optical properties of QWs. The lack of measurement of x in a single Q W (which is possible in M Q W s by X-ray double diffraction), the difficulty for knowing the well width, the possible existence of rather smooth interfaces, make difficult a reliable understanding of the optical properties of InGaAs-InP QWs. Nevertheless, the study of QW’s grown by Metal-Organic Chemical Vapor Deposition at low or atmospheric pressure (Skolnick et al., 1986; Razeghi et al., 1983; Kuo et al., 1985b; Miller et al., 1986a; Moroni et al.,1987) by
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
119
Molecular Beam Epitaxy (Marsh et al., 1985), chloride transport vapor phase epitaxy (Di Giuseppe et al., 1983; Kodoma et al., 1983), and by chemical epitaxy (Tsang and Schubert, 1986) have provided low temperature emissions at energies as large as 1.100 eV, in Q W s whose nominal thickness was in the 10 A range, with linewidths in the range of 3 to 20 meV, depending on the quality of the sample and on the well-width (Fig. 79). Excitonic peaks corresponding to various subband to subband transitions have been observed in 2°K and room temperature multi-QW absorption spectra (Temkin et al., 1985) (the etching of the 1410 meV large gap InP substrate is not necessary in this system), in 2°K photoluminescence excitation spectra (Skolnick et al., 1986; Tsang and Schubert, 1986), and in photoconductivity measurements (Skolnick et al., 1986) (Fig. 80). Due to the lack of sensitivity of the intersubband transitions in a simple QW to the ratio of the conduction-to valence- band discontinuity Q,, it is difficult to determine Q, precisely from measurements of these transition energies, a fortiori only from the measurement of the photoluminescence energy. The results include a 0.5/0.5 ratio (Temkin et al., 1985), a 0.35/0.65 ratio (Kuo et al., 1985b), and a 0.4/0.6 ratio (Skolnick et al., 1986; Andre et al., 1986). This 220-240 meV conduction-band offset value was also obtained from the observation of a threshold in the photoconductivity spectrum under a forward bias perpendicular to the QW plane, this level threshold being interpreted as a transition from a valence-band well to the barrier conduction-band continuum (Skolnick et al., 1986). The nature of the low-temperature photoluminescence is difficult to determine and may vary with sample quality. A Stokes shift between the first absorption peak and the photoluminescence peak amounts to 7.9 meV in a 80 A-thick QW (Temkin et al., 1985) and to 12.5 meV in a 110 A-thick QW (Skolnick et al., 1986). These Stokes shifts are consistent -but this is not a proof- with a trapping of excitons on defects: clusters of lower x concentration, interface defects (see Fig. 81), donors in these n-type samples. The problem is complicated by the persistent photoconductivity effect which may generate free carriers in the QW under illumination (Skolnick et al., 1986). A measurement of the carrier concentration under illumination is then necessary. By thermally modulating the photoluminescence of GaInAs-InP QWs, the two-component nature of the photoluminescence has been evidenced (Gal et al., 1986),and attributed by the authors to the excitonic and band-to-band nature of the photoluminescence line. When the temperature is raised, a few meV increase of the photoluminescence energy has been observed and related by other authors to a de-trapping of excitons (Skolnick et al., 1986; Moroni et al., 1987); a simultaneous increase of the linewidth may arise, as well as the appearance of a low energy impurity related line. This shows the difficulty for drawing definite conclusions in the case of photoluminescence experiments. Nevertheless, the minor role of alloy broadening with respect to
-
I
I I
:188
I
m
268
> b cn
1
v
160i
z W
t
z
-26meV
15meV
z, 01 2 z I 3
45mJ 55meV
I
3
12a
I
I
I
I
InP/ In,,3Ga,,,,A
h
3 E
o 256
u
A
257 258 41 A 259 I 260 x 262 m 263
6
5 z 1101
0
W
w
0 z w 0
zz
=3 100' E z
w
a 3 c
2 w
90
a.
3 e 3
I?
80
100
50
WELL WIDTH
150
(1)
200
(b)
FIG.79. (a) Low-temperature of four 18-26-80- 180 A-thick InP-In,,,,Ga,,,,As QW's grown by M.O.C.V.D. [from Moroni et al., 19871. (b) Low-temperature photoluminescence as a function of well width for InP/In,,,,Ga,,,,As. The curve indicates the band-toband El - HI transition following the model of section I1 using the following parameters: spin orbit splitting A,,(InP) = I10 meV; A,(In,,,,Gao,47As) = 360 meV; electron effective mass m:(InP) = m~(ln,,,,Ga,,,,As) = 0.041 m,, heavy hole effective mass m:h(InP) = 0.56 m,; m;h(InGaAs) = 0.5 m,, bandgap E,(InP) = 1420 meV; E,(In,,,3Ga,.,,As) = 811 meV; conduction band discontinuity AEc = 220 meV.
120
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
-
PHOTON ENERGY ( e V 0.8
=J
m Y
121
-
1.0 I
I
>-
1.2
I
I
1
1
z 0
Y,
W
z 3
I
I
1.6
1.4
1.2
PHOTON WAVELENGTH
1.0 ( m)
FIG. 80. Photoluminescence Excitation Spectrum of a 75 A-thick InP-In,,,,Ga,,,,As from Razeghi et a/., 1984.
I
QW,
I
Defect radius 300 Defect depth b
InP/ lno,53Gao,4,As -
-
I
WELL WIDTH
1
(A)
FIG.81. Binding energy of the exciton on a defect of radius 300 A and of depth one or twomonolayers as a function of well width (Gerbier, private communication).
other broadening mechanisms (Tsang and Schubert, 1986) and the band-toband nature of the photoluminescence above 100°K (Moroni et al., 1987) seem to be clearly proved. 2.
I n 0 . 5 3 Ga0.4,
5 2 A10.48
As
The In,,53Gao.4,As-Ino,48Alo~52As system lattice-matched on InP substrates is an attractive material because the large Ino,48Alo,52Asbandgap (1470 meV at 2°K) and the large conduction-band discontinuity (about
122
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
0
100
L
200
ci,
FIG.82. The sum of the confinement energies of the nth heavy hole HH, and nth electron bound state Enis plotted versus the Gao,,,Ino.,,As slab thickness L in GaO,,,A1,,,,As/ Alo,481no~52As QW's. The parameters entering in the calculations are E,(InGaAs) = 750 meV (810 meV) at 300°K (77"K), E,(InAIAs) = 1470 meV; AEG = 440 meV; A,&(InGaAs)= A,,(InAlAs) = 360 meV; me = 0.038 m,; mhh= 0.465 m,. The experimental results are taken from Temlcin et a/. (1983)(crosses),Alavi et a/. (1983)(triangles), Welch et a/. (1983) (circles);Stolz et al., 1985 (full circles).
500 meV measured from C/V profiling (People et al., 1983) can induce quantum confinement effects more pronounced than in the In,.,,Ga,,,,AsInP system. Emission energies as large as 1280 meV have been actually obtained in M.B.E.-grown samples, as shown in Fig. 82. Here also, the intrinsic or extrinsic nature of the photoluminescence is not clearly determined, the lines being rather broad. The problem may be also complicated by the presence of carriers in QWs, in modulation-doped samples (Penna et al., 1985a) or may be in non-intentionally doped structures (Welch et al., 1985). A definite conclusion about Q, = A E J A E , (the above curve in Fig. 83 is obtained for A E, = 440 meV) is difficult to deduce, even from the assignment of excitonic peaks obtained in the absorption spectra of multiQ W structures at low-temperature (Fig. 82) or at room temperature (Weiner
-
-'E, 50n
s
I
I
I
I
I
I
-
G a o u Inas AS - A10.a Inas+
r20-
z
w 0
10E w 0
5X
0 c
gmln 2d l
T
=
T=77K
LHI-E,
J /
I
I
I
I
-
I
FIG. 83. Absorption spectrum of a G a o , , , I n o . ~ ~ A s / A l , , , ~ l n o ,multiple ~ ~ A s quantum well a 77"K, Courtesy J. Y. Marzin.
et al., 1985; Kawamura et al., 1985),as well as by electroreflectance (Goldstein ef al., 1985) or by photothermal deflection spectroscopy (Penna et al., 1985b).
3. InAs-GaSb InAs-GaSb is an example of a type I1 superlattice, where electrons and holes are not located in the same host material. Electrons are located in the InAs layer, holes in the GaSb layer. In fact, the interpretation of the optical absorption edges in a series of InAs-GaSb superlattices indicated that the conduction-band of InAs overlaps the valence band of GaSb by (150 _+ 5O)meV (see Fig. 44)(Sai-Halasz et al., 1978; Chang et al., 1981). Nevertheless, we shall, in the following, restrict ourselves to the short period regime (superlattice period < 150 A) where, due to confinement effects, the semiconductor nature of the superlattice is clearly established. The absorption coefficient is found typically 5 times smaller than that of an equivalent type I superlattice. The small overlap also weakness the Coulomb interaction, which explains the absence of excitonic features. In spite of the reduced optical matrix element, semiconductor InAs-GaSb superlattices luminesce (Voisin et al., 1981).The photoluminescence consists in a single line with a low-energy tail which tends to saturate when increasing the excitation level. The low energy tail can be attributed to recombination processes involving shallow defects, which may be thickness fluctuations or acceptors in the GaSb layers. The energy position, close to the calculated bandgap, and both the temperature- and excitation- dependence of the
124
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
-EXPERIENCE T z 1.8K Kr' LASER 80 mW THEORY Tc= 43K
220
300
260
PHOTON ENERGY
hL)
(meV)
FIG.84. Experimental (solid line) and theoretical (dotted line) low temperature band-toband luminescence lineshape, in a 30- 50 A InAs-GaSb superlattice. The effective carrier temperature is 43°K. After P. Voisin (1983).
_.-EXPERIENCE
/ /*
; / */*
/I
230
I
I
I
I
250
270
290
310
L
PHOTON E N E R G Y ( m c V 1
FIG.85. Experimental (dashed and dotted-line) and theoretical (solid line) room temperature lineshape in a 30-50 A InAs-GaSb superlattice. The fit is actually sensitive to the width AE, of the first conduction subband. After P. Voisin (1983).
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
125
spectrum supports the interpretation of the main line in terms of band-toband recombination. A fit of the lineshape corroborates this interpretation (Voisin, 1983, These de Doctorat) (Fig. 84). The luminescence signal is still observed at room-temperature, as shown in Fig. 85. A t this temperature, the electron distribution thermally occupies a large part of the Brillouin zone in the growth z-direction, and the lineshape reflects the k, dependence of the optical matrix element explained in Sec. 111, which blurs out the Van Hove discontinuity of the density of states at E , + A E , , where AEl is the width of the electron state due to the coupling between the wells. V. STRAINED LAYERSYSTEMS Most of the theoretical considerations on the energy levels in semiconductor heterostructures developped in Part I apply exclusively to systems grown out of lattice matched materials. Unfortunately, the list of the lattice parameters of 111-V compounds given in Table IV shows at first glance that this perfect lattice matching condition can never be fulfilled with all-binary systems. Instead, the lattice mismatch is always sizeable, ranging for instance from 0.13% (GaAs-AlAs) and 0.6% (InAs-GaSb and GaSb-AlSb) to 3.5% (InP-InAs and InP-GaAs) and 7% (InAs-GaAs and GaSb-GaAs). Some ternary and quaternary alloys can be perfectly lattice matched to InP (Ino,53Gao,4,As)or closely lattice matched to GaAs (Al,Ga, -,As, x = 0.3) but, clearly, the perfect lattice matching is a highly restrictive condition, which, fortunately, turned out to be non-necessary to the realization of heterostructures presenting a perfect crystalline integrity, as long as the layers under consideration are thin enough so that the lattice mismatch can be accommodated by built-in biaxial elastic strain. The presence of built-in strains affects the structural aspects of the materials, and rises important questions concerning the stability (mechanical and thermodynamical) of the heterostructures. It also influences their electronic properties through the strain-induced changes of the band structure of the host materials. In this review, we discuss briefly the structural properties TABLE IV LATTICE PARAMETERS OF THE MAINIll-V SEMICONDUCTORS AT 300 K
AIP
AlAs
AlSb
GaP
GaAs
GaSb
InP
lnAs
lnSb
5.45 10
5.6605
6.1355
5.45 12
5.6533
6.0959
5.8686
6.0584
6.4794
126
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
of heterolayers involving lattice mismatched materials (V.A). Then we examine how their band structure can be obtained within the envelope function approach (V.B), and we finally discuss the most salient results obtained in systems such as In,,,Ga,.,As-GaAs, GaSb-AlSb, In0,,A1,~,AsGaAs and InAs-GaAs (V.C). A . Structural Aspects
Consider two perfect crystals having the same structure, but different lattice parameters a A and aB,and try to connect them along a plane interface. Two limiting situations are easily imagined: (i) exert biaxial stresses of opposite signs on materials A and B until they have the same in-plane lattice parameter a? = a: = a,, and then let the atoms establish their chemical bonds across the interface. This is the strained layer regime, characterized by a perfect or coherent interface, and an excess of energy equal to the elastic energy stored in the A and B layers; (ii) keep the unperturbed lattices, and let the atoms established as many chemical bonds as possible across the interface. This is the relaxed layer regime, characterized by an areal density of dangling bonds equal to the difference of areal density of atoms on each side of the interface, n, N ( l / a A ) ’ - ( l / a B ) ’ .This number would be as large as 2.610” cm-’ for the GaAs-AI,,,Ga,,,As interface! To be specific, consider a layer of material B of thickness d B ,grown on a semi-infinite substrate of material A. In the strained layer regime, the excess of energy is proportional to the epilayer thickness dB,while in the relaxed layer regime, this excess of energy is essentially a property of the interface, and it does not depend on the epilayer thickness. Thus the two regimes will cross for some critical layer thickness dc. This, however, is an oversimplified view, as the point defects in the relaxed layer regime rearrange in a network of dislocations which have long range (logarithmic) strain fields. The important problem of calculating dc is still a matter of investigation: Indeed there are several models based on approximate calculations of the thermodynamical or mechanical equilibrium (Van der Merwe, 1972; Matthews and Blakeslee, 1974), but they seem to predict values of dc definitely smaller than those found experimentally (People and Bean, 1985; Kasper, 1986). Furthermore, it was recently found, in the case of the Ge,Si, -,-Si system, that dc depends on the growth temperature (Kasper, 1986),which tends to indicate that the actual critical layer thicknesses are dominated by kinetic or thermo-activated processes. We discuss in the following one possible thermodynamical approach. 1. Thermodynamical approach to the critical layer thickness
The geometry of misfit dislocations in 111-V heterostructures grown along the (001)axis has been studied in details in the pioneering work of Matthews
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
127
and Blakeslee (I 974). These dislocations where found to be straight lines lying along the (1 10)directions of the interface plane, with a Burger vector b parallel to the (101) or (01I ) directions Ibl = a/& = 4 A. Such a dislocation contributes to the accommodation of the lattice mismatch in the (001) plane by a/2,bL, if L is the length of the sample perpendicular to the dislocation line. One can find in text books that in a semi-infinite homogeneous material, a dislocation parallel to the free surface is attracted towards this surface with a force per unit lenth (Landau and Lifchitz, 1967; Nabarro, 1967; Hirth and Lothe, 1968): F
= pb2/4nKd
(151)
where 1/K = cos’ $ + sin’ $/( 1 - v), $ being the angle between the dislocation line and its Burger vector, p = 1/S44 is the shear modulus, v = - S , , / S , is Poisson’s ratio (the Si;s are the elastic compliance constants), and d is the distance to the free surface. We immediately derive the energy per unit length Ed associated with the creation of such a dislocation: Ed = (pb2/4nK)(lndlb
+ 8)
(152)
Where the constant 8 = 1 stands for the core energy. We now consider a rectangular sample consisting of a L, x L , substrate with a large thickness ds, on top of which is an epilayer of a lattice mismatched material of thickness d, as sketched in Fig. 86. The accomodation of the lattic mismatch 6ala in the direction parallel to L,(,, is supposed to be shared between an homogeneous strain and a network of n2(1)dislocations parallel to L2(1): 16a/a - E , I
= n , a / 2 J Z ~ , ; 16a/a
- E,I
= n,a/2JZ~,
(153)
FIG.86. Schematic representation of the square network of dislocations at the interface between a semi-infinite substrate and a partly relaxed epilayer.
128
G. BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
As a consequence of Colonetti's theorem (Landau and Lifchitz, 1967; Nabarro, 1967; Hirth and Lothe, 1968), there is no interference term, in the expression of the total elastic energy, between the dislocation strain field and the homogeneous strain. If we neglect the interaction between dislocations, the excess of energy E associated with this configuration of the epilayer is the sum of the homogeneous elastic energy Eel and of the dislocation energy ( n , L , + n 2 L 2 ) E d where: , =
(L,L2dp/(1 - v ) } ( E : + E:
+
(154)
2 ~ ~ 1 ~ 2 )
Writing that aE/d&, = a E / a e 2 = 0, and setting in these equations
=
c2 = 6a/a then gives the critical layer thickness, at which the first misfit dis-
location will be generated: dc = { b(6a/a)-'( 1 - v/4)/47c(1
+ v)} (In(d,/b) + 0)
(155)
This expression is exactly one fourth of that deduced from the mechanical equilibrium model in the case of a superlattice where the lattice mismatch is equally shared by the two materials, as also discussed in Matthews and Blakeslee (1 974, 1975, 1976).Surprisingly, the thermodynamical approach described here is thus equivalent to the mechanical model of Matthews and Blakeslee (1974, 1975, 1976). The critical layer thicknesses dc vs. the lattice mismatch 6a/a are plotted in Fig. 87. A representative figure is d, = 90 A for 6a/a = 1%. It is clear that this calculation is not exact, because Eq. 151 applies to a semi-infinite homogeneous material, the approximation 0 = 1 is a rough estimation of the core energy, etc. However, these approximations are not likely to account for the discrepancy between experimental data (People and Bean, 1985, Kasper, 1986) and the prediction of Eq. 155: In the model that we discuss here, the work required to bring the dislocation line from the surface to a distance ad (a s 1) is: W(ad) = (b2p/4nK)(ln(ad/b)+ 0) - bp&,ad(l
+ v)/(l
-
v)
(156)
This quantity exactly vanishes for a = 0 (dislocation outside the sample), and for a = 1, at the thermodynamical equilibrium. In between, there is a potential barrier which has to be overcome. This accounts, at least qualitatively, for the thermo-activated character of the experimental data. Note also that W depends explicitly on the elastic constant p, while the thermodynamical critical thickness dc does not. 2. Plastic Relaxation of the MisJit An interesting consequence of the model of Sec. V.B.l is the analysis of the way the plastic relaxation occurs for layer thicknesses larger than d,.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
1
1
0.5
I
I
I
2
1
6a/a
I
1
3
129
I,
4
(%I
FIG.87. Thermodynamical critical layer thickness d , for a single layer grown on a thick substrate (solid line) and for a superlattice with equal layer thicknesses (dashed line).
Indeed, for d" > dc, the lattice mismatch is shared between the homogeneous = E~ = and the network of dislocations nl/Lz = n J L , = strain 2Ji(6a/a - eL)/a.We get: e,/(da/a)
= (dc/dB)(lnd"/b
+ 0)/(ln dc/b + 0)
(157)
If we neglect the logarithmic dependence in Eq. 157, the residual strain relaxes to zero as dc/d", that is rather slowly, and very large layer thicknesses are required if one wants to ensure a quasi-perfect plastic relaxation. This is particularly important when a superlattice is grown on a buffer layer which is designed to match the equilibrium in-plane lattice parameter of the SL, and has to accommodate the misfit with respect to the substrate. An example of such a situation is illustrated in Fig. 88 (Sauvage et al., 1986), which shows the X-ray double diffraction spectrum obtained in a Alo.,lno,,As(177 A)GaAs( 195 A) SL grown on top of a 5500 A-thick Alo,9,1n0,0,As buffer layer. The quantitative fit of this diffraction spectrum also shown in Fig. 88 indicates that the amount of plastic relaxation of the buffer layer is 42% only, which explains why the actual strain of the SL-GaAs layers (0.165%) is significantly different from the designed value (0.41%).
130
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
FIG.88. Experimental and theoretical X-ray rocking curves in a Al, -,In,As-GaAs superlattice grown on top of a Al, -,In,As buffer layer which, in this case, presents only a partial plastic relaxation with respect to the GaAs substrate.
Remarks:
(i) The conclusion that there is a gradual and not an abrupt transition between the strained-layer and the relaxed-layer regimes is likely to remain correct even if the actual critical layer thicknesses are dominated by thermal activation of the motion of the dislocations. (ii) This may be a fundamental difficulty for the experimental determination of d,, especially for relatively small misfits. The study of the optical properties of a thin QW layer grown on top of the partly relaxed layer, or the investigation of the mobility degradation in modulation-doped structures can provide information which can be compared with structural characterization data. (iii) Due to the absence of interference between the dislocation strain field and the homogeneous strain (which can be considered as resulting from an external stress), Eq. 151 should remain correct for the complex system of alternate layers, where the inhomogeneous built-in strain is piecewise constant along the growth axis. On the other hand, parallel dislocations with parallel burger vectors do interact (they repel each other), which may affect the plastic relaxation. (iv) It was suggested that experimental data for the Ge,Si, -,-Si system can be fitted by a law. Such a misfit dependence cannot be found in a thermodynamical equilibrium model, and supports the idea of activated processes. In this respect, the fact that the data points were obtained at various growth temperature ranging from 550 C to 700 C (Kasper, 1986; People, 1986) should be considered. (v) For layer thicknesses smaller than the “thermodynamical” critical thickness discussed here, the heterostructure is certainly stable and can a-
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
13 1
priori be used to fabricate devices subjected to heating (F.E.T.3, Lasers.. .). For layer thicknesses larger than d,, it is perhaps still possible to get strained layer structures, but, clearly, these will be metastable and will have probably poorer device performances.
3. Elastic Properties of Strained-Layer Heterostructures We now consider an A,B superlattice in the strained layer regime. Two situations may be encountered: (i) the SL is in self mechanical equilibrium, i.e. its in-plane lattice parameter a, is equal to the lattice parameter of the substrate (or buffer layer), a,; (ii) the SL is strained as a whole to conform with the lattice of the substrate, which often occurs when the misfit is small and/or when one of the SL component is the substrate material itself (e.g. GaAsAIGaAs). In this case the overall SL thickness is limited to some critical value, and the crystal is bent by the finite torque of the biaxial stress (Marzin, 1987); the radius of curvature are of the order of meters, and this effect (though easily observed in X-Ray topographies) does not change significantly the strain distribution in the epilayer. In the following, we assume that the SL is in self mechanical equilibrium. The layers in the SL exert on each other biaxial tensile and compressive stresses. A 1% strain involves stresses of the order of 10 kilobars. With the usual definition of the strain tensor, eij = l/2(dui/dxj + d u j / d x i ) , where u is the displacement vector, the elastic properties of cubic materials are described by the simple matricial equation:
Where the index 1, 2, 3 refers to the crystallographic axis x/(lOo), y/(OlO), z/(Ool). The aiis are the components of the stress tensor and the Siis the elastic compliance constants. The Sij’s of the various III-V’s are remarkably similar (Landolt-Bornstein, 1982), but not identical, and we keep them explicitly in the following. The analysis of the strain distribution is particularly simple for systems grown along the (001) axis, as the strain tensor reduces to E~~ = E~~ = and E~~ = E ~ From . the absence of external stress on the (001) free surface, we immediately get:
&y= (2S12/(Sll + S I Z ) ) A ’ B & y -&y ‘v
(159)
By minimizing the total elastic energy E,, = 1/2(LA&$7C+ LBc;a:) with respect to the strain distribution, taking into account E? - E? = 6a/a, one
132
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
readily gets the equilibrium in-plane lattice parameter of the superlattice: a,(LA/TA
+ L B / i B =) a A L A / i A+ aBLB/["
( 160)
,
where = (S, + SIZ)A-B. The case of SL's grown along the (11 1 ) axis leads to more complicated algebra. We get:
where E,/ is the strain along the growth axis and c1 the strain in the layer plane, which are related by: E///E = I('11
+ 2S12 - s44/2)/(S11 + 2S12 + s44/4)
The self equilibrium condition is still given by Eq. 160, with (*A,B
=
(162)
replaced by:
+ 2% + s44/4jAsB
(163) The best experimental method to determine the strain state of a given heterostructure is certainly the quantitative analysis of X-ray double diffraction rocking curves (Quillec et al., 1984). However, it is sometimes necessary to crosscheck such structural analysis with other experimental data. For example, the strain changes the effective spring constants between the atoms, and therefore it shifts the frequency of the LO phonons (Jusserand et al., 1985; Abstreiter et al., 1986b), which can be determined by Raman scattering. A representative figure for this effect is a shift of 3.5 cm-' for a strain el = 1% (Abstreiter et al., 1986b3, which means that the accuracy of this interesting method is relatively poor compare to X-ray data. Finally, the strain affects considerably the band structure of the host materials, and consequently the energy levels of the SL or QW structure. (Sll
B. Electronic Properties of Strained-Layer Superlattices
In the presence of elastic strain, the SL hamiltonian becomes: H = H;
+ H r + Hrk + HE + H : + H:k + &,
( 164)
Invariant expansions of the various bulk terms have been deduced by Pikus and Bir (1974), and useful explicit formulas for the 111-V's or 11-VI's can be found in the literature (Pollak and Cardona, 1968; Trebin et al., 1979).As each of the terms in Eq. 164 is piecewise constant, the basic assumptions of the envelope function model still hold. In the following, we restrict the discussion to the cases of SL's grown along a (100) or (111) axis, which are of practical interest. The H,, terms, which correspond to relatively small strain-induced k-
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
133
linear terms, vanish at kl = 0 for these two high symmetry directions (Trebin e t a / . , 1979).To our knowledge, their influence on the valence band structure of strained-layer SL's have not been investigated yet, and we neglect them in the and following. The strain hamiltonian HF"' acting on the conduction (r6) valence (r8, r,)bands is (Pollak and Cardona, 1968)
fp'' = --~1'(''(~11
+
622
- fid'((LIL2
+ b'{(L: - 1/3L2)&ll+ c.P.) + L ~ L I ) E+,c.P.} ~ ~ 3 3)
(165)
Where the Liare the components of the angular momentum operator L, and c.p. denotes cyclic permutations of the indices. a'(') is the hydrostatic deformation potential, which shifts the corresponding band as a whole, while b and d are uniaxial deformation potentials which lift the residual degeneracies of the valence band. The biaxial strain configuration can be analysed as the sum of an hydrostatic dilatation (or compression) and of a uniaxial compression (or dilatation) along the growth direction. For the (001) growth axis, the expansion of the strain hamiltonian on the (r6, T8, r,)basis yields: a'
0 0 0
0 -Eg+Nv-p' 0 0
0 0
-E,
+ a' + p
-v%
11/29 1/2) 13/2,3/2) 13/27 1/2) I1/2,1/2)
0 0
-& + a'
- Eg - A
+
(166)
+
Where, using the notations S = 2(S1, 2SI2)/(S1, S12)= 1 and S' = ( S , , - SI2)/(Sl1+ S12)'v 2, a'(') and p are given by: mc(') = - S E , ~ ~ ( ~ )and p = -S'E,bV (167) The biaxial strain leaves the heavy-holes decoupled from the light particles (at k, = 0),but admixes the light-hole and split-off bands. In the following we keep the denominations "light hole" and "split-off" for, respectively, the upper and the lower of these coupled bands. The layer submitted to a biaxial strain is characterized by heavy-hole to conduction, "light-hole" to conduction, and "split-off" to conduction bandgaps approximately given by ( p << A): Ec - EH,
=
Eg + (a'
-)'N
+ (ac )'N Ec - E,q, = E, + A + (N'
Ec - E L , = E,
-
+p -p -)'N
-
2p2/A
(168)
+ 2p2/A
For an in-plane strain E, = lx, the band gap shift aC - a' is typically 80 meV and p 'v 40 meV. The modification of the band structure of a bulk layer grown along a (001) axis as a function of the in-layer deformation is shown in Fig. 89. In the compressive regime (6, < 0), the material has a -
134
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
Et
Et
Et
&XX
&,,=O
CXX’O
FIG.89. Band structure of a bulk material submitted to a biaxial compression or extension ( E > ~ 0).
< 0)
( E ~
fundamental heavy-hole to conduction band gap which varies slowly with the strain, while in the tensile regime ( E > ~ 0), it has a fundamental light-hole to conduction band gap which varies rapidly with the strain. The coupling between the light-holes and the split-off band may become important for materials having a small spin-orbit coupling constant like GaAs (A = 350 meV) or InP (A = 108 mev). For a growth axis parallel to ( l l l ) , the diagonalisation of H, is still relatively simple provided we rotate the coordinates in such a way that the new z axis (z*) is parallel to ( 1 1 1 ) (Pollak and Cardona, 1968). This operation is trivial, since the basis wavefunctions X,Y, and 2 transform as the corresponding coordinate: S* = S, X * = (X - Y ) / & Y* = ( X + Y - 22)/& and Z * = (X + Y + 2)/&, The expansion of He on the new rg,rQ,r: basis has the same form as Eqs. 166-168, with the quantities a and p replaced by: a*c(v)= -S*E,aC(V) Where S* = 3(&
S44/4).
and
p*
=
-S’* ELdv/fi
(169)
+ 2S12)/(S11+ S12+ S4,/4) and S’*= &4/4(s1, + S,, +
The layers in the SL are under alternate biaxial tensile and compressive stresses, which gives rise to a variety of band-edge line-up, as shown in Fig. 90. Due to the strain-induced splittings of the host valence bands, heavy- and light-holes are confined in quantum wells of different depth. The strain- and confinement-induced valence band splittings may add or tend to compensate each other; furthermore, beside the configurations directly inherited from the parent generation of type I and type 11, strains may generate “mixed types” SL‘s (Marzin et al., 1985)in which the heavy-holes (or light-holes) are confined in the same material as the conduction states, while the light-holes (or heavyholes) are localised in the other material.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS TYPE I
MIXED TYPE
TYPE
135
II
U
FIG.90. The six possible band offset configurations at the interface between strained materials.
An important feature of the strain hamiltonian is that, at kl = 0 it does not couple the heavy (J, = 3/2) and light (J, = 1/2) particles. It follows that the procedure described in Sec. ILA, 1I.B can easily be adapted to calculate the subband energies in the strained-layer SL. We once again get the KronigPenney like formula: cos qd = cos k A L A cos k,L,
-
(t + 1 / t )sin kALAsin kBLB
(170)
Where 5 = { kA/mA(E)}/{kg/mB(E)}, the light-particle dispersion relations being written formally E = h2k2/2m(E),where the energy E is measured from the bottom of the conduction band and, using the notation EL = E, +
(CF - u")
-
8:
+ E: + A + 8) - 2BZ}/{(E+ E i ) + 2 ( E + E: + A + 8) + 48)
m(E)= (3/2P2){(E+ E : ) ( E
(171)
As in Sec. II.A, the heavy-hole problem is calculated independently in a parabolic approximation, using the quantum well defined by the offset of the heavy-hole bands of the A and B strained materials. The in-plane dispersion relations of the valence subbands in these systems are also considerably influenced by the presence of strain. Various theoretical calculations have been reported (Osbourn, 1985a; Osbourn et al., 1986; O'Reilly and Witchlow, 1986), which have emphasised the possibility of obtaining systems in which the ground heavy-hole subband presents a light inplane mass, which can be qualitatively argued in a rather simple way (see Sec. V.D.2). At the opposite, for a system in which the quantum well layer is under biaxial tensile stress, the ground hole subband may be the first light-
136
G. BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
hole subband (Voisin et al., 1984b), and it presents in this case a very heavy in-plane mass (Bastard and Brum, 1986; Brum et al., 1988) (see Sec. V.D.1). C . Experimental Studies
As already mentioned, any real structure involves layers which are strained to some extent. In some cases, the influence of strains on the electronic properties is really negligible. A remarkable example is the GaAs-Al,Ga, -,As system in which, generally, the small misfit 6a/a = 1.3 x is accommodated by straining the Al,Ga, -,As barrier. The resulting valence band splitting ( = 3.5 meV for x = 0.3) simply appears as a meaningless difference in the barrier heights for heavy- and light-holes. However, if the GaAs substrate is removed, the epilayer tends to recover its mechanical equilibrium; in this case, part of the misfit is accommodated in the GaAs layer, and the presence of strain becomes apparent in the optical spectra (Dingle and Wiegmann, 1975). For heterostructures in which the misfit is of the order of 1%, the straininduced effects compare in magnitude with the quantum confinement. We discuss in the following the results obtained in systems displaying the two opposite strain configuration, and which have been investigated in details: in the GaSb-A1Sb and GaAs-Al,In, -,As systems the small gap material (GaSb or GaAs) is under biaxial tensile stress, while in the In,Ga, -,As-GaAs system it is under biaxial compressive stress. Finally, we discuss the case of the InAsGaAs system in which the misfit is much larger (7”/,),so that the problems of very large stresses and very thin layers both come into play. 1. The GaSb-A1Sb and GaAs-All -$,As
Systems
The GaSb-A1Sb heterostructures which we have studied (Voisin et al., 1984b)are sketched in Fig. 91a. They were grown on a GaAs substrate, which presents a misfit of -7% with the epilayer. This considerable misfit is certainly entirely accommodated by the plastic relaxation of the first AlSb layer, and the subsequent layers are essentially in self mechanical equilibrium. The X-ray double diffraction spectra shown in Fig. 91b show that the structural quality of these SL‘s is excellent, which means that despite the large density of the interfacial dislocation network, no or only few threading dislocations are present in the ten period GaSb-A1Sb epilayer. The super periods d = LA + L , turned out to be significantly smaller than the target values (d = (200 500) 8 for sample S1, and (100 + 500) 8 for sample S2), but the quantitative fit of the diffraction spectrum of sample S1 shows that the ratio LA/LB is equal to the designed value and that the strain distribution obeys closely Eq. 160. The layer thicknesses are LA = 181 8, L , = 452 8 for sample S1 and LA = 84 8, LB = 419 8 for sample S2. The strain in the GaSb
+
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
137
1
EMI-INSULATING GaAs
BUFFER LAYER
0 0 ,
2$! 1000
: : : : : : : : : : :
s1 u n
100
El GiSb TOP LAYER
1
1ooo+ : : : : : : : : : :
-
A
I
2:
\
\
; 10
;
0
10
0 0 3
V
v
-
s2
100
: 1
Y
1
0.1
-
31 . O
30.0 0 (deg)
1
0.1
30.0
31.0
0 (deg)
( b) FIG.91. Sketch of the GaSb-AISb heterostructures which we have studied (a) and X-ray double diffraction spectra obtained in samples S1 and S2 (b).
layers was also determined from the shift of the GaSb LO phonons (Jusserand et al., 1985) with respect to a bulk reference sample, and was found again in agreement with Eq. 160. The low temperature optical transmission spectra of these two samples are shown in Fig. 92. These spectra clearly exhibit the steplike behavior characteristic of the two-dimensional density of states and marked excitonic peaks at the onset of the first absorption steps. The type I nature of the system is definitely established from the consideration of the magnitude of the absorption, which is not far from the characteristic value of 0.5% per transition and per quantum well (see Sec. 1II.B). The importance of misfit strains is evidenced by two unusual features apparent in the transmission
138
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOlSlN Transmission
S2
(GaSb84A.AISbllgA) n.10
0.8 0.6 800
900 1000 1100 Energy meV
FIG.92. Low temperature optical absorption in samples S1 and S2.
spectrum of sample S1: The intensities of the two first excitons (denoted A and B in Fig. 92) are reversed compared to the usual GaAs-AlGaAs case, and the absorption edge is below the bandgap of bulk GaSb (810 meV). These features are easily understood in the theoretical framework sketched in the previous section: The GaSb layers in our samples experience a biaxial tensile stress, in order to conform to the lattice of the much thicker AlSb layers. According to Eq. 168, they are characterized by a fundamental light-hole to conduction bandgap E, - EL, = 756 meV and a somewhat larger heavy-hole to conduction bandgap E, - EH, = 791 meV. In this strain configuration, there is a competition between the effect of strain and the effect of the quantum confinement, which, in the case of samples S1 and S2, results in a reversal of the energy positions of the heavy- and light-hole excitons. The full and open arrows in Fig. 92 show the transitions involving the heavy- and light-hole subbands respectively, calculated using standard values of the host parameters at the r point (Landolt-Bornstein, 1982). As usual, the energies of the allowed transitions do not depend strongly on the value of the band offsets, but the present system is remarkable because the large bandgap difference (AEg N 1.5 eV) is certainly accommodated for the largest part in the conduction band. It follows that the number of confined conduction subbands is larger than the number of light hole subbands, which will therefore fix the number of observable light-hole to conduction transitions. This leads to a maxima-minima argument (at least four HH -P E transitions and at most two LH + E transitions in sample S1) which, if we can trust it, determines quite accurately the band offsets: AEHH= 40 meV, AELH = 90 meV, AE, = 1350 meV. The same analysis gives the same result for sample S2, which has
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
139
completely different parameters. It should also be noted that the observed light- to heavy-hole splitting is consistent with small valence band offsets (Voisin, unpublished). On the other hand, from the study of the resonance of the Raman scattering intensity near the El gap of GaSb in samples grown on a GaSb substrate and having different parameters, Tejedor et al. (1985) concluded that the valence band offset in these samples should be larger than 300 meV. The same experiment performed on our samples (Calleja et al., 1986) did not lead to the same conclusion. This raises the important question whether the band offset in a strained layer system can depend strongly on the actual strain distribution or not. This question remains presently unanswered. The comparison between theory and experiment shown in Fig. 92 becomes somewhat unsatisfactory for the upperlying transitions. This effect may arise from the proximity of the L-minimum which, in bulk GaSb, lies 84 meV only above the r-point. Indeed, the potential barrier in the k-space when going from to L is only N 1 eV high, and significant deviations from the Kane dispersion relations can be expected for energies in the conduction band larger than -200 meV. Another consequence of the proximity of the L-minimum is the possible crossing of the energy levels originating from the L and r points. When the motion along the (001) axis is quantized, the energy minimum in the L valleys raises, in first approximation as h2n2/2rn,L~,where the “confinement mass” m, is equal to ( 2 4 + m,)/3 N 0.51 m, (Griffiths et al., 1983). Because of the symmetry mismatch of the atomic parts of the Bloch wavefunctions, the quantized states built from the L minima should mix only weakly with those built from the r minimum. (This is also the reason why we consider for AlSb the gap at r, E,, = 2.3 eV, instead of the much smaller gap at L, EqL= 1.6 eV). The L-originating states are not seen in optical absorption, at least because, as they lie in the ( 1 10)directions of the SL Brillouin zone, the corresponding bandgaps remain indirect. However, because of the large value of rnL, these L-originating states become the fundamental conduction states for small GaSb layer thicknesses. The T-L energy separation depends on the strain state: under the biaxial tensile strain, it increases as A(E,L - E,,) = SE1c,, where the intra-band deformation potential El is 5 eV about (Landolt-Bornstein, 1982). In the limit of thick AlSb barriers (d, >> d,), E,, - E,, thus becomes equal to 118 meV. Neglecting the non-parabolicity at the L-minima, we estimate that the T-L crossover should occur at d , N 60 A. The sharp decrease of the direct gap luminescence reported by Griffiths et al. (1983), and the evolution of the luminescence lifetime with layer thickness are consistent with this estimate (Forchel et al., 1986). As already mentioned, the SL band structure at k , # 0 results from the coupling of all the valence subbands, which, as evidenced by the treatment outlined in Sec. II.B, depends on their spacing and ordering at k , = 0. In this respect, the strain-induced light and heavy hole subbands reversal in our
140
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
't
-7 U - B:lOT
U + B:lOT
U+ B = 6 T
H a80
0.85
0.90
ENERGY ( m e V ) (a)
1 1 1 1 1 1 1 1 1 1 1
0
5
10
MAGNETIC FIELD ( T 1 (b) FIG.93. Low temperature interband magneto-optical absorption spectra in sample S1, recorded either at a fixed magnetic field (a) or at a fixed photon energy (b).
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
14 1
samples is particularly interesting. Figure 93 shows magnetooptical transmission spectra in sample S1 (Voisin et al., 1986a), recorded either at a constant photon energy or at a constant magnetic field. These spectra show many transmission minima which correspond to the transitions LHY + EY or HHY + EY between the Landau levels associated with the first lighthole, heavy-hole and conduction subbands, respectively. These transmission minima, or absorption maxima are reported in the usual transition energy versus magnetic field plot shown in Fig. 94. This plot exhibits two distinct fan diagrams, eye-marked by the solid and dashed lines, which extrapolate towards E,, = 799 meV and EH = 829 meV respectively. In addition, there are two dashed-and-dotted lines having a non-linear behavior, which have been drawn through the exciton data points. They extrapolate to 795 meV and 820 meV for the light- and heavy-hole excitons respectively. As shown in Sec. III.F, a quantitative analysis of these data requires a model calculation of the Landau level energies, and of the oscillator strengths associated with the different transitions, which represents a considerable amount of theoretical work. We first discuss a considerably simpler semiempirical interpretation, that is analog in its principle to the “diagonal approximation”: (i) we discard any spin effect, because the overall polarization dependance of the data is weak, even though there is a strong polarization dependance of the excitonic absorption; (ii) we evaluate the energies E: from the semi-classical quantization rule k: -+ (2N + l)eB/h, using simplified in-layer dispersion relations, which would be exact if the heavy hole mass were infinite. (These k,-dispersion relations are likely to be quite accurate for the conduction subbands in a relatively large gap material. Note that the accuracy in the evaluation of the EY energies is a crucial point of the interpretation, as they are the dominant contribution to the observed transitions); (iii) we estimate finally the energies LH: and HH: in the same semi-classical approximation, using empirical parabolic k,-dispersion relations with the inplane effective masses mkHand mAH as fitting parameters. This procedure leads to the fan diagrams shown in Fig. 94, with a very heavy mass miH = 0.8 m, (solid lines) for the ground light hole subband LH, and a rather light mass m i H = 0.1 1 m, (dashed lines) for the first heavy hole subband, respectively. The overall agreement is fair, which merely witnesses that the qualitative interpretation is not too far from truth. However, several features are not accounted for by this empirical approach, namely the polarization dependance of the data and the exciton data points. Also, the energy difference between the extrapolation EH of the “HHY + EY” transitions and the energy of the HH, - El exciton at zero magnetic field is significantly larger than the possible binding energy of this exciton. The theoretical fan diagram (Brum et al., to be published) obtained by calculating the Landau level energies and the transition oscillator strengths as
142
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
0
5
MAGNETIC FIELD (T 1
10
FIG.- 1. Plot of the magneto-optical transmission minima in function c he magnetic field at which they occur. The lines connecting the data points correspond to the fit by the empirical model, which neglects the valence subband mixing.
described in Sec. 1I.B and 1II.F is shown in Fig. 95. A part from the criterion for a transition to be observable, which was arbitrarily choosen to be a squared wavefunction overlap (M in Eq. 150) larger than 0.2, the calculation does not contain any fitting parameter. The theory agrees with the empirical analysis for the transitions extrapolating towards LH, E l , i.e. it gives approximately 1ii:ear transitions involving a very heavy “light-hole’’mass mLH. Transitions in the Q+ and 0- polarizations are splitted by 2 5 meV at 10 Tesla, which corresponds essentially to the 9-factor, g = -9 of bulk GaSb, and a hint only of band mixing appears in the splitting of the 0 - transitions at high magnetic field. At the opposite, the heavy-hole transitions evidence a strong band mixing effect, as the oscillator strength shifts rapidly from the “allowed” HHY -+ E y transitions at low field to the “forbidden” HHf -+ ET transitions at high field. Except for the lowest transitions, which have a singular behavior, the transitions which can be observed extrapolate nearly linearly toward HH2 -+ El, which does explain the experimental observation. The calculation certainly describes the main characteristics of the data, but the fit is relatively poor, essentially because the slope of the transitions is always too large. We presently believe that the observed discrepancy -+
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
143
11111111111
0
5
10
MAGNETIC FIELD ( T I FIG. 95. Experimental fan diagram and theoretical calculation of the magneto-optical transitions having a significant strength.
essentially represents the effect of the Coulomb interaction. Indeed, in the presence of the strong magnetic field, any two-dimensional electron-hole pair state is an excitonic bound state, H f - EY pairs corresponding to (and extrapolating to) the (N + 1)s state of the Hi- Ej exciton. The energies of two-dimensional exciton states in an arbitrary magnetic field was calculated by MacDonald and Ritchie (1986) in the case of parabolic, non-degenerate bands. Their results indicate that Coulomb interaction considerably affects the transition fan-diagram in the whole range of energy and magnetic field of interest. For instance, corrections to the transition energies as large as 2: 10 meV should be expected in our case for the lowest transitions at 10 Tesla, which could account quantitatively for the discrepancies between our experimental and theoretical results. Figure 92 also shows in dashed lines the low temperature luminescence spectra observed in samples S1 and S2. They lie 30 to 50 meV below the exciton peak seen in the absorption spectra, and are typically 20 meV broad. Clearly, this luminescence is related to shallow defects which are most probably
144
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
residual acceptors. In fact it is likely that the observed luminescence corresponds to electron-acceptor recombination as often observed in moderate quality bulk GaSb. On the other hand, Ploog et al. reported recently a luminescence and excitation spectroscopy study of GaSb-AIGaSb SL's in which the confinement barrier consist of a short-period or pseudo-alloy GaSbAlSb SL (Ploog et al., 1985); the samples were grown on GaSb substrates. They attribute the 11 meV Stokes shift between luminescence and excitation to the trapping of excitons on interface fluctuations, and conclude that the luminescence of their samples is essentially excitonic. However, the observed Stokes shift has precisely the magnitude of the binding energy of the exciton on neutral acceptor in bulk GaSb, so that the contribution of these bound excitons in their data seems difficult to exclude. More recently, we have investigated the optical properties of a 90 A thick GaSb-A1Sb single quantum well (Raisin et al., 1987), in which the 14 meV Stokes shift between the luminescence and the LH, - El transition observed in the electro-reflectance spectrum may be unambiguously interpreted in terms of binding of excitons on neutral acceptors. The All -,In,As-GaAs (x I 0.2) system presents the same strain configuration, and is remarkably suitable for its investigation. The QW layers (GaAs) are made of a constant, well-defined, high quality material, and the biaxial strain on these GaAs layers may be tuned from -0 to -1.4% by changing the In content x in the large gap alloy barrier and/or the ratio LA/LB of the layer thicknesses. The luminescence excitation spectra of three samples (Sauvage et al., 1986)corresponding to increasing strain in the GaAs well (from top to bottom) at a supposedly constant confinement (nominal GaAs layer thickness dA = 133A) are shown in Fig. 96. The values of the strain indicated in Fig. 96 were obtained from a cross-checking of the optical data with the analysis of X-ray rocking curves like that shown in Fig. 88. The most striking feature is certainly the different strain-induced downward shift of the heavyand light-hole transitions, which leads to the reversal on theenergy scale of the HH, - El and LH, - Elexcitons. This system has also been investigated by Kato et al. (1986), in a somewhat larger strain regime (x = 0.18, N 0.64%). Structures in the broad-band room temperature luminescence were attributed to band-to-band transitions, in agreement with a theoretical estimate of the transition energies. 2. In,Gal -,As-GaAs This system has retained a large attention because of promising perspectives for device applications: (i) in optics, as a possibility of extending in the infra-red the energy range of GaAs-based materials; (ii) in electronics, for the predictably light in-plane mass of the ground heavy-hole subband which
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
145
3 n L
a Y
>
!=
Y,
z W
t-
z
+2.4
W
0
z W
0
Y z
5
J
1500
1550
1600
PHOTON ENERGY ( m e V ) FIG.96. Low temperature excitation spectra in three Al, _,In,As-GaAs superlattices having the same GaAs layer thickness and increasing strain (from top to bottom) in the GaAs layer. The arrows at 1.515 eV indicate the -30% decay of the signal arising from absorption edge of the GaAs substrate, which removes the contribution of the exciting light reflected at the back of the substrate.
would permit the fabrication of highly symmetric complementary FET's (Osbourn, 1985b). The system has been studied in some details in the two possible configurations of SL strained as a whole on GaAs (Marzin et al., 1985) and SL in self-mechanical equilibrium. Here, the small gap material (In,Ga, -,As) is under biaxial compressive stress, and the strain-induced and confinement-induced effects add instead of competing. This appears clearly in the absorption spectrum of a MBE-grown 100 A-200 A Ino~,,Gao.,,As-GaAs SL (strained on GaAs) shown in Fig. 97: No light hole transition is observed in the vicinity of the ground heavy-hole to conduction transition, because the heavy- to light-hole splitting is considerably enhanced by the strain. Here again, the magnitude of the absorption is a decisive proof of the type I nature of the system for the heavy-hole to conduction transitions. The qualitative identification of the other transitions apparent in Fig. 97 was obtained from an on-edge excitation experiment (Marzin et al., 1985):The sample was excited with a dye laser beam focused on the cleaved edge of the GaAs substrate, which is transparent in the spectral
(Arb.unitr)
I
I
I
w
FIG.97. Low temperature optical absorption in a ten period GalnAs-GaAs MQW structure strained on its GaAs substrate (from Marzin et al., 1985).
range of interest. Thus, photons propagate inside the sample nearly parallel to the interface, and the polarization may be in the layer plane or along the growth axis. In the first case, both heavy- and light-hole transitions are allowed, while in the second case, heavy-hole transitions are forbidden (see Sec. 1II.B).Then the quantitative fit of the transition energies, using the X-ray determined structural parameters, indicated strongly that the actual bandedge configuration is of the “mixed type”, with the heavy-holes and the electrons confined in the ternary alloy and the light-holes in the adjacent GaAs layers. In fact, the light-hole band offset was found very small ( ~ 2 meV), 0 so that the light holes are only weakly confined, and the LH1 + El transition remains strong enough to be observable. Low temperature luminescence in these samples was 7 to 10 meV broad and presented a Stokes shift of similar magnitude with respect to the absorption. Luminescence arises from recombination of excitons, bound to interface defects and/or to shallow defects. Still smaller linewidths (6.5 meV) were observed more recently in equivalent structures grown by low pressure MOCVD (Roth et al., 1986). Modulation-doped SL‘s of both n-type (Fritz et al., 1983) and p-type (Shirber et al., 1985)have also been investigated, in the configuration of rather thick SL‘s grown on graded-composition buffer layers. The characteristic mobility plateau at low temperature with mobilities up to 35000 cm2 V-’ s-’ were observed for the two-dimensional electron gas (Fritz et al., 1983),which is fair, given the moderate thickness of the undoped spacer layer ( N 45 A). More surprising was the observation of two-dimensional hole mobilities up to 14000 cm2 V-’ s-l at 4 K (Shirber et al., 1985),which implies that the in-plane mass of the carriers is rather light. This in-plane mass was measured from the temperature dependence of the amplitude of the Shubnikov-de-Haas oscillations in the magneto-conductance of the 2-D hole gas (Shirber et al.,
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
147
19854, and, independently, from a magneto-luminescence experiment on a n-type modulation-doped sample (Jones et al., 1985); both experiments yield an in-plane hole mass mhH = 0.15 m,. The fundamental valence state in these SL is certainly the ground heavy-hole state; in the “diagonal” approximation (Sec. ILB), the in-plane mass of the heavy-hole is: mhH
= I/(?l
+ ?2) = 4(mHHmLH)/(3
mHH
+ mLH)
4/3 mLH
(l72)
In the In,Ga, -,As-GaAs SLs, the interaction between the hole subbands at finite k, is reduced, compared to unstrained structures, as the gap between the HH, and LH, is increased by the effect of strains. This is why the first heavy hole subband may manifest a rather light in-plane mass, as suggested by Eq. 172. Note that the measured values are still much heavier than the prediction of the “diagonal” model, which proves the importance of the subband interaction. It was suggested (Osbourn et al., 1986) that non-parabolicity of the hole mass could be directly related to the gap between HH, and LH,. This, however, cannot be a universal result, as the gap between LH, and HH, is usually small enough to play an important role. Finally, hole concentrations were found as large as 410” cm-, (Shirber et al., 1985). This seems hardly compatible with a very samll valence band offset (Marzin et al., 1985), which raises again the question of the dependence of this quantity upon the strain state, which is not the same in the samples of Marzin et al. (1985) and Fritz et al. (1983), Shirber ef al. (1985), and Jones et al. (1985). 3. InAs-GaAs A particularly interesting system is the InAs-GaAs short period SL, as it is potentially free from alloy disorder and may present electronic properties similar to the bulk alloy, which is an important material for the optoelectronics in the 1.5 pm wavelength range. Superlattices with equal layer thicknesses, nearly lattice-matched to the InP substrate, have been grown successfully by MBE with layer thicknesses in the range of 10 A to 20 A (Tamargo et al., 1985). The thermodynamical critical layer thickness for this system, according to the model of Sec. V.B, is 44 A for the SL with the lattice mismatch of 7%, or 15 A for a single layer with the average lattice mismatch of 3.5%. Thus, it is not completely surprising that the structures were found dislocation free and gave a fair luminescence signal. The structures that we have studied (Voisin et al., 1986c) consisted in a ten period stacking of alternate layers of InAs and GaAs with equal layer thicknesses LA = L , = 10 A (sample R 490) or 20 A (sample R 520), sandwiched between thick AlInAs buffer and protective layers lattice matched to the InP substrate. The low temperature luminescence spectra of these two samples are shown in Fig. 98. They
148
G. BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
700
750
800
ENERGY ( r n e V ) FIG.98. Low temperature luminescence observed in two InAs-GaAs short-period superlattices with equal layer thicknesses, LA= L , = 10 A (Sl) and LA= L, = 20 A (S2).
consist in a single line, lying around 765 meV, about 50 meV broad, accompanied with a low energy tail which is somewhat sample dependent and tends to saturate with respect to the emission maximum when the excitation level is increased. This is an indication that the low energy part of the spectrum corresponds to recombination involving shallow defects, while the main part of the line is likely to be due to band-to-band recombination. Thus, the SL band-gap certainly lies near the center of the line, and the most interesting characteristic of these data is that this band-gap is essentially independent of the individual layer thickness, as can be expected intuitively in the very thin layer (or “pseudo-alloy”) regime where the electron wave function becomes delocalized and therefore averages the potential energy. Note also that the SL band-gaps ( N 760 meV) are not far from the band-gap of a bulk In,,,Ga,,,As alloy (-800 meV). It is of course attractive to try to calculate the band structure of such short period SL‘s with the envelope function formalism developed in Sec. ILA, at least for the simplicity of the calculations. Unfortunately, the basic assumption of envelope functions varying slowly at the scale of the host unit cell are not fulfilledanymore, and a new argument has to be developed here. When the SL period is decreased, electrons are more and more delocalised, and the envelope wave function progressively transforms into a plane wave which, at q = 0, has essentially no spatial variation, the envelope function approach is still valid, because the variations themselves of the envelope are small, even though they occur at high spatial frequency. This, in some respect, can be checked directly by developing the Kronig-Penney formula (Eq. 37) at vanishing qd, k,L, and k,L,, which, for gives the band extrema E ( q = 0) for
149
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
the light particles at the energies satisfying: kiLi{l
+ L,ME)/LAmA(E))+ k Z { 1 + L A ~ A ( E ) / L , ~ , ( E ) )
or
(173)
k i / k i = -LAmA(E)/L@B(E)
Or
LAmA(E) = -L,m,(E)
The first solution gives the conduction band extremum, while the second gives the energies of the light-hole and split-off bands. For L A = L,, and parabolic dispersion relations, the first solution is E ( q = 0) = (VB- V')/2, where VA(,)is the conduction band energy in the A(B) material. This, indeed, is a physically sensible result, which supports the extension of the envelope function calculation to the short period SL regime. In the InAS-GaAs system, with the 6 . 8 x misfit equally shared between the two hosts, Eq. 168 yields, using conventional values of the host band structure parameters (Landolt-Bornstein, 1982): For InAs: E, - EHH = 476 meV; E, - EL, = 632 meV; E, - Eso = 1058 meV, For GaAs: E, - EHH = 1389 meV; E, - EL, = 1033 meV; E, - E,, = 1668 meV. The strain induced perturbation is clearly considerable, and in particular, the quadratic coupling between the light-hole and split-off bands is a very important effect. The band extrema are shown in Fig. 99 as a function of the individual layer thickness, for the case of a conduction band offset A E , of 550 meV, and in Fig. 100 (Voisin et al., 1986b) as a function of AEc for individual layer thicknesses of 10 A and 20 A. A remarkable result apparent on Fig. 99 is that the heavy- to light-hole splitting does not vanish with the layer thickness, which should be expected because the SL does not recover the cubic symmetry. Note however that it turns out that this residual splitting is essentially a consequence of the finite value of the spin-orbit coupling constants and of the related strain-induced coupling of the light-hole and split-off bands. By examining Fig. 100, it is seen that only a value of the order of 550 meV for AE, can explain our observation of a band gap in the range of 760 meV and depending weakly on the layer thickness. Finally, it is noteworthy that the interaction with the remote bands, which add k Z terms in the dispersion relations, may play a non-negligible role in the band structure of these short period SLs (Marzin, 1987). A test of the 3-D character of these SL's is the width A E of the first conduction subband. A E increases exponentially with the reciprocal barrier-layer thickness L , at large L,, and as I/L,' when L , goes to zero. Here, we get B E = 273 meV for LA= L , = 20 A and A E = 920 meV for LA = L , = 10 A.
150
G. BASTARD, C. DELALANDE, Y.GULDNER AND P. VOISIN 400-
AEs, 500 meV
200 -
0-200-400 -4OOL
____-----600-
c c c *---
__/*
I
-800 -
I
1
I
I
I
10
20
30
40
I I
I1
I
50
FIG.99. Band extrema of the InAs-GaAs short-period superlattice in function of the individual layer thickness LA = L,, assuming a conduction band offset of 550 meV.
AE,
(meV)
FIG. 100. Heavy-hole to conduction (solid line) and “light”-hole to conduction (dashed line) bandgaps of the InAs-GaAs superlattices in function of the conduction band offset AE,, for the case of equal layer thicknesses LA = L,.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
151
VI. 11-VI SUPERLATTICES: OPTICAL DETERMINATION OF THE BANDSTRUCTURE
Semiconductor superlattices (SL) involving I1 - VI materials are new and important materials which present a great technical and fundamental interest. HgTe-CdTe (Faurie et al., 1982), CdTe-Hg, -,Cd,Te (Reno et al., 1986), CdTe-Cd, _,Mn,Te (Kolodziejski et al., 1984; Bicknell et al., 1984), ZnTeCdTe (Monfroy et al., 1986) and ZnTe-HgTe (Faurie et al., 1986) SL’s have been recently grown by molecular beam epitaxy, and experimental studies of the electronic properties of these heterostructures have been undertaken by different groups (Guldner et al., 1983; Olego et al., 1985; Hetzler et al., 1985; Ong et al., 1983; Berroir et al., 1986a; Reno et al., 1986b; Bicknell et al., 1985; Miles et al., 1986). These new systems exhibit very diverse characteristics mainly because of the peculiar band structure of the 11-VI materials which can be either semiconductors or semimetals. 11-VI superlattices involving a zero-gap (semimetallic) mercury compound and an open gap semiconductor, such as HgTe-CdTe SL‘s, form a new class of heterostructures which are called Type 111 SL‘s. The band structure of these heterostructures can be calculated by using the L.C.A.O. (Schulman and McGill; 1979, 1981) or the envelope function (Bastard et al., 1981b, 1982, 1986) models which give very similar results. In the present section, we shall first focus on the HgTe-CdTe system which presents a great technical and fundamental interest. It has been, for instance, proposed as a novel infrared material (Smith et al., 1983) for wavelengths around 10 pm. We shall present a survey of the optical data obtained on the HgTe-CdTe SL’s in the temperature range (2-300 K). The magneto-absorption measurements at low temperature as well as the infrared transmission experiments at 300 K are described. All the experimental data are interpreted in the envelope function formalism, and the value of the valence band discontinuity between HgTe and CdTe is discussed. Finally, the electronic properties of some other 11-VI SL systems recently grown by molecular beam epitaxy are briefly evocated. A . H g Te-CdTe S L Band Structure Calculations
The bulk band structure of HgTe and CdTe near the r points and the band line-up of these two materials is shown in Fig. 102. CdTe is an open gap semiconductor with a direct gap at the Brillouin zone center. At k = 0, the conduction band r6has a s-type symmetry whereas the upper valence band is degenerate and has a p-type symmetry ( J = 3/2). The spin-orbit split-off r, band (p-type symmetry, J = 1/2) is located below the Ts states with A = E,, - E,, 0.93 eV. HgTe is a zero gap semiconductor (or a semimetal) due
-
152
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
to the inversion of the relative positions of the r6and T8 edges. What was the T8 light hole band in CdTe forms the conduction band in HgTe and the r6 conduction band in CdTe becomes a light hole band in HgTe. The ground valence band is the T8heavy hole band so that the T8states represent both the top of the valence band and the bottom of the conduction band yielding to a zero gap configuration. The spin orbit separation A is 1.05 eV in HgTe. The evidence for the inverted structure of HgTe was mainly provided by magnetooptical measurements. The band structure of HgTe-CdTe SL's depends on the discontinuity A between the r8band edges of HgTe and CdTe, this parameter being measured from the top of the CdTe valence band (Fig. 101).It has been shown that most of the HgTe-CdTe heterostructures grown by molecular beam epitaxy present a p-type conduction at low temperature (Faurie et al., 1985a). From that observation, one can conclude that A must be positive, otherwise electron transfer would occur between the CdTe valence band and the HgTe conduction band yielding to a n-type conduction which would not be compatible with the experiments. As shown later from the analysis of the experimental data, A is found to be small and positive. The first experimental determination of A was obtained from far-infrared magneto-optical techniques and has given A -40 meV (Guldner et al., 1983). This positive value implies that the HgTe layers are potential wells for heavy holes while the situation for light particles (electrons or light holes) is more complicated because the bands which contribute most significantly to the light-particle SL states are the r, conduction band in HgTe and the r8light valence band in CdTe. These two bands have opposite curvatures and the same r, symmetry.
-
-300
I
lhn
r,
FIG. 101. Band structure of bulk HgTe and CdTe at 4 K. The Ih, hh and e indices refer to light holes, heavy holes and electrons, respectively.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
153
This mass-reversal for the light particles at each of the HgTe-CdTe interface is a unique property of the Type 111 SL's, in particular of HgTe-CdTe SL's. An important consequence of these very unusual features is the existence of interface states (Chang et al., 1985; Lin Liu and Sham, 1985; Bastard et al., 1981b, 1982, 1985) in the energy region (0,A) which are evanescent in both the HgTe and CdTe layers with a wavefunction peaking at the interfaces. This special situation met in type I11 SL's contrasts with the more common one corresponding, for instance, to GaAs-AlGaAs SL's (type I) where the SL states arise mainly from bands in GaAs and AlGaAs displaying the same curvature. The simplest description of the SL band structure is obtained in the framework of the envelope function approximation (Bastard et d., 1981b, 1982, 1985; Altarelli, 1983, 1985, 1986) as shown in Sec. 11. The band structure of both HgTe and CdTe near the r point is described in this approach by the Kane model which takes into account the non-parabolicity of the r, and light r' bands. This non-parabolicity is important in HgTe where the separation c0 between the r6and r8edges is small. The interaction with the higher bands is included up to the second order and is described by the Luttinger parameters y l , yz = y3 = y (spherical approximation). The band parameters of HgTe and CdTe at 300,77 and 4K used throughout this chapter are given in Table V. At 4 K, the Luttinger parameters are well-known and are taken from references (Weiler, 1981) and (Lawaetz, 1971) for HgTe and CdTe respectively. The temperature variation of y l , y between 4 K and 300 K is assumed to arise essentially from the variation of the interaction gap c0 between the l-6 and r, band edges. For a HgTe-CdTe heterostructure, the envelope function is a six component spinor (Altarelli, 1983, 1985, 1986) in each kind of layer, if one
TABLE V BANDPARAMETERS OF HgTe AND CdTe AT 4,77 AND 300 K . c0 IS THE INTERACTION ENERGY GAP BETWEENTHE r6AND rs EDGES.E, is RELATED TO THE SQUARE OF THE KANEMATRIX ELEMENT AND y,, y, K ARE THE LUTTINGER PARAMETERS OF THE r6 BAND.
H gTe CdTe
Y1
- 122
- 261
- 302
1425
1550
1600
Y
K
Y1
300 K
HgTe CdTe
- 44.8
5.15
-23.55 2.12
18 18
Y
K
4K - 25.50
1.50
- 15.5
5.29
- 8.9
1.89
- 10.85
1.27
154
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
considers only the r, and r, bands. A system of six differential equations for the six components envelope function is established from the 6 x 6 Kane hamiltonian and the boundary conditions are obtained by writing the continuity of the wavefunction at the interfaces and by integrating the coupled differential equations across an interface (Altarelli, 1983, 1985, 1986). Taking into account the SL periodicity d (Bloch theorem), the dispersion relations of the SL bands are obtained along the growth axis, which is usually the [l 113 direction, and in the plane of the layers. The model depends on a single unknown parameter, the valence band offset A, the others being wellestablished HgTe and CdTe bulk parameters as well as the HgTe and CdTe layer thicknesses d , and d , respectively. Figure 102 presents the calculated band structure of a (100 A) HgTe 36 A (CdTe) SL along k, ( z being the [l 111 SL growth axis) and k, (x being a direction of the [l 1 11plane of the layers). The zero of energy corresponds to the CdTe valence band edge and the calculations are done for A = 40 meV and T = 4 K. The lowest conduction band, E l , the ground light particle band, Z, and the heavy hole bands, HH,, HH,, HH3, are shown.
JL d
k,
0
.!L
k,
-2
d
FIG. 102. Calculated band structure along k, ([ill] axis) and k , (in the (1 1 1 ) plane) of a (100 A) HgTe-(36 A) CdTe superlattice. The zero of energy corresponds to the CdTe valence band edge, d is the superlattice period and A = 40 meV.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
155
For k, = 0, the light-particle and the heavy-particle bands are completely decoupled. The I band lies in the forbidden energy region (0,A) for the light particles and corresponds to an interface state at k = 0 with an envelope function peaking at the interfaces (Chang et a!., 1985; Lin Liu and Sham, 1985; Bastard et al., 1981b, 1982, 1986). This state results from the mass-reversal occuring for the I ', light-particle band at each interface. The SL band gap E, is defined as the separation between El and HH, at k = 0 and is 17 meV for the SL presented in Fig. 102.The calculated bandgap E, is found to decrease when A is increased and can even become negative for large A. The width of the El band along k, is small and, as a consequence, the calculated electron effective mass along k, is found to be much larger than the very small value occuring in the Hg, -,Cd,Te alloys with a similar bandgap. This might be an important advantage of the HgTe-CdTe SL's as infrared detector materials (Smith et al., 1983) compared to the corresponding Hg, -,Cd,Te alloys, because of the reduction of the tunneling effects which are usually important in small gap materials. For k, # 0, there is an hybridization between the I and the heavy holes subbands which results in a complicated valence band structure. In particular, it can be seen in Fig. 102 that the in-plane mass of HH, is rather light for small k, compared to the heavy hole mass in bulk HgTe (-0.4 mo). This could explain the high hole mobility obtained in p-type SL's from Hall measurements (Faurie et al., 1985a). In these analysis, the strain effects, due to the small lattice mismatch between HgTe and CdTe ( ~ 0 . 3 %are ) ~assumed to be negligible. The effects of strain were calculated by different groups (Wu and McGill, 1985; Schulman and Chang, 1986; Berroir and Brum, 1987). They found that strains change the band energies only by a few meV's, and they have shown that the band structure of semiconducting SL's grown along the [1 113 direction is not significantly influenced by strain. The conduction band E , is nearly unaffected whereas the order of the light (I)and heavy hole (HH,) band can be reversed at k = 0 but the resulting valence band structure along k, is nearly unaffected because of the strong hybridization between the I and HH, bands (Schulman and Chang, 1986; Berroir and Brum, 1987). In no case, the strain effects can strongly influence the experimental determination of the valence band offset A. Figure 103 shows the SL bandgap Eg and the corresponding cutoff wavelength Ag calculated (Guldner et al., 1985) at 300 K, 77 K and 4 K using A = 40 meV for SL's with equally thick layers of HgTe and CdTe (d, and d2 respectively). For each temperature, a narrowing of the SL bandgap is predicted when the layer thickness increases. More generally, when d , # d,, it is found that d, controls essentially the SL bandgap while d , governs the width of the bands along k, and therefore, the effective masses along the SL axis. Another important feature is that E, increases when the temperature is raised as observed in bulk Hg, _,Cd,Te alloys with a similar energy gap (Weiler,
-
156
G. BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
I 250
\ I
I
I
-
200 E Y
m
w
150 -
100-
50 -
I
0
I 50
I 100
I '50
dl :d;
FIG. 103. Energy gap Eg and cutoff wavelength Ig as a function of layer thickness for HgTeCdTe SL's with equally thick HgTe and CdTe layers ( d , = d2).
1981).Nevertheless, the temperature variation is calculated to be smaller for a SL than for the ternary alloy. In Fig. 103, it can be noted that the interesting cutoff wavelengths Ag for infrared detectors (8-12 pm) should be obtained at 77 K for layer thicknesses in the range (50-70 A). Note that the small value of A used in the presented calculations is consistent with the phenomenological common anion rule (McCaldin et al., 1976) and the L.C.A.O. approach of Harrison (1977, 1985), taking into account that the valence band energy depends essentially on the anion and that HgTe and CdTe are closely matched in lattice constant (within 0.3 percent). Nevertheless recent theoretical results based on the role of interface dipoles do not support the common anion rule and predict a much larger value A -0.5 eV (Tersoff, 1984a, 1984b, 1985, 1986). We shall show that the optical measurements interpreted in the envelope function formalism are consistent with a small positive offset between the HgTe and CdTe valence bands (A I0.12 eV), in agreement with the common anion rule for latticematched heterostructures.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
157
B. Magneto-Optical Measurements in H g Te-CdTe SL's
Interesting informations on the SL band structure can be obtained from far-infrared (F.I.R.) magneto-absorption experiments. When a strong magnetic field B is applied perpendicular to the layers, the SL bands are split into Landau levels. At low temperature, the F.I.R. transmission signal being recorded at fixed photon energies as function of B, presents pronounced minima which corresponds to resonant optical transitions between the different Landau levels. Intraband (namely cyclotron resonance) and interband magneto-optical transitions can be observed, depending on the Fermi level position at low temperature. From the theoretical analysis of the data, the SL band structure can be deduced (Guldner et al., 1983; Berroir et al., 1986a, 1986b). The four samples (S,, S,, S,, S,) used in the magneto-optical investigations reported here were grown by molecular beam epitaxy in the (1 11) orientation at low temperature ( - 185°C) on (1 11) CdTe, Cd,,,,Zn,,,,Te or (100) GaAs substrates (Faurie et al., 1985a, 1985b, 1986).The HgTe and CdTe layer thicknesses ( d , and d , respectively), the number of periods and the type of substrate for each sample are listed in Table VI. Because of the lower growth temperature used in M.B.E. compared to other epitaxial techniques such as L.P.E. or M.O.C.V.D., the interdiffusion between HgTe and CdTe layers is very small. The interdiffusion constant was recently measured between 110 and 185"C, and it turns out that for a 2 pm thick superlattice grown at 185"C, one can expect an interdiffused interface of 10 or less near the substrate (Arch et al., 1986).Samples S1,SJ, S, are p-type at liquid helium temperature and undergo a p- to n-type transition when the temperature is raised. Sample S, is n-type in the whole temperature range investigated (2-300 K ) with a maximum Hall mobility at 77 K p -40.000 cmZ/Vsec. The infrared magneto-absorption experiments reported here (Berroir et al., 1986b)were done at liquid helium temperature using a grating monochromaTABLE V1 CHARACTERISTICS OF HgTe-CdTe SUPERLATTICES USED I N THE MAGNETO-OPTICAL INVESTIGATIONS. (d, = HgTe LAYERTHICKNESS AND d2 = CdTe LAYER THICKNESS). THESAMPLES ARE GROWN IN THE (1 1 I ) ORIENTATION.
SI s2
s3 s4
d,(A)
d2(A)
n
180 100 71
44 36 38 20
100 100
38
70 250
Substrate
CdTe CdTe GaAs Cd,.9,Zn,.,,Te
158
G. BASTARD, C. DELALANDE, Y. GULDNER A N D P. VOISIN
tor (3 pm I 1 5 5 pm), a C 0 2 laser (9 pm 5 1 5 11 pm), a far-infrared laser (41 pm I1 I 255 pm) and carcinotrons (600 pm I 1 I 1 mm). The magnetic field B, which was provided by a superconducting coil, was applied perpendicularly to the plane of the SL layers. Figure 104 shows typical transmission spectra obtained in sample S, for several infrared wavelengths and Fig. 105 gives the energy positions of the transmission minima (i.e., absorption maxima) as a function of B for this sample. The observed transitions extrapolate to an energy -0 at B = 0 but they cannot be due either to electron cyclotron resonance because S, is found to be p-type for T c 20 K, or to hole cyclotron resonance because they would lead to hole masses much too small. They are attributed to interband transitions from Landau levels of the top-most valence band HH, up to Landau levels of the ground conduction band Elarising in a zero-gap SL
FIG. 104. Typical transmission spectra observed in sample S1 as a function of the magnetic field B for different infrared wavelengths.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
159
FIG.105. Energy position of the transmission minima shown in Fig. 104 as a function of B (full dots). The solid lines are theoretical fits.
(Guldner et al., 1983). The band structure of sample S1 along k, (SL axis), calculated in the framework of the envelope function model using A = 40 meV, and T = 4 K, is shown in Fig. 106, as well as the band structure of bulk HgTe and CdTe. For this particular value of A, S, presents a zero-gap configuration because E , and HH, are degenerate at k, = 0 and this is qualitatively in agreement with the results presented in Fig. 105. Note that similar results were also obtained from L.C.A.O. calculations (Schulman and Chang, 1986). The energies E,(n) and HH,(n) of the Landau levels of index n = 0, 1,2,. . . associated to El and HH,, were calculated using an approximate model where the influence of the higher bands are neglected (Bastard et al., 1981b, 1982, 1986). The selection rule for the interband magneto-optical transitions HHl(n)-+ E,(n’) are taken to be n’ - n = & 1, as for the interband Ts -+ Ts transitions in bulk HgTe (Groves et al., 1967; Tuchendler et al., 1973). The calculated transition energies using n‘ - n = - 1 and A = 40 meV are shown in Fig. 105 (solid lines). For example, the curve labelled 1 + O corresponds to the transitions HH ,(1) -+ E,(O). Note that the experimental data could be interpreted equally well with the selection rule n’ - n = + 1 except for the transition 1 + 0. The agreement between theory and experiment is fairly good for A = 40 meV. The deviation from the theoretical fit of the experimental data for the 1 - 0
160
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
FIG. 106. (a) Band structure of bulk HgTe and CdTe at 4 K (A = 40 meV). (b) Calculated band structure of sample S I along k, ([ 1111 axis) at 4 K.
transition around 2.5 T is not understood at the moment. The calculated band structure of sample S, (Fig. 106(b)) is confirmed by the observation of interband transitions from Landau levels of LH,, which is the topmost SL band arising from the r6HgTe states, up to Landau of El, in the photon energy range 300-400 meV. Figure 107(a) shows a typical magnetotransmission spectrum observed in this energy range in the Faraday configuration. The position of the transmission minima are presented in Fig. 107(b) and the solid lines correspond to the calculated transitions slopes using the approximate model neglecting the influence of the higher bands (Bastard et a/., 1981b, 1982, 1986). The selection rules are taken to be An = f 1, as those established for r6+ Ts magneto-optical transitions in bulk HgTe (Guldner et al., 1973) (Faraday configuration). The observed broad minima (Fig. 107(a)) correspond to the two symmetric transitions n + n + 1 and n + 1 + n which are not experimentally resolved. The agreement between theoretical and experimental slopes is rather good. The transitions converge to 344 meV at B = 0 while the energy separation between LH, and El is calculated to be 330 meV at k , = 0. The 14 meV difference can be explained by the approximations of the model. It might be also explained by the 0.3% lattice mismatch between HgTe and CdTe which results in an increase of the interaction gap in HgTe (Schulman and Chang, 1986) and, therefore, in an increase of the separation between LH, and El. Note that these observations rule out any appreciable interdiffusion between HgTe and CdTe layers. Indeed, in the case of strongly interdiffused HgTe layers, the interaction gap 1 . ~ ~ 1of the resulting HgCdTe alloy would be significantly smaller than 302 meV, its
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
161
FARADAY CONFIGURATION
-
340
I
0
I
I
I
1
2
3
B (TESLA) 4
5
0 1 2 3 4 FIG.107. (a) Transmission spectrum observed in sample S1 for an infrared photon energy E = 376 meV. (b) Energy position of the observed transmission minima versus B (full dots). The solid lines are the calculated LH, -+ E , transitions slopes using A = 40 meV, as described in the text.
value in pure HgTe at 4 K and, as a consequence, the energy separation between LH, and E , would be much smaller than 340 meV, which is not observed in the experiments. Quite different results are expected for sample S, which is an open gap SL (see the calculated bands structure in Figure 102) with a n-type conduction at low temperature. Figure 108(a) shows typical transmission spectra obtained (Berroir et al., 1986a) for different F.I.R. wavelengths in sample S,. A single broad minimum is observed, whose energy position as a function of B is shown in Fig. 108(b). The transition extrapolates to an energy -0 at B = 0 and is attributed to cyclotron resonance arising in the E , conduction band. The corresponding cyclotron mass at B 1 T is rn = (0.017 f 0.003) m,. No transmission spectra are obtained around 20 meV which corresponds to the LO phonon energy in CdTe and to the restrahlen band of the substrate. When the magnetic field is tilted from the normal to the layers, the line
-
162
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN I
I
I
I
I
J I
I
I
I
I
I
I
T
118p
I
5
I
I
I
I
B (TI
FIG.108. (a) Typical transmission spectra obtained at 1.6 K in sample S2 as a function of B for several infrared wavelengths. (b) Energy position of the transmission minima versus E (full dots). The dashed lines correspond to theoretical fits of the E , cyclotron resonance.
becomes broader and the minimum is shifted to higher magnetic field because of the anisotropy of the El band (Fig. 102). To calculate the Landau level energies when a magnetic field is applied along the k, direction, the model (Berroir et al., 1986a)is formally the same as this used at B = 0, replacing k by k - (eA/c) in the Kane hamiltonian and taking into account the direct coupling of the electron and hole spins to the field by introducing the additional Luttinger parameter K (Luttinger, 1956) (see Table V). As described in Sec. II.C, the motion parallel to the layers is described by a six component vector (Fasolino and Altarelli, 1984) if the r7 components are neglected: "n
=(C,CP,-I,C 2 ~ n - 2 9C3(Pn,C4(Pn, CscPn-lrC6(Pn+l)
where (pn is the nth harmonic oscillator function and n = - 1,0,1,2.. . For n I 1, the coefficients C, corresponding to the negative oscillator index vanish. The calculated Landau levels associated with El, I, HH, and HH, are shown in Fig. 109 using A = 40 meV. The situation is fairly complicated and the Landau levels are strongly mixed due to the coupling between the interface state I and the heavy hole bands. The ground conduction level corresponds to n = 1 and the second level to n = 0. The first El intraband transitions, fulfilling the selection rule An = + 1 (cyclotron resonance), corresponds to 1 + 2 and 0 + 1' (Fig. 109). The dashed lines in Fig. 108(b) are
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
0
1
2
3
4
163
5
B (TI FIG.
109. Calculated Landau levels corresponding to the E , , I , HH1 and HH, bands
(see Fig. 102) in the case of sample S2. The calculations are done for T = 4 K and A = 40 meV. For each band, the two Landau levels corresponding to the multi-component wavefunction Gn (see text) are noted n and n'.
the calculated energies of those two transitions using A = 40 meV. At low photon energies ( E < 15 meV), the dashed lines correspond fairly well to the observed broad F.I.R. absorption showing that both the n = 1 and n = 0 levels are populated. For E = 30 meV, the calculated magnetic field separation between the two lines is larger than the observed absorption line. Only one transition, i.e., 1 + 2, is observed indicating that only the n = 1 Landau level is populated at B 5T.The interband transitions between valence and conduction Landau levels are not observable in the investigated F.I.R. region (0-30 meV) because of the population of the ground conduction levels and of the value of the superlattice bandgap. Such transitions have been investigated in the CO, laser energy region (Berroir et al., 1986a), as shown in Fig. 110. Three transitions are observed in the energy region (1 10130 meV) and extrapolate to -20 meV at B = 0 (Fig. 110). They are interpreted as being due to HH, + E, magneto-optical transitions obeying the
-
164
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
-E
I
I
I
I
I
I
E
/&-4,
1
1
1
I
1
/2'-3'
I
v
1
W
130HgTe - CdTs lOOi - 3 6 i 120 -
110 -
I I
100 0
/ I
//I .
I
5
/ I
1
I
10 B ( T )
FIG.110. Energy position of the transmission minima (full dots) as a function of B corresponding to the interband HH,+ E , transitions observed in sample S2 at 1.6 K . The dashed lines correspond to the theoretical fits.
selection rule An = k 1. The dashed lines in Fig. 110 correspond to the calculated transitions using An = + l and A = 40 meV. The experimental data could be interpreted also with the selection rule An = - 1 due to the width of the observed absorption lines but, for the sake of simplicity, only one type of transition has been presented in Fig. 110. The results for sample S, are consistent with a valence band offset A =40 meV. The sensitivity of the fitting procedure to the value of A was studied, and it turns out that an acceptable agreement between experiment and the calculated transitions could be obtained for A within the limits (0-100 meV), if one takes into account the uncertainties on the sample characteristics, on the data (broad absorption minima) and on the band parameters of HgTe and CdTe used in the model. In addition, the S, bandgap becomes nearly zero for A > 100 meV, and interband transitions should then be observed, in addition to cyclotron resonance, in the (0-30 meV) F.I.R. region. Figure 1 1 1 presents magneto-optical spectra obtained in sample S, . The observed transmission minima are again interpreted as interband magnetooptical transitions between HH, and El Landau levels and a bandgap E, = (45 f 10) meV is deduced by extrapolating the energy of the observed transitions to B = 0. Finally, for each sample, a precise determination of the SL bandgap E, at low temperature can be obtained from magneto-absorption experiments (Berroir et al., 1986b) and Fig. 112 shows the value of E, deduced from such experiments for samples S , , S,, S, and S4. The solid lines in Fig. 112
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
165
Hg Te - CdTe 778
3aH 1333meC r
I
0
I
2
I
I
I
I
4
6
I
I
I
8
I
10
B(T) FIG.I 1 1. Magneto-transmission spectra at 2 K associated to H H , sample S3.
0
50
100
dc
(a)
150
-+
E , transitions in
200
FIG. 112. Variation of the superlattice bandgap E, as a function of the HgTe layer thickness d , . The experimental data for samples SI, S2, S3 and S4 are given by the solid dots; for each sample, the first number corresponds to d, and the second one to d 2 (in Angstrom). The solid lines are the theoretical variations Eg ( d , ) for three values of d , .
166
G. BASTARD, C. DELALANDE, Y.GULDNER AND P. VOISIN
are the theoretical variation of E, as a function of d l calculated, as described in Sec. VLA, for d 2 = 20, 30 and 50 using A = 40 meV and T = 4 K. Experiments and theory are in very satisfying agreement. An acceptable agreement can in fact be obtained for A within the limits (0-100 meV) by taking into account the uncertainties on the sample characteristics, on the experimental data and on the band parameters of HgTe and CdTe.
a
C . HgTe-CdTe SL Infrared Transmission at 300 K
In order to determine the SL bandgap Eg at 300 K, infrared transmission measurements were performed (Reno et al., 1986b)between 50 and 600 meV on several SL's whose characteristics are reported in Table VII. The absorption coefficient (a)was obtained by taking the negative of the natural logarithm of the transmission spectrum and then dividing by the thickness of the SL. The energy bandgap was defined to be the energy where u is equal to 1000 cm-'. TABLE VII CHARACTERISTICS OF HgTe-CdTe SUPERLATTICES PRESENTED I N FIG. 114. (d, = HgTe LAYERTHICKNESS AND d, = CdTe LAYER THICKNESS). THE SUPERLATICES ARE GROWN IN THE (1 1 1) ORIENTATION.
SLI SL2 SL3 SL4 SL5 SL6 SL7 SL8 SL9 SLlO SLI 1 SL14 SL15 SL16 SL17 SL18 SL19
38 40 45 74 97 110 100 17 58 47 85 74 70 61 70 37 52
20 60 17 36 60 40 36 38 35 30 45 32 35 25 41 61 34
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
167
/
- 25
H Q T ~ CdTe 60
6
10
WAVELENGTH
14
170 m b d S
10
(pm)
FIG. 113. Infrared transmission curve of SL 16 (see Table VII).
Even though the accuracy of this determination is questionable because the value of lo00 cm-' for a is rather arbitrary, it is found that the values of the bandgap determined in this way are in very good agreement with those obtained from photoconductivity threshold at the same temperature (De Souza et ul., unpublished results). Figure 113 exhibits a typical infrared transmission curve for a HgTe-CdTe superlattice. The value of the SL absorption coefficient is comparable to those measured in the HgCdTe alloys with similar energy gap. Figure 114 presents a comparison of the experimentally determined bandgap with the theoretical curves E, (d,) calculated for d, = 10,20, 30 and 100 8 using A = 4 0 meV and T = 300 K. There is a good agreement if one considers the uncertainties in the HgTe and CdTe parameters used in the theoretical calculation and in the experimental determination. The fact that the fit worsens for small d2 (samples SL, and SL, for instance) is most probably due to the increased effect of interdiffusion, which is estimated to be 10 A for the first grown layers (Arch et ul., 1986).The effect of interdiffusion is to shift E, towards the smaller energies, due to the decreasing CdTe layer thickness. It can be seen in Fig. 114, that the SL bandgap is essentially governed by dl when d 2 > 30 8.Indeed, d 2 governs the width of the subbands along k,, which strongly increases when d , is decreased. For d 2 > 30 8, the band widths are small and E, is nearly independent on d 2 . When d 2 becomes less than 30 8, the increase in the width of the E , and HH, bands strongly influences E, = El - HH, which is, therefore, no longer governed only by d,.
-
168
G . BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
20
40
80
60
dl
loo
120
(A,
FIG.114. Variation of bandgap E, of different HgTe-CdTe superlattices at 300 K as a function of the HgTe layer thickness d , . The samples characteristicsare listed in Table VII and the experimental data correspond to the solid circles (17 A 6 d2 < 24 A), crosses (25 A < d2 < 34 A), and open circles (35 A C d2 C 60 A). The solid lines are theoretical variations E, ( d , ) for different values of d , using T = 300 K and A = 40 meV.
From the results of Fig. 114, one can also deduce (Reno et al., 1986b) that the SL cutoff wavelength is easier to control than that of the Hg, -,Cd,Te alloy of the same bandgap. This effect, predicted by Smith et al. (1983), is another advantage of SL's as infrared detector materials, compared to the ternary alloy. Even though the bandgap determination is rather arbitrary, it is clear that the absorption spectra could not be interpreted with a large value of A. For A > 200 meV, the calculated bandgap energy is located below the onset of the infrared absorption for each sample and the best agreement between experiments and theory is again obtained for A in the limits (0-100 meV). Resonant Raman scattering experiments (Olego et al., 1985; Olego and Faurie, 1986)were also carried out in HgTe-CdTe SL's in order to investigate the SL valence states arising from the spin-orbit split off r7bands of HgTe and CdTe. The measurements were performed in back scattering geometry at 12 K with laser excitation in the neighborhood of the r, - r7edge of CdTe. From these experiments, one can conclude unambiguously that the r7 holes are confined in the CdTe layers. That implies an upper limit of 120 meV for A because the spin-orbit energy A = E,, - E,, is 1.05 eV (Weiler, 1981) and -0.93 eV (Olego et al., 1985)in HgTe and CdTe respectively. Recently, A was also measured by X-ray photoemission spectroscopy (XPS) (Kowalczyk et al.,
-
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
169
1986)and a much larger value, A -0.35 eV, was obtained, which supports the idea that lattice-matched heterojunctions with a common anion may present large valence-band discontinuities (Tersoff, 1984a, 1984b, 1985, 1986). The magneto-optical data at 2 K as well as the infrared transmission measurements at 300 K cannot be interpreted by using such a large valence band offset in the envelope function model. The reason of such a discrepancy between the optical data and the XPS measurements is not understood at the moment and more experiments are needed to clarify this point. D . Other Il-Vl SL Systems
Cd, -,Mn,Te SL's have been grown successfully by M.B.E. (Kolodziejski et af., 1984; Bicknell et al., 1984).These heterostructures are type I SL's similar to GaAs-AlGaAs SL's, in which both electrons and holes are confined in CdTe layers. Stimulated emission from an optically pumped Cd, -,MnTe-CdTe multilayer structure have been reported (Bicknell et af., 1985). Hg,-,Cd,Te-CdTe SL's have been grown recently (Reno et al., 1986b; Faurie, 1986) and magneto-optical and magneto-transport investigations are now in progress. For x 0.16, a type 111 + type I transition is expected at 4 K in these heterostructures (Reno, et af., 1986b),corresponding to the semimetal + semiconductor transition which occurs in the alloys at x 0.16. The valence band offset must be comparable to its value in HgTe-CdTe SL's, so that Hg, -,Cd,Te-CdTe type I SL's should display a conduction band offset much larger than the valence band offset and should present interesting potential applications for avalanche photodetectors (Capasso, 1983). A similar system, Hg, -,Mn,Te-CdTe SL's (Chu et al., 1987, Boebinger et af., 1987) should present attractive properties due to the magnetic properties of HgMnTe alloys. An exchange interaction exists between the localized magnetic moments associated with the M n + + ions and the conduction electrons. One can expect, for example, two-dimensional spin glasses to occur in such structures. The highly strained-layer ZnTe-HgTe and ZnTe-CdTe SL's with a 6.5% mismatch between the lattice parameters of the two host materials, were also grown recently by M.B.E. (Monfroy et af., 1986; Faurie et al., 1986) and look promising for basic physics and as an infrared material (Faurie, 1986). The 11-VI SL's in particular the type 111 SL's involving a 11-VI zero-gap compound, widen the field of two-dimensional systems in a very interesting way. Their band structures are more complicated and subtile than those of usual 111- V compounds systems and optical and magneto-optical measurements appear to be a powerful tool to investigate their electronic properties.
-
-
-
170
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
ACKNOWLEDGEMENTS We are much indebted to M. Voos, J. M. Berroir, J. Bleuse, J. A. Brum, F. Gerbier, J. P. Hirtz, M. H. Meynadier, J. Orgonasi and J. P. Vieren for their active participation to the work reported here. We gratefully acknowledge the generous supply of high quality heterostructures by Drs L. L. Esaki, L. L. Chang (I.B.M. Yorktown Heights, U.S.A.), F. Alexandre and J. L. Lievin (C.N.E.T. Bagneux, France), G. Weimann (D.B.P. Darmstadt, F.R.G.), P. M. Frijlink (L.E.P. LimeilBrevannes, France), A. Regreny (C.N.E.T. Lannion, France), M. Tamargo (Bell Communication Research Redbank, U.S.A.), J. P. Faurie (University of Illinois, Chicago, U.S.A.) and M. Razeghi (Thomson C.S.F., Corbeville, France). The groupe de Physique des Solides is Unite Associte au C.N.R.S. This work has been supported in part by the Greco “Expbrimentations Numeriques”.
REFERENCES Abstreiter, G., Cardona, M., and Pinczuk, A. (1984).In Light Scattering in Solids IV (M. Cardona and G. Guntherodt, eds.) and references cited therein. Springer, Berlin, West Germany. Abstreiter, G., Prechtel, U., Weimann, G. and Schlapp, W. (1986a).Surfnce Science 174, 312. Abstreiter, G.,Brugger, H., Wolf, T., Jorke, H., and Herzog, H. J. (1986b).Surface Science 174, 640. Alavi, K., Pearsall, T. P., Forrest, S. R., and Cho, A. Y. (1983).Electronic Letters 19,227. Alibert, C., Gaillard, S., Brum, J. A., Bastard, G., Frijlinck, P. M., and Erman, R. (1985).Solid State Commun. 53,457. Altarelli, M. (1983). Phys. Rev. B 28, 842; (1986).In Heterojunctions and Semiconductor Superlattices. Springer-Verlag, Berlin, West Germany; and (1985).J . of Lumines 30,472. Ancilotto, F., Fasolino, A., and Maan, J. C. (1987).Proc. 2nd Int. Con$ Superlattices, Microstructures and Microdevices. Goteborg, 1986.(To be published 1987.) Ando, T. (1982).J. Phys. Soc. Japan 51,3893. Ando, T. (1983).J. Phys. Soc. Japan 52, 1740.Ibidem (1984).53,3101.Ibidem (1984).53,3126. Ando, T.(1985).J . Phys. SOC.Japan 54,1528. Ando, T., Fowler, A. B., and Stem, F. (1982).Rev. of Mod. Phys. 54,437. Andre, J. P., Dupont-Nivet, E., Moroni, D., Patillon, J. N., Erman, M., and Ngo, T. (1986). J . Crystal Growth 71,354. Arakawa, Y., Sakaki, H., Nishioka, M., Yoshino, J., and Kamiya, T. (1985).Appl. Phys. Lett. 46, 519. Arch, D. K., Chow, P. P., Hibbs-Brenner, M., Faurie, J. P., and Staudenmann, J. L. (1986).J . Vac. Sci. Technol. A4(4), 2101. Austin, E. J., and Jaros, M.(1985).Phys. Rev. B 31,5569. Bangert, E., and Landwehr, G. (1985).Superl. and Microstr. 1, 363;(1986).Surf, Sci. 170,593. Bastard, G. (1981a).Phys. Rev. B 24,4174. Bastard,G. (1981b).Phys. Rev. B 24,5693. Bastard, G. (1982).Phys. Rev. B 25,7584. Bastard, G. (1984).Surf. Sci. 142,284. Bastard, G . (1986).Surf. Sci. 170,426. Bastard, G . (1988)Wave Mechanics Applied to Semiconductor Heterostructures Les Editions de Physique, Les Vlis, France. Bastard, G. and Brum, J. A. (1986)I E E E J . Quantum Electron, QE 22,1625. Bastard, G. and Voos, M. (1985).Unpublished.
OPTICAL CHARACTERIZATION O F SEMICONDUCTOR HETEROLAYERS
17 1
Bastard, G., Mendez, E. E., Chang, L. L., and Esaki, L. (1982).Phys. Rev. B 26, 1974. Bastard, G., Mendez, E. E., Chang, L. L., and Esaki, L. (1983).Phys. Rev. B 28, 3241. Bastard, G., Delalande, C., Meynadier, M. H., Frijlinck, P. M., and Voos, M. (1984a). Phys. Rev. B 29, 7042. Bastard, G., Ziemelis, U. O., Delalande, C., Voos, M., Gossard, A. C., and Wiegmann, W. (1984b). Solid State Comm. 49,671. Bastard, G., Berroir, J. M. and Brum, J. A. (1987). In Optical Properties of Narrow-Gap LowDimensional Structures NATO AS1 Series B: Physics Vol. 152. Plenum Press, New York, New York. Bauer, G., and Ando, T. (1985).Phys. Rev. B 31,8321. Bauer, G. E. W., and Ando, T. (1986a).J. Phys. C. 19, 1537. Bauer, G. E. W., and Ando, T. (1986b).Phys. Rev. B 34, 1300. Bauer, G. E. W. and Ando, T. (1987) Journ. de Phys. 48 Colloque (3,215. Ben Daniel, D. J., and Duke, C. B. (1966). Phys. Reo. 152,683. Belle, G., Maan, J. C., and Weimann, G. (1985).Surf. Sci. 170, 611. Berroir, J. M. (1985). Unpublished. Berroir, J. M., and Brum, J. A. (1987).Superlattices and Microstructures 3, 239. Berroir, J. M., Guldner, Y., Vieren, J. P., Voos, M., and Faurie, J. P. (1986a).Phys. Rev. B 34, 891. Berroir, J. M., Guldner, Y., and Voos, M. (1986b). IEEE Journal of Quantum Electronics QE22, 1793. Bicknell, R. N., Yanka, R. W., Giles-Taylor, N. C., Blanks, D. K., Buckland, E. L., and Schetzina, J. F. (1984). Appl. Phys. Lett. 45,92. Bicknell, R. N., Giles-Taylor, N. C., Schetzina, J. F., Anderson, N. G., and Laidig, W. D. (1985). Appl. Phys. Lett. 46,238. Bimberg, D., Christen, J., Steckenborn, A., Weimann, G., and Schlapp, W. (1985). In High Excitation and Short Pulse Phenomena (M. H. Pilkuhu, ed.). North-Holland, Amsterdam, The Netherlands, 562. Bimberg, D., Mars, D., Miller, J. N., Bauer, R., and Oertel, D. J. Vac. Techn. To be published. Bir, G. L., and Pikus, G. E. (1974).Symmetry and Strain-Induced Eflects in Semiconductors. Wiley, New York, New York. G. S. Boebinger, Y. Guldner, J. M. Berroir, M. Voos, J. P. Vieren and J. P. Faurie (1987),Phys. Rev. B 36,7930. Bohm, D. (1951). Quantum Theory. Prentice-Hall, New York. New York. Broido, D. A,, and Sham, L. J. (1985).Phys. Rev. B 31,888. Brum, J. A. (1987). Ph.D. Thesis Paris (unpublished). Brum, J. A., and Bastard, G. (1985a).J. Phys. C 18, L-789. Brum, J. A,, and Bastard, G. (1985b). Phys. Rev. B 31,3893. Brum, J. A,, and Bastard, G. (1987).Superlattices and Microstructures 3, 51. Brum, J. A,, Priester, C., and Allan, G. (1985).Phys. Rev. B 32,2378. Brum, J. A., Voisin, P., Bastard,G, Voos, M., Maan, J. C., Chang, L. L., and Esaki, L.(1988).S U R E SCI. I%, 545. Burkhard, H., Schlapp, W., and Weimann, G. (1986). Surf Sci. 174,387. Calleja, J. M., Meseguer, F., Tejedor, C., Mendez, E. E., Chang, C.-A., and Esaki, L. (1986).Surf. Sci. 168, 558. Capasso, F. (1983).J. Vac. Sci. Technol. B 1(2),457. Capasso, F., Luryi, S.,Tsang, W. T., Bethea, C. G., and Levine, B. F. (1983). Phys. Rev. Lett. 51, 23 18. Capasso, F., Mohammed, K., and Cho, A. Y. (1986). IEEE J. Quantum Electr. QE-22,1853. Caruthers, E., and Lin-Chung, P. J. (1978). Phys. Rev. B 17,2705. Chan, K. S. (1986). J. Phys. C 19, L-125. Chang, Y. C. (1987).Journ. de Phys. 48 Colloque (3,373.
172
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
Chang, Y. C., and Sanders, G. D. (1985). Phys. Rev. B 32,8321. Chang, Y. C., and Schulman, J. N. (1983).Appl. Phys. Lett. 43, 536; (1985). Phys. Rev. B 31,2069. Chang, L. L., Sai-Halasz, G. A,, Esaki, L., and Aggarwal, R. L. (1981).J. Vac. Sci. Techn. 19, 589. Chang, Y. C., Schulman, J. N., Bastard, G., Guldner, Y., and Voos, M. (1985). Phys. Rev. B 31, 2557. Chaves, A. S., Penna, A. F. S., Worlock, J. M., Weimann, G., and Schlapp, W. (1986).Surf. Sci. 170, 618. Chemla, D. S., Miller, D. A. B., Smith, P. W.,Gossard, A. C., and Wiegmann, W.(1984).I E E E J . of Quantum Electronics QE20,265, and references cited therein. Chomette, A,, Deveaud, B., Emery, J. Y., and Regreny, A. (1985).Superlattice,s and Microstructures 1,201. Chomette, A., Deveaud, B., Regreny, A., and Bastard, G. (1986). Phys. Rev. Lett. 57, 1464. Christen, J., Bimberg, D., Steckenborn, A., and Weiman, G. (1984). Appl. Phys. L e t t . 44,84. Chu, X., Sivananthan, S., and Faurie, J. P. Appl. Phys. Lett. 50, 597 (1987). Colvard, C., Merlin, R., Klein, M. V., and Gossard, A. C. (1980). Phys. Rev. Lett. 45, 198. Danan, G. (1988)These de Doctorat Paris (unpublished). Danan, G., Jean-Louis, A. M., Alexandre, F., Jusserand, B., Leroux, G., Marzin, J. Y., Mollot, F., Planel, R., and Etienne, B. (1987).Proceedings 18th International Conference on the Physics of Semiconductors. Dawson, P., Duggan, G., Ralph, H. I., and Woodbridge, K. (1983). Phys. Reo. B 28,7381. Dawson, P., Moore, K. J., Duggan, G., Ralph, H. I., and Foxon, C. T. B. (1986). Phys. Rev B. Delagebeaudeuf, D., and Linh, N. T. (1982). l E E E Trans. Electron. Devices ED-29,955. Delalande, C., Meynadier, M. H., and Voos, M. (1985).Phys. Rev. B 31,2497. Delalande, C., Orgonasi, J., Meynadier, M. H., Brum, J. A., Bastard, G., Weimann, G., and Schlapp, W. (1986). Solid State Commun. 59,613. Delalande, C., Brum, J. A., Orgonasi, J., Meynadier, M. H., Bastard, G Maan, J. C., Weimann, G., and Schlapp, W. (1987) Microstructures and Superlattices. 3,29. De Souza, M., Boukerche, M., and Faurie, J. P. Unpublished results Deveaud, B., Emery, J. Y., Chomette, A., Lambert, B., and Baudet, M. 1984).Appl. Phys. Lett. 45, 1078. Deveaud, B., Emery, J. Y., Chomette, A,, Lambert, B., and Baudet, M. 1985).Appl. Phys. Lett. 45, 1078. Deveaud, B., Chomette, A., Lambert, B., Regreny, A., Romestain, R., and Edel, P. (1986). Solid State Comm. 57,885. Di Giuseppe, M. A., Temkin, H., Peticolas, L., and Bonner, W. A. (1983).Appl. Phys. Lett. 43,906. Dingle, R. (1975). In Festkorperprobleme X V(H. J. Queisser, ed.). Pergamon Vieweg Braunschweig, p. 2 1. Dingle, R., and Wiegmann, W. (1975). J . of Appl. Phys. 46,4312. Dingle, R., Weigmann, W., and Henri, C. H. (1974).Phys. Reo. Lett. 33, 827. Dingle, R., Gossard, A. C., and Wiegmann, W. (1975). Phys. Rev. Lett. 34, 1327. Doezema, R., and Drew, H. D. (1986). Phys. Rev. L e u . 57, 762. Dohler, G. H. (1986a).IEEE Journal of Quant. Electr. QE-22, 1682. Dohler, G. H. (1986b).In Two-Dimensional Systems: Physics and New Devices. Solid State Sciences 67, Springer-Verlag,Berlin, West Germany, p. 270. Dohler, G. H., Fasol, G., Low, T. S., Miller, J. N., and Ploog, K. (1986).Solid State Comm. 57,563. Drummond, T. J., Klem, J., Arnold, D., Fisher, R., Thorne, R. E., Lyons, W. G., and Morkoc, H. (1983). Appl. Phys. Lett. 42, 615. Duffield, T., Bhat, R., Koza, M., De Rosa, F., Hwang, D. M., Grabbe, P., and Allen, S. J., Jr. (1986). Phys. Rev. Lett. 56,2724. Duggan, G., Ralph, H. I., Chan, K. S., and Elliott, R. J. (1985). Proceedings of the M R S Conference 47, Les Editions de Physique.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
173
Duggan, G. (1985).J. Vac. Sci. Technol. B3, 1224. Eisenstein, J. P., Stormer, H. L., Narayanamurti, V., Gossard, A. C., and Wiegmann, W. (1984). P hys. Rev. Lett. 53, 2579. Ekenberg, U., and Altarelli, M. (1984). Phys. Rev. B 30, 3369. Ekenberg, U., and Altarelli, M. (1986). Superl. and Micros. In press. Englert,T., Maan, J. C., Uihlein,Ch.,Tsui, D. C., and Gossard, A. C.(1983).Physica 117B& 1188, 631. Erhardt, W., Staghuhn, W., Byszewski, P., von Ortenberg, M., Landwehr, G., Weimann, G., van Bockstal, L., Janssen, P., Herlach, F., and Witters, J. (1986). Surf. Sci. 170, 581. Esaki, L. (1980). In Narrow Gap Semiconductors-Physics and Applications. Lecture Notes in Physics 133, Springer-Verlag, Berlin, West Germany. Fasolino, A,, and Altarelli, M. (1984).Surf. Sci. 142, 322. Fasolino, A,, and Altarelli, M. (1986). Surf. Sci. 170, 606. Faurie, J. P. (1986).IEEE Journal of Quant. Electr. Q E 2 2 (9) 1656. Faurie, J. P., Million, A., and Piaguet, J. (1982). Appl. Phys. Lett. 41, 713. Faurie, J. P., Boukerche, M., Sivananthan, S., Reno, J., and Hsu, C. (1985a). Superlattices and Microstructures I, 237. Faurie, J. P., Reno, J., and Boukerche, M. (1985b).J . of Cryst. Growth 72, 11. Faurie, J. P., Hsu, C., Sivananthan, S., and Chu, X. (1986).Surface Science 168,473 and references therein. Faurie, J. P., Sivananthan, S., Chu;X., and Wijewarnasuriya, P. S. (1986). Appl. Phys. Lett. 48, 785. Fischer, R., Masselink, W. T., Sun, Y. L., Drummond, J., Chang, Y. C., Klein, M. V., and MorkoC, H.(1984).J.Vac.Sci. Tech. B2, 117. Flores, F., and Tejedor, C. (1979).J . Phys. C 12, 731. Forchel, A., Cebulla, U., Trankle, G., Kroemer, H., Subbanna, S., and Griffiths, G. (1986). Surface Science 174, 143. Fouquet, J. E., Siegman, A. E., Burham, R. D., and Paoli, T. L. (1985). Appl. Phys. Lett. 46, 374. Franz, W. (1958). 2. Naturforsch 13a, 484. Frijlink, P. M. (1986). In Heterojunctions and Semiconductor Superlattices (G. Allan, G. Bastard, N. Boccara, M. Lanoo, and M. Voos, eds.). Springer-Verlag, Berlin, West Germany. Fritz, 1. J., Dawson, L. R., and Zipperian, T. P. (1983). Appl. Phys. Lett. 43,846. Fukunaga, T., Kobayashi, K. L. I., and Nakashima, H. (1986).Surf. Sci. 174,71. Gal, M., Kuo, C. P., Lee, B., Ranganathan, R., Taylor, P. C., and Stringfellow,G. B. (1986). Phys. Rev. B 34, 1356. Gerbier, F. Private Communication. Glembocki, 0.J., Shanabrook, B. V., Bottka, N., Beard, W. T., and Comas, J. (1985).Appl. Phys. Lett. 46,970. Gobel, E. O., Jung, H., Kuhl, J., and Ploog, K. (1983). Phys. Rev. Lett. 51, 1588. Gobel, E. 0.. Kuhl, J., and Hoger, R. (1985). J. of Lumines 30, 541, and references cited therein. Goetz, K. H., Bimberg, D., Jiirgensen, H., Selders, J., Solomonov, A. V., Glinskii, G. F., and Razeghi, M. (1983).J. Appl. Phys. 54,4543. Goldstein, L., Jean-Louis, A. M., Marzin, J. Y., Allavon, M., Alibert, C., and Gaillard, S. (1985).In GaAs and Related Compounds, Biarritz. Institute of Physics Conference Series number 74. Adam Hilger Ltd, Bristol and Boston, page 133. Greene, R. L., and Bajaj, K. K. (1983).Solid State Commun. 45,825. Greene, R. L., and Bajaj, K. K. (1985). Phys. Rev. B 31,913. Greene, R. L., Bajaj, K. K., and Phelps, D. E. (1984). Phys. Rev. B 29, 1807. Griffiths, G., Mohammed, K., Subbanna, S., Kroemer, H., and Merz, J. L. (1983).Appl. Phys. Lett. 43, 1059. Groves, S. H., Brown, R. N., and Pidgeon, C. R. (1967). Phys. Rev. 161,779.
174
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
Guldner, Y. Unpublished results. Guldner, Y., Rigaux, C., Grynberg, M., and Mycielski, A. (1973).Phys. Rev. B8, 3875. Guldner, Y., Vieren, J. P., Voisin, P., Voos, M., Chang, L. L., and Esaki, L. (1981).Phys. Rev. Lett. 45, 1716. Guldner, Y., Bastard, G., Vieren, J. P., Voos, M., Faurie, J. P., and Million, A. (1983).Phys. Rev. Lett. 51,907. Guldner, Y., Bastard, G., and Voos, M. (1985).J. Appl. Phys. 57,1403. Harrison, W. (1977).J. Vac. Sci. Technol. 14,1016 and (1985)B 3 (4),1231. Harrison, W. (1985).In Two-Dimensional Systems: Physics and New Devices, Solid State Sciences 67,Springer-Verlag,Berlin, West Germany, p. 62. Hayakawa, T., Suyama, T., Takahashi, K., Kondo, M., Yamamoto, S., Yano, S., and Hijikata, T. (1986).Surjiace Science 174,76. Hegarty, J., Sturge, M. D., Weisbuch, C., Gossard, A. C., and Wiegmann, W. (1982).Phys. Rev. Lett. 49,930. Hegarty, J., Goldner, L., and Sturge, M. D. (1984).Phys. Rev. B 30,7346. Heiblum, M., Mendez, E. E., and Stern, F. (1984).Appl. Phys. Lett. 44,1046. Hetzler, S.,Baukus, J. P., Hunter, A. T., Faurie, J. P., Chow, P. P., and McGill, T. C. (1985).Appl. Phys. Lett. 47,260. Hino, I., and Suzuki, T. (1984).J. Crystal Growth 68,483. Hirth, J. P., and Lothe, J. (1968).Theory of Dislocations. McGraw-Hill, New York, New York. Hopfel, R. A., Shah, J., Gossard, A. C., and Wiegrnann, W. (1985).Physica 134B,174. Houdr6, R., Hermann, C., Lampel, G., Frijlink, P. M., and Gossard, A. C. (1985).Phys. Rev. Lett. 55,734. Ihm, J., Lam, P. K., and Cohen, M. L. (1979).Phys. Rev. B 20,4120. Inoue, K., Sakaki, H., and Yoshino, J. (1984).Japan. Journ. Appl. Phys. 23,L-767. Ishibashi, A., Mori, Y., Itabashi, M., and Watanaka, N. (1985).J . Appl. Phys. 58,2691. Iwamura, H., Kobayashi, H., and Okamota, H. (1984).Jap. J. Appl. Phys. 23,L-795. Iwamura, H., Saku, T., and Okamoto, H. (1985).Jpn. J. Appl. Phys. 24,104. Jaros, M.,Wong, K. B., and Gell, M. (1985).Phys. Rev. B 31, 1205. Jiang, T.F.(1984).Solid St. Commun. 50, 589. Johnson, E. J. (1967).In Semiconductors and Semimetals, Vol. 3 (R. K. Willardson and A. C. Beer, eds.). Academic Press, New York, New York, p. 153. Jones, E. D., Ackermann, H.,Shirber, J. E., Drummond,T. J., Dawson, L. R., and Fritz, I. J. (1985). Solid State Comm. 55,525. Jusserand, B., and Paquet, D. (1986).In Heterojunctions and Semiconductor Superlattices. Springer, Berlin, West Germany, and references cited therein. Jusserand, B., Voisin, P., Voos, M., Chang, L. L., Mendez, E. E., and Esaki, L. (1985).Appl. Phys. Lett. 46, 678. Kane, E. 0.(1957).J. Phys. Chem. Solids 1, 249. Kash, J. A., Mendez, E. E., and Morkoc, H.(1985).Appl. Phys. Lett. 46, 173. Kasper, E. (1986).Surface Science 174,630. Kato, H., Iguchi, N., Chica, S., Nakayama, M., and Sano, N. (1986).J. Appl. Phys. 59,588. Kawamura, Y., Wakita, K., and Asaki, H. (1985).Electron. Lett. 21,371. Keldysh, L. V. (1985).Sooiet Phys. JETP, 7,788. Kleinman, D. A. (1983).Phys. Rev. B 28,871. Kleinman, D.A. (1985).Phys. Rev. B 32,3766;(1986).Phys. Rev. B 33,2540. Kleimann, D.A,, and Miller, R. C. (1985).Phys. Rev. B 32,2266. Klipstein, P. C., Tapster, P. R., Apsley, N., Anderson, D. A., Skolnick, M. S., Kerr, T. M., and Woodbridge, K. (1986).J . Phys. C 19,857. Knox, R. (1963).Theory of Excitons, Solid State Physics Supplement 5. Academic Press, New York, New York.
OPTICAL CHARACTERIZATION O F SEMICONDUCTOR HETEROLAYERS
175
Kodoma, K., Ozeki, M., and Komeno, J. (1983). J . Vac. Sci. Technol. B 1,696. Kolodziejski, L. A., Bonsett, T. C., Gunshor, R. L., Datta, S., Bylsma, R. B., Becker, W. M., and Otsuka, N. (1984). Appl. Phys. Lett. 45,440. Kowalczyk, S. P., Cheung, J. T., Kraut, E. A., and Grant, R. W. (1986). Phys. Rev. Lett. 56, 1605.
Kriechbaum, M. (1986). In Two-Dimensional Systems: Physics and New Devices, Solid State Sciences 67, Springer-Verlag, Berlin, West Germany, p. 120. Kroemer, H. (1986). Su$ace Science 174,299. Kunzel, H., Dohler, G. H., Ruden, P., and Ploog, K. (1982). Appl. Phys. Lett. 41,852. Kuo, C. P., Vong, S . K., Cohen, R. M., and Stringfellow, G. B. (1985a). J. Appl. Phys. 57, 5428.
Kuo, C. P., Fry, K. L., and Stringfellow, G. B. (1985b). Appl. Phys. Lett 47,855. Kyoto, “Electronic Properties of Two-Dimensional Systems” (1985). Surf Sci. 170, 1-767 (1986); see also Kyoto, “Modulated Semiconductor Structures” (1985) Surf Sci. 174, 1- 700.
Landau, L., and Lifchitz, E. (1967). Theory of Elasticity. MIR, Moscou, Chap. 4. Larsen, D. M. (1968). J . Phys. Chem. Solids 29,271. Lassnig, R. (1985). Phys. Rev. B 31,8076. Lawaetz, P. (1971). Phys. Rev. 84,3460. Lambert, B., Deveaud, B., Regreny, A., and Talalaeff, G. (1982). Solid State Comm.43,443. Landolt-Bornstein, (1982). Numerical Data and Functional Relationships in Science and Technology (0.Madelung, ed.) Group Ill, Vol. 17, Springer-Verlag. Berlin, West Germany. Lederman, F. L., and Dow, J. D. (1976). Phys. Rev. B 13, 1633. Lin Liu, Y. R., and Sham, L. J. (1985). Phys. Rev. B 32,5561. Lommer, G., Malcher, F., and Rossler, U. (1985). Phys. Rev. B 32,6965. Luttinger, J. M. (1956). Phys. Rev. 102, 1030. Maan, J. C. (1984). In Two-dimensional Systems, Heterostructures and Superlattices, Springer Series in Solid State Sciences 53, Springer-Verlag, Berlin, West Germany. Maan, J. C., Guldner, J. P., Vieren, J. P., Voisin, P.. Voos, M., Chang, L. L., and Esaki, L. (1984). Solid State Commun. 39,683. MacDonald, A. H.,and Ritchie, D. S. (1986). Phys. Rev. B 33, 8326; MacDonald, A. H., and Ritchie, D. S. Private communication. Mailhiot, C., Chang, Y.-C., and McGill, T. C. (1982). Phys. Rev. B 26,4449. Marsh, J. H., Roberts, J. S., and Claxton, P. A. (1985). Appl. Phys. Lett. 46, 1161. Marzin, J. Y. (1987). Thtse de Doctorat, Paris. Marzin, J. Y., and Goldstein, L. Private communication. Marzin, J. Y., Charasse, M. N., and Sermage, B. (1985). Phys. Rev. B 31,8298. Masselink, W. T., Chang, Y.-C., and Morkoc, (1983). Phys. Rev B 28,7373. Masselink, W. T., Sun.Y. L., Fisher, R., Drumm0nd.T. J.,Chang, U. C., Klein, M. V., and Morkoc, H. (1984). J . Vac. Sci. Technol. B 2, 117. Masselink, W. T., Chang, Y.-C., and Markoc, H. (1985). Phys. Rev. B 32,5190. Masumoto, Y., Shionoya, S., and Kawaguchi, H. (1984). Phys. Rev. B 29,2324. Masumoto, Y., Matsuura, M., Tarucha, S., and Okamoto, H. (1985). Phys. Rev. B 32,4275. Matthews, J. W. and Blakeslee, A. E. (1974). Journal of Crystal Growth 27, 118. Matthews, J. W. and Blakeslee, A. E. (1975). Journal of Crystal Growth 29,273. Matthews, J. W. and Blakeslee, A. E. (1976). Journal of Crystal Growth 32,265. McCaldin, J. O., McGill, T. C., and Mead, C. A. (1976). Phys. Rev. Lett. 36,56. McCombe, B. D., Jarosik, N. C., and Mercy, J. M. (1986). In Two-Dimensional Systems: Physics and New Devices, Solid State Sciences 67, Springer-Verlag, Berlin, West Germany p. 156 and references cited therein. Mendez, E. E., and Wang, W. 1. (1985). Appl. Phys. Lett. 46, 1159.
176
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
Mendez, E. E., Bastard, G., Chang, L. L., Esaki, L., Morkop, H., and Fisher, R. (1982).Phys. Rev. B 26, 7101. Menendez, J., Pinczuk, A., Weber, D. J., Gossard, A. C., and English, J. H. (1986). Phys. Rev. B 33, 8863. Meseguer, F., Maan, J. C., and Ploog, K. (1987). Phys. Rev. B 35,2505. Meynadier, M. H. These de Doctorat, Ecole Nationale Superieure des Tblkommunications, Paris. Meynadier, M. H., Brum, J. A., Delalande, C., Voos, M., Alexandre, F., and Lievin, J. L. (1985a). J . Appl. Phys. 58,4307. Meynadier, M. H., Delalande, C., Bastard, G., Voos, M., Alexandre, F., and Lievin, J. L. (1985b). Phys. Rev. B 31,5539. Meynadier, M. H., Orgonasi, J., Delalande, C., Brum, J. A., Bastard, G., Voos, M., Weimann, G., and Schlapp, W. (1986). Phys. Rev. B 34,2482. Meynadier, M. H., Nahory, R. E., and Tamargo, M. C. To be published. Miles, R. H.,Wu, G. Y.,Johnson, M. B., McGill, T. C., Faurie, J. P., and Sivananthan, S. (1986). Appl. Phys. Lett. 48, 1383. Miller, D. A. B. (1986).Surf. Sci. 174,221. Miller, R. C. (1984). J. Appl. Phys. 56, 1136. Miller, R. C., and Gossard, A. C. (1983a).Appl. Phys. Lett. 43,954. Miller, R. C., and Gossard, A. C. (1983b). Phys. Rev. B 28,3645. Miller, R. C., and Kleinman, D. A. (1985).J . Lumines. 30,520. Miller, R. C., Kleinman, D. A., Munteanu, O., and Tsang, W. T. (1981a). Appl. Phys. Lett. 39, 1. Miller, R. C., Kleinman, D. A,, Tsang, W. T., and Gossard, A. C. (1981b).Phys. Rev. B 24, 1134. Miller, D. A. B., Chemla, D. S., Eilenberg, D. J., Smith, P. W., Gossard, A. C., and Tang, W. T. (1982a).Appl. Phys. Lett. 41, 679. Miller, R. C., Gossard, A. C., Tsang, W. T., and Munteanu, 0.(1982b).Phys. Rev. B 25,3871. Miller, R. C., Gossard, A. C., Tsang, W. T., and Munteanu, 0.(1982~).Solid State Commun. 43, 519. Miller, R. C., Kleinman, D. A,, Tsang, W. T., and Gossard, A. C. (1982d).Phys. Rev. B 26, 1974. Miller, R. C., Gossard, A. C., and Tsang, W. T. (1983). Physica 117B and 118B,714. Miller, R. C., Dupuis, R. D., and Petroff, P. M. (1984a).Appl. Phys. Lett. 44,508. Miller, R. C., Gossard, A. C., Kleinman, D. A., and Munteanu, 0.(1984b).Phys. Rev. B 29,3740. Miller, R. C., Kleinman, D. A., and Gossard, A. C. (1984~).Phys. Rev. B 29,7085. Miller, D. A. B., Chemla, D. S., Damen, T. C., Gossard, A. C., Wiegmann, W., Wood, T. H., and Burrus, C. A. (1985d). Phys. Rev. Lett. 53,2173. Miller, D. A. B., Chemla, D. S., Damen, T. C., Gossard, A. C., Wiegmann, W., Wood, T. H., and Burrus, C. A. (1985a). Phys. Rev. B 32, 1043. Miller, D. A. B., Chemla, D. S., Damen, T. C., Wood, T. H., Burrus, C. A,, Gossard, A. C., and Wiegmann, W. (1985b). IEEE J. Quantum Electron. QE21,1462. Miller R. C., Gossard, A. C., Sanders, G. D., Chang, Y. C., and Shulman, J. N. (1985~).Phys. Rev. B 32,8452. Miller, R. C., Gossard, A. C., and Wiegmann, W. (1985d).Phys. Rev. B 32,5443. Miller, B. I., Schubert, E. F., Koren, U., Ourmazd, A,, Dayem, A. H., and Capik, R. J. (1986a).Appl. Phys. Lett. 49, 1384. Miller, D. A. B., Weiner, J. S., and Chemla, D. S. (1986b).IEEE QE-22, 1816. Miura, N., Iwasa, Y.,Tarucha, S., and Okamoto, H. (1985).In “Proceedings of the 17th International Conference on Physics of Semiconductors,” San Francisco, 1984. (D. J. Chadi and W. Harrison, eds.) Springer, Berlin, p. 359. Monfroy, G., Sivananthan, S., Chu, X.,Faurie, J. P., Knox, R. D., and Staudenmann, J. L. (1986). Appl. Phys. Lett. 49, 152.
OPTICAL CHARACTERIZATION OF SEMICONDUCTOR HETEROLAYERS
177
Moore, K. J., Dawson, P., and Foxon, C. T. B. (1986). Phys. Reo. B 34,6022. Mori, S., and Ando, T. (1979).J . Phys. Soc. Japan 47, 1518. Morkoq, H.,Drummond,T. J.,Thorne, R. E.,and Kopp, W.(1981).Japan. J . Appl. Phys.20, L-913. Moroni, D., Andre, J. P., Menu, E. P., Centric, Ph.. and Patillon, J. N. (1987).Journal de physique C5, 143. Mott, N. F., and Davis, E. A. (1979). Electronic Processes in Noncrystalline Material, 2nd Ed. Oxford University Press, New York, New York. Nabarro, F. R. N. (1967). Theory of Crystal Dislocations. Oxford University Press, New York, New York. Nedorezov, S. S. (1971).Sou. Phys. Sol. State 12, 1814. Ninno, D., Wong, K. B., Cell, M. A., and Jaros, M. (1985).Phys. Rev. B 32,2700; see also Cell, M. A,, Wong, K. B., Ninno. D., and Jaros, M. (1986). J. Phys. C 19,3821. Olego, D. J., and Faurie, J. P. (1986). Phys. Rev. B33,7357. Olego, D. J., Faurie, J. P., and Raccah, P. M. (1985). Phys. Rev. Lett. 55, 328, Ong, N. P., Kote, G., and Cheung, J. T. (1983). Phys. Rev. B28,2289. O’Reilly, E. P., and Witchlow, G . P. (1986). Phys. Rev. 34,6030. Osbourn, G . C. (1985a).Superlattice and Microstructures 1,223. Osbourn, G . C. (1985b). J . Vac. Sci. Technol. A3.826. Osbourn, G . C., Schirber, J. E., Drummond, T. J., Dawson, L. R., Doyle, B. L., and Fritz, 1. J. (1986). Appl. Phys. Lett. 49, 731. Ossau, W., Jake], B., Bangert, E., Landwehr, G., and Weiran, G. (1986). Surface Science 174, 188. Penna, A. F. S., Shah, J., Pinczuk, A., Sivco, D. and Cho, A. Y. (1985a).Appl. Phys. Lett. 46, 184. Penna, A. F. S., Shah, J., Di Giovanni, A. E., Cho, A. Y., and Gossard, A. C. (1985b).Appl. Phys. Lett. 47, 591. People, R. (1986). IEEE J. of Quantum Electronics, QE-22, 1696. People, R., Wecht, K. W., Alavi, K., and Cho, A. Y. (1983). Appl. Phys. Lett. 43, 118. People, R., and Bean, J. C. (1985). Appl. Phys. Lett. 47, 322. Perry, T. A, Merlin, R., Shanabrook, B. V., and Comas, J. (1985). Phys. Rev. Lett. 54,2623. Petroff, P. M., Miller, R. C., Gossard, A. C., and Wiegmann, W. (1984). Appl. Phys. Lett. 44, 217. Petrou, A,, Smith, M. C., Perry, C. H., Worlock, J. M., and Aggarwal, R. L. (1984). Solid State Comm. 52,93. Petrou, A,, Waytera, G., Liu, X.,Ralston, J., and Wicks, G . (1986). Phys. Rev. 834, 7436. Pickett, W. E., Louie, S. G.,and Cohen, M. L. (1978).Phys. Rev. B 17, 815. Pinczuk, A., Shah, J., Stormer, H. L., Miller, R. C.,Gossard, A. C., and Wiegmann, W. (1984).Surf. Sci. 142,492. Pinczuk, A., Heiman, D., Soorgakumar, R., Gossard, A. G., and Wiegmann, W. (1986). Surface Science 170,573. Ploog, K., and Dohler, G. H. (1983).Adv. Phys. 32,285. Ploog, K., Ohmori, Y., Okamoto, H., Stolz, W., and Wagner, J. (1985).Appl. Phys. Lett. 47,384. Ploog, K., Fischer, A,, and Schubert, E. F. (1986). Surf. Sci. 174, 120. Pollak. F. H., and Cardona, M. (1968). Phys. Rev. 172,816. Polland, H. J., Horikoshi, Y., Hoger, R., Gobel, E. 0..Kuhl, J., and Ploog, K. (1985a). Physica 1348,412. Polland, H. J., Schultheis, L., Kuhl, J., Gobel, E. O., and Tu, C. W. (1985b).Phys. Rev. Lett. 55, 2610. Potz, W., Porod, W., and Ferry, D. K. (1985). Phys. Rev. B 32,3868. Priester, C., Allan, G., and Lannoo, M. (1984a).Phys. Rev. B 29,3408. Priester, C., Bastard, G., Allan, G. and Lannoo, M. (1984b). Phys. Rev. B 30,6029.
178
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
Quillec, M., Goldstein, L., Le Roux, G., Burgeat, J., and Primot, J. (1984),Journal of Appl. Phys. 55,2904. Raisin, C., Lassabatere, L., Allibert, C., Girault, B., Abdel-Fattah, G., and Voisin, P. (1987). Solid State Comm. 61, 17. Razeghi, M., Hirtz, J. P., Ziemelis, U. O., Delalande, C., Etienne, B., and Voos, M. (1983). Appl. Phys. Lett. 43, 585. Razeghi, M., Nagle, J. and Weisbuch, C. (1984). In Proceedings of Int. Symp. GaAs and Related Compounds, Biarritz, (B. de Cremoux, ed.). Adam Hilger Ltd., Bristol and Boston, Conference Series Number 74. Reynolds, D. C., Bajaj, K. K., Litton, C. W., Yu, P. W., Masselink, W. T., Fisher, R., and MorkoC, H. (1984). Phys. Rev. B 29,7038. Reno, J., Sou, I. K., Wijewarnasuriya, P. S., and Faurie, J. P. (1986a). Appl. Phys. Lett. 48, 1069. Reno, J., Sou, 1. K., Faurie, J. P., Berroir, J. M., Guldner, Y., and Vieren, J. P. (1986b).Appl. Phys. Lett. 49, 106. Reynolds, D. C., Bajaj, K. K., Litton, C. W., Yu,P. W., Singh, J., Masselink, W. T., Fischer, R., and MorkoC, H. (1985).Appl. Phys. Lett. 46,51. Rogers, D. C., Singleton, J., Nicholas, R. J., Foxon, C. T., and Woodbridge, K. (1986). Phys. Rev. B 34,4002. Roth, A. P., Sacilotti, M., Masut, R. A., DArcy, P. J., Watt, B., Sproule, G. I., and Mitchell, D. F. (1986).Appl. Phys. Lett. 48, 1452. Ruckenstein, A. E., Schmitt-Rink, S., and Miller, R. C. (1986). Phys. Rev. Lett. 56,504. Ruden, P., and Dohler, G. H. (1983). Phys. Rev. B 27,3547. Ryan, J. F., Taylor, R. A., Tuberfield, A. J., Maciel, A., Worlock, J. M., Gossard, A. C., and Wiegmann, W. (1984). Phys. Rev. Lett. 53, 1841. Sai-Halasz, G. A., Chang, L. L., Welter, J. M., Chang, C. A., and Esaki, L. (1978).Solid State Comm. 27, 935. Sakaki, H., Arakawa, J., Nishioka, M., Yoshino, J., Okamoto, H., and Miura, M. (1985a). Appl. Phys. Lett. 46, 83. Sakaki, H., Tanaka, M. and Yoshino, J. (1985b).Jap. J . of Appl. Phys. 24, L-417. Sanders, G. D., and Chang, Y. C. (1985). Phys. Rev. B31,6892. Sauvage, M., Delalane, C., Voisin, P., Etienne, P., and Delescluse, P. (1986). Surface Science 174, 573. Schlesinger,Z., and Wang, W. I. (1986). Phys. Rev. Lett. 33,8867. Schmitt-Rink, S., and El 1, C. (1985).Journ. of Lumines. 30,585. Schmitt-Rink, S., El 1, C., Koch, S. W., Schmidt, H. E., and Haug, H. (1984).Solid State Comm. 52, 123. Schulman, J. N., and Chang, Y. C. (1981).Phys. Rev. B 24,4445; (1985) Phys. Rev. B 31,2056. Schulman, J. N., and Chang, Y.-C. (1986). Phys. Rev. B 33,2594. Schulman, J. N., and McGill, T. C. (1979). Appl. Phys. Lett. 34, 663; (1981). Phys. Rev. B23, 4149. Schuurmans, M. F. H., and t’Hooft, G. W. (1985). Phys. Rev. B 31,8041. Shah, J., Pinczuk, A., Stormer, H. L., Gossard, A. C., and Wiegmann, W. (1984). Appl. Phys. Lett. 44,322. Sham, L. J. (1986). Surf. Sci. 174, 105. Shanabrook, B. V., and Comas, J. (1984). Surjace Science 142,504. Shen, H., Parayanthal, P., Pollack, F. H., Tomkiewicz, M., Drummond, T. J., and Schulman, J. N. (1986).Appl. Phys. Lett. 48,653. Shirber, J. E., Fritz, I. J., and Dawson, L. R. (1985). Appl. Phys. Lett. 46, 187. Singh, J. (1986). Appl. Phys. Lett. 48,434; (1986).J. Appl. Phys. 59,2953. Singh, J., Bajaj, K. K., and Chaudhuri, S . (1984). Appl. Phys. Lett. 44,805.
OPTICAL CHARACTERIZATION O F SEMICONDUCTOR HETEROLAYERS
179
Skolnick, M. %,Tapster, P. R., Bass, S. J., Pitt, A. D., Apsley, N., and Aldred, S. P.(1986).Semicond. Sci. Technol. I, 29. Smith, D. L., and Mailhiot, C. (1986). Phys. Rev. B 33,8345; (1986). Phys. Rev. B33,8360. Smith, D. L., McGill, T. C., and Schulman, J. N. (1983).Appl. Phys. Lett. 43, 180. Sooryakumar, R. (1986). IEEE Quant. Electron. QE-22, 1645. Sooryakumar, R., Chemla, D. S., Pinczuk, A,, Gossard, A. C., Wiegmann, W., and Sham, L. J. (1985). Solid State Commun. 54,859. Stein, D., Ebert, C., von Klitzing, K., and Wiemann, G. (1984). SurfJci. 142,406. Stern, F. (1986).Surjace Science 174,425. Stern, F., and Das Sarma, S. (1984). Phys. Rev. B 30,840. Stern, F., and Schulmann, J. (1985). Superl. and Microstr. I, 303. Stolz, W., Fujiwara, K., Tapfer, L., Oppolzer H., and Ploog, K. (1985). In GaAs and Related Compounds. Biarritz. Institute of Physics Conference Series number 74. Adam Hilger Ltd, Bristol and Boston, page 139. Stormer, H. L. (1980). In Proceedings of the ISthInt. Conf. Physics of Semiconductors. Kyoto. (1980).J . Phys. Soc. Japan49, Supp. A, 1013. Stormer, H. L., Schlesinger, Z., Chang, A,, Tsui, D. C., Gossard, A. C., and Wiegmann, W. (1983). Phys. Rev Lett. 51, 126. Sturge, M. D. (1962).Phys. Rev. 127,768. Takagahara, T. (1985). Phys. Rev. B 31,6552. Tamargo, M. C., Hull, R., Greene, L. H., Hayes, J. R., and Cho, A. Y. (1985). Appl. Phys. Lett. 46, 569. Tamargo, M. C., Nahory, R: E., Meynadier, M. H., Finkman, E.. Sturge, M. D., Huang, D. M., and Ihm, J. (1987). Private communication. Tanaka, K., Nagaoka, M., and Yamabe, T. (1983). Phys. Rev. B 28,7068. Tanaka, S., Kuno, M., Yamatomoto, A., Kobayashi, H., Mizuta, M., Kukimoto, H., and Saito, H. (1 984). Jap. J . of Appl. P hys. 23, L-427. Tanaka, K., Sakaki, H., Joshino, J., and Furuta, T. (1986). Surjace Science 174,65. Tejedor, C., and Flores, F. (1978).J . Phys. C 11, L-19. Tejedor, C., Calleja, J. M., Meseguer, F., Mendez, E. E., Chang, C.-A,, and Esaki, L. (1985). Phys. Rev. B 32,5303. Temkin, H.,Alavi, K., Wagner, W. R., Pearsall, T. P., and Cho, A. Y. (1983).Appl. Phys. Lett. 42, 845. Temkin, H., Panish, M. B., Petroff, P. M., Hamm, R., Vandenberg, J. M., and Sunski, S. (1985). Appl. Phys. Lett. 47, 394. Tersoff, J. (1984a).Phys. Rev. Lett. 52,465. Tersoff, J. (1984b).Phys. Rev. B 30,4874. Tersoff, J. (1985). J . Vac. Sci. Technol. B3, 1157. Tersoff, J. (1986). Phys. Rev. Lett. 56,2755. Trebin, H. R., Rossler, U., and Ranvaud, R. (1979). Phys. Rev. B 20,686. Tsang, W. T., and Schubert, E. F. (1986). Appl. Phys. Lett. 49,220. Tuchendler, J., Grynberg, M., Couder, Y., Thome, H., and Le Toullec, R. (1973). Phys. Rev. B8, 3884. Van der Merwe, J. H. (1972).Surface Science 31, 198. Vifia, L., Collins, R. T., Mendez, E. E., and Wang, W. I. (1986). Phys. Rev. B 33, 5939. Voisin, P. (1983).Thise de Doctorat, Paris. Unpublished. Voisin, P. (1984).Surf. Sci. 142,460. Voisin, P. (1986). In Heterojunction and Semiconductor Superlattices (G. Allan, G . Bastard, N. Boccara, M. Lannoo, and M. Voos, eds.). Springer Verlag, Berlin, West Germany, p. 73. Voisin, P. Unpublished.
180
G. BASTARD, C. DELALANDE, Y. GULDNER AND P. VOISIN
Voisin, P., Bastard, G., Gonzalves da Silva, C. E. T., Voos, M., Chang, L. L., and Esaki, L. (1981). Solid State Comm. 39, 982. Voisin, P., Guldner, Y., Vieren, J. P., Voos, M., Maan, J. C., Delescluse, P., and Linh, N. T. (1983). Physica 1178 and 118B, 634. Voisin, P., Bastard, G., and Voos, M. (1984a).Phys. Rev. B 29,935. Voisin, P., Delalande, C., Voos, M., Chang, L. L., Segmuller, A., Chang, C.-A., and Esaki, L. (1984b).Phys. Rev. B 30,2276. Voisin, P., Maan, J. C., Voos, M., Chang, L. L., and Esaki, L. (1986a).Surf. Sci. 170,651. Voisin, P., Voos, M., Marzin, J. Y., Tamargo, M. C., Nahory, R. E., and Cho, A. Y. (1986b).Appl. Phys. Lett. 48, 1476. Voisin, P., Voos, M.,Tamargo, M. C., Nahory, R. E., and Cho, A. Y. (1986c),Surf. Sci. 174,615. von Klitzing, K., Dorda, G., and Pepper, H. (1980).Phys. Rev. Lett. 45, 494. Wang, W. I., Mendez, E. E., and Stern, F. (1984).Appl. Phys. Lett. 45,639. Weiler, M. H. (1981).In Semiconductors and Semimetals Vol. 16 (R. K. Willardson and A. C. Beer, eds.). Academic Press, New York, New York, p. 119. Weimann, G., and Schlapp, W. (1985). Appl. Phys. Lett. 46.41 1. Weimann, G., and Schlapp, W. (1986). In Two-Dimensional Systems: Physics and New Devices (G. Bauer, F. Kuchar and H. Heinrich, eds.). Springer-Verlag, Berlin, West Germany. Weiner, J. S., Chemla, D. S., Miller, D. A. B., Wood, T. H., Sivco, D., and Cho, A. Y. (1985).Appl. Phys. Lett. 46, 619. Weisbuch, C., (1977). These d’Etat, Orsay. Weisbuch, C., Miller, R. C., Dingle, R.,Gossard, A. C., and Wiegmann, W. (1981). Solid State Comm. 37,219. Welch, D. F., Wicks, G. W., and Eastman, L. F. (1983).A p p l . Phys. Lett. 43,762. Welch, D. F., Wicks, G. W., and Eastman, L. (1985). Appl. Phys. Lett. 46,991. White, S., and Sham, L. J. (1981).Phys. Rev. Lett. 47,879. Wood, T. H., Burrus, C. A,, Miller, D. A. B., Chemla, D. S., Damen, T. C., Gossard, A. C., and Wiegmann, W. (1984). Appl. Phys. Lett. 44, 16. Wu, G. Y., and McGill, T. C. (1985). Appl. Phys. Lett. 47,634. Xu, 2. Y., Kreismanio, V. G., and Tang, C. L. (1983).Appl. Phys. Lett. 43,415. Yafet, Y., Keges, R. W., and Adams, E. N. (1956). J . Phys. Chem. Solids 1, 196. Yamanaka, K., Fukunaga, T., Tsukada, N., Kobayshi, K. L. I., and Ishii, M., (1986). Appl. Phys. Lett. 48,840. Yamanishi, M., Kan, Y., Minami, T., Suemune, I., Yamamoto, H., and Usami, Y. (1985). Superlattices and Microstructures 1, 1 1 1. Yamanishi, M., Usami, Y., Kan. Y., and Suemune, I. (1986). Surf. Sci. 174,248. Yang, S. R. Eric, Broido, D. A. and Sham, L. J. (1985). Phys. Rev. B 32,6630. Yu, P. W., Chandhuri, S., Reynolds, D. C., Bajaj, K. K., Litton, C. W., Masselink, W. T., Fisher, R., and Morkoq, H. (1985). Solid State Comm. 54, 159. Zucker, J. E., Pinczuk, A., Chemla, D. S., Gossard, A. C., and Wiegmann, W. (1983). Phys. Rev. Lett. 57, 1294. Zucker, J. E., Pinczuk, A,, Chemla, D. S., Gossard, A. C., and Wiegmann, W. (1984). Phys. Rev. B 29, 7065.
ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS.VOL . 72
Dimensional Analysis JOSE F. C A R I ~ ~ E N A Departamento de Fbica Tedrica Facultad de Ciencias Universidad de Zaragoza Zarayoza ( S P A I N )
and M A R I A N 0 SANTANDER Departamento de Fisica Tedrica Facultad de Ciencias Universidad de Valladolid Valladolid ( S P A I N )
1. Introduction . . . . . . . . . . . . . . . . . . . . I1. Conventional Dimensional Analysis . . . . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . B . Physical Quantities . . . . . . . . . . . . . . . . . C . Quantities and Real Numbers . . . . . . . . . . . . D . From Empirical Quantities to Formal Concepts . . . . . . E . Physical Algebra. . . . . . . . . . . . . . . . . . F . Dimensional Analysis . . . . . . . . . . . . . . . . . 111. The Mathematical Foundationsof Dimensional Analysis . . . . A . Axiomatics for the Physical Algebra . . . . . . . . . . B. Dimensional Dependence and Units . . . . . . . . . . C. The Group of Unit Changes and the Gauge Group . . . . . D . Dimensionless Products . . . . . . . . . . . . . . . E. Functions in the Physical Algebra . . . . . . . . . . . F . Then-Theorem . . . . . . . . . . . . . . . . . . G . The Grmp-Theoretical Meaning of the n-Theorem . . . . H . Remarks on the Application of the n-Theorem . . . . . . IV . The Physical Meaning of Dimensional Analysis . . . . . . . A . The Dimension Group of a Theory . . . . . . . . . . B . A Detailed Study of an Example . . . . . . . . . . . V . Kinematic Groups and Dimensional Analysis . . . . . . . . VI . Dimensional Analysis and Symmetries of Differential Equations .
. . . . . 182 . . . . . . 183 . . . . . 183
. . . . .
186
. . . . . . 186
. . . . .
. . . . .
. . . . 188 . . . 190 . . . 193
. . . . 199
. . . . 199 . . . . . . 204
. . . . . . 206 . . . . . . . . .
. . . .
. . . .
. . . .
208
. . 210
. 212 . . 212 . . 215
. . . . . . 216 . . . . . . 218 . . . . . . 221 . . . . . . 226 . . . . . . 234
181 Copyright 4:) 1988 by Academic Press. Inc . All rights of reproduction reserved . ISBN 0-12-0 14672-X
182
JOSk F. CARIRENA AND MARIAN0 SANTANDER
VII. Appendix . . . . . . . . A. GroupTheory . . . . . B. Differentiable Manifolds . . C. Lie Algebras and Lie Groups References . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . 245 . . 245 . . 247 . . . 251 . . 255
INTRODUCTION
Dimensional Analysis, hereafter shortened to D.A., has been an appealing field throughout this century and has motivated a great deal of papers and books. The interest in D.A. has been maintained and there have been a good number of recent papers on the subject. We feel that the group-theoretical aspects have not been emphasized sufficiently and that an updated presentation of D.A. from a group theoretical perspective, including references to those recent contributions, may be useful for a great number of mathematicians and physicists. Our aim is therefore neither to give an exhaustive set of references on the subject, nor to give a complete description of every aspect of D.A., but rather we will restrict our study to the setting usually known as D.A., leaving aside many interesting relationships with other fields of physics, such as renormalization, critical phenomena, (see e.g., Stanley, 1971; Ma, 1973 and 1976; Stevenson, 1981) and metrology (Petley, 1985) etc. Anyway, we apologize for the possible omission of references to previous relevant works. As a general guideline, we have tried to insist on the group theoretical and structural aspects of D.A. and not on the development of examples that can be found scattered in most of the publications on D.A. This review is organised as follows: The next section contains a descriptive exposition of Conventional Dimensional Analysis which is of a rather simple character. We have tried to give a more systematic explanation of concepts, attempting in this way to cope with some existing feelings of a metaphysical nature about D.A. In Section 111, which is of a formal mathematical structure, we introduce an axiomatization of the physical algebra and develop the connection between relevant concepts in D.A. and the underlying group theoretical aspects. In particular, we interpret the celebrated ll-theorem. The physical meaning of D.A. is analysed in Section IV with special emphasis on the differences between the group of dimensions of a theory, a well defined object, and a particular dimensional structure of the physical algebra that can be used to obtain information in a concrete problem through the ll-theorem. A detailed study of an example is given for illustrating this point. In Section V we show how the knowledge of a symmetry group of the theory may be useful in determining its correct dimension group. Both geometric and kinematic group examples are discussed in this approach. Another traditional application of D.A. has been the reduction of some ordinary and partial differential equations. For the sake of completeness, we
DIMENSIONAL ANALYSIS
183
give in Section VI a sketch of the general theory of the symmetry of differential equations and systems with examples referring to scaling symmetries. In order for the paper to be more selfcontained, we have added some appendices with some mathematical properties and definitions needed for a full understanding of the review and are not yet in the background of every physicist. We hope, however, that most of the paper may be profitably followed without a deep knowledge of these mathematical properties and definitions.
11. CONVENTIONAL DIMENSIONAL ANALYSIS
A . Introduction The idea of dimension and the birth of Dimensional Analysis is traditionally ascribed to the Baron Joseph Fourier in his work on the Analytical Theory of Heat (Fourier, 1822)but, as usually happens, traces of it in a more or less explicit way are found much earlier. Galileo had remarked that geometrical similarity cannot be simply extended to mechanics, in the known example of the resistance of two geometrically similar bones made of the same material. Newton (1686) went further ahead in his remarks on Dynamical Similarity. Maxwell also contributed-the standard notation [f] for the dimension of the physical magnitude f is due to him-to Dimensional Analysis which was subsequently developed around the turn of the century by many people, e.g. Vaschy (1892), Reynolds, Rayleigh (1915), Buckingham (1914, 1915), Einstein (1911) and Bridgman (1922). For an account of the history of Dimensional Analysis, the reader may consult Macagno (1971)and Martins (1981). Some peculiar features of the application of Dimensional Analysis of a given problem were a subject of controversy, the most famous being the one held between Rayleigh (1915) and Riabouchinski (1915). The question was about the application of D.A. to the Bousinesq problem on the heat transfer by convection between a solid and the fluid in which it was placed. The point was the following: If the nature of heat as energy is not taken into account and heat is treated as an independent “dimension”, an expression for the rate of heat transfer is obtained; but as a matter of fact, if the same “dimension” as energy is given to heat-whereas one could naively expect a more precise answer as a consequence of the greater insight in the nature of heat borrowed from statistical physics-the answer given by Dimensional Analysis is only a partial answer, not incorrect but less detailed. This example has very often been expounded with various explanations. (e.g. Drobot, 1953; Brand, 1957; Sedov, 1959; Gibbings, 1980).
184
JOSk F. CARIRENA AND MARIAN0 SANTANDER
Without entering into a discussion on these arguments, they clearly showed that the use of Dimensional Analysis in a given concrete problem should not be reduced to a strightforward mechanical application of a symbolic calculation without a bearing on the physics of the problem. With such a blind calculation not only incomplete but perhaps erroneous and even nonsensical results would be obtained. Dimensional Analysis has a very extensive literature. Without attempting to be exhaustive, we may quote the books by Bridgman (1952), Duncan (1955), Focken (1953), Huntley (1952), Ipsen (1960), Kurth (1978), Langhaar (1951), Massey (1971). Palacios (1964), Pankhurst (1964) Saint Guilhem (1971), San Juan (1947), and Sedov (1959) as well as the monographic issue 292 (6) of the Journal of the Franklin Institute (1971).There are also a great number of brief articles dealing with a more or less punctual application of Dimensional Analysis to a given problem, as well as many interesting contributions to its mathematical structure, which will be referred to in due course. In short, it is an established branch of the foundations of physics. A consensus on the basic questions on Dimensional Analysis has not apparently been reached, and there remains some feeling against it as being of some cloudy, even methaphysical nature. In fact, it is not easy to find any publication on D.A. without any more or less explicit reference to this situation. Furthermore, a great deal has been written under the heading “Dimensional Analysis” being but of a rather vague nature. This particularly applies to some discussions on the “fundamental” dimensions, their number and relations. In most cases, these discussions are rather ad-hoe justifications supporting the writers’ particular positions, rather than true expositions of theoretical constructions using some clearly stated and accepted principles. Anyway, the physicists instinct has always been able to recognize in the ideas commonly used under the heading “D.A.” a useful guideline in all the stages of development of any theoretical and/or practical study, and these ideas have been freely used from a pragmatic point of view. However, as remarked by many writers, the pragmatic attitude can be dangerous with regard to ideas. In fact, one can operate with dimensions before knowing what they are, and perhaps obtain correct and interesting results; in the same manner, it is also very easy to obtain different, meaningless, or even plainly wrong answers for a given concrete problem. In any deductive system, obtaining a wrong answer is always a consequence of a flaw at some stage in the argument, which may not be imputed to D.A., for example neglecting a relevant variable. In general, a flaw is more easily run into if the structure behind the Dimensional Analysis is left in darkness and the dimensional symbols are used without regard to their meaning. Here are some basic questions that cannot be meaningfully discussed until a sensible understanding has been attached to the objects of D.A.:
DIMENSIONAL ANALYSIS
185
Is Dimensional Analysis a physical theory that can be used to make predictions on real phenomena? Or, is it only a structural aspect of other physical theories without any independent well-determined meaning? Is there any meaning ascribed to the dimensions, to their relations and to its number?, Or is this problem just a matter of convention? Does Dimensional Analysis deal with a passive aspect of the description of phenomena, such as the problem of changing the units used to describe it together with the numerical values of the quantities, or does it deal with the active aspect of resulting results for two different natural systems, such as an actual airplane and its model in a wind tunnel? Are the two alternatives in the preceding question equivalent? If we try a rough classification of the reasons for this stage of things, we can recognise:
(a) The word “dimension” carries many different meanings and confusion may result if either some of them are inadvertently mixed or if we use it without a proper elucidation of its meaning in the particular context. In part, these difficulties are of a semantic nature, but they also arise as a consequence of avoiding the explicit use of a formal model for the algebraic operations between physical magnitudes, a so-called “physical algebra”. (b) Another potential source of confusion is the fact that in the conventional expositions of D.A. there is a considerable overlapping between at least two very different kinds of problems, theoretical and metrological ones. This is paralelled by the lack of the necessary distinction between the definition of the nature of a physical quantity and the definition of its measure. In this context, dimensions are linked with units, this link being one face of the story, but by no means the whole story, as we shall see below. (c) The most frequent use of Dimensional Analysis is made in physics, applied science or engineering in the study of a particular problem. One looks for the obtention of a relation between given magnitudes relevant to a particular problem. But one sometimes can find other uses of D.A.. For example, when studying the relation of relativity to its classical counterpart, the conditions of validity of the former are stated as: “all relevant physical quantities with dimensions of speed, whether velocities, ratios of energies to momenta, ratios of electrical to magnetic fields, etc., must be small when compared to c” (Levy-Leblond, 1977).This relates to the eventual meaning of such and such dimensional relation in the context of a given theory, that is, in all the situations covered by the theory, and it is unnecessarily rigid to enforce the use of the same model in these two extreme cases. All these difficulties are related and have been recognised for a long time. For the most part, adequate treatments are scattered in the literature. It has
186
JOSB F. CARIRENA AND MARIAN0 SANTANDER
been pointed out (e.g., Massey, 1971) that perhaps the main single reason for the existing difficulties and uncertainties stems from the failure to distinguish between “physical” algebra and “ordinary” algebra, and hence one may expect that the explicit use of a physically sound mathematical model of this “physical” algebra would be helpful. This is the algebra ordinarily used for all calculations in physics or engineering, from its more elementary level. Strange as it may seem, the literature on this kind of model is not extensive, though it includes very interesting contributions (the most recent ones being by Bunge, 1971; Houard, 1981; Kurth, 1978; Szekeres, 1978; and Withney, 1968) which deserve to be more widely known. Thus, we begin with a brief and informal exposition of the ideas on the physical algebra and Dimensional Analysis which will them be made much more precise at a later stage. B. Physical Quantities
Physical quantities are abstractions such as length, time, energy, electric charge, temperature, and also more complex conceptual constructs, such as potential, gravitational tidal field, action, probability, etc. As remarked by Einstein, all physical quantities are abstractions freely invented, in some cases not quite close to the experience. Physical laws are expressed as relations between these quantities. These physical quantities are classified into species, the quantities in a given species being susceptible to comparison by means of an “equality”, i.e., a given equivalence relation. The specific nature of the equivalence relation used depends on the quantity under consideration and on the particular theory or situation being analised, and could be essentially settled in two different ways-either operationally or theoretically. Anyway, this comparison criterion should include all the required properties for an equivalence relation and explains the relevance of group theory in the theory of quantities. C. Quantities and Real Numbers
The first formal and comprehensive theory of quantities was that of Eudoxus. Having been adopted by classical greek mathematics originally, it was exposed by Euclid in the Elements (see e.g., Bourbaki, 1969). In this theory, a very careful distinction between the quantities of a given species and the ratios of magnitudes, which act as operators on the quantities, was made. In modern terms (Bourbaki, 1969), Eudoxus’ theory considers a species of quantities as a system with an internal composition law (addition), and an external composition law, whose operators are the quantity ratios which are themselves an abelian multiplicative group. The axioms used by Eudoxus,
DIMENSIONAL ANALYSIS
187
which include in particular the so-called axiom of Archimedes, lead necessarily to the real numbers system and allow an identification of the real numbers with the ratios of whatever quantities. In the middle of the XVI century, Bombelli pointed out that once one unit of length has been chosen, there is a one-to-one correspondence between lengths and ratios of lengths, so that one can define an algebra on the lengths themselves opening the way with this true “physical algebra” to all the further progress through the identification of real numbers with points in a “geometrical line”, whose merit is usually credited to Descartes. The theory of Eudoxus was rescued from oblivion by Barrow, and its further development gave rise to our actual construction of the well-known real number system. We only wish to underline two main features: (a) Our real number system is the crown of a long process started with the theory of measurement by Eudoxus in which the idea of a real number-in naive terms, a symbolic label to be attached to the ratios of magnitudes, or after Bombelli and Descartes, to the points of a straight line that could be manipulated according to the rules of arithmetic-was the final result. (b) The axiom of Archimedes is essential to the construction but from the point of view of measurement of quantities, it cannot be taken for granted. The success of real numbers for almost all kinds of measurements-after an adequate reduction has been performed-could lie simply in their capability for use rather than their consequential “universal” significance. Nevertheless, it is by now clear that real numbers (or, some of its subsets) are basic objects in all our present theories, this being more literally true when one moves from speculative “fundamental” physical theories to “on-thee a r t h problems in applied physics, engineering, or modelling, and biology. In all of these branches, a fundamental though elementary fact that underlies the use of mathematics in physics, or more generally, in any applied science, is that a physical quantity can be represented by or identified to a number only after a choice of a unit of measurement and a zero point. For most quantities, this number is a real number, although there exist quantities which can only take on some discrete values obtained by counting. There are also quantities associated with more complex conceptual constructions, ranging from vectors, tensors,. . . , to fields or to linear operators in a Hilbert space, having an associate probability distribution on each state such as the quantities in Quantum Mechanics. In all these cases, real numbers appear as an essential ingredient, and in the last instance, quantities can be reduced to real numbers. For example, an ordinary vector in physical (Euclidean) R3space is considered as a quantity having a magnitude (a real number) as well as a direction and an orientation as a directed line in space (to be described in vector algebra, or better, in geometric algebra where it would be considered a directed number).
188
JOSE F. CARIfiENA A N D M A R I A N 0 SANTANDER
D. From Empirical Quantities to Formal Concepts
In general terms, the quantites that are considered physically important and meaningful are formal concepts, which ought not to be confused with empirical quantities-quantities defined with reference to the result of other measurements. This difference is essential indeed, but is not always clearly stated. As pointed out by Taylor and Wheeler (1966),it is not possible to define in physics the concepts related by a physical law in an independent manner, but “all the physical laws have this subtle character, in that they implicitly define the terms used and relate them”. In fact, empirical quantities must be related to “corresponding” formal concepts, but this relation may not be a simple equality. The concept of speed has been discussed clearly by Levy-Leblond (1 980) in this context. Usually one says that the average speed of a material body is defined as the quotient of the space length traversed by the body in the time interval, and from this idea, one is led to the instantaneous velocity through a standard argument not discussed here. There are three different ways to find a numerical measure of speed (say of a train) through this idea: space lengths and time intervals both can be measured either in an external inertial reference frame (lengths along railtrack, synchronized clocks at the stations), or one in an external reference frame with the other in the reference frame of the moving body (lengths along railtracks, time as measured by the watch of the passenger), or one can even consider measures to be made only in the body’s own reference frame (changes in velocity measured by an adequate accelerometer, coupled to a time integrator). According to classical mechanics, i.e., Galilean relativity, the three definitions always give the same result. This is not the case in relativity-the first speed is the ordinary velocity (bounded by c), and the latter, the so-called rapidity that is becoming more frequently used in relativity textbooks, is additive (for motions in the same direction) and it has a geometric interpretation as the minkowskian angle between two world lines and is physically suitable, for it measures only the velocity changes caused by external agents (LeCy-Leblond, 1980). The previous discussion can easily be translated to Euclidean geometry where one can envisage three numeric ways of measuring “how tilted a line is-the ordinary slope with respect to an “horizontal”, the inclination (height gained divided by the length along the line), and finally, the integral of the curvature along the line. Only this last concept fits exactly to the ordinary angle, and is undoubtedly the correct formal concept in this problem. Geometry as the oldest physical theory offers a great variety of examples that are useful because they refer to situations where our knowledge and intuition have been established for a long time. For example, the angle is thought to be a physical quantity that is defined as the quotient between the lengths of an arc of circle and that of its radius. This procedure actually
DIMENSIONAL ANALYSIS
189
defines a way of measuring angles and is invoked when one says that the unit of angle so obtained, the radian, is a derived unit and that the quantity itself, the angle, is dimensionless, because its numerical value is independent of the unit used in the measurement of the lengths concerned. One apparently naive query could then be the following: what lengths are we talking about? If one takes the equally naive answer, namely, that they are the “actual” lengths of the “actual” arc of circle and of its radius, we run into a difficulty. After all, “actual” space-that is to say, space as measured by (perfect) rods-could not be exactly an Euclidean space but a Riemannian space with nonzero curvature, in which the concepts of length and angle will still have a well defined meaning, but where the length of an arc of circle is not proportional to its radius. If we stick too literally to our previous definition, we would be stuck by a number of unpleasant features. For instance, for a fixed unit of length, the measure of a complete turn would be different from 2n, and even worse, the value of this angle of a complete turn will depend on the radius of the circle used. Confronted with this situation, the wise man of the Cassimir parable (Casimir, 1968), knowing differential geometry would say: “Angle (the formal concept) is a quantity which measures “how apart” are two lines which intersect at a point: this holds no matter whether the space is Euclidean or not. But your original definition of the angle through (the empirical concept of) its measure as a quotient of lengths is too restricted, and only matches with the correct one if the space is exactly Euclidean, a question which is to be experimentally determined. If not, you either need another “definition” for the measure of the angle, to be provided by some adequate relation between the true angle and the magnitudes which you can experimentally measure, which holds independently of the space curvature, or continue using your definition, but remembering that the number you will obtain in this way must be “corrected” to supply the true angle. And you could continue speaking of angles and studying their relations with other magnitudes, but you will need to remember that strictly speaking, angle is not a quotient of two lengths, but an independent concept that in some particular cases bears a simple relation to that quotient”. It is also worth while to remark that the “definition” of angle as a quotient of lengths is a natural extension of the first of our numerical measures in the previous paragraph. Yet, here it runs into a difficulty, whereas a definition adapted from the third numerical measure, would do the job required in a new and “good” definition. There is a twofold lesson to be drawn from these intentionally simple examples: (a) The “definition” of a derived quantity in terms of measurements makes ordinary use of some laws that hold in a given theory, but perhaps
190
JOSB
F. CARIRENAAND MARIANO SANTANDER
not in another one. Hence its literal use could lead to a good measure of a magnitude in the former case but not in the latter, although the quantity itself continues to be significant in the latter. These questions lead to a comparison of the meaning of quantities having the same name in two different theories, by no means a trivial problem, especially where quantum mechanics is concerned. (b) Although we can envisage all the quantities afforded by the different ways to turn an empirical concept into a formal one, not all are, equally meaningful, and in general, it is not wise to promote all of them to formal concepts of theory. In our example, this is plainly clear, but other examples, using for instance temperature, can also be used. E . Physical Algebra
In physics, as well as in engineering or applied science, numbers more often appear as the values of various physical quantities, and physical laws are expressed as relations between these quantities. In principle, thus, these relations should be expressed independently of the settlement of units. However, the ordinary way of writing the physical laws is as numerical relations between the magnitudes of the quantities, once a set of units for all quantities has been chosen. It is plainly clear that the algebra of physics is something more than the algebra of pure numbers. Strange as it may seem, the literature on this “physical algebra” is not extensive, its shortage being more or less compensated by the fact that for the physicist an equation such as h = 1/2 gt2 deals with symbols which carry a concrete physical interpretation. When the magnitudes of h, g , t, are entered into the relation, they are not treated as pure numbers 1.056,0.81,etc, but as expressions as 1.056 m, 43 lo3 kg, etc, that is, a real number times unit. The arithmetical computations to be performed when using any physical expression to calculate a specific value of a given quantity -say in a given experimental setting- has two facets: first the purely arithmetical computation on all the real numbers as if units were not present, and then a symbolic computation on the units carried out by making use also of the formal rules of ordinary algebra. Hence, more complex expressions as 9.81 m/s2, 6.68. lo-* m Kg s - l will arise. Every such expression is to be considered as a possible value of some physical quantity, which may, or perhaps may not, have a known physical meaning. Therefore, in order to be able to use physical algebra, we must be able to interpret symbols like m Kg s - l as possible units for some physical quantities. The procedure used to interpret symbols is commonly called the settlement of derived units from primary ones. This aspect of the physical algebra concerned with the settlement of derived units is thus directly linked with a set
-
DIMENSIONAL ANALYSIS
191
of physical laws, accepted as the correct ones in a given domain. For example, if space is believed to be Euclidean, a law of physics is that the area of a square is porportional to the square of its side. If arbitrary area units have not been chosen previously, we may well take the symbol m 2 to stand for such a unit. The implicit interpretation is that m 2 is the area of a square of one meter side, the unit of length. This interpretation cannot be universal, but is only on secure grounds if space is Euclidean; if the space is of non-zero curvature, the symbol m 2 could also be interpreted as a unit of area, for it definitely does not refer any longer to the area of a square of length unit, because in this case, not only the area of a square is not proportional to the square of its side, but even worse, there are no squares at all! Although theoretically inadequate, this interpretation could prove to be of a sufficient accuracy for most practical purposes in a specific range. Physical relations appear thus as equations in a physical algebra that can be reduced to the pure numbers algebra if concrete units have been previously chosen for all quantities concerned. For example, our physical law (in a theory which assumes space to be Euclidean) A x 1’ between the physical quantities, becomes the form A = k l 2 for the values of the corresponding magnitudes. Thus, k appears as a universal constant which must be experimentally determined and has a numerical value which depends on the units chosen for area and length respectively. A change in the units of the concerned quantities (as a set) would, in general, but not always, change also the numerical value of k in the expression above. Just as the use of two different reference frames generally changes both the numerical and formal expression of physical laws but does not if both frames are inertial, change of all units concerned in a problem generally changes the numerical relation between its values but does not in some particular cases. In our previous example, it suffices that the new unit of area be ,I2 times the old one, if the new unit of length is 1 times the old one. Of course, the first manifestations of this phenomenon were recognised as such in (Euclidean) geometry, for then any change in the unit of length leaves invariant the numerical relations between lengths and angles, whereas a change in the unit of angle does not share this property and a change in the unit of area leaves invariant a numerical relation between lengths, angles and areas only if accompanied by a change in the unit of length restricted in the way described above. The preceding statements refer to the purely numerical relations between the values of the length and the area, and its invariance under some changes of units. But as often occurs, there is a way of writing an equation so that it is formally invariant under all changes of units: it is to consider k on the same footing as the real physical variables A and I, letting its numerical value depend on the set of units used in such a way that the relation is numerically correct no matter what set of units is used. But be warned, the statement that now this
192
JOSE F. CARIfiENA AND MARIAN0 SANTANDER
equation is valid for all possible sets of units is a tautological one, because k was adjusted from the beginning just in order for this to happen. Dimensional Analysis provides the concepts needed to describe how the invariance of an equation under changes of a set of units is realized. To each quantity-generically, Q-appearing in a physical law, it assigns a symbol, to be called its dimensional symbol, or simply its dimension, and to be denoted [Q]. Then, using the rules of a formal multiplicative algebra, any monomial expression in the physical algebra is translated into expressions with the dimensional symbols for the corresponding quantities, and if a monomial equates another, its corresponding dimensional symbols satisfy also the same relation. It is well known that this procedure has a strong conventional component. For our present example, A = k 12, which quantities are the physical ones? There are two extreme possible answers. We could say that area and length are here physical quantities, susceptible to variation, but not k which can be taken as an expression of a fixed property of the natural world. Instead, we could well maintain that k, being an abstraction, must be ranged with A and I as a physical quantity of a special type. If we adopt the first answer, we obtain the relation between dimensional symbols [ A ] = [ I ] whereas if we stick to the second one we have [ A ] = [ k ] [ l ] ’ . What is the meaning, if any, of these relations? Let us start with the second equation: according to the way k was introduced in order to ensure the numerical invariance of the relation A = k ’I under arbitrary changes of the units of area and length, the relation [ k ] = [ A ] [ I ] - ’ could be interpreted as a mnemonic of the way in which k changes its numerical value when a change in the units of the physical magnitudes A and I is performed, if [ A ] is to be interpreted as the scale change factor of the measures of areas, etc . . .,in an arbitrary change of the relevant units, and, in particular, as a reminder of the structure of the changes which leave numerically invariant the value of k. This interpretation is clear and welldefined, but due to the tautological nature of the initial expression, it seems difficult to accept any further significance for this dimensional equation. From this viewpoint, the first relation must be interpreted as follows if we wish the same numerical relations between the measures of 1 and A to hold for various systems of units, the changes of the units of area and length cannot be made in an independent and arbitrary way but instead must be related in a way which depends on how the units have been defined. That is, one can take as unit of area the area of a rectangle, one of its sides being of unit length, the other being of a standard fixed length. For the measures, there is also the equation A = k 12, but now the changes of units of area and length which leave numerically invariant k are those corresponding to a multiplication by the same factor. Every such instance is merely a particular case of the preceding one which was a more general dimensional equation.
.
’,
-
.
DIMENSIONAL ANALYSIS
193
There is also another aspect to be considered. One could say, at least as far as the physical significance ascribed to areas and lengths is concerned, that both of these quantities are of a different kind. Then, [A], [I] could be interpreted as symbols for types of quantities, and the relation between such symbols as a description of the way in which the relevant quantities enter into the physical algebra. If one adopts this interpretation, the relation [A] = [I]’ appears as the more natural, reflecting a fact of nature, or of a theory which we take as a fact of nature in the range of validity of the theory, and from this viewpoint, the introduction of [ k ] as a “type of quantity”, although formally possible and consistent, is not particularly enlightening. From this simple introduction, two different ways to give a well-defined interpretation of the dimensional symbols appear to take shape. Both ways are meaningful, possible, and useful when considered on their own. But if both interpretations get mixed, the end product is very likely to be unreliable and/or misleading. An example is provided by plane angles. There has been a continued discussion about the status of plane angles from the viewpoint of Dimensional Analysis, which is echoed by its special placing in a category of special units by the SI International Conference. Angle measure obviously depends on the units used, and from this, if we use the first interpretation, we could treat it as a dimensional magnitude. But Euclidean geometry predicts a simple relationship between angle and length measures, namely the proportionality between angle and the quotients of arc and radius lengths, that could be used to set a particular definition of the unit of angle, the radian. Once a unit of length is chosen, there is no necessity of a separate, independent unit for angles. Furthermore, a change in the unit of length has no effect on the numerical value of the measure of the angle, so we could say that angles are dimensionless. But some people feel uneasy about this situation saying that torque and energy as physically different concepts are then dimensionally identical and favour a dimensional assignation to angles, for then [torque] [angle] = [energy], the way in which these concepts enter into physical algebra. More complex examples arise in Electromagnetism; the arguments are in general a consequence of mixing two different kinds of ideas about the Dimensions, and they are not to be expected to shed any additional light on the problem.
-
F. Conventional Dimensional Analysis
As remarked before, ordinarily D.A. is used to find a form of relation between relevant variables in a given concrete problem. In fact, for some authors, D.A. is nothing more than this.
194
JOSE F. CARIRENA AND MARIAN0 SANTANDER
We present here a schematic presentation of the steps usually followed in such an application. This exposition is used as our starting point of a discussion of the relation of these ideas with more formal settings. Hence, this is only a brief reminder of Conventional Dimensional Analysis, and for the time being, we will use here the current “physical” terminology without trying to clearly show the mathematical structures behind these recipes.
(i) The first thing needed is an identification of all the quantities concerned in the problem at hand. These could be both real physical quantities as well as physical constants, including constants expressing physical properties of particular objects or characterizing whole classes of physical phenomena and universal constants. (ii) A set of units having been set up for all the physical quantities in (i), any relation between these quantities will be expressed as a numerical relation between the corresponding values. But if this equation accounts for a relation between physical quantities, it must be independent of the particular units used. Like the relativity principle, this is an assumption of physical nature, and it is ordinarily translated through an invariance requirement to be made on the equations (between real numbers) which are thought of as genuine expressions of the physical laws. In its simplest form, the requirement to be made is: “numerical equalities of the values of quantities are to be preserved under arbitrary changes of units of whatever quantity appearing in the problem”. (iii) In practice the units for all physical quantities are not defined in an independent manner, but instead one chooses-in a more or less arbitrary way-the units for a set of the physical quantities, Q1,Q2,...,Q,, and then the units for all other quantities X are defined by means of some well defined procedures. Without exception, all such procedures are based on the existence of a relation incorporated by a given theory, between the quantity X and the basic quantities Q1,Qz,. . .,Q,. A fundamental point to be emphasized here is that this possibility of using derived units has a meaning only when the underlying relation has been incorporated in a theoretical frame. It is also worthy of remark that although one could imagine “pathological” ways of writing possible relations, the ones which are physically relevant tend to be deceptively simple. (iv) The search for relations satisfying the invariance requirement in ii) is made in Dimensional Analysis through the following, symbolic procedure: (a) To each quantity P relevant in the problem under consideration we associate a symbol, [PI, which for the moment is just a mnemonic device. “Like quantities” must have the same symbol. (b) These symbols are supposed to generate a formal algebraic multiplicative system in which expressions such as [P][Q] and [PI‘ are allowed-
DIMENSIONAL ANALYSIS
195
r being an integer or a rational number. In particular, there is an identity element, to be denoted 1, which satisfies i[P] = [PI, etc.. . . (c) All the symbols so obtained are not considered independent. Instead, for each relation defining a unit for a quantity X, a “dimensional relation” between the symbols [XI and [Q,], [Q2], . . .,[Q,] is associated by replacing in the definition of X the quantities X, Q1, Q 2 ,...,Q,, by its corresponding dimensional symbols and using the rules of the formal multiplicative algebra. If the relation between X and Q is complex, this step will present some difficulties, but as the defining relations are simple in actual cases, there is no need of worry. (d) To each constant relevant to the problem, a dimensional symbol is also assigned following the same procedure as in c). As remarked at the end of point ii), this is only meaningful if all these constants have previously been incorporated in the structure of a given theory, for only then can one make use of this translation procedure. (e) Finally, any relation between the magnitudes appearing in the problem must be invariant-see iii) above-under changes of units for all the quantities. Owing to the way the units have been chosen, this holds true provided it holds for arbitrary changes in the units for the basic quantities Q1, Qz, . . ., Q,,. The search for such relations, called unit free relations, can be carried out using a direct method, known as the Rayleigh indicia1 method, or using a systematic method, known as the so-called ll-theorem. This theorem appears in the literature under many different formulations, to be discussed below, but essentially it says that any relationship such that the requirement in iii) holds may be rewritten as a relationship among . . ,nq,of the quantities P,, P,, P,, and of the some combinations n,, nZ,. constants C,, Cz,.. .C,, whose dimensional symbols are equal to 1. As an example of application of D.A. along these lines, we give the rederivation by Taylor (1946) of the relation r = t2/5(E/p~)”5f(r),
between the radius at time t of a spherical blast wave produced by releasing a quantity of energy E a t time t = 0 in air of density po and index of politropy y. The first stage a) is implemented by assuming that t , r, po, E and y are the relevant quantities. The numerical relation searched for will be of the form t
= &r, Po, E, 7).
We choose for b) the ordinary set of units, with length, time and mass as primitive quantities and units for energy and density derived as usual. The dimensional symbols of our variables are then easily shown to be [ t ] = T, [ r ] = L , [ p o l
=
M K 3 ,[ E l = ML’T-’, [y] = 1.
196
JOSE F. CARIRENA AND MARIAN0 SANTANDER
The Rayleigh method, or indicia1method, consists of looking for the values of the exponents ai in an equality between r and a monomial in the remaining quantities, t a l , p?, Ed’, y“‘, which makes this equality a unit free relation. These conditions on the exponents can be directly obtained by “solving” the symbolic equality
Crl = Etl“’C P o l a ” ~ l a ” ~ l a 4 or ~1 T O M O
= ~ ( - 3 ( a 2+ Z ( U ~ ) ) ~ ZI -a
d ~ ( a +2a d
from which we obtain the set of linear equations
-3a,
+ a3 = 1
al - 2a3 = 0 a,
+ a3 = 0
whose solution is a1 = 2 / 5 , ct, = - 1/5, ct3 = 1/5 while a4 is arbitrary. Hence, any monomial ( t 2 E / p o ) l i 5 y nn, arbitrary, can be equated with r in a unit free relation, and one may expect to obtain the correct relation giving r as a function oft, E , po, y, as a sum of monomials with such structure, obtaining an expression r = (t2E/~o)”5f(~),
f being some unspecified function. The method based on the II-theorem gives the same result. It is a systematic method of search for all dimensionless monomials in the quantities of the problem. There is, in general, an infinite number of such monomials, which can however be expressed in terms of a finite number of basic, independent monomials. The theorem of Buckingham, or ll-theorem, gives the number of basic independent dimensionless monomials, and an algorithmic procedure to generate a set of them. In our case, there are two basic dimensionless monomials,
n, = y,
n, = ( r5 p o2 / t 2E 1,
The assertion of the ll-theorem is that any unit free relation among the variables, say F(r, t , E , po, y) = 0, is equivalent to a relation between a complete set of basic ll monomials, say, G(Hl, n,) = 0, from which the expression r = (t2E/P0)”5f(Y) is recovered. The use of Dimensional Analysis using this pattern has led to a great variety of applications in various fields. But through many standard and
DIMENSIONAL ANALYSIS
197
common examples it is clearly seen that there are many peculiar properties of such analysis which may give “an amount of information dependent on the skill and the experience of the analyst” (Bridgmann and Sedov, 1974). Some points deserve remarks. The application of this procedure does not require any knowledge of the meaning of dimensional symbols. In the context we have presented, there is, however, a particularly adequate interpretation which makes the content and meaning of the symbolic calculations almost transparent, [XI = [QJ“’ [ Q 2 I a 2..[Q,]”” . being a code for telling us how the numerical value of the quantity X changes under particular changes of the units of the basic quantities Qi. This interpretation (Langhaar, 1951)has the great value of being well-defined and has explicitly been adopted by some authors who have proposed abolishing the term “dimension” in favour of the expression “measure formula” (Duncan, 1955) no doubt a better term for this aspect of the problem. Unfortunately, there are still many other interpretations, more or less explicit, for such symbols, and it is not evident at all whether and how all these interpretations are in some sense equivalent. For example, in addition to the former interpretation, [XI may be taken, either only as a mnemonic device in order to perform the procedure indicated above, either as a symbol for a unit of the quantity X, or as a symbol for a “species” of physical quantities, that is a class of comparable quantities. For example, quantity of heat can be unambiguously compared to energy, so that both can be ranged in the same class. A glance at various publications on Dimensional Analysis will quickly reveal that all these interpretations are being used, independently or worse, in a mixed form. Second, if the relations between dimensional symbols, implicitly assumed in conventional D.A., arise through the essentially arbitrary definition of derived units, one can hardly expect them to have any deep significance. This is true, but one must remember that it is through the incorporation-under the form of laws-of some physical relations that any definition of derived units becomes possible. Hence, dimensional relations so derived reflect in some entangled way properties of the natural world which are incorporated in the theory. However, there is little profit in discussing the possible meaning of dimensional relations so obtained until a formal scheme has been developed. As we can expect, the question “what is a dimension?” has more than one reasonable answer. Third, on the practical side, it is possible to obtain different answers for a given problem when apparently following all the prescriptions given above. In particular, this can happen if different interpretations of the dimensional symbols get mixed. The standard example is the aforementioned resolution by D.A. of the Bousinesq problem on the heat transfer by convection between a
198
JOSk F. CARIRENA AND MARIAN0 SANTANDER
solid placed in a fluid and which originated the Rayleigh-Riabouchinski controversy. The warning in iv.a) that “like quantities” must have the same symbol is linked with this controversy. Another source of confusion lies in the overlapping of a problem of practical metrology, such as the definition of primitive and derived units, with the complete listing of all constants relevant to the problem, through the inadvertent use of the so-called Zanzibar units (see e.g., Petley, 1984, p. 8). All these difficulties together have created the belief that D.A. is an unreliable device. Among others, Bridgman and Sedov (1974) have rightly pointed out that having different answers for a given problem does not mean necessarily that they are incorrect, but instead, that they are only partial solutions, ranging from trivial ones to an almost complete resolution. There are two different levels in which one can try to clear the formal structure of Dimensional Analysis. The first one is to give a sensible interpretation to the symbolic calculations one performs in D.A. when applying it to a concrete given problem, assuming that to every quantity, dimensional symbols have been assigned, but without showing how these assignations have been made. As has been known for a long time, this interpretation is made in terms of linear algebra (Brand, 1957; Curtis et al., 1982; Drobot, 1953; Hulin, 1980). Here, a most adequate and almost mandatory interpretation of dimensions is the one corresponding to the change of the derived units when a change in the primitive units is performed. The symbolic calculations are then easily shown to be a direct transcription of the general invariance requirement of the unit free relation. This part of the problem is a purely mathematical one, and the best treatments are those that do not obscure its content with imprecise physical terminology. The second level refers to the structure which underlies the assignment of dimensional symbols to the quantities in the problem at hand. This has a deep bearing in physics and has also to do with the mathematical models of the “physical algebra”, and in the last term, with group theory. The assignment of dimensional symbols to every quantity is “solved” by conventional D.A. in a “universal way”, that is, in a way which does not depend on the specificity of the problem, through the recourse to an all purpose comprehensive unit system. Once this assignment has been made in this or another way, the remaining mathematical structure is essentially linear algebra. We propose a separate consideration for these two aspects, as we feel its mixing is the root of many misconceptions commonly encountered. There are many relations between both aspects which will become clear as we proceed. For the most part, these ideas can in no way be presented as original, but the persistence of misunderstandings shows that they are not as widely known as they ought to be.
DIMENSIONAL ANALYSIS
199
111. THEMATHEMATICAL FOUNDATIONS OF DIMENSIONAL ANALYSIS
We now discuss the first level referred to. In this part, we shall discuss the mathematical results, as the Buckingham’s l2 theorem, which arose in the mathematical structure underlying the physical algebra and whose meaning is usually obscured by an overdue dependence on an imprecise “physical” terminology. We shall try to give a brief, but self-contained, exposition of the main results. We shall particularly emphasize the group theoretical aspects of the problem, whose role is not always properly appreciated and helps in placing the results of Dimensional Analysis in a proper perspective. Historically, the n-theorem-the key result in Dimensional Analysisseems to have appeared in the work of Vaschy (1892), but it was Buckingham (1914) who gave a proof for some special cases and introduced the current terminology. Most books on Dimensional analysis contain proofs of the ntheorem, sometimes under apparently different statements and with different levels of rigour and scope. These classical proofs were clarified and improved by Birkhoff (1950), and later by Langhaar (1951), Drobot (1953) and Brand (1957). Even in recent times, Carlson (1978)and Curtis et al. (1982) gave neater presentations; both treatments are recommended.
A . Axiomatics for the Physical Algebra
The set of all physical quantities, or at least a set including all the physical quantities relevant in a problem or theory, and perhaps others demanded by reasons of mathematical completitude or coherence, has a structure commonly not made explicit but sanctioned by long practice. A basic problem in the foundations of the subject is to give a suitable axiomatization of the physical algebra which would lead to a complete description of that structure. There are many attempts of such axiomatization, in general very similar although they differ in small technical points. It is not our aim here to give a complete exposition, and the interested reader may consult the works by Drobot (1953), Brand (1957), Whitney (1968), Krantz et al. (1971), Kurth (1972), Carlson (1978b), Szekeres (1978), and Houard (1981). There is an evident and nontrivial overlap of the structure of this physical algebra with the existence of some laws of nature which allows such a structure to be built. But it seems better to start with a description of the structure generally agreed upon for the set of all physical quantities, and then to take that structure as the basic frame in which one can develop Dimensional Analysis. The main features of the description are formally independent of the particular physical laws to be
200
JOSE F. CARIRENA AND MARIAN0 SANTANDER
taken into account, so that the developed formalism has a greater degree of flexibility from which one can find profitable the solution of a particular problem. Leaving aside some technical details, there is satisfactory agreement in the description of such structure. Maybe the moral of all these works is that, contrary to a widespread, yet implicit, belief, an intrinsic description in terms of the physical quantities themselves is possible, and it allows us to see how the key property of the functions used in Dimensional Analysis, namely the dimensional homogeneity, or unit-free character, arises as a natural expression of a particular group invariance of the intrinsic description in terms of functions of the physical quantities. The physical algebra adequate to a particular kind of problem-in mechanics, electrodynamics, maybe biology, economics, etc.- will be denoted by @. It includes physical quantities, to be denoted by capital letters, X,Y , .. . . In every field of application, the attention will be limited to some quantities, which are assumed to have been defined with reference to a theoretical frame or by means of some operational procedure. The relevant physical algebra will include these quantities, as well as a great number of other quantities, in many cases lacking a sensible direct physical interpretation. In addition to the set of these quantities, there is also a comparison criterion for them. That criterion is not necessarily an absolute one, valid everywhere once defined. On the contrary, it is better for theoretical purposes, and in many cases a general practice, to allow for some flexibility in the particular choice of the criterion. Whereas the ultimate motivation for the use of different criteria may come from disparate roots, the point is that the structure in the set @ is always of the same kind, no matter what the particular comparison criteria are. As a consequence of the criteria, all quantities may be considered as classified by the equivalence relation “being comparable” and therefore classed into “species”. As two extreme examples, one may take a comparison criterion for lengths and time intervals that is not artificially imposed but physically sound, in view of the Einsteinian relativity theory (of course such a criterion is also possible, but physically non-natural in non-relativistic physics). At the other end, one could consider a criterion that does not allow a comparison between horizontal and vertical lengths, because they may play different roles in the context of a given particular situation, for example, in a homogeneous gravitational field, as is done consistently in practice in aircraft. Hence one may envisage the “species” of lengths, of energies, or the “species” of horizontal lengths, vertical lengths, angles in vertical planes, etc. All the elements in a given species are supposed to be unambiguously comparablewithin the chosen criteria and within a given species, the elements can be added or multiplied by positive (or arbitrary) real numbers-which appear as an external set of operators in each class.
DIMENSIONAL ANALYSIS
20 1
Furthermore, in addition to this rather restricted form of composition, there is a product of quantities, which is defined for quantities in the same class or in different classes, and a ‘‘rising to a power operation”. The treatment given to powers is recognized as the weaker part in all axiomatizations, and it is mainly there where the different treatments differ (e.g. rising to rational powers, real powers, etc). As a matter of fact, as pointed out by Krantz et al. (1971), in classical mechanics, from all possible expressions maup,only mu and mu2 appear to be physically meaningful, but not m113 or u’. However, there is no axiomatization taking this fact into account. We emphasize that in no way is @ to be considered as an “universal” physical algebra but instead, one must consider @ as being adapted to a particular kind of problem or theory, having as an essential building block a specific comparison criterion. However, although the specific structure of different 0’s will differ, their algebraic structures are always of the same kind. Definition: A physical algebra, or @-algebra,is a set with a composition law called product and denoted multiplicatively, and an external “composition law” (maybe not well defined everywhere), with rational (or real) numbers as external operators, called power and denoted exponentially, such that:
(i) The set of real numbers is a subset of Q, and the product and powerswhen defined-of real numbers as elements of @ coincide with the usual operations in the real field. (ii) XY= YX,(XY)Z=X(YZ), l X = X , f o r a l l X , Y , Z ~ @ w h e rIeE R . (iii) XpXq = X p + 4 for , all X E 0, p , q E Q, whenever Xp, X 4 and Xp+qare defined. (iv) Whenever Y # OY, XY = Y is equivalent to X = 1 E R.
If X = XU,with x E R, a multiple aX of X is the physical quantity aX = (ax)U. Any particular species is the subset of all real multiples of a fixed element in 0,and such a subset is called a “ray”-biray in Whitney (1968) terminology-and as a subset of @ will be denoted 4. Notice that this ray includes the multiple by 0 E R of any element in it, and is therefore not quite the same as the rays in projective geometry. This ray can be considered as a point in the set of all classes, and on this set, denoted S,the two operations in @ induce a structure of linear space (over Q or over R), usually written multiplicatively, with 11 denoting the null vector. The structure of the linear space 3 will depend both on the choice of the relevant quantities and on the comparison criterion, but the fact that S is a linear space holds independently of these details. In the following, we shall assume that 3 has a fixed structure of linear space, and we will denote ( ) the canonical map ( ): Q, -+ Z which carries each quantity X E @ to it class (X), considered as a point in E.The “kernel” of this map, that is, the subset of
202
JOSE F. CARIRENA AND MARIAN0 SANTANDER
elements in CD going to the neutral element 3L E % is the subset of real numbers considered as physical quantities. As we have pointed out, there are many minor differences between the different axiomatizations, and we are not attempting to give a thoroughly complete discussion. A moment of reflection will convince the reader that these ideas capture the main features implicit in the ordinary usage in physics of the rules of manipulation of the calculus with physical quantities. The set 0 of all physical quantities is therefore like 0 = R x %, its elements being pairs (x, 4), with x E R and 4 E %, 4 describing the species of the quantity and x its amount. For each species 4, any particular element U E 4, U # OU, can be taken as a unit, and in terms of it any element X in Q is written as a multiple X = X U of U, the real number x being called the “measure” of X relative to the unit U. Thus, the selection of a system of units is equivalent to giving a “section” for the projection ( ), with the proviso that it does not meet any element Z with Z = OZ. But one does not actually choose the units for all the classes in an arbitrary, independent way. Instead, one usually chooses units in such a way that the expression of laws in the given theory be “simplest”, and this involves arbitrary conventions, whose physical basis can be more or less sound. In rough terms, these conventions are of two kinds. First, for some quantities, there exist natural standards with reference to a given theory, and one may or may not define a unit in these species in order to give to the standard some specific but arbitrary value. Second, for other quantities for which there is no natural standard and maybe for those of the first kind for which we prefer not to take into account the existence of a natural standard, it is advisable to have some simple relation between units for quantities which are related by physical laws, and hence, related in the physical algebra. The simplest example of such a situation is furnished by Euclidean geometry when considered as a physical theory. The physical quantities are lengths, angles and areas. There is a natural standard for angles, for instance the angle between the two half-rays in a straight-line, but there are no standards for length and area. In this sense, angle can be thought of as being dimensionless and a unit for this quantity obtained by fixing forever the value of the standard, for instance the number K. But we also can deliberately forget about the existence of this standard and assign a unit for this species in an arbitrary way, admitting therefore arbitrary changes, the angle being then considered as a dimensional quantity. Among all quantities usually dealt with in elementary physics, the angle is the only one having such a natural standard in a manifest way and this fact has given rise to misconceptions on the nature of angle dimensions: either we can take into account the existence of the standard and choose it as a natural unit and then the angle turns out to be a dimensionless quantity, or we simply treat angle as an ordinary extensive and additive quantity, the angle becoming then
DIMENSIONAL ANALYSIS
203
a dimensional quantity. The point we want to stress here is that the existence of a standard in Euclidean, non-Euclidean and even Riemannian geometry, is the fact that allows the angle to be considered dimensionless and not the property-specific to Euclidean geometry-that the angle can be measured as the quotient of two lengths. Were the space geometry non-Euclidean, this last definition would break down. This example is taken from geometry, but one could as well take the example of speed in relativity, for “light speed” is a physical standard. Nothing prevents us of course from not using this standard to fix the corresponding unit, but in this case the price we would pay is the appearance of a universal constant. Let us consider first the set of physical quantities for which standards exist and have been used for setting a corresponding set of units (fixed according to some convention). If X and Yare in this subset, with units V ( X )and V( Y), one may naturally extend this set of units to the rays of X Y and X p , as U ( X Y ) = V ( X ) U ( Y )and U ( X P ) = ( V ( X ) ) ”We . shall denote 0,the subalgebra of @ which contains all elements for which units have been fixed for ever in this way, and we denote Z, the vector subspace of Z whose points are the corresponding classes. We now consider the quotient vector space V, = E/Zo. As follows from the linear space theory, E is isomorphic to the direct sum, a direct product in our multiplicative notation, of Eo and V,. This means that we can choose another subset of the physical algebra, say @’, having the property of being a subalgebra of @ and an isomorphism of the set of classes of 0’with V,. This choice is not canonical, but in the following, we will assume that a fixed choice has been made in order to forget about the existence of 0,.The composition of the class projection ( ) with the canonical projection E + V, is denoted [ 3 and corresponds to the conventional idea of dimension. We remark that the kernel of [ ] is the subalgebra @, and includes now the real numbers as a subset of @andalso some other quantities, each in its own class different from that of real numbers, which are conventionally termed dimensionless. This rather involved treatment is needed if we wish to give a consistent distinction in the conventional presentation between “dimensionless quantities” and real num bers. In the following, V, is assumed to have a specific structure of finitedimensional linear space, and we shall always assume that V, is the linear space of classes of a subalgebra a’. In this context, we discuss the ordinary concepts in Dimensional Analysis. The traditional way has been through the settlement of a set of units. The interpretation of the units so obtained as actual units for their corresponding quantities depends, eventually, on a set of physical laws implicitly contained in the structure of V,. Here, we are exclusively concerned with its formal description.
-
204
JOSE F. C A R I ~ ~ E NAND A MARIANO SANTANDER
B. Dimensional Dependence and Units
We now define two basic concepts in the theory, that is, dimensional independence and fundamental set of units. Conventionally, D.A. is linked with both concepts, although as will be seen in the next section, one can meaningfully discuss some problems of D.A. without any reference to units. Dejinition: A set { X , , .. . ,xk} of k elements of 0 is said to be dimensionally dependent if the vectors [ X , ] ,. . . ,[xk] are linearly dependent in V,. Otherwise, they are said to be dimensionally independent. That is: { X I , ,. .,xk} are dimensionally dependent if there exist rational numbers c,, . . .,c k ,not all zero, ci E Q, such that X c l . ..X c kE 0,.Otherwise, the set { X I , .. . , X , } is said to be dimensionally independent. DeJinition: A fundamental set of units {Ul,.. . ,Urn)for elements in the subalgebra Q,’ such that
Q,
is a set of m
(i) { U , , . . . , Urn}are not dimensionally dependent. (ii) For each rational number a, V ; is defined for all the units Ui. (iii) For each element X E @, the set { X , U , , . . ., Urn}is dimensionally dependent. We notice that for every X E 0,its class ( X ) has a unique decomposition (depending of the isomorphism of Vo and the set of ( ) classes of Q,’) as a product (X) = ( X , ) ( X ’ ) with ( X , ) E Zo and ( X ’ ) E V,. In the class ( X , ) E Z,, there is a well defined unit, to be denoted simply U,(Xo). Now the last requirement in the preceding definition means that for every X E Q,, there exists a unique set of elements V , E Q,,, x E R and m rational numbers . .,U;mU,(X,). The uniqueness follows a,, . . .,amr such that X = XUL;~,. from the factorization ( X ) = ( X , ) ( X ’ ) to which does not correspond a unique factorization X = X,X’ with X , E 0,and X ’ E Of,but both factors are only defined up to a real number. In fact, for each possible factorization of X with X , = x , U ( X o ) and X ’ = x ‘ U ( ; ’.. . Uamm, only the product x = xox’ is uniquely defined. The product UL;l,.. .,Vim U,(X,)may be considered as a unit obtained from the set of fundamental units { U1,.. . ,Urn]in the ray of X . So, the accepted linear space structure of the set Z gives us a well-defined description of the structure of a “coherent” set of units for all rays, although this does not provide by itself an interpretation for these units. Actually, most rays in never appear in physical theories, and their units, although formally defined, lack any reasonable interpretation. The real number x, the measure of X relative to the unit in its ray, is also called the measure of X relative to the set of
.
DIMENSIONAL ANALYSIS
205
fundamental units { U , , . . ., U,,,}, and the column vector (a,,, . . , a J E Q“ is called the dimension vector of X relative to the set { U , , ..., Urn).In the conventional way of writing, each symbol [ U , ] stands for a (canonical) basic vector of Q“, and the dimension vector of X is hence written (remember the multiplicative notation) as [ U , ] ” ’ . . . [Urn]“-.The [V,] are called “basic” dimensions, the name being more appropriate than one could expect, because they are in fact a basis in a linear space. The explicit use of the linear space structure and the linear algebra language actually makes most results in D.A. almost transparent. The appearance of the factor U,(X,) in the definition of the derived unit for X is noteworthy. It will distinguish units for two classes of elements differing in a factor which had been declared dimensionless. Hence a complete set of units, that is, a set of units for all rays is obtained from: (i) the specification of the quantities for which we have used standards for selecting “natural” units, stated in terms of some conventions, and (ii) a fundamental set of units { U , , . . . , Urn}.Once such a unit system has been selected, one can define a map 3:Q, + R mapping every element X E Q, into its measure relative to the complete set of units. As we shall assume that the conventions which underlie the settlement of units in Q0 are left fixed, even though implicit, we will refer to x as the measure of X relative to the fundamental set of units { U , , . . . , U k } . That map satisfies the following conditions: (i) (ii) (iii) (iv)
X(x) = x for all real numbers x
ER F ( X Y ) = X ( X ) 3 ( Y )for all X , YE Q, 9 ” ( X p )= ( X ( X ) ) pfor all X E Q,, p E Q T ( X ) = 0 if and only if X = OX.
Any map 3:0 + R satisfying these conditions is called a gauge on 0. Any set of fundamental units determines a gauge in a well-defined way. Conversely, one may take the gauge concept as a primitive one. Assume that we have a map .f:Q, -+ R satisfying the four conditions i) to iv) above. We may associate to every X # OX in Q, an element U,(X) in such a way that we have an identity X = 3 ( X ) U , ( X ) . Furthermore, if X , and X , are in the same class ( X ) , but are not equal to the zero element in their class, that is, one is a multiple of the other and we have X , # OX,, X , # OX,, it is easy to see that U,(X,) = U,(X,) and U,(X,) may be taken as a unit in the ray of X . Naturally enough, we also may set &(OX) = U,(X) for all X # OX. As a consequence of ii), if a ray Z is the product of two, X and K the unit in the ray of Z obtained by means of that procedure, U,(X Y )equals U,(X) U,( Y ) .Thus, the concept of gauge leads to the same idea of a coherent system of units as our preceding method based itself on a set of fundamental units.
206
JOSE F. CARIRENA A N D MARIAN0 SANTANDER
C . The Group of Unit Changes and the Gauge Group
Once the background for the idea of a system of units has been given, whether through a set of fundamental units or through a gauge, we have to discuss the allowed changes. Let us discuss first the case of sets of fundamental units. We assume that the conventions defining the units in Q0 are maintained. Accordingly, the “direct product” structure Q, = (Do x Q,’ will be maintained. From the definition of fundamental sets of units, and using elementary results in the theory of linear spaces, the most general change (provided that 0’is fixed) of a fundamental set of units to another one is easily proven to be given by ( U l , . . . , UJ+(U>,*.*,U & ) ,
u; = /$up. up.,,
u ; m i
(1)
with real numbers li > 0 and B = (b,) a rational matrix with det B # 0. Under such a change, the measure and the dimension vector of X transform as x’ = X / y ’ l , .
. .,P ,, m ,
(ui,.. .,a&)‘ = B(u,, . . .,aJ.
This transformation law displays the fact that the sign of the measure of a quantity X does not change under a change of units. For some applications, it is convenient to restrict our study to the “positive” part of the ray, selected by the choice of a unit system. The corresponding subset 0,will be called the positive part of Q, and depends on the initial choice for the unit system, but thereafter will be independent of a later change of units. The set of all changes of units is of course a group. This group receives, as is the case in geometry, two possible interpretations-a passive one, in which ones studies the transformation laws of the measures and dimension vectors of a fixed set of physical quantities when one changes the system of units, and also an active one, with a change of units considered as a transformation of the set into itself. When considered this way, the structure of the group is easily ascertained. In fact, we can regard a change of units as a transformation c: Q, + Q,, which maps each ray into itself and carries the old unit Ui into the new unit U ‘ i .This way every change of units (induced by a change of fundamental units) generates a mapping c: 0 + Q, satisfying the conditions
(i) (ii) (iii) (iv) (v)
c ( X Y ) = c ( X ) c (Y) c ( X p )= ( C ( X ) ) ~ c ( X ) = X for all X E a0 = ( X > c leaves invariant the positive part of 0.
DIMENSIONAL ANALYSIS
207
A transformation with these properties is called a similarity of the physical algebra. One can easily show (cf. Krantz et al., 1971), that a similarity is completely described by the images of a set of fundamental units, which being in the same ray, must be simple multiples c(Vi) = A,&, with the A, (i = 1,. . ., m = dim V,) positive real numbers. If X has dimension vector (a,) relative to { V ,,..., Urn),then c ( X ) = A;’, ..., AkmX. In this way, we can identify the group of similarities of the physical algebra with R”. The same conclusion could also be reached through the use of the set of fundamental units. Suppose that we have a set of fundamental units, { U , ,. . . , Urn},and the units in all rays obtained from them and from the fixed units in the subalgebra Q,,, U,. We ask for the most general form of another set of fundamental units { V,, . .. , V,} which gives rise to the same set of units for each ray. Simple algebra shows that the y’s are given in terms of a rational regular matrix B = (b,) as v;l = U ; l i , ... , V imi. The set of all such changes is a group, and invariance under such a group is conceptually the proper way of expressing the independence of a law of the unit system used in its formulation. But, it is pertinent to recognize the great amount of conventionality in this group. In fact, as remarked before, we are free to either use existing standards or not. If not used at all, for the same set of physical quantities, the part of the algebra we have called 0, reduces to the real numbers, and the linear space E becomes identified with V,, this way, the number of elements in a fundamental set of units becomes larger, and hence the group itself becomes larger. But the price we pay will be the reappearance of the standards as a number of universal constants. The peculiar position of angle in Dimensional Analysis can be traced to the fact that, in contrast to length, time or mass, for which no direct standards exist-at least in elementary physics-for angle, a direct standard does exist. Carlson has pointed out that from the conceptual viewpoint, the proper setting for this discussion is the consideration of general transformations of units, which gives no undue prominence to any particular system of “fundamental dimensions”. However, in view of the previous result, there is no loss of generality in considering only the pure scale changes, (B = l), and this explains why the apparently greater generality afforded by the consideration of general changes is not actually relevant for the results obtained. Of course all the preceding discussion could be framed in terms of gauges andl CiY define a unique real function a: Q, and gauge changes. Two gauges ? R by CiY(X) = a ( X ) X ( X ) .Obviously, a is a function satisfying a ( X ) a ( Y )= a(X Y ) ,V X , Y E 0,a ( X p )= ( U ( X ) )with ~ , a ( X ) > 0, a(x) = 1 for all x E R. The set of all these gauge changes forms an Abelian group under the natural composition law (a/?)(X)= a ( X ) / ? ( X )From . the preceding expressions, a can be considered as a function on the set E of rays, in fact a linear map of the linear space E into R. A gauge is therefore completely defined by its values on a
208
JOSk F. CARIRENA A N D M A R I A N 0 SANTANDER
set of physical quantities whose associated vectors in E generate all the space E,and these values can be ascribed independently if the corresponding set of associated vectors is a basis for E. We may restrict ourselves to changes of gauge with a ( X ) = 1 for X E (Do. Such a gauge change is completely defined by the values of a on a set of physical quantities whose associate vectors in Vo generate all the space V,. Hence, each gauge change of this kind-the ones corresponding to changes of sets of fundamental units-appears to be completely determined by rn real positive numbers. As an abstract group, the group of changes of fundamental units or group of gauge changes, called gauge group, is an Abelian group isomorphic to R". In the active point of view, once a fundamental set of units, { U , , . . .,V,}, is fixed, the element gi = (Al,.. .,A,) acts on the set @asfollows: if the dimension vector of X relative to the set of fundamental units { U , , . .., V,} is (a,, . . .,a,)*, then gl(X) = A q l . ..AimX. In the passive point of view, and for each "species", the group G acts on the set (identified with R ) of the measures of the quantities in a ray, the transformation corresponding to gi being x' = XA;"' .. . A;am. When there is no risk of confusion, we shall also denote this action as x + Six, but we warn that in this expression x does not mean simply a real number but something carrying information on the aspect of the physical nature of the quantity X described by the dimension [ X I . A set of n quantities { X I ,. , .,X,} determines a set of n measures (x,, .. . ,x,) and a m x n rational matrix A, whose i-th column is the dimension vector of the X i , relative to the fundamental set of units { U , , . .., V,}, i.e. Xi = xi I7 U:ui. The set of their measures appears identified to R" (or R:), and the action of the group G on the point x = (xl,...,x,) E R" is given by xi = XiAra'i..
.A i " m i
If one considers the apparently more general change of a set of fundamental units given in (l),then in addition to this transformation law, the dimension matrix A does also change as A' = B-'A. The rank of A is invariant under these transformations because B is regular. The transformation law (1) is for the easily reminded by the conventional expression, [V1]"Ii... [Urnlami dimension of X i . In particular, if the dimension vector of X is the vector (O,O, . . .,O)'relative to some fundamental set of units, it is also the same vector relative to any another fundamental set of units, and the measure of X is invariant under all changes of fundamental units; such an X is called dimensionless. D. Dimensionless Products The next concept is that of a dimensionless product. We shall restrict ourselves here to the subset of physical quantities whose measure is strictly positive, that is stable under changes of fundamental units.
DIMENSIONAL ANALYSIS
209
Definition: A dimensionless product of the n physical quantities Xi,i = 1,. . . , n , is a product of rational powers of the Xi’s, ll = X t l . . . X,k”, where the kiare rational numbers, whose dimension vector is the zero vector, An element of (Do will be dimensionless, but there exist other dimensionless products of elements in 0. Any product Il of rational powers of the Xi’s is therefore characterized by an element of Q“, k = (ki),and its dimension vector is easily obtained by direct substitution of the Xi in terms of the Uiwhich gives Ak as the dimension vector of the product. So, Il is dimensionless if and only if Ak = 0. The well known results of linear algebra on the general solution of a homogeneous linear equation immediately imply the basic result: “Any dimensionless product of the n quantities { X l , .. . ,X,,} can be written as a product of rational powers of n-r dimensionless products associated to any set of n-r linearly independent rational solutions of the equation Ak = 0. Any such set of dimensionless products is called a complete set of dimensionless products. In order to have an algorithmic procedure to compute a complete set of dimensionless products of the n quantities { X , , . . . ,X,,} whose dimension matrix relative to a fundamental set of units A has rank r, it is convenient to define a new matrix K , called by Carlson the exponential matrix of a set of p In - r dimensionless products Ill,. . .,lip. K is a rational n x p matrix, whose j-th column is the vector kj, associated to the j-th dimensionless product. Of course, we have AK = 0. Conversely, if K is an n x p rational matrix satisfying A K = 0, K is the exponential matrix of a set of dimensionless products. For a complete set of dimensionless products of the {Xl,. . . ,X,,}, K will be an n x ( n - r ) rational matrix with rank n - r. A possible K for such a complete set is determined following the method used by Brand (1957): Without loss of generality, and simply by adequate reordering of the X’s and U’s we can suppose that then x n dimensional matrix A whose rank is r can be decomposed into blocks A=[;
:]
P being a r x r rational matrix with det P # 0. From the fact that rank A = r, it easily follows that S = RP-’Q. In fact, as the n - r last rows are linear combinations of the first r, there will be a (n - r) x r matrix M such that M P = R , MQ = S . From the first equation, P being regular, M = R P - ’ . Then the matrix K =
[
(3)
210
JOSE F. CARIRENA AND MARIAN0 SANTANDER
does the job, although it is, of course, not the only possible solution. Due to the particular structure of that matrix, the j-th dimensionless product has the and there are no other factors with powers structure l l j = ll:= (X,)k””Xr+j, of X , with s > r. Thus, it is easy to compute a complete set of dimensionless products of the X s such that some of the xk appears only in a single dimensionless product. This is an interesting result which may make the application of D.A. easier to a particular problem in which we are actually interested in an explicit expression of a particular variable in terms of other quantities.
E . Functions in the Physical Algebra We now define the functions in the physical algebra and its representations: Definition: A @-function is any function of n physical quantities, defined in the Cartesian product of n rays $l,. .. ,4,, and taking values on a ray 4, F: 41
x
42
x
**.4, 4. +
Notice that each of the $J~, which is a point in 3,must here be considered as a set, in fact a subset of @ when appearing in the domain or the range of F. A particular case arises when the range is the set of real numbers considered as physical quantities, and there we speak of a real @-function. A gauge S having been fixed, it will establish a well defined relation between @-functions and real @-functions. For each &valued function, FT = 2 0 F is a real valued @-function. The usual practice in physics is not to work directly with these functions, but with its representations in a gauge. For the purpose of achieving a clear separation between the ordinary real numbers algebra and calculus and the physical algebra, this distinction is essential. Definition: Let S be a gauge in @ and let F be a real valued @function. The function of n real variables defined by
S(X 1 .,xn) = F ( x 1 u,(41), 3 * *
* * * 3
xn ud4n))
is called the representation of F in the gauge 2. The representation in the gauge % of a +-valued @-functionF is simply the representation of S 0 F, that is a real function of n real variables defined by F A x l > * -3xn) - = S[F(~lu,(~l),...~xnu,(~n)l,
i.e., F, = (20 F ) , (For details, see Szekeres, 1978). Now consider a relationship between the physical quantities X , , .. .,Xn and assume that it is written in terms of a @-function.In view of the preceding
DIMENSIONAL ANALYSIS
21 1
comments, there is no loss of generality in assuming that our relation is written in terms of a real @-function,as F(Xl,. . .,X,,) = 0. This equation may or may be not invariant under a gauge transformation, or under a change of a fundamental set of units. For any g E G, the transform of F under g is defined in the usual way, as ( g F ) ( g X , . gX,) 7 .
. 7
=
F(X17.. ., X,,).
The equation F = 0 determines a subgroup GFof the group of all changes of fundamental sets of units, as the group of invariance of that equation, this is g E GF if g F = F. In other words, g E GF means that the equation F(gXl, . . .,gX,)= 0 is equivalent to F ( X , , . . .,X,,) = 0. Once a gauge or a fundamental set of units has been fixed, we may consider the representation of F(X1,.. . ,X,,) = 0 to be denoted f(xl, . . . , x n ) = 0, the arguments of f are the numerical measures xi’s of Xi’s relative to the gauge 3 or to the fundamental set of units { U , , ...,U,}. The condition of invariance of the equation F = 0 under the group GFis translated as follows: ( f g )(xl,, .. ,x,) = 0 is equivalent to f ( x l , . . . ,x,,) = 0, where the function f g is defined, as before, by (f9)(gx1 7 . .
., gx,)
= f(Xl9..
. *x,,)
where the real numbers gxi are given in terms of the x i as follows: If in the parametrization of G corresponding to the fundamental system of units { V ,,..., U,} the elements are denoted as g A = (Al,. ..,A,,,), then gxi = A;”“. . . Amamixi.The equivalence of ( f e ) = 0 and f = 0 holds for all elements of the group GF. In particular, an equation F = 0, where F is a given real @-function,may be invariant under all elements in the group of unit changes. In this case, we shall say that F is dimensionally invariant, or unit-free, or dimensionally homogeneous. Under this form we recognize a particular kind of group invariance, akin to the invariance required by the principle of relativity: the functional relations between the measures of the quantities are the same, no matter what particular choices of the fundamental set of units are made. When a dimensionally invariant equation F(Xl,. . . ,X,,) = 0 is represented in some fundamental set of units, the equation f ( x , , . ..,x,,) = 0 has the corresponding invariance property under all changes of gauge. Sometimes this property is the object of a definition of dimensional homogeneity for equations whose arguments are measures, but actually, this condition is a consequence of the assumed group invariance of the original @-equation F = 0. In practice, the equations are usually written in a particular representation, and one must remember that the variables x,, . . . ,x,, in f are not merely real numbers, but are to be considered as the measures of the physical quantities X , , . . . ,X,, and hence each of the x i must be attached to a dimension vector, the
212
JOSE F. CARIfiENA A N D M A R I A N 0 SANTANDER
i-th column of the dimension matrix A of the set { X , , . . . ,X , } relative to the set of units { U , ,. .., U,}. The conditions of being dimensionally homogeneous or unit-free appears as the equivalence of the equations f ( x l , . . . ,x,) = 0 and f(gxl, . . . ,gx,) = 0, for all changes of fundamental sets of units.
F. The Il-Theorem Let us make precise the structure of dimensionally homogeneous functions. Assume we have a set of units, { U,, . . . , U,,,}. As we have seen, the most general change of a system of fundamental units is that given by Eq. (l),and is determined by the m positive real numbers liand a rational non-singular matrix B. If f ( x , , . . . ,xn)= 0 is a unit-free relation, then a simple use of the transformation laws shows that if one has f(xl, . .. ,x,) = 0, one also has
f(n;" . .g y x , , . . . , *
n;ln
. . .n;yx'')
= 0.
(4)
and conversely. This has to be an identity for all possible choices of the ,Ii. The ll-theorem is in fact a result on the structure of unit-free relations, obtained by an adequate specialization of the values of Ai,to the functional Eq. (4).We give this result in the form:
KI-Theorem: Let { U , , ..., Urn} be a fundamental set of units, and let { X , , . . .,X n } be n physical quantities, with (positive) measures {xl,. . .,x,} and dimensional matrix A, with rank A = r. If the equation f ( x , , .. . ,x,) = 0 is unit-free, then that equation is equivalent to a relation of the form f(1,. . ., 1 , ll,, . . .,ll, - r ) = 0, where the ITi are a complete set of dimensionless products of the X's. If we assume that the ordering of the X's and U's is such that A has a partition in blocks as the one given in Eq. (2), a suitable complete set of dimensionless products of the X's is the one with exponential matrix K given in Eq. (3). For a careful proof of the ll-Theorem, see, e.g., Carlson (1978) or Curtis et a!. (1982). We shall return to this result from the point of view of group theory but at this point we want to remark that under the hypothesis of the relation f = 0 being unit-free, one may choose a particular set of n values of the li,such that the transformed values of the r first x:s are equal to one and the remaining n-r transformed values of the xi equal the values of a complete set of dimensionless products. G. The Group-Theoretical Meaning of the lT-Theorem
We will next discuss the preceding results from a group theoretical perspective. Let F be a @-functiondefined in a subset of dl x - * * x d,,of @" with
213
DIMENSIONAL ANALYSIS
values in a particular ray 4. This covers the case of real functions if 4 is R . Let f : R" + R be the representation of F in a gauge X,defined by the following commutative diagram:
Rn i.e., the function f is given by
-
R
The similarity group acts on the functions F as follows: This transformation law when written in terms of the functions f representing F becomes a locally operating realization (Asorey et al., 1983) of the similarity group G, where the gauge exponent does not depend on the point x and which is determined by the dimension matrix A of the set of rays $1, . . . ,$,, and by the dimension vector a of 4. More accurately, if one chooses the parametrization (Al,.. .,A,,,)for the element of G corresponding to the multiplication of each element X with a dimension vector a by the factor A!1.. . A:m, then the locally operating realization of G is given by f y x , , . . ., X " ) = n y . . . A;mf(A;""
..,
. . . A;-xl,.
A;Bl".
..A i a m n X " ) .
For the case of a real valued @-function, the factor AT1,. . . ,AGm is identically equal to 1 and we will then obtain the so-called quasi-regular representation of G associated to the action of G on R" via the matrix A. There is no need to worry about the factor A;1.. . A i m because there exists a simpler way of making it to disappear, namely, by associating to every 4valued function F the real-valued function f = % F. This new function will depend on the gauge but no factors A y l , . . . ,itrn will appear in its transformation law, because the dimension vector of R is 0. Therefore, we have the following building blocks: an Abelian group G of similarities of the physical algebra 0,isomorphic to (R,)" as well as a representation of G in the algebra @ via the natural action of G on @. For each subset 41 x ... x $,, of 0"we may use a gauge 0 for identifying such subset with R" and we will obtain in this way a linear representation of G in R" specified by the dimension matrix A and explicitly given by 0
214
JOSk F. CARIRENA AND MARIAN0 SANTANDER
This representation is, of course reducible if n > 1 because of the Abelian nature of G. Finally, we have a @-functioninvariant under G. This is an intrinsic property and for the real function f representing F it appears as the property of being a dimensionally homogeneous function. As the ll-theorem is the result that specifies the structure of such functions, we may expect it to have a group theoretical interpretation as actually happens. In fact, if F is an invariant real valued @-function, the invariance condition F ( g X , , . ..,gX,,) = F ( X , , ...,X,,) is expressed in terms of its representation f by
f ( X I , . ..,x,)
.A m a m l X l r . . .,
= f(&.ll..
p n . .
.Amam%,),
and hence says that the function f has to be constant on each orbit of the action of the group G on R". We recall that the isotopy group of every point is defined as the subgroup of the elements leaving the point invariant. In general, points in the same orbit have different but conjugate isotopy groups. However, the group G being in this case Abelian, all points in the same orbit have the same isotopy group. Moreover, in the case we are studying it is easy to check that even points in different orbits have the same isotopy group which is made up by the elements (A,, . . . ,A,) satisfying p
i
.
.
,1 - a m i
"
=
..................... p n . .
. &amn
1
=1
This system of equations becomes a homogeneous linear system, A z = 0, in terms of the new variables zi = log Aiand it is well known from linear algebra that the general solution is a linear combination of m - r particular solutions, with r denoting the rank of the matrix A . We also recall that the orbit Gx of any point x is a submanifold of R n and as the isotopy group G , is a subgroup of dimension m- r isomorphic to R"-' the orbit will have a dimension d = dim G - dim G, = m - (m - r) = r and may be identified to R'. The point is that we can choose local coordinates in R" in such a way that the last n - r coordinates single out an orbit while the first r coordinates specify a point in such orbit. Then, the dimensionally homogeneous functions are constant on each orbit, and hence they are fully determined by their values at a single point of each orbit and may be described as functions of the space of orbits which is parametrized by the last n - r coordinates. This is actually the content of the ll-theorem: the first r variables equal to 1 select a particular point in the orbit determined by the remaining n - r coordinates that characterize the orbit. The result of the ll-theorem is then particularly clear from this grouptheoretical viewpoint and it can be ranged along with other more familiar
DIMENSIONAL ANALYSIS
215
results on functions invariant under some groups. For instance, functions invariant under rotations on physical space are characterized by the condiThe group concerned here is the group tion F(r) = F ( 9 r ) for any rotation 9. SO(3, R ) of all proper rotations. The orbits under S O ( 3 , R ) are the spheres of radius r and a set of (spherical) coordinates may be chosen such that two coordinates, namely 0 and $ specify a point in the orbit singled out by the third, the radius r. In these coordinates rotationally invariant functions are those depending only on the variable r.
H . Remarks on the Application of the Il-Theorem The Il-theorem is a result on the structure of functions invariant under a group G of changes of units. As a mathematical result, it belongs to the realm of group theory. But our precedent exposition suggests that the choice of the group G itself is, in a great part, a matter of convention, and this is to be reflected in the theoretical frame only through different structures of the vector space E and different selections of the subalgebra Oo for which units are chosen from standards and are not allowed to change. The first conclusion to be drawn is that there is no “universal” physical algebra, and hence, for the aspects of dimensions taken into account in this treatment, which links dimensions with units, the “dimension of X” has no absolute significance. In fact, a possibility would be not to choose natural but artifact standards, that is for lengths, masses, etc. The original definitions of the M.K.S. units were actually made this way. As long as these standards share the properties exhibited by natural standards (wide availability, reproducibility, stability, accuracy, etc.), there is no formal reason to reject them. In our model, this could well mean that Oo covers the physical algebra, and that all quantities are dimensionless. In this sense, Planck’s much quoted remark that the question of the “real” dimensions of a physical quantity has no more meaning than that of the “real” name of an object (Planck, in his lectures on Electrodynamics) is pertinent there. But as we remarked before, even if we assume that we have (natural or artifact) standards for some quantities, we still have the freedom of choosing them, or not, for fixing units, and we could ask what would happen if units were changed-leaving aside practical problems that are the concern of metrology. In other words, in this context, the definition of the group G is largely a matter of convention and for a given problem or theory, one must make it explicit, maybe through a description of E and Eo. The flexibility in the selection of G may be used for the obtention of the maximum of information in a given problem. As a particular instance, for Newtonian mechanics, the fundamental dimensions are conventionally taken as L, M , T.This choice is in fact a natural
216
JOSE F. CARIaENA AND MARIANO SANTANDER
one for that theory, as it will be seen later. But for a given problem, all relations of interest to us may be invariant under a group G larger than the group, isomorphic to R 3 , of changes of the L , M, T units, but a group which may still be realized as scale changes, and is therefore susceptible to appear as being the gauge group for some different dimensional structure defined in the physical algebra. The structure L , M, T is natural only for Newtonian mechanics, but for other theories, such as Newtonian gravitation, thermodynamics, relativity theory, electromagnetism, general relativity, quantum mechanics, etc., there are other natural assignments which use some natural standards related with the laws of the theory. In this sense, and this will be discussed later, the dimensional structure is not completely conventional. But our point here is that the “natural” assignment of dimensions in a given theory may however not be the most appropriate for the solution of a concrete problem, and it is an unnecessary rigidity to narrow the potential of Dimensional Analysis by unduly restricting the number of dimensions: the larger the number of basic dimensions, the bigger the information obtained via an appropriate use of ntheorem. To sum up, the aim is to look, in a particular problem, for a dimensional structure such that its gauge group coincides with the scaling symmetry group of the initial relations, that is to say, all the relations between physical quantities that would be necessary in order to produce a complete quantitative solution of the problem. These initial relations may even leave aside fundamental physical relations of the theory when they are not relevant to the problem. This viewpoint has been emphasized by many authors ( H u h , 1980 and references therein, and Supplee, 1985) and shall be discussed in a very detailed example.
IV. THEPHYSICAL MEANING OF DIMENSIONAL ANALYSIS This leads us to the second aspect of Dimensional Analysis. All the preceeding discussion has deliberately been very formal. The choice of a particular dimensional structure of the physical algebra is not universal, and in this sense, the gauge group of the chosen structure is a conventional choice. But saying this we do not go so far as saying that dimensions have not any meaning; we only say that for every problem, there are different ways of describing the pertinent relations in the context of a physical algebra, and some selections are better for the purpose of obtaining information via the ntheorem. The idea of concluding from this fact that the concept of dimension is completely devoid of a meaning seems to be excessive, only for the reason that
DIMENSIONAL ANALYSIS
217
many different dimensional structures-mainly in Electromagnetism-seem to be, each one in turn, perfectly adequate for the description of phenomena. Of course that remark is to be found, in a more or less explicit way, in most writings on Dimensional Analysis; a recent work rightly emphasizing this point is that of Houard (1983). This leads us directly to the second and nonconventional aspect of dimensional analysis which we discuss now. Although our presentation of the idea of physical algebra has been formal, it is clear that it has a bearing on a set of physical laws, which are the ultimate reason for the possibility of building such structure. It is convenient to keep this in mind because the neglecting of this fact is one of the reasons for the somewhat methaphysical look of Dimensional Analysis. This is linked with the fact of giving an interpretation for the mathematical relations in the physical algebra. Manin (1981) has pointed out that this fact is one of the most striking features of physics; the communication between mathematics and physics is made difficult by the inclination among physicists to pass directly from the mathematical expressions to its factual meaning, using a more or less implicit interpretation. This feature is very remarkable in Dimensional Analysis. In fact, in the previous section the physical algebra has been characterized, according to the present algebraic ideas, only through its formal properties. Whereas this suffices for the development of operatory manipulations, this is not enough in itself to provide a physical interpretation of the product, which has to be given from outside, and which involves the physical laws. The main idea in the former presentation of the mathematical part of Dimensional Analysis, namely, that from a fundamental set of units we can obtain in a well defined way a set of units for all quantities, assumes that the products and powers in the physical algebra, which are used to produce expression as U;',.. V?, as a unit in its class, have some definite interpretation in terms of an element in this class. That interpretation rests on a law taken as being exact in some domain. Let us take the much quoted example of torque and energy, for which ordinary SI units are in both cases kg m2 s - ~This . only means that we have two different procedures, based on simple physical laws, which starting with units for length, time and mass end up with two different units for the classes of torque and energy, having however the same name and assigning to the two classes the same dimensions. In everyday work, no physicist nor engineer gets confused by this, because the interpretation given to the same symbol kg m 2 sC2 is different after it applies to a torque or to an energy, even though its formal, operatory properties do in fact coincide. From our exposition of the physical algebra, we recognize the reason for this particular feature is the fact that angle falls into the category a0of classes where some unit is fixed from natural standards. The implicit interpretation,
218
JOSJ? F. CARIRENA AND MARIAN0 SANTANDER
hereafter referred to, is conveyed when one recognizes the classes of angle, torque and energy related in the physical algebra as (torque) x (angle) = (energy); if u, denotes the fixed unit of angle-which lies in @,-and u l , u,, urn-denote the units for lengths, times and masses, the unit in the class of torque is to be written in full as urnu~u;’u;’, in contrast with the energy unit, U , , , U ~ U ; ~ . Hence, it is very important to have in mind the differences between the two partitions of the quantities into the classes denoted ( ) and [ 3 respectively.The first classification is by means of a relation of “being the same kind of magnitude”, whereas the other corresponds to “having the same dimensions”, and refers both to a specified subalgebra 0,with fixed units, and to a specified set of fundamental units for some quantities. Both classifications also involve physical laws and are described in formal terms by the structure of the vector spaces E, E, and V’. Now it is very interesting that from a conceptual viewpoint, all this discussion is greatly clarified if one forgets everything about units, and restricts himself to the study of the structure of the products in the physical algebra. A study from this point of view is not new: a recent and recommended work is Houard (1983), and there are also previous works by Fleichsmann (1951, 1954), Landolt (1952) and Stille (1961) and by Quade (1961) and Bunge (1971). As in the former case we do not intend to give an exhaustive presentation but only to introduce the main ideas; the interested reader is referred to the quoted literature. A . The Dimension Group of a Theory
Consider anew the set of all physical quantities. They are classified by an equivalence relation corresponding to “being of the same kind of quantity”. Now, we choose not to allow flexibility in this equivalence relation and we assume that the question about whether or not two concrete quantities are of the same kind can be unambiguously answered in the context of a particular theory. For each class, the set of possible values of the quantity is the set R of real numbers, or the set of positive real numbers R,. Hence, the set of all physical quantities can be written as Z x R. The next main ingredient is the product of quantities. Whereas in the first part we have discussed its formal properties, we are now approaching the issue from a more physical point of view. The product of magnitudes involves both an abstract, formal definition-which concerns the formal properties of the product-and an identification principle, that rests on the physical laws themselves and provides an identification of otherwise different classes. This double aspect of the process has clearly been discussed by Houard. For example, if we consider the quantities length and area, both assumed to be measurable, the innocent
DIMENSIONAL ANALYSIS
219
looking relation 1 mz = (1 m)’ hides such an identification, which in its conventional interpretation of the area of a square of a side 1 m of length, is made only possible by the particular property of Euclidean space and where the physical law involved is but the bilinearity of the area in the sides of a paralellogram. For mathematical reasons of completitude, it is assumed that these definitions are made for all interesting quantities in the theory. The set Z of all classes acquires this way a natural structure of group, whose product is induced by the one defined between quantities and which is actually isomorphic to the Abelian additive group of the vector space E in the former treatment. A basis in this space corresponds to a subset r of quantities, ordinarily with a direct physical interpretation, and such that every other quantity of interest can be reached by products and/or powers starting from this subset r of quantities. We insist that here we are assuming that the products involve an explicit identification of classes and hence every quantity either has been introduced as a “primitive” quantity in the theory or has been defined through some product. The set of all quantities is determined by the set r a n d the set of the identifications defining the products. We remark that here we do not consider Z as a given vector space, but instead we are building E from a subset of quantities. In a group-theoretical language, the basic subset of quantities are the generators of an Abelian group, which is the free group generated by them, and the additive group of the vector space E is an Abelian formal group whose elements are products of rational powers of the generators. The customary usage at this point is to consider only the group generated by integer powers of the generators. If only a minimal set of quantities in r a r e considered as generators, the group is the free group generated by them, but one can alternatively take more quantities as generators, and then one has to add some defining relations (in the sense of group theory) which will be the expression of the homogeneity of the laws used to define the products. The somewhat imprecise characterization of I‘as the set of “basic” quantities has no consequences for the theory provided that this fact is duly taken into account. When discussing the physical algebra we have shown how one could select a subalgebra where units are supposed to have been fixed-from existing standards-and which from the ulterior development (within that structure of the physical algebra) of Dimensional Analysis are considered dimensionless. In the present terms, we have a group, called a group of dimensions, D, whose elements are the set of all quantities which are a product of integer powers of the basic quantities, and a subgroup, Do whose elements are products of the integer powers of the basic quantities in &,. To the quotient vector space E/Eo there will correspond now a factor group, DID,,which is called a reduction of the initial group of dimensions. The new group appears thus as the genuine
220
JOSE F. CARIRENA AND MARIAN0 SANTANDER
dimension group when some more standards have been introduced in the definition of the products, or when one has fixed the units and does not allow their changes for a greater subset of the basic quantities. Remarks similar to those made on the conventionality in the structure of the physical algebra can also be applied here, but if one limits the use of standards to be made dimensionless to those appearing in the fundamental physical laws of a theory, one obtains a well-defined group, characteristic of the theory and called its group of dimensions. Its physical meaning is clear after the preceding discussion: it describes the different kinds of dilatation-like symmetries of the theory, and the fact that a such dilatation for some kind of quantities must be accompanied by a dilatation (with particular factors) for others. To put it in a more formal way, for a given group of dimensions, we can consider a general dilatation, that is, a transformation which is simply a scale change with a positive scale factor aw in each kind of quantity or dimension, X + a r X I XThat . transformation on the quantities will be a symmetry of the theory if and only if the application [ X I + aIxlis a homomorphism from the group of dimensions into the positive real numbers. This set of all homomorphisms of the group D into the positive real numbers-that in the physical algebra corresponds to the gauge group-is a group which could also be called here the gauge or similarity group. Its physical interpretation is very direct: if [XI + aIx,is an element of the gauge group, a change of all quantities X’s, each by the factor a[,], is a symmetry of the theory. In order to see these ideas in an example, let us consider Newtonian mechanics. As basic quantities one has to include lengths, times, masses and forces, whose dimensions will be called, L , T, M , F, and these could be taken as generators of the dimension group. These generate a free group, which is not however the dimension group of the theory: because of the relation F = ma one has to consider a relation for the corresponding dimensions, LT-’MF-’ = I, or what is equivalent, to replace the free group by its quotient by the subgroup generated by L T -’MF-’, obtaining the free group generated by L , T, M . In that sense we have a “natural” group of dimensions for each theory. For Newtonian Mechanics, furthermore, that group can also be shown to be related to the peculiar properties of the invariance group of Classical Mechanics, the Galilei group, as we shall see in the next section. If one considers Newtonian gravitation, a new law enters, F = GMm/r’. The value G is a standard for its class, whose dimension is in the group of Newtonian mechanics L3T-’M-’. Any symmetry of the Newtonian gravitation cannot change independently masses and lengths or times, and its natural group of dimensions is the quotient of the free group generated by L, T, M by the subgroup generated by L3T-’M-’. This is isomorphic to the free group generated by only two of them, say, L and T. Any similarity of Newtonian
DIMENSIONAL ANALYSIS
22 1
gravitation-for example, a question like “how would the periods of planets be changed if all the masses were doubled”-is described by a homomorphism of that group into the positive real numbers, in this case, as M = L 3 T - 2 ,the similarity would be ct(L)= 1, c t ( L 3 T Z = ) 2, and from there one obtains the value of cc(T). For other theories, we can consider their groups of dimensions. Let us make a brief comment on the case of electromagnetism. The problem of dimensions in electromagnetism has been a root of many controversies, and a concise but very clear statement of the different possibilities is given in a book by Jackson (1975). One can also read the parable given by Casimir (1968). From the present point of view, the important thing is to recognize what the physical standards in the electromagnetic theory are. If one makes a treatment similar to the one of Newtonian gravitation for the Coulomb law, one is led to take the free group generated by L, T, M , and Q-the new dimension in the sense of new physical quantity for the electrical charge, and reduce it by the subgroup generated by L - 3 T 2 M - ’ Q Z .The physical standard in question would be the constant 1/4n~,,. But one must keep in mind that there is an important difference between Newtonian gravitation, with the law F = G M m / r 2 ,and electrostatics: whereas the gravitation law is to be taken as an exact law in its theoretical frame, Coulomb law is only a small part of the complete electromagnetic theory and does not describe all the forces acting on charged particles. A more sustantive treatment will have to consider a) electromagnetism in its correct theoretical frame, that is, relativity theory, and b) the complete theory, as given not by the Coulomb law alone, but by the set of Maxwell equations and the expression of the Lorentz force. When such a study is done (Guissard, 1972) the result strongly supports the view that the only standard in the classical electromagnetic theory is the speed of light, and that the constants of permitivity and permeability of the free space are not fundamental properties, in the sense here given to the term standards, of free space, but only constants whose appearance, values and dimensions, are fixed by the particular choice of units. B. A Detailed Study of an Example
Let us develop in some detail a concrete example. In the gravitational field of Earth, consider the ballistic motion, assuming ideal conditions, of a projectile of mass m, fired at some angle relative to the horizontal plane, so that its velocity has horizontal component u, and vertical component u,. Let us inquire the horizontal range x . The relevant physical theory is, of course, Newtonian gravitation, and from it we know that the other relevant physical quantities are the Earth mass, M,, and the Earth radius, R,, which
222
JOSfi F. CARIQENA AND MARIAN0 SANTANDER
determine the local gravitational field at the point where the motion takes place. As gravitation law is also invoked, we must have in mind also the Newton's gravitational constant G; as we shall see, G can be hidden in the dimensional structure. We expect a relation described by a @-function, @(u,., u,, G, M,, R,, m, x) = 0. We know from the II-theorem that if this function is invariant under the gauge group of its physical algebra, it will be equivalent to a relation between a complete set of dimensionless products. Of course, this invariance depends both on the choice of the gauge groupthrough the precise physical algebra used-and on the physical laws concerned in the problem-here the gravitation law. Let us start by performing a conventional discussion. In terms of the ordinary physical algebra for Newtonian mechanics, a fundamental set of dimensions are L, T, M , and we have the following dimensional matrix: L< T M
M, R ,
G
-1
-1
-2
m x
0 0
0
0
0
1 0
We have n = 7 quantities and m = 3. The rank of the matrix is I = 3, and a complete set of dimensionless products is easily found to be u,/u,, x l R , , m / M , , u : / x ( G M , / R ~ ) As . the Newton gravitation law is obviously invariant under the full gauge group thanks to the explicit presence of G, we conclude that our relation will be equivalent to another one involving only these four dimensionless products. But from this we cannot draw further detailed information about the dependence of x on the other magnitudes. Even if we forget about the fundamental level of use of the law of gravitation, and we replace G, Me and R , by g = G M , / R i , we have 5 quantities and the dimensional matrix is:
r1
T
M
-1
1
0 1 0 0 0 1 0
-2
-1
0
Here a complete set of dimensionless products is u,/u, and u:/xg. But now it is not clear if the full gauge group ought be an exact symmetry group of the @-function @'(u,,u,,g,m,x) = 0. In physical terms, and in the active interpretation, a change in all masses by some factor, with no changes in lengths or times, would of course change the Earth mass and along with it the value of g, so that strictly speaking, we cannot apply the ll-theorem. Nevertheless, we may expect in any case that group to be a very approximate
223
DIMENSIONAL ANALYSIS
invariance group of the @ function, and if we insist on applying the lltheorem, we obtain a relation as x = { u ~ / g } F ( u , / v , ) . We may however increase the information by using a different structure for the physical algebra. We start by remarking that the problem has symmetry under rotations of vertical axis, and there is a consistent distinction between “horizontal” and “vertical” lengths. If we range these lengths into different classes and introduce independent units for horizontal and vertical lengths, we have n = 7, m = 4, (L,, L,, T, M ) , and now the dimensional matrix is:
LT Lz
u,
u,
G
M,
R,
-1
-1
-2x
0
0
0
M
m x 0
0i
-1
As its rank is r = 4, a complete set of dimensionless products has 3 elements: m / M @ , v ; / R e ( G M e / R i ) , v , u , / x ( G M , / R ~ ) . The full gauge group is now a symmetry group of the @-function, and from this x = @(m/M,, u~/R,(GM@/R~))u,v,/(GM,/R&). We see G in the combination ( G M , / R i ) which gives the acceleration due to Earth’s gravity. If we replace G, M , , and R , by g = G M @ / R i ,as before, we will now have 0,
1 0
LX
Lz T M
-1
9
0,
0 1
-1
0
0
0 1 -2 0
m x 0 0 0 1
1 0 0 0
with n = 5, m = 4, and r = 4.A complete set consist only in the single product v,u,/gx. The symmetry of the new @-function @ ( u x , u,,g,m,x) = 0 will also be approximate in the same sense as previously. Rotational symmetry of vertical axis would be maintained if all “horizontal” lengths were changed by a factor different from that of “vertical” lengths. Earth would then be an ellipsoid, and the breakdown of symmetry comes from the fact that y would also change when masses are changed. Neglecting this effect, we obtain x = const uxu,/g. When compared to the previous result, we obtain more physical insight in it, as we recognize the approximations inherent and can obtain an approximate solution to the problem by the replacement of the value of @ at the v:/Ro(GMe/R;) x by its values at small arguments m/M, z (0,O). From the physical viewpoint, this corresponds to neglecting the effect of the projectile back on the Earth.
224
JOSE F. CARIRENA A N D MA R I A N 0 SANTANDER
We now use a different dimensional structure, adapted to Newtonian gravitation. Starting from the one given by L , T, M , the new structure is obtained by reducing the group of dimensions by L3T-’M-’, that is, by taking as the new group of dimensions the quotient group of the old one by the subgroup generated by the old dimensions of the gravitational constant. This has one definite advantage: now the gravitational law is written as F = M m / r 2 , which only involves physical quantities and is invariant under the full gauge group which is now generated by L and T. Mass has there the dimensions L 3 T 2 ,and its explicit relation between its old and new values ism, = Gm,. In this treatment, there is no need to worry about the gravitational constant, which is hidden through the selection of a system of units. Now the relevant quantities are the physical variables u,, u,, M,, R,, m, x. We are yet free to consider a splitting of horizontal and vertical lengths, in this case, the reduction from the structure based on L,, L,, T, M , is by L;T-’M-’ (because the lengths involved in the law of gravitation are to be considered vertical). It is a simple task to see that a complete set has 3 dimensionless products, m / M , , u ; / R , ( M @ / R & ) , u,u,/x(M,/R&), that are formally identical to the old ones with G = 1. This is to be expected: a simple reduction of the dimensional structure by the dimension of a universal constant simply amounts to deleting (making it equal to 1 and dimensionless) that constari. But if we replace now Me and R , by g = M , / R & , we have
L, Lz T
p -1
-1
9
m
x
-2
-2
0
Mass enters nontrivially into gravitational theory, and this is adequately captured by our new dimensional structure in the physical algebra. We have n = 5 quantities and m = 4.The rank of the dimensional matrix is 3 and a complete set of dimensionless products is u,u,/xg and mg/v;. For a typical situation (say, in SI units, m = 1 Kg, 101 = 100 m s-’, the product mg/v; has a value lo-’’ (remember that now mass has to be measured in gravitational units), but still carries information on the however negligible dependence of the range on the mass. The other dimensionless product has according to the exact theory the value 2. The invariance group of the sought for relation is the full gauge group, as we see that changes in masses alone are not in the group; in the active picture a change in masses is only possible along with changes in vertical lengths or times, the special structure of these changes assuring the invariance of the pertinent relations. So if we apply the ll-theorem, we obtain: x =I~,~z/s}ww/~3*
DIMENSIONAL ANALYSIS
225
With the further assumption on our projectile to be a test particle, that its range ought not depend on m when rn is very small, leads to x = cte v,v,/g.
The interesting features in this example are: (i) If all relevant quantitites are taken into account, any solution for the relationship between x and other magnitudes is not wrong. Nevertheless, there are some dimensional structures that lead to a more informative solution. (ii) For each dimensional structure, the II-theorem ought not to be applied in an automatic way, but only after its applicability has been ascertained by considering whether the full gauge group of the physical algebra is, or is not, a symmetry group of the searched-for relation. (iii) Universal constants may appear if one uses a dimensional structure that is not “natural” for the theory under consideration and will not appear explicitly if a “natural” structure (which is some reduction of the former) is used. The amount of information which can be obtained in both ways is the same. (iv) The comparison between the results obtained when considering different dimensional structures, or when making different groupings of the relevant quantities that could enter a problem in some combination according to its physical relevance tends to be enlightening. In our example, the difference of replacing G, M,, R, by g = G M , / R , is striking. If this is made within the dimensional structure L,, L,, T, M , we have three dimensionless products.
II, = m / M , ,
n2= v t / R , ( G M , / R i ) ,
TI3
= v,u,/x(GM,/R~),
while only one 113 = v,u,/gx in the second. But if this is made within the reduced dimensional structure L,, L,, T, we have the same three ll’s in the first case (of course with G = I), but only two for the second, ll = mg/v:, 113= u,v,/xg. The new ll is some combination of the old ones, II = l1111;2; this combination does not appear in the former case because the mass has been left as a primitive dimension, which as we have said, does not correspond to the symmetry of the facts under consideration. (v) It has been claimed sometimes that the reduction of the number of fundamental dimensions by means of the process of reduction is of bad value for D.A., because the solution for every problem losses informative power when the number m is lowered. This claim is generally supported by the existence of three fundamental natural standards, namely the limit speed cthe speed of light and massless particles, the gravitational constant, G Newton’s constant, and the quantum constant h-Planck’s constant. If the conventional structure L, T, M is reduced by these three dimensions, the
226
JOSk F. CARIRENA AND MARIANO SANTANDER
dimensional group obtained is trivial, that is, there are natural standards, the famous Planck’s units, for lengths, times and masses, and hence for all quantities in the physical algebra adequate to classical mechanics. Hence, within such a dimensional structure, the ll-theorem cannot provide information. Whereas this is of course true, one may be reminded that there is no compulsory need to limit oneself to the use of such a trivial reduction, even if one is discussing a problem which really involves some universal constants and wishes it to explicitly disappear, as is the case in practically all writings in theoretical physics, because the particular problem at hand could be invariant under some transformations which could allow for the use of a different dimensional structure. In our example, the rotational invariance for rotations with vertical axis provides an example. So that, even if we initially have enough standards for considering all physical algebra as the subalgebra @,, we have still the freedom of considering a different way of classifiying all the quantities into classes and furthermore having a different dimensional structure with a sufficient number of fundamental dimensions for finding useful information. It is perhaps not out of place to remark that most of the non-trivial applications of D.A. have been developed for hydrodynamics, fluid mechanics, ect. (see the unsurpassed Sedov’s book-a physical theory whose theoretical frame is Newtonian mechanics, where the conventional dimensional structure is fairly rich and where the basic equations are in fact invariant under the full gauge group), so that there is no urgent need to improve the information obtained by the conventional application of the ll-theorem. This need is more evident when one moves to other physical theories, where Dimensional Analysis is apparently a useless tool.
V. KINEMATIC GROUPSAND DIMENSIONAL ANALYSIS
Until now the assignation of the dimensional structure or the choice of the set of primitive dimensions along with the definition of all relevant products in the physical algebra has been considered as previously given, but it is to be obtained starting from basic physical knowledge of the theory under discussion, Now we are going to see that, at least in some cases, a particular and natural dimensional structure for some quantities with a direct interpretation in terms of the symmetry group can be obtained assuming a symmetry group as the basic object of a theory. Some important quantities are related to the group in most cases so that this is not a fundamental restriction. The role of symmetry groups as “superlaws” is a well known topic in theoretical physics (Wigner, 1967). In the connection with Dimensional Analysis, the most important invariance groups are those that arise from the
227
DIMENSIONAL ANALYSIS
realizations of the principle of relativity. Two aspects of this principle must be carefully distinguished (Bacry and Levy-Leblond, 1968): (a) the “abstract” principle, stating that all laws of Nature are invariant under a particular set of “physical” transformations, such as space and time translations, space rotations, pure inertial transformations and consequently all their products obtained by composition, and (b) a concrete realization of this principle, which gives the specific form of these physical transformations as a group of transformations in the mathematical sense, as well as the specific form of the transformation laws of physical quantities under this group. In all classical physics and quantum “non-relativistic” theory the so-called Galilei group 3 is assumed to be the concrete realization of the relativity group. Y is a ten dimensional Lie group, whose action in terms of the ordinary space-time coordinates (x, t) as referred to an inertial reference frame is: x’= Wx
+ vt + a
t’=t+b
The Lie algebra of this group is: [J, J]
= J,
[J,H] = 0, [J,K]
= K,
[J,P]
= P, and
[K,H] = P
where [A,B] = C means [ A , , BJ = &ijf c k , [A, B ] = C means [A,, B] = Ci and [A, B] = C means [A,, Bj] = 6,C. All the other commutators are equal to zero. For a review of the role of Galilei group as an invariance group in classical and quantum physics, see Levy-Leblond (1972). According to relativity theory, the set of physical invariance transformations also includes space and time translations, space rotations, and all their products. But the expressions of some of these transformations in terms of the space time coordinates (x, t) relative to an inertial frame are different from those of the Galilean case; namely, a pure inertial transformation along a coordinate axis is now given by: v-x x’ =
x
t + T
+ vt
/qiJ
/q!J. C
t‘ =
This group is also a ten dimensional Lie group, and its Lie algebra is the following: [J, J] = J, [J,H] = 0, [J,K] = K, [J,P] = P, [K,H] = P, [K,P] = (l/c2)H, [K,K] = -(l/c2)J.
228
JOSE F. CARIRENA AND MARIAN0 SANTANDER
An interesting property is that in both cases space-time appears as a homogeneous space of the corresponding group: Newtonian space-time, with its particular geometrical structure, is a homogeneous space of the Galilei group, and relativistic Minkowskian space-time is a homogeneous space of the Poincare group. The isotopy subgroup is generated by space rotations and pure inertial transformations, J and K. As length and time appear always as basic quantities in all expositions of Dimensional Analysis, and space-time can be given a group theoretical interpretation, one may ask whether there is some group theoretical justification for its appearance as basic quantities. This question has been raised and developed by Cariiiena, del Olmo and Santander (1981, 1985) and will be briefly reviewed here. Although we cannot expect this kind of treatment to give a complete justification for the appearance of all quantities in other physical theories, such as temperature, we feel that it has the value of linking, in an explicit way, Dimensional Analysis with the conventional form of group invariance which is nowadays well-known and thoroughly developed. Furthermore, the reduction process of the dimensional structure-in this case by the dimensions of the physical standard c, explicitly involved in the replacement of Newtonian mechanics by relativistic mechanics-appears in this view as directly related to the Inonu-Wigner (1953) contraction from the Poincare group to the Galilei group. This contraction is, at the group level, the singular limit process that corresponds in geometry to the non-relativistic limit in physics. The value of this viewpoint is that theories with different symmetry groups can be considered, either simultaneously as an “exact” and an approximate theory, or as two different theories, each with a given range of validity. This way one obtains, for each of them a natural assignment of the basic dimensions which is adapted to the actual existence of standards for some quantities in the corresponding theory, a feature which is somewhat implicitly hidden in the group structure. For the sake of brevity, we will here consider only some of the most important groups and their associate physical theories; we refer the reader to previous papers (CariEena et al., 1981, 1985) for more details. Specifically, we shall discuss: (a) The isometry group of a two-dimensional Euclidean plane, a group whose corresponding physical theory is classical Euclidean geometry. (b) The isometry group of a two-dimensional hyperbolic plane (the Lobachewski plane, a Riemann space of a constant negative curvature). These two groups are related by a local (point-like) contraction, which means, in geometrical terms, that in a small neighborhood of all points which are at a small distance from a given point, the hyperbolic transformations
DIMENSIONAL ANALYSIS
229
appear to be very close to the Euclidean transformations. That relationship is akin to the non-relativistic limit, where the groups concerned are: (c) The Galilei group, the group of motions of the 3 + 1 Newtonian space-time, already discussed. (d) The Poincare group, the familiar group of symmetry transformations of 3 + 1 relativistic (Minkowskian) space-time. These groups are related by an axial (line-like) contraction, which, in a small neighborhood of all points lying on lines at small Minkowskian angles with a given time-like line, makes Poincare transformations closely approximated by the corresponding Galilei transformations. For any Lie group, a set of basic objects from the mathematical point of view is that of its one-parameter subgroups. For any such subgroup, there is a canonical parameter, a parameter in terms of which the group product corresponds to the addition of the parameters of the factors. Remark that two one-parameter Lie subgroups with proportional canonical parameters give rise to the same one-dimensional Lie subgroup, and when considering these latter the “canonical parameter” is not well-determined but instead only up to a factor. Note the structure of quantities, where a real value is associated with a quantity only up to a factor to be fixed by the choice of a unit. Coming back to our examples, we see: Euclidean case
Here the one-parameter subgroups correspond geometrically either to translations along a line 1, or to rotations around some point P. The canonical parameters may be identified with the distance between any point on 1 and its image, in the first case, and the angle between any line through P and its image, in the second. Thus, lengths and angles, the basic quantities in classical Euclidean geometry, appear here as being the canonical parameters of oneparameter subgroups. The ratios of canonical parameters within a given onedimensional subgroup have an intrinsic meaning, and can be considered the measure of one quantity linked with the subgroup when the other has been chosen as a unit, but ratios of canonical parameters corresponding to different one-dimensional subgroups does have not in general such intrinsic meaning. Hyperbolic Geometry
There are three classes of conjugation of one-parameter (and onedimensional) subgroups, and in addition to translations along lines and rotations around proper points, there are also so-called “horocyclic displacements’’, the common limit for a translation when its line goes into infinity and for a rotation around a point which moves to infinity. We have in this case, in
230
JOSE F. CARIRENA AND MARIAN0 SANTANDER
addition to lengths and angles, a third geometrical quantity that is a measure of “how apart” two parallel lines are, different from the distance along a common perpendicular, a distance which is not constant in hyperbolic geometry and cannot therefore be taken as a measure of the “separation” between two lines. If the Lie algebra is referred to the basis {J,Pl,P2}, horocyclic displacements are generated by J alp, a2P2, with a: + a: = 1. Now it is well known that in hyperbolic geometry there exists a standard of length, in the sense that this plane has curvature, and one can uniquely select a unit of length so that the curvature has some prescribed value, say equal to - 1. Our aim is to see how that peculiarity is contained in the Lie algebra structure of the hyperbolic group, as oppossed to the Lie algebra of the Euclidean group. To this end, we must first recognize that conjugate subgroups in the Lie group correspond to subgroups with the same kind of geometrical significance. For a subgroup of, say, translations along a line, all its conjugates are translations along all other lines in the plane; for the subgroup of rotations around a point, the conjugates are rotations around all other points, etc. Hence, the inclusion of lengths along all lines under a common heading of lengths presupposes that all subgroups of translations become comparable, so that we can meaningfully speak of the ratio of any two lengths along different lines, and, mutatis mutandis, the same for any other quantity, angles, etc. Whether this is or is not possible turns out to be a property of the group of the geometry, a property that, in our case, is fulfilled for lengths and angles in Euclidean and hyperbolic geometries. Let us put the preceding ideas in a more explicit setting. We consider a group G as a transformation group acting transitively on some space X,hence identified with a homogeneous space of G, and consider as natural candidates for geometrical quantities the canonical parameters of its one-parameter subgroups. For two elements in the same one-dimensional subgroup, g and g‘, the quotient of their canonical parameters is well-determined and can be considered as the measure of g’ when g is taken as a unit. Since the exponential map relates the Lie algebra with the group, a change in the unit g amounts to a change A + LA, with 1# 0, in the algebra. Whereas independent units can be chosen for every one-parameter subgroup, one must try to ascertain to what extent units can, initially defined only for some subgroups, be “propagated” to others, and realise if among all possible changes of units, there are some “natural” changes. In regard to the first part of the problem, conjugation appears as a natural method for the propagation of units and gives rise to two equivalence relations in the Lie algebra. Consider the action of G on its Lie algebra by inner automorphisms on the one-parameter subgroups. Two generators A and B are
+
+
23 1
DIMENSIONAL ANALYSIS
in the same orbit if there exists an element g of the group such that Ad,A = B, where exp(Ad,X) = g(expX)g-’. In this case, we will write A B. There is also another equivalence relation, corresponding to the action of G on the set of its one-dimensional subgroups (one-parameter subgroups without a particular parametrization). A and Bare defined to be equivalent according to this relation, to be denoted A z B, if there exists a real number 2 # 0 such that A AB. If A and B are equivalent in this sense, the one-parameter subgroups generated by them are conjugate, so that the relation z corresponds to “being the same kind” of geometric quantity, and in order to embrace under some common heading that quantity, one has to specify the way of relating units for all one-parameter subgroups whose generators are related by z.The idea is: if A has been chosen as a unit for its one-dimensional subgroup and A B, then choose B as a unit for its subgroup. This idea works provided the onedimensional subgroup exp(tA) has no non-trivial selfconjugations. Hence, we have two different situations, after the subgroup generated by A has, or does not have non-trivial selfconjugations. From now on we shall only consider the case where there are no non-trivial selfconjugations, and refer the reader to a previous paper (Cariiiena et a/., 1985) for the general case. We now have all the ingredients needed to obtain a unit system for the set of all one-parameter subgroups of G. For the case we are considering, such a unit system is completely specified by a set { A , } of elements A,, one in each %-class. If B is obtained by propagation from A, to some other oneparameter subgroup conjugated to A , and therefore in the same %-class, we will write B = A,. Note that for classes where there are no non-trivial selfconjugations, the transport starting from any element leads to a unique result. Hence, we have there a structure very similar to the physical algebra in the sense that for every element of the Lie algebra, X,whose w-class is denoted a(X), we have a well-defined unit Am(*),and one has a unique non-zero real . interpretation is very similar to the one in number x such that X = X A , , ~ ,This the physical algebra, and x will be considered the measure of X in the given unit system. Having completed a discussion of point i), we now see if the own group structure selects some particular set of changes of units, that we expect to be from a physical point of view, those which leave invariant the values of any possible standard in the theory. To see this, we select a basis of the Lie algebra, {Xi}, i = 1,. . . d , and we consider for every non-zero Lie bracket [Xi, Xj], the non-zero real number xu which gives the measure of that conmutator in the chosen unit system, namely,
-
-
-
232 A,
JOSE F. CARIRENA AND MARIAN0 SANTANDER
If we now perform a generic unit change, described by the replacements I(a)A,, the numbers xij transform according to the relation:
4
and we select the so-called natural changes, defined as being those for which the linear transformation of the algebra corresponding to the change of units, , a Lie algebra automorphism. Roughly speaking, this that is X i -+ I ( a ( X i ) ) X i is will mean that the commutation relations defining the Lie algebra will be formally invariant under the change, or in other words, that the changes will automatically take into account the existence of standards in the theory corresponding to that group. Mathematical characterizations relative to these automorphisms are to be found in the quoted references. The meaning of the automorphism condition can be clearly seen in the context of two of our examples, the Euclidean and the hyperbolic group. In the first case, a basis is {J,PI,P,} and a unit system consists of { J , P,} because there are only two x -classes, both without selfconjugations. The generic unit change is J -+ aJ, PI -+IP,, and the automorphism condition implies that a = 1, but leaves free the value of A for the Euclidean group-that corresponds to a familiar feature according to which all the (numerical) relations do not change explicitly when the unit of length is changed, but do change if one changes units for angles. The exigence of the automorphism condition keeps fixed the unit of angle and describes the existence of a natural standard for angle, whereas leaving free the unit of length describes the existence of similarities in the geometry, the first root of Dimensional Analysis. If we now consider hyperbolic geometry, in the basis {J,PI,P,}, a unit system for the classes of rotations and translations in { J , P , } , but now the automorphism condition forces that the factors in a generic change of units are both equal to 1. That corresponds precisely to the features of hyperbolic geometry, where the numerical relations between lengths, angles, etc., involve a universal constant whose value depends on the unit of length chosen, or in other words, describes the existence of a standard for length. These examples show in the simplest cases how the exigence of the automorphism condition for the change of basis associated to a change of units captures the main aspect of the problem, according to the existence, or not, of standards in the theory. A more formal way of introducing that result, which shows more closely the analogy with the ideas of gauge group, introduced along the discussion of the physical algebra, is the following: The natural changes of units are characterized by a set of equations, which must hold for the scale factors A(a), one for each x-class, amongst the basis { X i } and the set of commutators [ X i , Xi]. For every nonzero conmutator [ X i ,X j ] # 0, we must have A(a(Xi))
DIMENSIONAL ANALYSIS
233
i(a(X,)) = A(a([Xi,X,])). From these equations, one obtains a linear system of equations, by taking logarithms, whose solutions are linear combinations of a set of n independent solutions. The set of all natural changes has then a group structure and is an Abelian group isomorphic to (R')", the factor of change E.(a) for a class being expressed in terms of the factors for n particular classes, say f, through an expression
where, for each class, there is a set of real numbers do,,, that play the role of the components of the dimension vector of any element in the class relative to a particular basis determined by the n classes for which the scale factors are initially chosen as independent. In terms of the group G itself, this can also be described by saying that the group G admits n independent outer automorphisms, which act on the first kind of canonical coordinates as dilatations. Let us comment briefly on the relevance of this theory in the examples of the Galilei and Poincare groups, where the result is a justification in grouptheoretical terms of the conventional dimensional structures used in classical and relativistic mechanics. For the Galilei group 3,one should note that the invariance group relevant in Quantum Mechanics, and in some extent also in Classical Mechanics (LevyLeblond, 1969, 1972, and Martinez Alonso, 1977) is not actually the Galilei group itself, but a central extension of 9, called the extended or quantum mechanical Galilei group (in this connection, see e.g. the review of LevyLeblond, 1972). The results are accordingly different. In the true Galilei case, the elements in the ordinary bases range into four %-classes, {J1,J2,J3}, [PI,P,, P3), { K , ,K , , K 3 ) and { H } .In all these classes, there are no nontrivial selfconjugations, and a unit system can be generated starting from J1,Pl , K H . Under a general change J , + aJ,,PI -,LP,, K , -,oK,,H -+ T H , the automorphism conditions are a = 1, DT = I , and we see that the more general automorphism has scale factors (a,I , G, t) = (1, I , AT- ',z). There are two basic automorphisms which correspond to the fact that length and time can be scaled independently by arbitrary factors, and the geometry of Newtonian space-time is unchanged; in other words, space and time dilatations are outer automorphisms of the Galilei group. When going to the quantum mechanical Galilei group, we will have an extra central generator, I, commuting with all the other generators appearing in the commutator [ K , , 41 = 6,1. We have another class, where the physical meaning of the new group elements that give the transformation law of the quantum phase, that is related to the mass of the particle under consideration. If p denotes the scale factor for the new class, the automorphisms relations
,,
234
JOSE F. CARIRENA AND MARIAN0 SANTANDER
now imply A 2 t - ' = p. The meaning of that result is clear if we remember that the factor for the masses, the inverse of the factor in the generator I, leads directly to the fact of the action being dimensionless. Hence, the quantum mechanical Galilei group has a group of outer automorphisms with two generators corresponding to the possibility of performing independent dilatations on all lengths and all times; any such transformation has also an effect on the masses so that the action-the quantum standard h-is invariant. For the Poincart: group the discussion is similar, but there are some differences in the results. The classification into classes is formally identical: ( J 1 , J 2 ,J 3 } . {Pl, P,,P 3 } , { K l , K , , K 3 ) , (H}. (The conventional fourdimensional formulation can be confusing here because it ranges all spacetime translations under an apparently equal footing, whereas, in fact, translations along time-like straight lines are not conjugate to translations along space-like ones). With the same scale factors, the automorphism condition is in this case more restrictive and gives a = 1, o = 1, AT-' = 1. So, the most general automorphism is (a,A,o, t) = (1, A, 1, A - I ) . When compared with the Galilean case, one clearly sees that the existence of the standard c forbids dilatations with different factors for lengths and times and makes the canonical parameters of pure inertial transformations-the Minkowskian angle in geometry, or the rapidity (Livy-Leblond, 1980) in physicsdimensionless. To sum up, the structure of the symmetry group of a theory contains in an implicit way information on the existence in that theory of standards. This has been discussed for two purely geometrical theories, Euclidean and hyperbolic plane geometries and for two kinematical theories, Galilean and Einstenian relativity, which are the basic frames where other physical theories are developed, but is equally valid for any other physical theory with a symmetry group. We refer to the original papers for some other examples and a more comprehensive discussion.
VI. DIMENSIONAL ANALYSIS AND SYMMETRIES OF DIFFERENTIAL EQUATIONS The theory of the reduction of a differential equation through the use of one-parameter invariance groups, originated around 1873, is due to the pioneer work of the Norwegian mathematician S . Lie who introduced the theory of (Lie) transformation groups as a useful tool in the process of reduction of ordinary and partial differential equations to simpler ones, following some ideas taken from Galois theory. For a historical reference see
DIMENSIONAL ANALYSIS
235
the first section in the paper by Helgason (1977). Lie’s original approach has recently undergone a revival, probably because of the increasing relevance of nonlinear problems both in mathematics and physics. Our aim here is not to give an exhaustive study of the Lie’s theory, its derivation and applications but, instead, we will only try to present some simple ideas exhibiting its main properties and, in particular, we will apply the ideas of dimensional analysis to concrete problems. There are many approaches to the theory of symmetries of differential equations, each approach with its own advantages. We will try to sketch briefly the different approaches and establish the relations among them. To start with, we will consider the simple case of a first order ordinary differential equation in which x and y may take values in some open sets in R , (5)
Y’ = m y ) .
In Modern Differential Geometry, such equations, when written in the form of a system dx _ -1 dt
2 = F(x,y) dt can be interpreted as being the equations (locally) determining the integral curves of the vector field
a ax
X =-
+ F(x,Y)-a
aY
(7)
But similarly, we also can express the differential Eq. (5) by the system
that determines the integral curves of the vector field AX where A(x,y) is an arbitrary nonvanishing function, whose integral curves have the same graph in the (x, y) plane but with a different parametrization. The important point is that the general solution of Eq. (5) is given by $(x, y) = const. where $ is an invariant function for X given by (7) (and for each of its multiples), that is to say,
X4
= d 4 ( X ) = 0,
236
JOSE F. CARIRENA AND MARIAN0 SANTANDER
such that a@iy does not vanish. In fact, if simultaneously 4 x 4x
4 satisfies (6), then we have
+ F4y = 0 + Y'4y = 0
from which we see that ( 5 ) follows. Conversely, if ( 5 ) holds, then
d
x4 = 4 x + Y V Y = {4(x,y ) } = 0. dx The search for the general solution of (5)is then reduced to look for an exact 1-form in the kernel of X . The corresponding functions are locally determined by the characteristic system. Now, let us assume that a (Lie) group G acts on R 2 . Such an action translates to an action on the real valued functions by
4"(9(x, Y ) ) = 4(&Y )
(9)
and then we can say that G is a symmetry group for the differential Eq. ( 5 ) if and only if G maps invariant functions under X one into another; namely, if X 4 = 0 then X @ = 0. In the particular case of a one-parameter Lie group generated by a vector field Y E %(R2),this means that for every function 4 such that X 4 = 0, the condition X Y 4 = 0 holds. This condition may also be written as [Y,X I 4
= 0, V+
such that X4
=0
(10)
But given a function 4 the vector field X satisfying X 4 = 0 is only determined up to multiplication by a nonvanishing function, and therefore Y generates a one-parameter Lie group of symmetry for ( 5 ) if and only if there exists a function ;1E Cm(R2)such that
[ Y , X ] = AX
(1 1)
which also means that the set of exact 1-forms of Ker X is invariant under the Lie derivative along the vector field Y. For instance, the simplest case is that of the first order differential equation y' = 0 describing the pencil of horizontal straight lines in the plane. The vector field is X = a / d x and the invariant functions are the functions only depending on the variable y , the general solution being then 4 ( y ) = const., namely y = const. The invariance one-parameter groups for such differential equation are generated by vector fields Y of the form
DIMENSIONAL ANALYSIS
237
The case of scale changes can also be considered. Here the starting point is not Eq. (5) but the symmetry group and then the aim is to find the explicit form of functions F such that Eq. ( 5 ) admits the two-dimensional symmetry group of scale changes x, = ey'x, y , = eP'y
(13)
with B, y E R arbitrary. The vector field Y generating such a one-parameter transformation group is Y =y-
a + By- a ay
ax
The condition that Y is a symmetry for X xF,
= d/ax
+ F ( x , y ) a / a y becomes
+F =0 yFy = F
A=-y
(14)
because
and the solution of the first equation gives F ( x ) = g ( y ) / x , with g a function to be determined by the second condition which becomes Y d ( Y )= g ( Y )
i.e. g( y ) = by, with b an arbitrary constant. In other words, only an equation of the form (Bluman and Cole, 1974, p. 9). y'
=b
(16)
y
X
is invariant under such a two-dimensional group. If, on the contrary, Equation (5) is not invariant under the twodimensional group of transformations given by ( 1 3), but only under one oneparameter subgroup, which without loss of generality can be assumed to be that of ( 1 3) with y = 1, the invariance condition reads
xF,
+ F + B(yFy
-
F)=0
(17)
which is a quasi-linear partial differential equation whose characteristic system is
Jose F. CARIRENA AND MARIAN0 SANTANDER
238
from which we obtain the integrals
the general solution being then
as indicated in Bluman and Cole (1974; p. 11). Another alternative way of dealing with the differential equation is by means of the 1-form a = d y - F(x,y)dx
(19)
which satisfies a ( X ) = 0. Such 1-form a is not exact and, as indicated before, the solution of Eq. (5) is equivalent to find an integrating factor p for u in such way that p u = d 4 , the general solution being then given by 4 = const. From this new point of view, solving the differential Eq. ( 5 ) amounts to determining the curves y: I -+ R 2 such that y*u = 0. As far as the symmetry theory is concerned, the condition (11) for symmetry is equivalent to the existence of a function f such that L y a = fu, because from the general identity L , i ( X ) a - i(X)L,u = i([ Y, X I ) a applied to our particular case, we find i ( X ) ( L y u )= -Ai(X)u = 0,
and as two one-forms annihilating X are proportional, there will exist a function f such that L,u = f a . The remarkable property is that once we know a one-parameter group of symmetry transformations for a, we are able to construct an integrating factor for a and, therefore, to find the general solution of (5). This integrating factor is the reciprocal of i( Y ) u .To prove this, we first remark that
which, by making use of the Cartan homotopy identity, becomes 1
{[ - Lyu + i( Y )d a ]
A
a
But since L,a is proportional to u, the 2-form L,u '{&a}
= [i(Y)al2
+ [i( Y ) a ]du}. A
a vanishes and then
+
{ [ i ( Y ) d a ]A u - a ( [ i ( Y ) d a ] } i ( Y ) [ a A du].
DIMENSIONAL ANALYSIS
239
Now as any 3-form is identically null, we conclude that the 1-form a/(i(Y ) u ) is closed and therefore locally exact, the function I/(i( Y)n)being an integrating factor for u. This result, as most of them, is due to Lie (see Helgason, 1977). The coordinate expression for this factor system is ( F t - r])-', where the vector field Y is assumed to be
a
a
ax
ay
Y=<-++-
(20)
As an instance, the two integrating factors for Eq. (16)corresponding to the one-parameter groups of (13) are the same function y-', while in the case in which F is given by (18), the integrating factor is [ x p H ( y / x p )- By]-'. Lie's original approach to the theory of symmetries of differential equations was based on the concept of prolongations and extended transformations which actually correspond to the modern idea of jet-extensions. In the case of the first order equations we are considering, the action of a oneparameter group with generator Y = a / a x + q a/ay is extended to an action of three-dimensional space (x, y, y ' ) in a very natural way. In intuitive terms, if under the action of Y the point ( x , y ) is transformed to (x,,yr)given by
<
XI
then the derivative p dy Pr = d x
+ t< + O ( t 2 )
=x
y( = y
+ tr] + 0 ( t 2 )
= y' = d y / d x
transforms as
+ tdr] + o ( t 2 )-- y' + t(qX + ylyy')+ o ( t 2 ) + tdt + O(t2)
1
+ t ( 4 , + 5 , ~ ' )+ O ( t 2 )
and therefore, up to first order in t , with n being Then, instead of the vector field Y, we will consider the extended generator
whose integration will give the extended one parameter group. The differential equation ( 5 ) is in this new space a surface C that is the inverse image of 0 under the function G(x, y , p) = p - F(x, y). An infinitesimal symmetry of the system is a vector field Y EX ( R 2 )such that Y(')is tangent to C, for then G is invariant under Y ( ' ) .The invariance condition Y")G = 0 is
240
JOSE
F. CARIRENA AND MARIANO SANTANDER
equivalent to (K
- YF),, = 0
(24)
and on the left-hand side, the condition of restricting to X means that we must replace K by F(x, y ) and it is a condition involving only x , y and F. Next we will illustrate it with several examples. The invariance condition
+ (qy - t X F - tyF2= 5 F x + l F y (25) may be seen from different viewpoints. Given t and q, it is a partial differential qx
<
equation determining the functions F such that Y = a/Bx + q a / a y is a symmetry of Eq. (5). On the other hand, given a differential equation, the condition is a differential equation in 5 and 1 whose particular solutions will give us one-parameter symmetry groups of the differential equation (4). For instance, fixing the values of and 1 to be as in (13), 5 = yx y q = by, with arbitrary y and p, we will find (14) as invariance conditions from (25). As an example of how this theory works when we try to determine the symmetry group of a given differential equation, we analyse in some detail the case of the equation y’ = a, where a is a fixed real number. In this case the invariance condition (25) reads 1x
+
(uy
- 5x)a - tya2= 0.
Let us assume that we are interested in the case of invariance of each equation in the set of equations y’ = a. Then tY = qx = 0, from which we obtain the explicit form of the vector field Y,
5 = a,
+ k x , g = a2 + k y =
Y
= a,-
a + a,- a + k
ax
dY
The three vector fields, Yl = a/&, Y2 = a / a y and Y3 = x d / d x + y a/ay, generate translations in the x and y directions and dilatations respectively. The Lie algebra structure, given by the following commutation relations,
CY,,Y21 = 0, CY,, Y31 = Y, and CG, Y31 = Y2 shows that { Yl, Y2} generate an Abelian ideal and the symmetry group is a semidirect product group G = T2 0D of two-dimensional translations and dilatations in the plane. This group is known as the “Aristotle group” and is the group of point transformations transforming a straight line into another parallel one. Let us now consider the case of a first-order system of differential equations
_ dy’ - ~ ‘ ( xy’), dx
i , j = 1 , . . .,n
DIMENSIONAL ANALYSIS
24 1
from which higher-order differential equations may be considered as particular instances. Without loss of generality we can restrict ourselves to the case of autonomous systems, where the function F' do not depend on the independent variable x . In fact, given a system (27)
+
we can associate to it an autonomous system as (26) by putting n = m I, y i = z i , for i = 1,..., m , y m + l = t and the functions F' defined by F'(y') = G'(y') for i = 1,. . . , m , and Fmt' ( y j )= 1. From the geometric viewpoint the differential system (26) can be considered as the set of equations locally determining the integral curves of the vector field X E % ( R n + l )
a
.a
X = -+ F'ax dy'
when (26) is written as dx -=I dt dy' - = F'(y') dt
But if we write it as
1
dy' dt
J
- = A(x,y)F'(yj)
it corresponds to the vector field AX. Then, what actually can be associated to the differential system (26) is not a vector field but a family of them or, even better, the set of 1-forms annihilating the vector field X and all its function multiples. This implies that a vector field Y E % ( R n + l )is to be considered as an infinitesimal symmetry of (26) if there exists a function I such that [ Y , X ] = AX.
(31)
If X I denotes the set of 1-forms in R " + l annihilating X , then Y is a symmetry of (26) if and only if L,a E X I , for every c1 E XL. In fact, this follows from the relation 0 = Ly(u,X> = (Lya,X)
+ (cl,LyX).
242
JOSk F. CARIRENA AND MARIAN0 SANTANDER
In particular, X L can be determined by the knowledge of n X-invariant, functionally independent, functions 4i, because if X4i = 0 then d+i G X I . If we know such n invariant functions, we will obtain the general solution of (26) through the implicit function theorem by considering the set of equations #ii(X, y', . . .,y") = ci
and expressing the y's in terms of x and the constants Ci. Then, in order to find the general solution of (26), we must look for a set of n exact forms in R"", which amounts to finding n functionally independent X-invariant functions. We now aim to show how the use of the knowledge of an infinitesimal symmetry of X may be useful for simplifying its solution, so that we start by considering Y to be a strict symmetry of X, so, the function A appearing in (3 1) vanishes identically. Thus, instead of using the coordinates (x, y', . . .,y") we can use a new set of local coordinates (ul, ..., u"") in which Y is just Y = d / W + ' . Consequently, the condition [Y,X] = 0 means that the coordinate functions X i of X with respect to the new set of coordinates do not depend on u n + l and therefore the problem is reduced to a simpler one with one less coordinate plus a quadrature for the last variable. If Y is a symmetry of X in the wider sense of (31), in the new coordinates we were considering, the coordinate functions X i of X are of the form X i = C(U',..., u"+')Ri(ul,..., u"), with C being a common factor, because (31) implies that dX'/du"'' = AXi. As indicated above, only the ratio of the components of the vector field X are relevant for the problem at hand, and these ratios being independent of u"", the problem is also reduced to a simpler one. For a more detailed study, the reader is referred to the paper by Sayegh and Jones (1 986). The problem of symmetries for a higher-order differential equation in normal form may be reduced to that of a differential system because if the equation is y(") = F ( x , y, y('), . ..,y'" - I)),
we can associate it with the system
dY = u(1) dx
(32)
DIMENSIONAL ANALYSIS
243
and the theory of symmetries for differential systems may be extended to such higher-order differential equations. Alternatively, Lie's original approach was based on the "prolongation" of a vector field. So, if Y = ( a / a x + q a / a y is the coordinate expression of a vector field, its n-extension is defined in a way generalizing (23), the final expression being
where q(k)is to be calculated by the recurrence formula
Here d / d x is a symbol for
We remark for a later use that if X = a / a x , then X " ) = a / a x , too, and the same is true for any coordinate The condition for Y-invariance of the differential equation is just that the restriction of Y(")to C, Y("lir,is tangent to C,where I:denotes the hypersurface defined by G-'(O) with G being the function G ( x ,y , @,. . . ,d"))= u(")- F ( x , y, dl),. . .,u("- l)), in complete analogy with the case of first order systems. Let us now suppose that Y is a symmetry of the differential Eq. (32). We are going to show how it is possible to reduce the order of the equation by one. In fact, we only need to take appropriate coordinates (z, w ) in which the variable w is such that Y = a/aw. Then, using the chain rule, we can express the differential equation in terms of the new variable and its derivatives w ' ~= ) dwk/dzk,
G(z,W ,w ( ' ) , .. . ,w'")) = 0.
The vector field Y being a symmetry of the differential equation and the nprolongation Y(")of Y being Y again, we will have as the invariance condition that the function G does not depend on w. It is then well-known that setting u = dw/dz we will get a ( n - 1)-st order equation for u whose solutions will give the general solution of (32). As a particular example we will consider a homogeneous second order linear equation y"
+ p(x)y' + q ( x ) y = 0.
244
JOSfi F. CARIRENA AND MARIAN0 SANTANDER
which is invariant under the infinitesimal symmetry given by Y = y d / a y , i.e., the group of scale transformations (x, y) -,(x, l y ) of which the Y C 2prolonga) tion is given by
and satisfies Y‘2’GI, = 0
with G = u(’) - pu“) - qy. The local coordinate w = logu can be augmented for instance by z = x and in these coordinates Equation (32) becomes dw dz
which is independent of w. Now the change v = dw/dz leads to a Riccati equation. The preceding example is one of the most important cases in Physics because second order differential equations describe the motion of classical one-dimensional systems. Moreover, the motion of classical systems with a finite number of degrees of freedom is described by regular systems of second order differential equations, very often arising as the Euler-Lagrange equations corresponding to a regular Lagrangian function and their symmetries have received very much attention because of the well-known connection between point symmetries of the Lagrangian and constants of motion, which enables us to simplify the problem, via the first Noether Theorem. The reader interested in the subject of the symmetries of these second order differential equations systems can find a good development in the papers by Meinhardt (1981), for point transformation symmetries, and that of Sayegh and Jones (1986),where not only point transformations but contact, or even non-contact, transformations of regular second order differential equations are considered by defining a set of dynamical systems associated to an arbitrary second order differential equation system, and recovering for a particular choice of the dynamical system the symmetry conditions of Meinhardt (1981) and Bluman and Cole (1974). The case in which the second order differential equation comes from a Lagrangian deserves much more attention but the subject cannot be considered here and a possible list of references would be so long that we prefer not to insist on this point; we will only mention that as far as scale changes are concerned, the problem of a “mechanical type” Lagrangian that is the sum of a quadratic in the velocities energy term plus a function U only depending on the position coordinates, a potential energy, was studied by Wolsky (1971),who proved that if U is a homogeneous function of degree d,
DIMENSIONAL ANALYSIS
245
the equations of motion will be invariant under a change t’ = S,t,
Q’ = S,Q
provided that S: = S t - 2 . The homogeneity property of U is a strong condition but keeps room for the motion in a uniform field, the harmonic oscillator, the Kepler problem the free particle and the Coulomb field. The problem of the reduction for partial differential equation works in a similar way, but a mathematical description of them would be well beyond the scope of this review. The interested reader is referred to the original papers by Michal (1951), Morgan (1952) and the excellent books by Bluman and Cole (1974) and Olver (1986), as well as other papers and books included in the references.
VII. APPENDIX A. Group Theory
The concept of a group structure was very related since its first days, the months of May and June 1829 [Ro 821, with that of equivalence relation, the structure of group being suggested by the properties of a set of transformations preserving something. So, transitivity suggests the existence of an internal law, reflexivity the existence of an identity element and symmetry that of an inverse for each element. More accurately: Definition A ] : A group is a pair (G,T) where T is a binary operation (i.e. a function T: G x G + G) such that:
(i) T is associative: (g, T g2)T g 3 = g1 T (gz T g3) for every three elements of G. (ii) There is an element e E G called identity element such that e T g = g for every g E G. (iii) For any g E G, there exists an element g’ such that g’ T g = e. If the binary operation is commutative, the group (G, T) is said to be an Abelian group.
It is easy to see that g T e = g Vg E G is also true and that the element g’ is uniquely defined and also satisfies g T g’ = e. It is usually denoted g-’, but when the group is Abelian the notation -g is much more often used. Definition A2: If (G,T) is a group, a subset H c G is a subgroup of G if (H,T,,) is a group. Here TI, denotes the restriction of T onto H x H.
246
JOSk F. CARIRENA AND MARIAN0 SANTANDER
The composition law is generally denoted by a point or omitted and sometimes with an abuse of language we will also say that G is a group, with no mention given to the composition law. Given a subgroup H of the group G, it has associated two different equivalence relations glgLg2
if
91-’g2 E H
gIgRg2
if
g1g2-lEH
The equivalence classes are the left cosets gH = {gh 1 h E H} and the right cosets Hg = {hg I h E H} respectively. The first relation is compatible with left multiplication while the second one is so with respect to right multiplication.
* *
glBLg2 glgRg2
991%992 glggIZgZg
Definition A3: A subgroup H of G such that both classifications coincide (i.e. the left coset gH is the same as the right coset Hg for any g E G) will be called a normal subgroup. Definition A4: Let (G,T) and (G’,0) be groups. A map cp: G + G’ is a morphism if q(gl T g2) = v ( g l ) cp(g2). A bijective morphism is called isomorphism. It can be checked that if 4 is a morphism cp(e) is the identity element of G’ and q(g-’) is the inverse of cp(g). Furthermore, the kernel of cp, defined as being the reciprocal image of the identity e’ of G’, Ker cp = cp-’(e’),is a normal subgroup. It reduces to one element, Ker cp = (el, if and only if the morphism cp is injective. We may now go back to the original idea of groups whose elements are permutations or group actions. 0
Definition AS: Let G be a group and M a set. A left action of G on M is a map 4: G x M + M such that
0) 4 k m ) = m (4 4bl9 4 9 2 4)= 4b192 m) 9
9
The property (i) may be replaced by: (i‘) 4 is onto. We also often say that 4 is a transformation group for M and the action 4 is to be understood. A right action is similarly defined by an onto map 4: M x G + M with 4 4 4 h g A 9 92) = 4(mg,92). The action 4 of G on M permits the definition of the marginal maps: for any g E G, dg:M -+ M, is given by &(m) = 4(g,rn) and if m E M, 4,,,: G -+ G is
DIMENSIONAL ANALYSIS
247
defined by #,(g) = #(g,m). The symbol # for the action may be deleted and then the notation g m for 4 (9,m)will be used. Definition A6: Given a left action of G on M , the orbit of an element m E M is the set Gm = {#(g,m),Vg E G ) = #,(G).The stability group or isotopy group of m is defined by G, = $:'(m) = {g E G I +(g, m) = m} The action of G on M gives an equivalence relation on M: two points m and m' are equivalent if there exists an element g E G such that m' = gm. The set M is then partitioned in classes each one being an orbit of any of its elements. We also recall that a map F: X -,Y gives an equivalence relation Se, in X by: x 1 9 F x 2if F ( x , ) = F(x,). The map F may be then factorized as a product i 0 E q in which q: X X / g Fis the canonical projection and @ is a bijection given by &q(x)) = F(x). Finally i denotes the map i: F ( X ) + Y given by i(x) = x . When we consider the case of the map #,,: G M for any arbitrary but fixed element m E M , we will find a bijection 4,,,of the set of left cosets, G/G,, onto the orbit Gm because the +,-equivalence in G is just the left equivalence associated to the subgroup G,,,. Therefore, the orbit G m of m E M can be seen as the set of left cosets, GIG,,,. 0
-+
-+
Dejnition A7: The action 4 of G on M is called effective if for each g E G different from e E G, there will exist a m E M such that #(g, m) # m. It is said to be transitive if there is just one orbit, i.e. given two points m and m' there exists g E G such that b(9.m) = m'. In this last case, it is also said that M is a homogeneous space for G. Finally the action is called free it for any g E G different from e, $(g, rn) # m for every m E M. It is worth remarking that &is a permutation of M and moreover the map associating 4g with g is a morphism which is injective when the action is effective: the subset {4g1 g E G} will then be a subgroup of the group of the permutations of M that is isomorphic to G. If H is a subgroup of G, H acts freely on the right on G by right multiplication, the orbits being the left cosets g H , similarly with the interchange left-ring. The group G acts also on itself by conjugation, by means of #(gl,g2)= glgtg;l, the action playing a fundamental role in the theory of linear representations of G.
B. Differentiable Manifolds Generalized coordinates have been used in Mechanics for a long while for a simpler description of a mechanical system or for incorporating different holonomic constraints. The mathematical structure enabling us to use such
248
JOSE F. CARIRENA AND MARIANO SANTANDER
generalized coordinates is that of a differentiable manifold. Let M be a topological space which will be assumed to be a second-countable Hausdorff space. Definition B1: An m-dimensional chart is a pair (U, cp) where U is an open set of M, cp(U) is an open set of R" and cp: U -+ cp(U)is a homeomorphism. Definition 82: Two m-dimensional charts (U,cp) and (V, I,) are Cm-compatible if either U n V is empty or the maps I, 0 cp-': cp(U n V ) -+ I,(U n V) and cp 0 I,-': I,(U n V ) --t cp(U n V) are diffeomorphisms of the open sets cp(U n V) and I,(U n V ) .
I
Definition B3: An atlas A on M is a family {(U,,cp,) a E r}of charts such that { U, 1 ct E r}covers M and every pair of charts is C"-compatible.
A chart that is C"-compatible with those of an atlas may be added to it and a larger atlas is then obtained. The procedure is then repeated, if necessary, again and again, until a maximal atlas has been obtained. Definition 84: A differentiable structure on M is a maximal atlas of mdimensional charts. The number m is called the dimension of M. Examples: Interesting examples of differentiable manifold structures are the open manifolds and the product manifolds: if S is an open subset of M , an atlas A = {(U,, cp,) I a E T} of M gives an atlas on S by considering the open sets V , = U, n S and the restrictions tj, = qalVD. On. the other hand if A = {(U,, cp,) 1 a E r>and A' = {(V,,I,J I a E A } are atlas on M and N respectively, the family {(U, x V,,cpa x I,J I(ct,a)E r x A} gives an atlas on M x N and the corresponding maximal 00 -compatible atlas will define a differentiable structure on M x N called the manifold product of (M, A) and (N, A'). The charts enable us to locally identify points on M with the corresponding ones in R" and similarly for maps between manifolds. The definitions concerning differentiability of such functions is introduced by means of the corresponding concepts for the associated R"-valued functions. Definition B5: Let M and N be differentiable manifolds. A map F: M + N is differentiable at the point p E M if there are local charts (U, cp) of p and (V,I,)of F(p) with cp(U)l Vsuch that I, F 0 cp-': cp(U)+ $ ( V ) is differentiable. The map F is said to be differentiable if it is so for any point p. This definition does not depend on the choice of the local charts because of the C"-compatibility 0
249
DIMENSIONAL ANALYSIS
condition for charts. Of course, differentiability implies continuity as for the corresponding functions F : R" + R". A property worthy of note is that if F : M N is differentiable at p E M and G : N -+ P is differentiable at F ( p ) , then the composed map G 0 F is differentiable too at p E M . The set of functions f: M -+ R that are differentiable at p will be denoted C"(p) and it can be endowed with an algebra structure with the usual composition laws. -+
Definition B6: A bijective differentiable map F : M + N for which the inverse map F - ' : N -+ M is differentiable too will be called a diffeomorphism. A particular example of a diffeomorphism is that of the map cp of a chart (U,cp) of M , when the structure of an open submanifold is considered on both U and cp(V). Another remarkable concept is that of a vector. The traditional concept of a vector in a point of R" cannot be generalized for a differentiable manifold but there are other equivalent concepts in R" which admit such a generalization. The main idea is to replace the traditional concept of a vector u at a point p of R" by that of an equivalence class of curves x: 1, -+ R" (where 1, = ( - E, E)), two equivalent curves being those such that x(0) = p and i ( 0 ) = u. This equivalence class gives a map up: C"(p) -+ R (where Cm(p)denotes the set of C"differentiable functions in a neighborhood of p ) as follows:
In particular, we may choose as a representative of the class the straight line x ( t ) = p + to. It can be easily checked that such map is linear and satisfy the "Leibniz" condition u,(fg) = f ( p ) v , g + g(p)u,f. Conversely, given a linear map X , of Cm(p)in R satisfying the Leibniz condition it is possible to find a curve x whose vector linear map is just X,. The preceding argument leads to the setting up of the following definition. Definition B7: A vector in a point p of a differentiable manifold M is a linear map X,: C"O(p)-+ R satisfying the Leibniz condition
XpCfg) = f ( P ) X , S
+ g(p)Xpf.
An example of such a vector is given by the choice of a curve y: 1, that y(0) = p . Then dyldtl, is the vector
-+
M such
250
JOSE F. CARIRENA AND MARIAN0 SANTANDER
which can also be presented by choosing a local chart (V,4) at p E M as follows:
where u is the tangent vector at cp(p) to the curve cp 0 y. Then, curves such that their images under the map of a local chart are equivalent are also to be considered as equivalent, this property being independent of a particular choice of the local chart. The concept of differential of a map F : M + N is a generalization of the corresponding concept for a map of R" in R". The relation of the differential with the derivative of F at the point p along a vector u, d/dt { F( p tu)},,= = (DF),,(u),suggests that the differential may be considered as a mapping of the vectors at p (the "curve" p to) into the vectors as F ( p )(the curve F ( p tu)). This is actually the meaning of the differential introduced as follows.
+
+
+
Dejinition 88: If F : M -+ N is a differentiable map, the differential of F at p , to be denoted F,,, is the linear map FeP:T,,(M)-+ T,,,,,N defined by
F,,(XP)f = X P ( f o F)Vf E C m F ( p ) . With this definition it is easy to check that the chain rule, (G 0 F)*,, = G*F(p) 0 F.,,, holds and therefore, if F is a diffeomorphism F,,, is regular. Consequently, a chart (V,cp) in p E M defines a basis of T p ( M )by {cp~:(,@~
Jq(,,));
i = 1,. ..m } .
On the other hand, a curve y: I, = ( - E , E ) -+ M gives a map y.,: '&I, -+ T,,,,M, and the vector y,,(d/dsI,=,) is called the vector tangent to y at the point y ( t ) and generally denoted dyldt. This permits to see more clearly the geometric meaning of the differential according to the comment preceding Definition B8. In fact, let X,, be a vector in TPM and y a curve such that y(0) = p o and d y / d t = X,,. Then, the curve F y has the tangent vector at F ( p ) given by 0
wheref E C"(F(p)).This is just the vector F,p(X,,),according to Definition B8. Dejinition B9: A differentiable vector field X on a manifold is a map X : M -+ T ( M ) = u , , T p ( M ) , such that X p E T ( M ) and for every f E P ( M ) the function X f , given by ( X f ) ( p )= X p f , is C"-differentiable. In a local chart, X p is written X,, = u ( p ) ailp,where ailpis used as a shorthand for
cp*G:dai Iv(p))*
25 1
DIMENSIONAL ANALYSIS
Given a vector field X on M and a differentiable map F : M + N , if there is a vector field Y in N such that YF(p)= F,p(Xp),the vector fields X and Y are said to be F-related. In particular if F is a diffeomorphism, there exists a uniquely defined vector field on N that is F-related with M , which is usually denoted F,(X), whose definition is just [F.(X)IF(,,)= KP(Xp). Definition BIO: The curve y: I --* M is an integral curve of the vector field X in M if dy/dsl,=, = Xy(t)for any t E I . In particular if F : M + N is a diffeomorphism, the curve y is an integral curve of X if and only if F 0 y is an integral curve of F,(X).It is possible to show that given a vector field X , for any p E M there exists an open neighborhood V of p such that for any q E V, there is a maximal integral curve ys of X with domain I(q) starting from q. Moreover, the map QX: I(q) x V + M , given by mx(t,q) = y,(t) is differentiable and if s, t and s + t are in the domain of p E M and t E I(y,(s)) then @At,@As, PI) = @x(t
+ S? PI.
The map QX is called the flow of X and of a particular interest is the case of complete vector fields for which I ( p ) = ( - 03,oo) for every p E M . These last fields define an action of R on M by means of the flow O X.
C. Lie Algebras and Lie Groups
4) in which d is a vector Definition C I : By an algebra we mean a pair (d, space and 4: d x d -+ d is a bilinear map. Sometimes, the notation a * b is used instead of $(a, b).The algebra is said to be commutative if $ is symmetric, $(a, b) = $(b, a), and associative if $(a, $(b, c)) = $($(a, b),c), which with the *-notation becomes a * (b * c) = (a * b) * c. The dimension of the algebra is that of the vector space d. Dejnition C2: A Lie algebra (2, [ , I ) is an algebra such that:
+
(i) [ ,] is skew-symmetric, i.e. [a, b] [b,a] = 0, (ii) the Jacobi identity, [a, [b,c ] ] [b, [c, a ] ] [c, [a, b ] ] = 0, holds.
+
+
Examples CI: (i) As a first example of an associative algebra we can consider the set Cm(M)of differentiable functions in a differentiable manifold M with the usual product of functions. (ii) The set End V of the endomorphisms of a vector space V is endowed with an associative algebra structure when the usual composition law is considered.
252
JOSE F. CARIfiENA AND MARIAN0 SANTANDER
(iii) End I/ can also be endowed with a Lie algebra structure by means of the following bilinear map: [f,g] = f g - g f . The corresponding Lie algebra structure is usually denoted gI(V). 0
0
Dejinition C3: A morphism of the algebra (d, *) in the algebra (9,~) is a such that 4(a * b) = 4 ( a )4(b). ~ If 4 is a bijective map, linear map 4: d + 9, *) coincides with (B,T), the then 4 is said to be an isomorphism. When (d, morphisms and isomorphisms are called endomorphisms and isomorphisms, respectively. Definition C4: A derivation of an algebra (at,*) is a linear map D : d -+ d such that D(a * b) = (Da) * b a * (Db). For instance, if X is a vector field in a differentiable manifold M , we can associate with it a derivation (also denoted X ) of the associative algebra (Cm(M),.) by means of ( X f ) ( m )= X,f, because of the Leibniz rule for vectors in a point. Conversely, given a derivation of the algebra (Cm(M),.), there exists a uniquely defined vector field such that the derivation associate with it coincides with the given derivation. In other words, the vector fields in M can be identified to the derivations of the associative algebra (Cm(M),.). This fact may be used to see that the set X ( M ) endowed with the commutator of the derivations as the inner composition law is a Lie algebra. Moreover, if F : M -+ N is a diffeomorphism, then F,%(M) + X(N)is a morphism of Lie algebra structures.
+
Definition C.5: A Lie group is a group endowed with a differentiable structure such that the following maps,
-
6: G x G, 0(91,92) = 91 g2
l/b: G + G, $(g) = 9-1
are differentiable. For each g E G, L,: G + G and R,: G + G will denote the left and right translations respectively, defined by L,(g‘) = g g’, R,(g’) = 9 ’ . g. Such maps are diffeomorphisms. A particularly interesting example of a Lie group is the group GL(R,n) of the invertible square n x n real matrices. Its dimension is n’. Definition C6: A Lie subgroup H of a Lie group G is a subgroup of G that is a submanifold too and a Lie group with respect to its differentiable structure. A very useful mechanism for defining Lie groups is the following: let G be a Lie group and assume that F : G + M is a differentiable map of constant rank. If m E M is such that F-’(m) is a group, then it is endowed with a regular
253
DIMENSIONAL ANALYSIS
manifold structure and is a Lie subgroup. As an instance, if F : GL(R,n) -+ R is given by F ( A ) = det A , then F-'(l) is the Lie subgroup denoted SL(R, n). In a similar way, if F : GL(R,n) -+ GL(R,n) is given by F ( A ) = A'A, then F-'(l) is the Lie subgroup O ( R ,n), called the orthogonal group. Definition C7: A homomorphism of Lie groups is a homomorphism F : GI + G2 that is a differentiable map too. The rank of F is constant and therefore its kernel is a Lie subgroup. Definition C8: Let G be a Lie group. A vector field X E 9 ( G )is said to be leftinvariant (respectively right-invariant) if L,.X = X , (resp. R,,X = X ) Vg E G , i.e. X,. . g = Lg.,,Xg(resp. X,.., = Rg.Xg,). Dqfinirion C9: The set of left-invariant vector fields on G is a finitedimensional Lie subalgebra of Y(G),to be denoted 93, called the Lie algebra of G. Its dimension coincides with that of G. A left-invariant vector field in G is determined by its value in the identity element e E G and the Lie algebra structure of 93 may be translated to an isomorphic Lie structure on T,G. Given a basis of '3, the m3 real numbers cijk determined by
[xi,xj] = C",k are called the structure constants of 9 with respect to the basis {Xi) ( i = 1,. . . ,m) and satisfy the relations Ck.
v
m s= 1
(c:,c;,
+ Ck. = 0
+ ciScii + c:&)
J1
(i, j , k, t
=0
=
1,.
. .,m)
Example C2: A differentiable structure is given in G L ( R ,n) with an atlas of just one chart, the map cp being q ( A ) = (a1 1 , . . .
a21,.
.., a 2 n , .
a .
,an1+...Tann).
and the coordinate functions will therefore be x i j ( A )= aii. The left-invariant vector field Xij determined by its value Xij, = $ / a x i j l , is shown to be given by
and with respect to this basis of gI(R, n) the defining relations turn out to be Lxij7
x k f ] = hjkxi,
- hilXjk.
254
JOSe F. CARIRENA A N D MARIAN0 SANTANDER
This shows that this Lie algebra is isomorphic to the given in the set Mn(R) of real square n x n matrices by means of the commutator. In fact, a basis of M n ( R )is made-up by matrices E , with the only nonvanishing entries being a one in the site (i,j), i.e. [Eijlkf= Bikdjf.For such matrices is an easy task to compute the relation [Eij, Ekf] = BjkEif - hilEjk,the correspondence n
A
-+
X, =
ai,xrii,j,r= 1
a
ax,j
being the above mentioned isomorphism. Dejnition CIO: We shall call one-parameter Lie subgroup of a Lie group G a homomorphism of Lie groups a: (R, +) -+ (G,.). Let us remark that a(0) = e and the value of (da/dt)l,=, defines a leftinvariant vector field X. The point is that X is complete and the value of X at a point a(t)is the tangent vector of the curve a(t) at this point. The vector field so defined is called the infinitesimal generator of the subgroup. The converse is also true: every X E '3 is complete and its integral curve starting at e E G defines a one-parameter subgroup of G. The point of such curve reached for the value t = 1 is an element of G usually denoted expX. The reason for this is that exp(t, t2)X = exp t,X expt2X and that when computed for the group GL(R,n) the matrix eA is expX,, where X, is the element of gI corresponding to A in the aforementioned isomorphism. The Lie algebra '3, being a finite-dimensional vector space, can be endowed in a natural way with a differentiable structure. The point now'is that it can be shown that the exp map, exp: '3 -+ G is differentiable, and moreover, that there will be neighborhoods 42 of X = 0 in '3 and Y of e in G such that the restriction expl, is a diffeomorphism of 42 in K For every isomorphism q : '3 -+ R", the inverse map of exp ,I log: Y % enables us to define a local chart ( K q log) at e E G. The corresponding coordinates are called first kind canonical coordinates. In particular, when the isomorphism q is given by the choice of a basis in '3, the coordinates so obtained are called first kind canonical coordinates with respect to the basis. For instance, if {Xi}(i = 1,..., rn) is a basis of '3 and (u', ..., urn)are the first kind canonical coordinates of g, then g = exp{u'X, + ... + u'"Xm).With these coordinates, the parameter space for a one-parameter subgroup is a straight line. It is also possible to show that there exist a neighborhood Y of e E G such that its elements can be written in a unique way as a product exp u'X, exp u2X2 * exp u"X,,,. The numbers ( u l , . . .,urn) are then called second kind canonical coordinates. Another remarkable property is the following: if F: G -+ G' is a homomorphism of Lie groups (not necessarily diffeomorphism) there exists a map
+
--f
0
DIMENSIONAL ANALYSIS
255
F.: Y + Y’that is a homomorphism of Lie algebras and satisfy exp 0 F. = F 0 exp. The map F, is defined as follows: F.(X) is the left-invariant vector field in G‘ whose value in e‘ E G’ is F.e(Xe). Since the property exp o F, = F o exp we can also see that if F and F‘ are Lie group homomorphisms of G into G’ such that F, = F’., then F and F’ coincide on the connected component of the identity e of i;. Moreover, if H is a Lie subgroup of G,and i denotes the canonical injection i: H + G , then i, is an injective map i,:&‘ + Q picking out a subalgebra of 9 that corresponds to H , If M is a differentiable manifold and G a Lie group, by a differentiable leftaction of G on M we will mean that the map @: G x M + M defining the action of G on M is differentiable when in G x M is considered the product differentiable structure. Given such an action, for every element a E Q we can define the fundamental vector field X , E S ( M )by means of
i.e. X , is the vector field defined by X,(m) = -a), where a,,,: G + M is the differentiable map @,,,(g) = @(g,m). The minus sign in the above expression is conventional and was just added for making the map X : 9 + T ( M ) , X : a + X,, to be a Lie algebra homomorphism, i.e. [X,,Xb] = X[,.b], the kernel of such a homomorphism being the subalgebra associate to the ineffectiveness subgroup. Now, if H is a Lie subgroup of a Lie group G , the set G / H of the leftcosets admits a uniquely defined differentiable structure such that the natural projection is differentiable and admits local sections. Moreover, if G acts transitively on the differentiable manifold M (or in other words M is a homogeneous space for G) and H is the isotopy subgroup of an arbitrary but fixed point rn, E M, there will exist a diffeomorphism B: G / H + M equivariant with respect to the actions of G on M and the natural left-action of G on G / M by left translations.
REFERENCES Aldersley, S. J. (1 977). “Dimensional Analysis in relativistic gravitational theories”, Phys. Reo. D 15, 370. Asorey, M., Carifiena J. F. and del Olmo, M. A. (1983). “Vector bundle representations in Quantum Physics”, J . Phys. A 16, 1603. Bacry, H. and Levy-Leblond, J. M. (1968). “Possible kinematics”, J. Math. Phys. 9, 1605. Barenblatt, G . I. (1981). “Selfsimilarity: Dimensional Analysis and Intermediate Asymptotics”, J . Appl. Math. 44, 267. Barenblatt, G . I. and Zeldovich, Ya. B. (1972). “Selfsimilar Solutions as Intermediate Asymptotics”, in Annual Rev. Fluid. Mech., p. 285, Annual Rev. Inc., Palo Alto.
256
JOSE F. CARIRENA AND MARIAN0 SANTANDER
BirkhofT, G. (1950).Hydrodynamics: A study in logic, fact and similitude, Princeton U. P. Bluman, G. W. and Cole, J. D. (1974). Similarity Methods for Diflerential Equations, Springer, Berlin. Boer, J. de (1979).“Group properties of quantities and units”, Am. J . Phys. 47, 818. Bourbaki, N. (1945). Topologie yenerale, IV, p. 214; V p. 12; Hermann, Paris. Boyer, C. P. (1976).“Symmetries and Exterior Differential Forms” in Proc. of the Int. Symposium on Math. Physics, Mexico D.F. Brand, L. (1957).“The Pi-Theorem of Dimensional Analysis”, Archiu. Rat. Mech. Anal. 1.35. Bridgmann, P. W. (1922, 1931, 1932). Dimensional Analysis, 1” Ed, Yale U. P. (1932 ed. New Haven, Yale U. P.; Rev. ed. Dover, 1952). Bridgman, P. W. and Sedov, L. I. (19.57). Dimensional Analysis, Encyclopaedia Britannica, Macropaedia, 14,422. Buckingham, E. (1914). “On physically similar systems; illustrations of the use of dimensional equations”, Phys. Rev. 4, 345. Bunge, M. (1971).“A Mathematical theory of dimensions and units of physical quantities”. In Problems in the Foundations of Physics, p. 16, Springer, Berlin. Carifiena, J. F., del Olmo, M. and Santander, M. (1981). “Kinematic groups and dimensional analysis”, J . Phys. A 14, 1. Cariiiena, J. F.,del Olmo, M. and Santander, M. (1985).”A new look at dimensional analysis from a group theoretical viewpoint”, J . Phys. A 18, 1855. Carlson, D. E. (1978).“On some new results in Dimensional Analysis”, Archiu. Rat. Mech Anal. 68, 191. Casimir, H. B. G. (1968).”Helvetica Phys. Acta”, 41, 741; reprinted in A Random Walk in Science, eds. R. L. Weber and E. Mendoza, London, 1973. Causey, R. L. (1969). “Derived measurement, Dimensions and Dimensional Analysis”, Phil. Sci. 36,252. Cohen E. R., Crowe, K. M. and Dumond, J. W. (1957). The fundamental Constants of Physics, New York. Corrsin, S. (1951).“A simple proof of Buckingham’s n-theorem”, Am. J . Phys. 19, 180. Curtis W. D., Logan, J. D. and Parker, W. A. (1982).“Dimensional Analysis and the Pi-Theorem”, Linear Aly. and Its Appl. 47, I 17. Drobot, S. (1953).“On the Foundations of Dimensional Analysis”, Studia Math. 14,84. Duncan, W. J. (1955). Physical Similarity and Dimensional Analysis, E. Arnold & Co, London. Einstein, A. (1911). Ann. Phys. Leipzig 35, 686. Evans, J. H. (1972). “Dimensional Analysis and the Buckingham II-Theorem”, Am. J . Phys. 40, 1815. Fleichsmann, R. (1951).Z . f.Phys. 129,377. Fleichsmann, R. (1954). Naturwiss. 41, 131; Z . f. Phys. 138, 301. Focken, C. M. (19.53).Dimensional Methods and their Applications, E. Arnold, London. Gibbings, J. C. (1980).“On Dimensional Analysis”, J . Phys. A 13, 75. Gibbings, J. C. (1982).“A logic of Dimensional Analysis”, J . Phys. A 15, 1991. Guissard, A. (1972).“Electrical Units and Electromagnetic field vectors”, IEEE Trans. on Educ. E 15,41. Hainzl, J. (1971). “On local generalizations of the n-Theorem of Dimensional Analysis”, J . Franklin Inst. 292,463. Hamermesh, M. (1983).“The symmetry group of a Differential Equation” in Group Theoretical Methods in Physics, Lecture Notes in Phys., p. 201, G. Denardo et al. eds. Hansen, A. G. (1964). Similarity Analysis of Boundary Value Problems in Enyineering, Prentice Hall. Helgason, S . (1977). “Invariant differential equations on homogeneous manifolds”, Bull. A.M.S. 83, 751.
DIMENSIONAL ANALYSIS
257
Houard, J. C. (1981). “Sur la description intrinseque des grandeurs dimensionelles”, Ann., fnst. H . PoincarP XXXV, 225. H u h , M. (1980). “Dimensional Analysis: some suggestions for the modification and generalization of its use in Physics teaching”, Eur. J . Phys. 1.48. Huntley, H. E. (1952). Dimensional Analysis, Dover, New York. Inonu, E. and Wigner, E. P. (1953). “On the contraction of groups and their representations”, Proc. Nat. Acad. Sci 39, 510. Ipsen, D. C. (1960). Units, Dimensions and Dimensionless Numbers, McGraw Hill, New York. Jackson, J. D. (1975). Classical Electrodynamics, John Wiley, New York. Kline, S. J. (1965). Similitude and Approximation Theory, McGraw Hill, New York. Krantz, D. H., Luce, R. D., Suppes, P. and Tversky, A. (1971). Foundations of Measurements, I, Academic Press. Kurth, R. (1965). “A Note on Dimensional Analysis”. Am. Math. Monthly 72,965. Kurth, R. (1972). Dimensional Analysis and group theory in astrophysics, Pergamonn Press. Landolt, M. (1952).Grosse, Masszahl und Einheit (Rasher Verlag, Zurich). Langhaar, H. L. (1951).“Dimensional Analysis and the theory of models”, J. Wiley, New York. Leroy, 9. ( 1984).“Conversion of electromagnetic quantities from M.K.S.A. to Gaussian units (and vice versa) using dimensional analysis”, Am. J . Phys. 52, 230. Levy-Leblond, J. M. (1969). “Group theoretical foundations of the classical mechanics: The Lagrangian gauge problem”, Comm. Math. Phys. 12,64. Levy-Leblond, J. M. (1972). “Galilei Group and Galilei Invariance” in Group Theory and Its Applications, 2, ed. E. M. Loebl, Academic Press, New York. Levy-Leblond, J. M. (1977). “On the conceptual nature of the Physical constants”, Riu. Nuovo Cim. 7, 187. Levy-Leblond, J. M. (1980). “Speed(s)” Am. J . Phys. 48, 345. Ma. S. K . (1973). Introduction to Renormalization Group, Rev. Mod. Phys. 45,589. Ma, S . K. (1976). Modern Theory of Critical Phenomena, Benjamin, London. Macagno, E. 0. (1971). “Historic0 critical review of Dimensional Analysis”, J . Franklin fnst. 292, 39 1. Manin, Yu. 1. (1981). Mathematics and Physics, Progress in Physics, 3, Birkhauser, Boston. Martinez Alonso, L. (1977).“Group theoretical foundations of classical and quantum mechanics. 1. Observables associated with Lie algebras”, J . Math. Phys. 18, 1577. Martins, R. D. A. (1981).“The origin of Dimensional Analysis”, J . Franklin fnst. 311, 331. Massey, 9. S. (1971). Units, dimensional analysis and physical similarity, Van Nostrand Reinhold, London. Meinhardt, J. (R). (1981). “Symmetries and Differential Equations”, J . Phys. A 14, 1893. Michal, A. D. (1951).“Invariant differential forms in several group variables as solutions of partial differential equations” in Frechet DifJerentials. Proc. N.A.S. 37, 766. Morgan A. J. A. (1952).“The reduction by one of the number of independent variables in some systems of partial differential equations”, Quart. J . Math. Oxford 3, 250. Murphy, G . (1950).Similitude in Engineering, Ronald Press, New York. Newton, 1. (1 686). Philosophiae Naturalis Principia Mathematica, U. California Press, 1962. Olver, P. J. (1979). “How to find the Symmetry Group of a Differential Equation” in Appendix to D. H. Sattinger, Group Theoretical Methods in EiJurcation Theory, Lecture Notes in Maths. No. 762, Springer, Berlin. Olver. P. J. (1986). Application of Lie groups to Diflerential Equations, Springer, Berlin. Ovsiannikov, L. V. (1982). Group Analysis of Dijferential Equations, Academic Press, New York. Palacios, J. (1964). Dimensional Analysis, Macmillan, London. Pankhurst. R. C. (1964). Dimensional Analysis and Scale Factors, Chapman Hall. Petley, 9. W. (1985). The Fundamental Constants and the Frontieraf Measurement, Adam Hilger. Bristol.
258
JOSk F. CARIRENA AND MARIAN0 SANTANDER
Quade, W. (1961). Abh. Braunsch. Wiss. Gess. 13,24. Rayleigh, (1915). Nature 96,66 and 644. Remillard, W. J. (1983). “Applying Dimensional Analysis”, Am. J . Phys. 51, 137. Riabouchinski, (1915). Nature 96,591. Rothman, T. (1982). “The short life of Evariste Galois”, Sci. Am. 246, 112. Saint Guilhem, R. (1971). Lesprincipes generaux de la simillitudephysique, Gauthier Villars, Paris. San Juan, R. (1947). Teoria de las magnitudes fisicas y de sus fundamentos algebraicos, C. Bermejo, Madrid. Also published in Reoista de la Real Academia de Ciencias 39, Madrid, 1947. Sayegh, S. I. and Jones, G. L. (1986). ‘Symmetries of Differential Equations”, J . Phys. A 19,1793. Sedov, L. I.(1959). Similarity and Dimensional Methods in Mechanics, Academic Press, New York. Stille, U. (1961). Messen und Rechnen in der Physik, Vieweg, Braunschweig. Stanley, H. E. (1971). Introduction to Phase Transitions and Critical Phenomena, Clarendon, Oxford. Stevenson, P. M. (1981). Dimensional Analysis in Field Theory, Ann. Phys. 132, 383. Supplee, J. M. (1985). “Systems of equations versus extended reference sets in dimensional analysis”, Am. 1.Phys. 53, 549. Szekeres, P. (1978). ‘The mathematical foundations of Dimensional Analysis and the question of fundamental units”, Int. J. Theor. Phys. 17, 957. Taylor, G. I. (1946). “The air wave surrounding an expanding sphere”, Proc. Roy. SOC.London A, 273.
Taylor, E. F. and Wheeler, J. A. (1966). Space-time Physics, Freeman, San Francisco. Vaschy, A. (1892). Ann. Telegraphiques 19,25 and 180. Wigner, E. P. (1967). Symmetries and rejections, Indiana University Press, Bloomigton. Withney, H. (1968). “The Mathematics of Physical Quantities”, Am. Math. Monthly 75, 115 and 227.
Wolsky, A. M. (1971). “The scales of length and Time in Classical and Modern Physics”, Am. J . Phys. 39, 529. Zassenhaus, H. (1954). “What is an angle?”, Am. Math. Monthly 61, 369.
ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS, VOL 12
Lattice Quantization JERRY D. GIBSON Department of Electrical Engineering Texas A&M University College Station, TX
and KHALID SAYOOD Department of Electrical Engineering University of Nebraska- Lincoln Lincoln, N E
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11. Scalar Quantization . . . . . . . . . . . . . . . . . . . . . . 111. Definitions and Motivation for Optimal Vector Quantization . . . . . . . IV. Motivation for Lattice Quantization. . . . . . . . . . . . . . . .
Lattices. . . . . . . . . . . . . . . . . Lattice Quantizer Design. . . . . . . . . . . Fast Quantization Algorithms . . . . . . . . Performance Comparisons . . . . . . . . . . IX. Research Areas and Connections to Other Fields . X. Conclusions . . . . . . . . . . . . . . . Acknowledgment . . . . . . . . . . . . . Notes.. . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . .
V. Vl. VII. VIII.
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. .
. . . . . . . . . . .
259 262 265 270 275 296 304 316 325 326 327 327 328
I. INTRODUCTION We are all familiar with the process of analog-to-digital (A-to-D) conversion, whereby continuous-time, continuous-amplitude signals are converted into a sequence of binary words suitable for storage or for manipulation in digital form. The process of A-to-D conversion consists of three distinct operations: sampling, quantization, and coding. When we sample a signal, we represent the continuous-time signal by a set of sample values taken at discrete time instants, and, as long as these samples are taken at a uniform rate greater than the Nyquist rate, there is no appreciable loss in fidelity as a result of this sampling operation. Quantization generates a 259 Copynght 0 1988 by Academic Press,Inc. All nghts of reproduction reSeNed ISBN 0-12-014672-X
260
JERRY D. GIBSON AND KHALID SAYOOD
discrete-amplitude representation of the continuous-amplitude sample values, but unlike sampling, quantization produces a non-recoverable loss in fidelity. The combined operations of sampling and quantization generate a sequence of discrete-time, discrete-amplitude values, and this sequence is changed into digital form by coding, which assigns a distinct binary word to each allowable discrete-amplitude level at the quantizer output. A standard A-to-D converter is an example of a uniform, scalar quantizer, and if the fidelity of the digitized output is not adequate, we simply select another A-to-D converter with more bits (quantization levels). When we do this, we are increasing the number of bits/sample required to store the digitized sequence or to transmit the sequence over a communications link. For two common and important sources, speech and images, the number of bits/sample (called the rate and denoted by R ) required by straightforward A-to-D conversion to achieve acceptable fidelity can be excessive for many applications. As a result, a research area called data compression has emerged, which has as its goal the representation of a source sample with as few bits as possible while still maintaining adequate fidelity for the particular application at hand. Familiar examples of data compression systems are delta modulation (DM), logarithmic-pulse code modulation (log-PCM), and differential PCM (DPCM). See Jayant and No11 (1984) for details concerning these systems. Vector quantization is a relatively new data compression technique which has been and continues to be the subject of intense research interest and which also is beginning to find applications in practical systems. The fundamentally different characteristic of vector quantization, as opposed to scalar quantization where each scalar sample is quantized individually, is that a block or vector of scalar quantities is formed and this vector is quantized as a single entity. The reader can perhaps imagine that this approach could prove useful if the scalar components of the vector are dependent or correlated, but it may be surprising to note that quantizing and coding of blocks (or vectors) always yields better theoretical performance than scalar quantization, even if the vector components (the scalars) are uncorrelated or independent. This last fact is a result from a branch of information theory called rate distortion theory, originally delineated by Claude Shannon (1948; 1959). Research in vector quantization has been pursued vigorously only within the last ten years primarily because of the following three reasons: (1) Although Shannon’s results provide bounds on the performance of optimal data compression systems, they do not provide guidance as to how vector quantizers might be designed; (2)early rate distortion theory results were primarily concerned with Gaussian sources, and optimal block coding of these sources only offers an asymptotic gain of about 0.25 bit/sample over scalar quantization followed by optimal noiseless coding, which did not provide sufficient motivation for further investigations by researchers; and (3) block coding or vector quanti-
LATTICE QUANTIZATION
26 1
zation/coding requires operations in multidimensional space, which is not only mathematically more difficult than scalar quantization, but it also implies substantially increased complexity over a scalar approach. Recent research has uncovered various vector quantizer design techniques, all of which are based upon one of two approaches, the iterative design procedure often called the Linde, Buzo Gray (LBG) algorithm (Linde, Buzo and Gray, 1980)or the specification of uniform quantizers by using lattices. The former approach generates a locally optimal vector quantizer design, but the quantization/encoding problem may be formidable. The lattice-based approach can greatly simplify the quantization operation, but the resulting quantizers are only optimal for uniformly distributed sources or asymptotically optimal as the number of output points becomes large. Nevertheless, experimental results using the various vector quantizer designs have indicated that substantial performance improvements are available with vector quantizers, and these results, coupled with further rate distortion theoretic results and studies of the asymptotic performance of vector quantizers have combined to intensify research and development efforts concerning vector quantization. The present chapter attempts to introduce the concept of vector quantization and to describe how lattices can be used to advantage in the vector quantization process. Several excellent survey/tutorial articles have previously appeared in the literature (Gersho and Cuperman, 1983;Gray, 1984; Makhoul, Roucos, and Gish, 1985), and these papers are highly recommended. The development in this chapter differs from these papers in that their emphasis is on the LBG algorithm (see Gersho and Cuperman, 1983;Gray, 1984)while we are concerned almost wholly with the use of lattices in vector quantization. Additionally, these articles were written with the goal of minimizing the number of equations in their presentation in order to reach a wider audience. There is some overlap with Makhoul, Roucos and Gish, 1985,particularly on the topics of rate distortion theoretic and asymptotic performance results. Other perspectives on vector quantization are available in the tutorial/survey chapters written by Gersho (1986),Swaszek (1986),and Adoul(1987). We begin our presentation with discussions of scalar quantization and vector quantization in Sections I1 and 111, followed by reasons for the consideration of lattice-based vector quantizers in Section IV. Section V defines the various lattices of interest and develops their important properties, while Section VI illustrates the application of these lattices to designing vector quantizers. The utility of lattices for devising fast quantization algorithms is demonstrated in Section VII. Performance comparisons among scalar quantizers and the best known vector quantizers are present in Section VIII, including theoretical results, experimental results on synthetic sources, and
262
JERRY D. GIBSON AND KHALID SAYOOD
experimental results for speech and images. Current research areas and variations of lattice quantizers are developed in Section IX, followed by a few summary thoughts and conclusions in Section X. 11. SCALAR QUANTIZATION
A scalar quantizer is a quantizer that discretizes only a single input sample at a time. An L-level scalar quantizer Q(x) is determined by specifying L + 1 values xo < x1 < * * < x L , called step points or decision leoels, that partition the real line 9, and a set of L output points y,, y,, . ..,y,, such that if the input sample x satisfies xi-l I x < xi,then Q(x) = y,. A typical quantizer inputoutput characteristic for L even is shown in Fig. l(a), which can be equivalently represented by the one-dimensional diagram in Fig. l(b), where the hash marks are step points and the dots are output ‘‘levels’’ or points. Although the quantizer representation in Fig. l(b) is not as familiar as the one Fig. l(a), the Fig. l(b) diagram generalizes easily to two dimensions. For L even, a symmetric, uniform L-level quantizer has step points 0, &A, +2A,. .., f ( L / 2 - 1)A with xo = -a and xL = +a,and output points +A/2, If:3A/2,, .., & (L - l)A/2, where A is called the step size. An L-level, symmetric, nonuniform scalar quantizer has step points 0, kuiA, i = 1,2,. . ., (L/2) - 1, with xo = -a and xL = +a,and output points &<,A, j = 1,2, ..., L/2, where the constants { q i } and { t j }can be selected to yield the desired quantizer characteristic (Jayant and Noll, 1984). For the design or performance analysis of a quantizer, the input samples are regarded as sequences of random variables, since the quantizer has no way of knowing exactly what the next sample will be. If the input signal or source samples are independent and identically distributed, each with the probability density function (pdf)f,(x), and if the chosen error measure is g [ 2 - x], then the average distortion can be expressed as
D
=
sp,
scz - xlfx(x)dx,
(1)
where 2 denotes the quantizer output for input value x. Limiting consideration to the squared error distortion measure, g [ 2 - x] = (2 - x)’, we can write the distortion in terms of the step points and output levels as
D
=
f
i=l
Xi
[yi - x]2fx(x)dx. xi-,
The entropy of the quantizer output is given by
263
LATTICE QUANTIZATION
output
(G) yL
I r---
I
-I
I
+---+---* X
1
r
z++2
XL-1
Input
(4
I
--* y g 2 V$
Yl
X1
yg+l , ; y
r
XI.
z++l
yL
"$+2
(b)
FIG. l(a). Symmetric midriser quantizer ( L Even). (b). One-dimensional quantizer characteristic.
where p i = P [ x i _ Ix c x i ] . Since the entropy of a discrete random variable is the minimum rate in bits/sample required to represent that variable for storage or transmission applications, D in Eq. (2) and H(Y ) in Eq. (3) specify the (minimum) rate necessary to achieve a distortion D with the chosen quant izer.
264
JERRY D. GIBSON AND KHALID SAYOOD
Assigning codewords to the quantizer output levels is an example of what is called noiseless source coding, which is so-named because the coding process is invertible. For an L-level quantizer, with L an integer power of 2, the simplest approach to choosing codewords is to assign a binary word of length b = log, L to each quantization level. If L is odd or L # 2b, log, L is no longer an integer and the procedure must be modified. For example, if we have a three-level quantizer, we could assign a two digit binary word to each of the three quantization levels, thus yielding a rate of 2 bits/sample, but this would leave one binary word, or 25% of the codewords, unused. Another possibility would be to group the quantizer outputs into groups of three ternary symbols which are then mapped into five digit binary words, which is a rate 5/3 bit/sample source code. This latter code has less than 16%of the codewords unused, and is clearly more efficient than the former 2 bits/sample code. Other such codes could be pursued, but since the minimum rate required by a fixed-length code to represent three levels is log, 3 = 1.585 bits/sample, there is less than 0.1 bit/sample yet to be gained with this approach for the three-level quantizer. If the probabilities of the quantizer output levels are known or can be estimated, an entropy coding technique such as Huffman coding can be used. This method assigns short codewords to highly probable levels and longer codewords to less probable levels, thus yielding a short average codeword length, n. Since this procedure maps fixed-length blocks into variable-length binary words, buffering is necessary at both the transmitter and receiver. It is important to note that grouping quantizer output levels into blocks before using the Huffman procedure can produce a smaller Ti, since the average codeword length Ti of the Huffman code satisfies 1
H ( Y ) In < H ( Y ) + (4) M for a block of M quantizer output levels. Thus, coding pairs of output levels ( M = 2) is at least as good as coding one level or sample ( M = 1) at a time, and for large M , n can be made arbitrarily close to H ( Y ) . How large M has to be for efficient encoding depends upon the probabilities of the particular quantizer output levels of interest. Other details concerning methods for scalar quantizer design and the various approaches to coding, that is, assigning binary words to the quantizer output points, can be found in Jayant and No11 (1984) and Gallager (1968). However, a few observations are in order. First, designing a scalar quantizer consists of partitioning the real line into a finite set of disjoint and exhaustive intervals and assigning a single output value to each interval. Second, efficient coding methods require that the output points be collected in groups or blocks, and in order to approach the minimum possible rate, delay and/or complexity may be significant.
LATTICE QUANTIZATION
265
111. DEFINITIONS AND MOTIVATION FOR OPTIMAL
VECTOR QUANTIZATION
In this section, we present some technical details concerning the performance improvement possible using optimal vector quantization, but we begin by carefully defining what is meant by vector quantization. For simplicity in the sequel, we abbreviate both vector quantization and vector quantizer by VQ. Whether V Q stands for vector quantization or vector quantizer should be evident from the context. Let X be an N-component source vector with joint pdf f(X) = f ( x , , x 2 , . . . ,x N ) .An N-dimensional VQ is a function Q(X) that maps X E BN into one of L output points with each output point corresponding to an The quantizer is completely output vector Y Y 2 , . . .,YL, belonging to WN. specified by listing the L output vectors and their corresponding partitions of B Ninto L disjoint and exhaustive regions denoted by +tl, fizz,. . .,+tL, so that Q(X) = Yi if X E fii for i = 1,2,. . .,L . An N-dimensional VQ is sometimes called a block quantizer with block length N . Throughout this chapter we confine the discussion to the mean squared error (MSE) per sample distortion measure given by
,,
D
1
= ,E{d(X
1
- Y)} = --EIJX N - Q(X)(I2
where (1 * (1 denotes the usual t2norm. For purposes of transmission or storage the output vectors Y, are assigned a binary codeword ciof length bi bits. The average codeword length is thus L
6 = i =-1
biP(X E +ti) bits/vector
so the average rate in bits/sample is -
b R=N’
(7)
and hence 1
1
N
N
- H(Y) I R 5 - l0g2L.
The design of a VQ for a given distortion measure and input vector pdf requires the selection of the partitions +ti and the output vectors Yi, i = I, 2, ..., L , often called the VQ codebook, such that the partitions are nonoverlapping and cover WN.In the scalar case ( N = l), the problem is
266
JERRY D. GIBSON AND KHALID SAYOOD
relatively simple since partitioning of 92 consists of choosing nonoverlapping intervals along the real line. For N > 1, the partitions #zi can take on any shape, and hence, there are infinitely many candidates for the set of optimum partitions. Even when the N-dimensional VQ is uniform, which implies that the piare just translates of the same shape, there are many different kinds of partitions which cover 9i".For example, all triangles, quadrilaterals, and '(Gersho, 1979).This availability of many hexagons can be used to partition 9 possible shapes for the partition makes the design procedure more complicated for N > 1, but it also provides a possible performance advantage over scalar quantization. The performance of a VQ is completely determined by two quantities, the average distortion D in Eq. ( 5 ) and the required rate R in Eq. (7). If we wished to find the optimum performance possible using an N-dimensional VQ, we could take either one of two possible approaches (Gray and Davisson, 1974). We could fix the acceptable distortion D,and find the VQ that requires the minimum rate R which has distortion less than or equal to D,or we could fix the maximum rate R and find the VQ that achieves the smallest distortion D with rate less than or equal to R. However, both of these approaches, as described, imply that we must design an optimum N-dimensional VQ, and we do not know how to do this as yet. The ultimate bound on the performance of vector quantizers, indeed, on any data compression system, is provided by rate distortion theory as originally developed by Shannon (1948; 1959). The utility of rate distortion theory stems from the fact that the optimum performance theoretically attainable for any data compression system can be computed without actually designing such a system; in fact, all that is needed is a characterization of the source and a specification of the distortion measure. To define the rate distortion function of a source, which specifies the minimum possible rate for a given distortion D, consider a discrete-time, continuous amplitude, stationary source that produces a sequence of scalar random variables, x i , i = 1, 2, .... A block of n source samples x = (x,, x 2 , .. . ,x,) is represented as y = (y,,y,, .. .,y,) by the source coder with probability density function fulx(y 1 x), and so for the single-letter (sample) fidelity criterion I
n
the average distortion is given by W ( x - Y ) f = S S / . ( X ) f Y I X ( Y lx)d(x - Y ) d X d Y
(10)
where f x ( x ) is the joint probability density for the components of x. Now, we
267
LATTICE QUANTIZATION
are only interested in values of the average distortion less than or equal to D, and thus we define
FD = { f Y l X ( Y I x):ECd(x - Y)1
D)
(1 1)
as those transition probability densities between the source coder input and output which produce an average distortion less than or equal to D . The rate required to transmit these n-sample input blocks x with an average distortion of D or less is therefore
where
fY(Y)
=
s
fX(X)fYlX(Y
I x) dx.
Finally, we can define the rate distortion function of the given source as
R ( D ) = lim R,(D)
(14)
n-rm
which is the effective rate that the source produces information for reproduction with fidelity D. Therefore, R ( D )constitutes a lower bound on the rate required by any data compression scheme to achieve an average distortion of D or less (Jayant and Noll, 1984; Shannon, 1959; Gallager, 1968; Gray and Davisson, 1974; Berger, 1971). While the rate distortion function R ( D ) has received most of the attention in the information theory literature, it is seldom that a pre-specified average acceptable distortion value is available, and it is much more common in communication systems to know the maximum allowable bit rate. Because of these facts, it is more direct to define a distortion rate function D ( R ) that is obtained by minimizing the average distortion subject to a constraint on the transmission rate R. The distortion rate function is the inverse of R(D), and it can be defined precisely as follows. The rate required to represent the source coder output is given by the average mutual information between x and y,
where fy(y) is as shown in Eq. (13). Each choice of fylX(yI x) gives rise to an average mutual information, and in minimizing the average distortion, we wish to consider only those conditional densities which correspond to a transmission rate less than or equal to some specified rate R. Thus, the admissible set of conditional probability densities is defined by FR = { f Y , X ( Y
I x):I(x,y)5 R ) .
(16)
268
JERRY D. GIBSON AND KHALID SAYOOD
Therefore, the minimum average distortion possible when transmitting the nvector x at a rate of R bits/sample or less is 1 Dn(R)= - inf E { d ( x - y)} n~YIXSFR with the average distortion expressed by Eq. (10). It follows then that the minimum average distortion when transmitting at a rate of R bits/sample or less is
D(R) = lim D,(R). n+m
(18)
Equations (14) and (18) actually constitute the original motivation for considering vector quantizers since they imply that as the block length n of the vector x increases, the performance of the data compression system or source coder approaches the best performance possible, namely R(D) and D ( R ) , respectively. A slightly closer connection with V Q can be made if we avoid the expressions involving average mutual information and consider the average distortion of an N-dimensional V Q as specified in Eq. (5). The distortion-rate approach to designing a VQ is to choose Q(X) to minimize the average distortion in Eq. (5) subject to the rate constraint in Eq. (8).Therefore, we have 1
DN(R)= min- E{d(X - Y)} Q(X) N
1 Q(X) N
= min-EIIX
- Q(X)l12
over all Q(X) which satisfy 1 -H(Q(X)) N
1 N
= -H(Y)
I R.
(20)
The distortion rate function can be obtained from Eq. (19) as D ( R ) = lim DN(R). N-m
(21)
Equation (21), like Eqs. (14)and (18), implies that as the vector length becomes large, the performance of a V Q can be made to approach the best performance possible by any data compression system. Thus, in theory at least, we have a source coder structure, namely vector quantization, which can achieve the performance promised by the rate distortion and distortion rate bounds (Makhoul, Roucos and Gish; 1985). Of course, we would also like to have some quantitative indication of the performance gain available with V Q in comparison to scalar quantization. Some early and very important results were obtained by Gish and Pierce (1968) who showed that at high rates (large R ) , the uniform scalar quantizer is
LATTICE QUANTIZATION
269
the optimum entropy constrained quantizer, and that the performance of the optimum entropy constrained quantizer is within 0.255 bits/sample of R ( D ) for the mean squared error distortion measure independent of the source probability density function. Farvardin and Modestino (1984) have demonstrated that this excellent performance by the optimum entropy constrained quantizer is also maintained at low rates for the uniform, Gaussian, Laplacian, and gamma probability densities. In particular, they have found that the optimum entropy constrained quantizer performs within 0.3 bits/sample of R ( D ) in all cases (that they considered) and for all distortions. Possible performance gains of 0.3 bits/sample or less seem relatively small, and one may feel that such modest increases in performance are not worth the additional effort and complexity required by VQ. However, VQ may offer subjective performance improvements in many applications not evident in these objective performance measures, and furthermore, entropy-coded scalar quantization has some disadvantages of its own. Entropy coding of the scalar quantizer output always involves delay and usually consists of fixed-length to variable-length coding. Variable-length codes require buffering at both the transmitter and receiver, and they can be extremely susceptible to loss of synchronization in the presence of channel errors. Therefore, it is often desirable to avoid entropy coding, and, as a result, it may be more meaningful to compare the performance of scalar quantizers whose 2Routput levels, R an integer, are encoded by direct assignment of R bit binary words. In such cases, the performance of scalar quantizers at rates of 1 to 3 bits/sample are at least 0.6 bits/sample greater than R ( D ) for a Gaussian source and 1 bit/sample greater than R ( D ) for a Laplacian source (Farvardin and Modestino, 1984; Max, 1960; Adams and Giesler, 1978). Certainly, these possible performance gains are significant. It should also be noted that VQ with large dimension or blocklength offers the possibility of avoiding entropy coding altogether. This result follows from Shannon’s work (1959) which showed that for a fixed number of output levels, L, the output entropy of the VQ approaches log, L when N is large. Sakrison (1968) gave a geometrical demonstration of this fact for Gaussian sources by proving that for large dimension ( N ) , the optimum quantization vectors fall with high probability on the surface of an N dimensional sphere and that the output points are uniformly distributed on the sphere’s surface. Because of these results, the minimum output rate can be achieved by a straightforward assignment of appropriate-length binary words. We thus see that vector quantization can theoretically achieve the optimum performance promised by the rate distortion bound and that this performance is possible without entropy coding. Furthermore, the performance increment available with VQ in comparison to scalar quantization is at least 0.3 bits/sample, and possibly even greater if entropy coding is not used on the scalar quantizer output.
270
JERRY D. GIBSON AND KHALID SAYOOD
Iv. MOTIVATION FOR LATTICE QUANTIZATION Now that we have defined vector quantization and have established that optimal vector quantizers can provide significant performance advantages, we are ready to see how optimal vector quantizers can be found. As noted previously, an N-dimensional VQ is completely determined by specifying the partitions fii and the output vectors Yi,i = 1,2,. . .,L , such that the f i i are nonoverlapping and completely cover W N .Two necessary conditions that must be satisfied for an optimal N-dimensional quantizer are that: (i) the partition of W Nmust be a Dirichlet partition (also called a Voronoi region), that is,
fii
=
{X:IIX- YillIIIX - Yjll for each j # i}
(22)
and (ii) the output points must be centroids of their respective regions, so
Yi= {Y:[+, Ilx - Y11y(x)dx is minimum).
(23)
One important approach for V Q design is based upon using training sequences representative of the vectors to be quantized and the LBG algorithm, which is a version of the K-means algorithm in the pattern recognition literature (Linde, Buzo and Gray, 1980; Makhoul, Roucos and Gish, 1985; MacQueen, 1967). This iterative algorithm can be shown to converge to at least a local optimum, and global optimality can be approximated by repeatedly running the algorithm with different initialization vectors. This algorithm (in general) produces a nonuniform partition of 9" and a nonuniform distribution of VQ output points. This design procedure is performed totally off-line, but the computational and storage requirements of this process are still not insignificant. In fact, for M training vectors and I iterations of the algorithm, the computational cost is about NLMI = NMI 2 R N operations and the storage cost is N(L + M). Since M must be at least lOL, these quantities are large and grow exponentially with an increase in R and N (Makhoul, Roucos and Gish, 1985). Once we have obtained a VQ codebook, the quantization process consists of calculating the distortion between the current input vector and each output vector and choosing that output vector with minimum distortion. If each distortion calculation requires N operations, then the quantization process requires NL = N 2RN operations. The storage cost is also N 2RN. Since these operations must be accomplished in real time, this computational burden is quite significant. For example, if R = 2 bits/sample and N = 10, the number of operations is 10.220 z 10 million! These computational and storage
LATTICE QUANTIZATION
27 1
requirements are for a full-search VQ, and much research effort is going into reducing these numbers with some loss in performance. Vector quantizer codebooks designed using the LBG algorithm generally have no discernible structure, and this “random” codebook distribution is what complicates the quantization or encoding process. Lattices in B Nhave considerable structure, and hence, lattice-based quantizers offer the promise of design simplicity and reduced complexity encoding, providing that lattices can be found in high dimensions which yield good quantization performance. A lattice is defined as a set of vectors A = { X : X = ulal + u2a2 + ... + .,aN} (24) where a,, i = 1,2,. . .,N, are the basis vectors of the lattice and the ui are integers. We form a VQ from a lattice by selecting L of the lattice points x to be the output points Yi and forming Voronoi regions about these output points so that if a source vector X E bi,then Y, = Q(X). Now that we have a defined lattice quantizer, how do we find a good lattice quantizer? First, Gersho (1979) has conjectured that for N asymptotically large, the optimal quantizer for a uniformly distributed source will have all of its Voronoi regions (except the boundary regions) congruent to some basic polytope, Second, a quantizer will perform well if its Voronoi region approaches the shape of a sphere. This statement comes from the following argument. It is generally felt that for asymptotic N the best covering of space is a dense packing of nonoverlapping spheres. Since a nonoverlapping covering is not possible for finite N , the best covering will be a covering by spheres with minimal overlap. The nonoverlapping regions are the Voronoi regions. Thus, to find a good lattice quantizer, we are interested in regular lattices and Voronoi regions which best approximate a sphere in W N . Note that we can also use the second argument to justify approaching the problem as a search for dense sphere packings in BNwith the lattice points as the sphere centers, called a lattice packing (Sloane, 1984). Details concerning the design of lattice vector quantizers, or simply lattice quantizers, are pursued in subsequent sections. Here we examine the performance offered by lattice quantizers. That is, since we are restricting the possible output points to lie on a lattice, which is a regular structure, and we know that locally optimal vector quantizers designed with the LBG algorithm can be quite irregular, exactly what is the performance penalty for limiting consideration to lattice quantizers? We begin to answer this question by presenting a version of a derivation due to Sakrison (1979). We consider an N-dimensional VQ as previously defined in Sec. I11 with an output rate in bits sample of 1 1 L - H(Y) = - P[Y = Y,] log, P[Y = Yi]. N Ni=,
c
272
JERRY D. GIBSON AND KHALID SAYOOD
Since P[Y
= Yi] =
S,
f(X)dX,
(26)
i
which is an N-dimensional integral over the input vector probability density, we can rewrite Eq. (25) as 1
- H(Y) = - -
N
1
Ni=,
s,,
f(X) log P[Y
= Yi]
dX.
(27)
Under the assumptions of small distortion and a sufficiently smooth input density, the argument of the logarithm in Eq. (27) can be approximated by PLY where
= Yi] =
s,,
f(X)dX z Kf(X)
(28)
is the volume of the ith partition, so that Eq. (27) becomes
c
1 1 L -H(Y) z - f(X)log [Kf(x)] d X N Ni=lS,r
1 N
1
1 L P c y = Yi] log q. Ni=,
= -H(X) - -
(29)
To simplify further we also assume that all of the partitions fii are translated versions of the same shape, say fi, and that each partition contributes an average distortion equal to D. Then K = V for all i, and 1 1 1 -H(Y) E -H(X) --log N N N
V
(30)
is the rate in bits/sample required by the vector quantizer to achieve an average distortion of D or less. What we would like to do now is show that the N-dimensional V Q ratedistortion performance just derived is near the rate distortion bound for the input source sequence xj, j = 1,2,. . .. Letting yi, i = 1,2,. . ., denote the representation sequence produced by the source coder, then we consider the single letter fidelity criterion 1 " n i Z l(Xi
d(x - y) = -
1
- y,)Z
(31)
273
LATTICE QUANTIZATION
which allows us to write the average distortion E { 4 x - Y)>
=
[ [ f X ( X ) f Y l X ( YI x ) 4 x - Y )cfx dY.
(32)
The average mutual information between the source coder input x and the reproduction y is
where fy(Y) =
5
fX(X)fYIX(Y
Ix ) d x ,
(34)
and to find R ( D ) we wish to minimize I ( x ; y ) over all transition probability densities f y l x ( yI x ) that yield an average distortion equal to D. Thus, defining the admissible set FD
= I f y l x ( Y 1x1: E C 4 x - Y ) l
D),
(35)
we can write the rate distortion function for the given source and the chosen fidelity criterion as
R(D) =
inf
IyI x~ F D
/(x;y).
By using the facts that / ( x ;y ) = H ( x ) - H ( x I y ) , H ( x - y I y ) = H ( x I y ) , and H ( x - y ) 2 H ( x - y I y), we can manipulate Eq. (36) as follows, R(D)=
inf [ H ( x ) - H ( x I y ) ] IY Ix E FD
= H ( x ) - SUP
IyI
2 H(x)-
H(x - y ( y )
E FD
SUP
H ( x - y ) 4 R,(D),
(37)
f y l x EFD
where the second equality results since H ( x )does not depend on fylx( 1 x). The final result in Eq. (37) is called the Shannon lower bound to the rate distortion function (Shannon, 1959; Berger, 1971), and our particular derivation is due to Sakrison (1979). The Shannon lower bound, denoted here by R,(D), is particularly useful since Eq. (36) defining R ( D ) is difficult to evaluate for general sources and distortion measures, but R,(D) can be calculated in many cases of interest. To see how close the performance of a uniform VQ comes to R(D), we compare Eq. (30) and R,(D) in Eq. (37). Note that if we assume that the
274
JERRY D. GIBSON AND KHALID SAYOOD
sequence of source inputs is independent and identically distributed, then H ( X ) = NH(x), so that the first terms in Eq. (30) and R,(D) are identical. Although it is beyond our development here, it is possible to show that for N large, H(X - Y)equals log V with high probability (Shannon, 1948;Sakrison, 1968; Sakrison, 1979), and since H(x - y) = (l/N)H(X - Y), then the last terms in Eq. (30) and in R,(D) are equal for large dimensions. Therefore, we conclude that the performance of a uniform VQ achieves the Shannon lower bound on R ( D ) as the number of dimensions becomes asymptotically large. This result is encouraging since it says that VQs with identical quantization cells, which are obviously highly structured, are asymptotically optimum. To evaluate the performance for small N, we need ~ and ~ ~(l/N)log ~ H (V,X where V is the N-dimenonly compare S U ~ ~ ~ -, y) sional volume of the basic quantizer partition p. Before V can be calculated, however, we must scale p such that the desired average distortion, say D*, is achieved. To perform this scaling, we first note from Eq. (28) that the probability of any N-vector X is uniform throughout the region p,and that if the representation vector is at the center of #, then it follows that the quantization errors are uniformly distributed throughout p.Based upon these observations, we can show that for an average distortion D*, the uniform quantization region fi in one dimension is the closed interval [-@, and thus, 1 1 1 -log V = -log 12D* = 1.792 +-log D* bits/sample. 2 2 N
(38)
Similarly, if in two dimensions we choose p to be a hexagon, then for an average distortion D*,the radius of the hexagon is [24D*/5]'/2 and 1 1 -log V = -10g[72&D*/lO] N 2 =
1 1.82 + -logD* bits/sample. 2
(39)
(Note that Eq. (30) in Sakrison (1979) appears to be in error). The values in H ( x - y) for the squared Eqs. (38) and (39) should be compared to supfyIXEFD error distortion measure, which is (Berger, 1971; Sakrison, 1979) 1 sup H ( x - y) = -log 2neD* 2
f y Ix E FD
= 2.047
+ -21 log D * bits/sample.
(40)
Subtracting Eq. (38) from (40), we see that optimal VQ offers a reduction of
LATTICE QUANTIZATION
275
0.255 bits/sample over scalar quantization, but comparing Eqs. (39) and (40), the two-dimensional uniform hexagonal quantizer performance is only 0.028 bits/sample better than the scalar quantizer. These last results have both an encouraging and a discouraging aspect. On one hand, it is encouraging that uniform V Q performance can approach R(D) for large dimensions. On the other hand, however, it is discouraging to find that multidimensional V Q only performs 0.255 bits/sample better than uniform scalar quantization, and that a two-dimensional quantizer provides only a small portion of this available improvement. This last point implies that we will not be able to close the gap between vector quantizer performance and R ( D ) without going to higher dimensions (N). In fact, Sakrison (1968, 1979) felt that because of this seemingly negligible gain in performance and the complexity involved in the implementation of N-dimensional VQs, that vector quantization “...may never be used in practice”. However, as stated at the end of the immediately preceding section, VQ may provide subjective improvements not evident in our mathematical development, and it is possible that we can avoid the use of entropy coding with vector quantization. In this section, it has been demonstrated that uniform vector quantizers can achieve performance at or near the rate distortion bound, and hence, we conclude that highly structured VQs based upon lattices, which offer the possibility of significant reductions in implementation complexity, are viable alternatives to optimal and locally optimal VQs designed using the LBG algorithm. We are now ready to begin our investigation of lattice quantizers.
V. LATTICES Given a set A of n linearly independent vectors a,, a 2 , .. .,a,,,a lattice in ndimensional Euclidean space is defined by taking integral linear combinations of the vectors in A as shown in Eq. (24). The set A is called the basis of the lattice A, and as is evident from Eq. (24), there is considerable structure associated with a lattice. In fact, most of this section and the two subsequent sections are concerned with presenting and exploiting lattice structural properties. However, it is also clear from Eq. (24) that it is possible to define many different lattices, and we wonder exactly how to proceed to find those lattices which can serve as good quantizers. One approach is to examine lattices which have already been widely studied in other contexts or applications, such as sphere packing, and investigate the performance of quantizers built upon these lattices. Such has been the approach employed in the lattice quantization literature, and it has led to the study of lattices that are based upon the root systems of Lie algebras and which have the designations A,(n 2 l), B,,(n 2 l),
276
JERRY D. GIBSON AND KHALID SAYOOD
C,(n 2 l), D,(n 2 2), and E,(n = 6,7,8).The subscripts indicate the dimension of the lattice. Fortunately, we can develop these lattices from group theory without having to understand Lie algebras. Other important lattices that are not root lattices but have been investigated in other situations are the CoxeterTodd lattice (K12),the Barnes-Wall lattice (Al6), and the Leech lattice (A24). Since some of these lattices and their duals (to be defined shortly) give the best packings and coverings in their respective spaces, we are led to consider them as promising candidates for lattice quantizers. Before proceeding to consider these specific lattices, we provide a few more properties of lattices in general. T o begin, we state the definition of a group. Definition: A group ( G , * ) consists of a set of points {gi} E G and an operation * such that 1. If g l , g2 E G then g1 * g2 E G; (41) (42) 2. g1 * (g2 * g d = (Sl * 92) * 93; 3. There is an identity element 1 in G which has the property that gi * 1 = 1 * gi = gi for all gi in G ; 4. For each element gi in G there exists an element g; called the inverse of gi with the property gi * g f ' = g; * gi = 1.
'
If in addition to these properties, we also have the property that gi * gj = gj * gi then the group is called an Abelian or commutative group. With this definition we can show that a lattice is a discrete subgroup of 9". Let x and y be two points of the lattice
+ u2a2+ ... + u,a, y = ulal + u2a2 + ... + u,a,.
x = ulal
Then the vector x x
(43) (44)
+y
+ y = (ul + ul)al + (u2 + u2)a2+ ... + (u, + u,)a,
(45)
is of the form in Eq. (24). Thus A is closed under standard addition and the first requirement of the definition is satisfied. The second requirement is obviously satisfied by the properties of addition. The identity element of this group is the zero vector, and the inverse of x is simply - x, where - x is a member of the lattice by Eq. (24). Therefore A is an additive group. To show that it is discrete, all we have to do is show that the points of the lattice are separated by some quantity Y. To do this we define p
= min{ jail}.
(46)
Let C be the cube lxil I p . Then by definition there is no nonzero point of the lattice inside C . In fact if xo is a point of the lattice then there is no point not
217
LATTICE QUANTIZATION
tt
+
x
A
0
-
I
**
w
0
T
* *
w
0
0
4t
equal to xo inside C + xo. Now C contains within it some sphere 1x1 I I I p. Therefore all points of the lattice are separated by at least r. We have proved We can also prove the converse, that lattices are a discrete subgroup of gn. that is: if A is a discrete subgroup of anand A is not contained in some linear subspace of an-', then A is an n-dimensional lattice. The proof can be found in Lekkerkerker, 1969. Let us now turn our attention to the basis of a lattice. In Fig. 2, we have a two-dimensional lattice. The lattice points marked with x form a basis for the lattice, but so do the lattice points marked with and the lattice points marked with *. In fact, there are an infinite number of bases for a given lattice. All of these basis sets, however, are related to each other. If A is a basis set for the lattice A, then B is a basis set if and only if
+
B = UA
(47)
where U is a matrix with integral components and IU( = f 1. This in turn means that the magnitude of the determinant of any basis set of a lattice is invariant, and thus is a property of the lattice rather than simply the property of individual basis sets. The determinant of a lattice is denoted by d(A) and is useful in computing certain parameters of lattice quantizers. We can get a
278
JERRY D. GIBSON AND KHALID SAYOOD
geometric interpretation of d(A)by defining a half open polytope P consisting of points x
= rlal
+ r2a2 + ... + r,a,,
0 I ri < 1.
(48)
The volume of this polytope is equal to the determinant of the lattice. is completely covered by the polytopes P y, y E A, and all Furthermore, 9" such polytopes associated with the lattice have the same volume. With each lattice we can associate another lattice called the dual lattice. To demonstrate how to obtain a dual lattice, we define a basis set composed of the vectors e, where e, is the vector with ith component equal to one and all other components zero. From this basis, we generate the cubic lattice, denoted Z", consisting of all integral coordinates in 9". Now if we select points x such that u x is an integer for all u in Z" (where is the standard inner product), it is easy to see that x also belongs to Z". Defining the lattice A = AZ", where A is the matrix of an arbitrary nonsingular transformation, we consider the matrix A * which is the transpose of the inverse of A . Then for all points y E 9"with the property that x y is an integer for all x in A,
+
-
-
-
X - Y
= A x . A *Y 7
(49)
which implies that the set of points y form a lattice A* with basis A*. This lattice is called the dual lattice of A. Note that det A* = (det A)-' and A* = (det A)-'(adj A ) T ,where adj A denotes the adjoint of the matrix A . It then follows that d(A*) = [d(A)]-'.
(50)
Some lattices and their dual are equivalent within a rotation and a scale change, which we indicate by the notation A z A*. The 2" lattice with A = I is a trivial example of this behavior, since A* = (det A)-'(adj A ) T = I . As we shall see, other important lattices also exhibit this property. We now turn to a discussion of the root lattices, that is, the lattices based upon the root systems of Lie algebras. As stated previously, we can develop these lattices within the context of finite reflection groups and avoid the additional complication of having to study Lie algebras. The basic idea is easy with to describe in two dimensions. What we desire to do is to cover identical cells or tiles. For an appropriately chosen cell, this can be accomplished by starting with a single cell and propagating this cell throughout the space by a series of reflections. The process is illustrated in Fig. 3 where the initial cell or tile is the triangle labeled with the point a. The point a, and indeed every point in the cell (triangle), is reflected through the plane (line) labeled P1 to get the cell containing the point a'. Point a' is the reflection of point a with respect to P1. Similarly, we can reflect this new cell through the plane P2 to obtain the cell labeled with the point a", which is the reflection of
279
LATTICE QUANTIZATION \
\ \
/
/
\
/
\
----\P2
PI FIG.3. Tiling two-dimensional space with reflections of a basic cell.
a' with respect to P2. Continuing these reflections, the entire space can be covered or tiled. We make these ideas more specific by considering the group of all orthogonal transformations over a real Euclidean space V, denoted Lo( V ) . An orthogonal transformation is one whose matrix representation, say A , satisfies ATA = I . We are interested, in particular, in the finite subgroups of O ( V ) generated by reflections, and even more specifically, in the subgroup of transformations G which leaves a lattice A invariant under all transformations T E G .
Definition: A rejection of I/ is a linear transformation T that takes each vector to its mirror image with respect to a fixed hyperplane P (Grove and Benson, 1985). Specifically, we can write that for x E V, T x = x if x E P and T x = -x if x E P'- ( P l is the hyperplane orthogonal to P ) . To give an explicit description of these reflections, let r E P', r # 0, and define the transformation T,x=x--
-
forallx E V. Weseethat if x E P , x r
2x r r r.r
= 0,so
T,x = x,and T,r = r - 2r
= -r.
280
JERRY D. GIBSON AND KHALID SAYOOD
Note that P u {r} contains a basis for V and that Tf = 1, so that T, is a reflection and T, E O(V). We say that T, is a reflection through P and along r. Letting G be the subgroup of reflections in O(V), T, E G and the vectors + r previously defined are called roots of G. The positive roots of G are those roots which for some t E V, t r > 0, and the negative roots are those for which t r < 0. The entire set of positive and negative roots is said to be a root system for G. A simple positive root is a positive root that cannot be expressed as a linear combination of two (or more) other positive roots (Gilmore, 1974). A definition of the base of a root system is as follows.
-
Definition: A subset R of W", where R = {rl, r 2 , . . . ,r,} and the ri are the positive simple roots, is called a base for the root system if R is a basis for W n and if each positive root can be written in the form n
c1
=
C uiri, i= 1
ui E Z
with ui all positive. The cardinality of a base is the dimension of the underlying Euclidean space. The reflections of the ri are often called the fundamental reflections of G. If the roots of a system cannot be broken down into subsets of roots where the roots of each subset are orthogonal to the roots of the other subset, then the system is irreducible and the associated lattice is also irreducible. A reducible system can always be written as the direct product of its irreducible subsystems, where the sum of the dimensions of the subsystems is equal to the dimension of the original reducible system. The reducible cases can then be studied through their component systems. The 2" lattice is an example of a reducible system. Irreducible systems are the most important, but we will have occasion to study both reducible and irreducible systems. There are several important properties of root systems that we need to continue our development. One property is that for two simple positive roots ri and rj, i # j , ri rj < 0. (53)
-
We can surmise this property from Eq. (52), since this definition yields for any positive root c1 that
which in order to guarantee that uk > 0 when all other ui > 0, implies Eq. (53). Furthermore, since we know that
LATTICE QUANTIZATION
28 1
where Oij is the angle between the two roots, it is evident from Eqs. (53) and (55) that cos 0, I 0, which implies that eijis always obtuse. We can obtain another property if we let T be the reflection in the ri direction and consider
Now if the simple positive roots are a basis for a lattice A, then
Since Trj must also belong to A, this implies that
-
2r.J r.I -
integer.
'
IIriI12 Using Eq. ( 5 5 ) , we can write cos28,
=
(ria rj)' IIri II IIrjII
which can also be expressed as
cosz
ri - rj .ri - rj eij = -
IIrit12 IIrjI12'
or from Eq. ( 5 8 ) cos2
m .m' eij= 2
2
for some integers m, m'. These integers can be determined by noting that
m m' o I cos2 eij= .2 1 1 , so that all of the possible values for the quantities in Eq. (62) can be enumerated as shown in Table I (Gilmore, 1974). The quantity in Eq. (58) is very important in the study of the root lattices, and it is sometimes given the special shorthand notation (rj,ri)
=
2rj ri IIriI12
~
'
Note that (ri,ri) = 2, and from Eq. (53) and Table I that (rj,ri) for i # j is either 0, - 1 , -2, or - 3, which correspond respectively to inter-root angles of. 90", 120". 135", and 150".The Cartan matrix corresponding to the simple roots of a root system is defined to have as its j j r h member the quantity (ri, rj) for
TABLE I RELATIVE LENGTHS OF ROOTSAND ANGLESBETWEEN ROOTS(GILMORE, 1974)'
*
cos Qij
1
2
-
4
1
-
4 0
Qij
0". 180"
*2
45", 135"
+2
+I
-
+1
k2
2
60", 120"
+1
+I
1
90"
0
0
undetermined
-
2
-
1
0
-1 2 - 1 0 - 1 2
A, :
1
,
.
0
-
1
' 0
.
.
0-
.
'
0
'
0
.
' . . . ,
.
.
,
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- 0
0
0
0
'
'. -1
- 2 -1
-1 2
0 -1
0
0 - 0
0
0
0
0
- 2 -1 0
-1 2
0 -1
-1
2
0
0 0
0
.
.
,
1 2
'
. . 0 -1 2 -
00
B, : ... ...
0
0
-1 0
2 -1
-2 200 0
-1
c, : - 0
-1 0
0
2
-2
-1
2.
FIG.4. Cartan matrices for some irreducible root systems (Humphreys, 1972).* 282
283 - 2
-1 2
.
.
.
.
.
D , : .
.
. 0 0 0 - 0
.
. .
0
. .
- 1 .
.
0 0 0 0
.
0
-1
.
" 0 0 0
0 -1 0
. .
. .
0' 0
. . . .
.
.
.
.
.
.
.
.
.
.
1
.
.
. 2
-
.
'
'
'
'
2 0 -1
. .
.
'
'
. .
.
. . -
. .
"
0 - 1 2 - 1 -1 2
.
.
.
.
.
1 0 0 -1 2 - 1 - 1 0 - 1 2 0 0 - 1 0 20 0 -1
0 0 0
FIG.4. (Continued)
i , j = 1,2,. ... n. Because of the previously stated properties of ( r i , rj), the diagonal term of a Cartan matrix is always 2 and 'the off diagonal terms are either 0, - 1, - 2, or - 3. For symmetric root systems, which are root systems where all roots are of the same length, (ri,rj) = (rjrri), and the Cartan matrix is symmetric. Sometimes the elements of the Cartan matrix are
284
JERRY D. GIBSON AND KHALID SAYOOD
normalized to make the main diagonal unity, but once the Cartan matrix (even normalized) is known, the simple roots of a root system are known within an arbitrary spatial orientation and a scale factor. Cartan matrices for the root systems A,(n 2 l), B,(n 2 2), C,(n 2 3), D,(n 2 4),E,(n = 6,7,8), F4 and G,, are shown in Fig. 4 (Humphreys, 1972). The systems A,, D,, and En are symmetric, and all of the matrices shown represent irreducible root systems. Coxeter-Dynkin diagrams are a graphical shorthand way of representing the Cartan matrix. (The terminology Coxeter-Dynkin diagram is a com-
B,(n?Z):
Dn(@4):
* * *
1
\
0
n-I
n- 2
-
>
D
’
n
’ * ‘
1
E,:
2
0
0 1
Ee: 0 1
n-3
2
n
A
v
W
B B
3
5
4
n
A
3
4
F4: 1
6
5
* \
2
’
0
>7> 3
W
-
0
6
7
8
0 4
FIG.5. Coxeter-Dynkin diagrams for several root systems (Humphreys, 1972).’
LATTICE QUANTIZATION
285
bination of Coxeter graphs from group theory and Dynkin diagrams from Lie algebras). Coxeter-Dynkin diagrams corresponding to the Cartan matrices in Fig. 4 are presented in Fig. 5 (Humphreys, 1972).The nodes correspond to the simple roots of the system and are numbered in the order that they appear in the Cartan matrix. The number of lines connecting the ith and jIh nodes represent (ri, rj), called the Cartan integers, ignoring the negative sign. Nonsymmetric values have an arrow which points to the larger of the two, with the larger value being entered in the position indicated by reading the nodes in the direction of the arrow. Rather than just defining all of the root systems of interest, what we would like to d o now is “back up” and show how root systems can be constructed using some of the properties that we have obtained for root systems. We and take the following four properties as axioms consider a subset R of 9’’ which must be satisfied in order for R to be a root system (Humphreys, 1972): 1. 2. 3. 4.
R If If If
is finite, spans Bnand does not contain 0; a E R then the only multiples of a in R are + a ; a E R the reflection T, leaves R invariant; a, E R then 2u. P/a a = (p, a ) = negative integer.
The algorithm for constructing a root system simply consists of starting with a set of roots and adding roots that satisfy the axioms listed above. Generally, the procedure terminates because we find that ( & a ) > 0 in axiom 4. Using the four axioms and Table I, we demonstrate the process by constructing root spaces of rank 2 from the root space of rank 1. Starting with the roots f a we add a root fi at 60”.As m and m’are both 1 or - 1, IfiI = l a [ . If pis a root so is - 8. Now we can use reflection to obtain the root fi’. This is the reflection of p with respect to the plane normal to a. Along with -p’, we now have six roots in our root space. This construction is shown in Fig. 6. It turns out that we cannot add any further roots without violating the axioms. This particular root system is known as A , where the subscript denotes the dimension, and the root system represents the Lie algebra A,. If we consider A , as a subspace of g3,we can see some interesting algebraic symmetry (Gilmore, 1974). From Fig. 7 the roots can be written as +(ei - e j ) , i # j ; i , j =
1,2,3.
This representation can be generalized to obtain the root systems A , in n dimensions. That is, the root system A , in n dimensions is made up by k ( e i - e j ) , i # j ; i , j = 1,2,...,n + 1.
Another root system can be obtained from the root space of rank 1 if we add a root \j at an angle of 45” to the root a. The length of this root is either J 2 or I / J j of the length of a. If we let the length of /3 = l a l f i , the rest of the root
286
JERRY D. GIBSON AND KHALID SAYOOD
FIG.6. The root system A 2 .
-el +e,
'e, +ez
el
-es
FIG.7. The root system A , imbedded into @.
system can be obtained through reflection as shown in Fig. 8. The roots are therefore +e,,+e2,+e,
+e2.
(66)
This can be generalized to obtain the roots in n dimensions as +ei f e j ;f e i , i # j ; i, j = 1,2,. . .,n.
(67)
287
LATTICE QUANTIZATION
p'
=
-e, f e z
a'= ez
FIG.8. The root system B,.
This root system is denoted as B,,and represents the family of Lie algebras, B,,. If instead of taking the length of fl to be l c l l f i , we take it to be I a I / f i , we obtain a different root system, C, (Fig. 9), whose roots in n dimensions are given by f e i f ej, 2ei; i # j;i, j = 1,2,.. .,n.
(68)
Adding a root with an angle of 90" to the root space of rank 1 gives us the D, system (Fig. 10) consisting of two of the normal sets of roots. These roots can be written in n space as fe, f ej. Alternatively, if we add a root with angle 30" to 2 CI and complete the root system through reflections, we get the system G , (Fig. 11). This root system cannot be generalized to higher dimensions. These are all of the root systems of rank 2. Apart from the root systems developed thus far, namely, A,, B,,, C,,,D,,and G,, there are only four other distinct root systems, F4, E 6 , E , , and E , .The root system for F4 contains the following nonzero roots (Gilmore, 1974), F4:feifej,+2e,,fe, 4 e , f e , f e , , i # j ; i , j =
1,2,3,4
(69)
while for E,(n = 6, 7, 8), the nonzero roots are E , : _ + e i + e j , i # j ; i , j =1 ,..., n - 1 ,
+
(70)
where there are an even number of signs in the first six terms for E6 and E , , and for all eight terms in E,. The quantity under the square root indicates clearly that the E, series ends at E 8 .
-a'=2e2
p'
=
-el f e z
\ a' = -2e, FIG.9. The root system C,.
-a = - e ,+a
p=
FIG.10. The root system D,. 288
e, f e z
LATTICE QUANTIZATION
a"
a'
-a'
-a"
289
FIG. 11. The root system G,.
It is sometimes possible to simplify the generation of a root system by starting with a high dimensional, say n - 1 dimensional root system. For example, if the root system of A , is known and {r2, ..., r,) is a base for A , , since A , is a subgroup of E , and the 28 positive roots of A , are among the roots of E,, we can start with the larger set of roots to find the roots of E,. This procedure is illustrated in Grove and Benson, 1985. Table I1 summarizes our results on root systems and lists the number of roots in the system (Gilmore, 1974). For purposes of tying many of the concepts in this section together, let us show how to start with a root system and generate the Cartan matrix. In particular, we consider the A , root system. Figure 6 shows the A , root system with the base(u, p') marked with heavy lines, so (u,
8')
a.
= 2-
p'
=
21.11p1' cos 120"
u-u
By construction la1 = lp'l, therefore ( u , p ' ) - 1. So the Cartan matrix for A , is
Id2 = - 1,
and similarly (fl',cr)
[-: -:] To get the Cartan matrix for A , we look at the base e i - e j , j = i + l , i = l ,..., n,
=
290
JERRY D. GIBSON AND KHALID SAYOOD TABLE I1
ROOTSYSTEMS, ROOTS,AND CARDINALITY (GILMORE, 1974)4 Root
System
Number of Roots
B, C, 0,
n2 + n 2n2 2n2 2n(n - I )
F4
48
E,
240
El
126
A,
Roots
1
1 8
fe, f e j , 1 5 j < i I 8, uiei, u, = k 1, even number of positive u, 2 , = 1-
k e i fei,1
i # j I 6, fJ 2 e 1 ,
even number of
+ signs ~
E,
12
1
k e i If: e,, 1 I i # j I 5, z(fe , f e2 f . . . f <
even number of
+ signs
and order them
el
- e,, e, - e3,...,en - e,,,.
(73)
Then since 2,
i=k
0,
otherwise,
( e i - ei,ek - e,) =
(74)
we get a Cartan matrix which has 2 on the diagonal and - 1 on the off diagonal, 2 -1 0 **' 0 -1 2 -1 0 -1 2 ..* 0 A" : (75) 9
0 0
. .
... - 1 '
-1
2-
which checks with Fig. 4. By examining the root systems, Cartan matrices, and Coxeter-Dynkin diagrams, it is possible to discern relationships between the various root
LATTICE QUANTIZATION
29 1
systems. In Fig. 5, the Coxeter-Dynkin diagrams are shown only for certain restricted values of n, since these are the only values that give distinct root systems. For example, D, has the roots ke l _+ e,, which, since A , has the roots k e , ,is equivalent to the direct product A , x A , . Thus, D, is reducible and is better sjudied in terms of its components, namely 2 A, systems. Also, examining the roots for B , and C , in Eqs. (66) and (68), respectively, we see that they are related by a rotation through 45" and a scale change by 4,so B , and C , are said to be equivalent, which is expressed by B, z C , . In three dimensions, we can write the roots of D, and construct its Cartan matrix, which turns out to be the same as A , , so D, z A,. We can also draw conclusions concerning dual lattices if we recall the discussion between Eqs. (49) and (50) which says that the matrix of basis vectors for the dual lattice is the inverse transposed of the matrix of basis vectors for the original lattice, and thus that the Cartan matrices of a symmetric lattice and its dual are inverses. Beginning with A , , it is immediate that A ; z A , . Now, the inverse of the Cartan matrix for A, is
[-:::I:-
which is the Cartan matrix for A : . Clearly the matrix in Eq. (76) is just a scaled version of the A , matrix, so A f z A , . It is also evident that since A , 2 D,, then A : z 05.More tedious calculations also reveal that D, E DZ and E , z E g , among other relations. We may associate a lattice with a quadratic form (Barnes and Wall, 1959; Gameckii, 1962) n
n
where 9.. i j = 8. 1 '
aj
(78)
and the ai are basis vectors of the lattice. The normalized Cartan matrix is the matrix of the gij components, which are sometimes called metrics. From these metrics, it is a straightforward task to obtain a set of basis vectors for the lattice using a Gram-Schmidt procedure (Sayood, Gibson and Rost, 1984). Being able to generate such a basis set is often extremely useful when designing and studying lattice quantizers. We develop this procedure here. Let A be the basis set of the lattice A where
A=
i"l 8"
(79)
292
JERRY D. GIBSON AND KHALID SAYOOD
There are an infinite number of basis sets which satisfy the inner product requirements in the metric matrix. Each of these basis sets corresponds to a To obtain a basis set, we first have to different orientation of the lattice in 9". fix the orientation of the lattice in space. One way to do this is to require that A be a triangular matrix. If we force A to be a lower triangular matrix, this forces the first basis vector a, to be in the direction of e , . To see how this leads to obtaining the basis vectors, write the matrix A as
A=
a,, a,, 9nl
0 a22
0 0
0 0
an2
an3
an4
..'
... ...
0' 0 an,
Now a,
'a1
= Y11
but a, a,
= a:,
So from Eqs. (81) and (82), aii = J g i i
1
(83)
and the basis vector a, is completely specified. Now to obtain the basis vector a2 we use the equations resulting from a2
'a,
= g21
(84)
a2
-
a2 = 9 2 2
(85)
or a11a21 = S2l
4
1
+4
2 = 922.
Thus
Proceeding in this fashion we can obtain a general algorithm for finding the basis vectors, which is summarized below. Step 1 : Obtain the inner products g i j ; Step 2: j = I ; all = Jy,,;
(
i- 1
k=l
LATTICE QUANTIZATION
(
293
Step 4: ujj = 9.. - jC - l a2 J!i)'/'> . JJ
k= 1
Step 5 : If j < n go to Step 3. In this algorithm, we assume that the sum when the upper limit is less than the lower limit is zero. As an example of using this algorithm, consider the A , lattice. The inner products are given by the normalized version of the Cartan matrix in Fig. 4,so Step 1: g I 1 = g,, = 1 1 912 = g21 = -1. Step 2: j = 1; a,, = 1 Step 3: j = 2; u2' = - 1 Step 4: u22 = (1 - $ ) l / Z =
,
+.
Thus the basis vectors are a1 = ( L O ) a2
=
(-2.1).
(88)
Note that depending upon the form of the metric matrix used, the enumeration of the basis vectors may be different. Confusion in comparing lattices can be avoided by transforming the matrix of basis vectors into minimal canonical form, which is unique within a scale factor for a given lattice (Cohn, 1962). We will have occasion to use this procedure in a later section. Note again that we can get basis vectors for a dual lattice by taking the inverse of the metric matrix transposed and then generating the basis vectors, or by taking the inverse transposed of the basis vector matrix for the original lattice. There are many other ways to construct lattices in addition to using the root lattices. One particularly intuitive approach is to stack (or pack) spheres in n-dimensional space (Conway and Sloane, 1982~).We begin by considering the densest "sphere" packing possible in one dimension, which consists of spheres (line segments) of radius 1/2 centered at the integers Z ' . To go to a two-dimensional lattice, we extend these one-dimensional spheres to two dimensions by drawing circles of radius 1/2 centered at the integers. We then form another two-dimensional layer of spheres with radius 1/2 that is identical to the first layer which we fit into the gaps of the first layer. These gaps or slots that is at the maximum are often called deep holes. A deep hole is a point in 9" distance possible from a lattice point. If we continue to stack two-dimensional spheres in the preceding fashion, we get the A , lattice packing. In three dimensions, the circles centered at A, are extended to three-dimensional spheres with radius 1/2 and an identical layer of spheres is placed in the holes
294
JERRY D. GIBSON AND KHALID SAYOOD
of the first layer, and the process is continued. The resulting lattice packing is equivalent to D,. As we continue into higher dimensions, we take an appropriate (n - 1)-dimensional lattice and place a copy as close to the first layer as possible to get an n-dimensional lattice packing. In higher dimensional spaces, intuition escapes us, and in fact, in higher dimensions, this layering approach can yield more than one lattice (Conway and Sloane, 1982~).Lattices produced by this sphere packing procedure are called laminated lattices, and in dimensions 1 through 8, the laminated lattice packings are unique and coincide, respectively with the lattices A, E' Z ' , A , , A , r D,,D4,D5,E 6 , E , , and E , . In dimension 16, the laminated lattice is unique and is equivalent to the Barnes-Wall lattice A16 (Barnes and Wall, 1959),and in 24 dimensions, the unique laminated lattice is the Leech lattice 1\24 (Leech, 1967).There is another sequence of lattices, denoted by K i , i = 1, 2,. . .,24, which can be obtained from suitable cross-sections of the Leech lattice and which are the same as the laminated lattices for dimensions n I 6 and 18 I n I 2 4 , but are not laminated lattices for 7 s n I 17 (Conway and Sloane, 1982~).One of these lattices, K , , , was discovered by Coxeter and Todd (Coxeter and Todd, 1953). Since the Coxeter-Todd lattice K , , ,the Barnes-Wall lattice 1\16 and the Leech lattice A24 are the densest known sphere packings in their respective dimensions and they have been useful for quantization, we develop these lattices a little further here. Let a,, a,,. . .a,,, be row vectors that are a basis for lattice A,, in 9".Define the generator matrix G for the lattice as
- 81
G=
82
.
(89)
so that for any x E A,,, x = cG, where c E 2". (90) A generator matrix for the Coxeter-Todd lattice K , , is shown in Table I11 (Conway and Sloane, 1984). Another generator matrix for Klz is given in (Sloane, 1981). We note without proof that K,, E K:,, that is, K,, is equivalent to its dual, and in the next section we study its importance for lattice quantization. The Barnes-Wall lattice is also equivalent to its dual A16 2 h:6, and a generator matrix for A16 is given in Table IV (Sloane, 1981). The Leech lattice is equal to its dual, A24 = At4, and a generator matrix for 1\24 is shown in Table V (Sloane, 1981). There are numerous other representations for K , , , 1\16, and Az4, depending upon the particular purpose at hand, and we mention others in Section VI on fast quantization algorithms. For simplicity of our development here, we leave these alternative representations to the literature (Sloane, 1981; Conway and Sloane, 1984; Conway and Sloane, 1982~).
TABLE Ill A GENERATOR MATRIX mR THE COXETER-TODD LATTICE, K,, 2 K:, (CONWAY AND SLOANE, 1984)5
- 2 0 0
0 0 0
0 2 0
0 0 0
0 0 2
0 0 0
0 0 0
0 0 0
0
0
1
0
0
0
I
0
0 0
0 0 0
0
I -2
J
0 0 0
s
l
2
J
2
TABLE IV A GENERATOR MATRIX FOR THE BARNES-WALL LATTICE, AL6z A:6 (SLOANE, 1981)6 ~
~~
4
2 2 2 2 2 2 2 2 2 2 1 0 0 0
2 0 0 0 0 0 0 0 0 0 1 1 0 0
2 0 0 0 0 0 0 0 0 1 1 1
2 0 0 0 0 0 0 0 1 1 1
2 0 0 0 0 0 0 0 1 1
2 0 0 0 0 0 1 0 1
2 0 0 0 0 0 1 0
0
2 0 0 0 1 0 1
2 0 0 1 1 0
2 0 0 1 1
2 0 0 1 0 1 1 1 1 0 1 0 1
1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
295
1 0 1 0 0 1 1 1 1 1
0 0 0
3 2
296
JERRY D. GIBSON A N D KHALID SAYOOD TABLE V
A GENERATOR MATRIXFOR THE LEECHLAITICE,A,, = AT4 (SUIANE, 1981)' -
4 2 2 2 2 2 2 2 2 2 2 2
2
2 2 2 2
0
2 2 2
0
2 2 2
1 1 1 0 1 1 1 0 0 0 1 0 1
I I I 1 1 I 1 1 1 I
0 1 0 0 0 1 1 1 0 1
1 0 1 0 0 0 1 1 1 0
1 1 0 1 0 0 0 1 1 1
0 1 1 0 1 0 0 0 1 1
1 0 1 1 0 1 0 0 0 1
1 1 0 1 1 0 1 0 0 0
1 1 1 0 1 1 0 1 0 0
0 1 1 1 0 1 1 0 1 0
0 0 1 1 1 0 1 1 0 1
0 0 0 1 1 1 0 1 1 0
1 0 0 0 1 1 1 0 1 1
1
- + + f t f t + ' t + "2 "2 'i "i
-
1
I 1
0
i
i
i
VI. LATTICE QUANTIZER DESIGN Recall from previous sections that an N-dimensional VQ is completely determined by specifying the partitions hiand the output vectors Yi, i = 1, 2 , . . ., L , such that the p i are nonoverlapping and completely cover LZ". Two necessary conditions that must be satisfied for an optimal VQ are given in Eqs. (22) and (23) of Section IV. Given one of the lattices in Section V, we design a lattice V Q by letting the lattice points be the output points Yi, and we construct Voronoi regions or Dirichlet partitions about these lattice points according to Eq. (22). Note that under the assumption that the N-dimensional input pdf f(X)is uniform, then once we construct the Voronoi region around a lattice point, the output points (lattice points) are guaranteed to be the centroids of their respective regions (see Eq. (23)). Consider the following examples. For the Z' lattice the Voronoi regions are simply intervals along the real line of size L* about each lattice point. Clearly, this partition covers 9' and the output points are centroids of their
LATTICE QUANTIZATION
297
FIG. 12. Voronoi regions and output points for the 2’ lattice.
respective regions. The 2, lattice has Voronoi regions that are squares with the lattice/output points at the center of each square as shown in Fig. 12. Similarly, the output points and Voronoi regions for the A, lattice are shown in Fig. 13. The hexagonal partitions in Fig. 13 can also be obtained by considering A, as a sphere packing in W2.In fact, A, is the laminated lattice packing in W 2as mentioned in Section V. The hexagonal partitions can be generated from the laminated lattice by connecting the deep holes. Alternatively, the spheres in the laminated lattice packing can be expanded in
FIG.13. Voronoi regions and output points for the A , lattice.
298
JERRY D. GIBSON AND KHALID SAYOOD
radius until all of W 2 is just covered, which, incidentally, occurs just when the spheres pass through the deep holes. The expanded spheres overlap, but if we draw chords through each overlapping region, we get the hexagonal covering in Fig. 13. Figure 14 demonstrates the last two interpretations based upon the laminated lattice packing (Sloane, 1984). In three dimensions, some interesting Voronoi regions are the Voronoi region for the A: s DT lattice, called a truncated octahedron, shown in Fig. 15, and the Voronoi region for the A, E 0,lattice, called a rhombic dodecahedron, shown in Fig. 16. The Voronoi regions are shown inscribed in a cube so that the locations of the surrounding lattice (output) points, shown as dark dots, can be easily discerned. There is an output point at the center of each Voronoi region as indicated, and in each figure there is one lattice (output) point behind the Voronoi region that is suppressed (not seen) to avoid confusion. The lattice points for the A: s 0: quantizer in Fig. 15 are at the vertices of the cube and at the center of the cube, while in Fig. 16, the output points are at the midpoints of the edges of the cube and at the center of the cube (Gersho, 1979; Conway and Sloane, 1982a and 1984; Barnes and Sloane, 1983; Joshi, 1977). The A: lattice is sometimes called the body centered cubic lattice, and the A , lattice is sometimes called the face centered cubic lattice (Conway and Sloane, 1982a and 1984; Barnes and Sloane, 1983). The origin of these names is not transparent, but the terms come from the following interpretation (Coxeter, 1961). Consider a simple lattice in 9, whose points have only even coordinates. In fact, it is instructive to consider the cube with vertices at (O,O, 01, GO, O), (0,2, 01, (O,O,21, (2,2, O), GO, 2), (0,2, 2), and (2,2,2). The center of a face in this cube has two odd coordinates and these points along with the point at the origin are equivalent to the A, E D,lattice, thus the name “face centered cubic lattice”. Further, the center of the cube or “body”
FIG. 14. Generatingthe hexagonal covering of W 2from the laminated lattice sphere packing (Sloane, 1984).*
LATTICE QUANTIZATION
299
Fic;. 15. Voronoi region and surrounding output points for the A : z 0: lattice quantizer.
FIG.
16. Voronoi region and surrounding output points for the A 3 2 D, lattice quantizer.
300
JERRY D. GIBSON AND KHALID SAYOOD
has three odd coordinates and these points (using all other cubes nearest the origin) along with the origin are equivalent to the A: z 0: lattice, motivating the nomenclature “body centered cubic lattice.” It is also of interest to note that the term Brillouin zone is sometimes used. The Brillouin zone of a lattice A is the Voronoi region of the dual or reciprocal lattice A*. Thus, the Brillouin zone of A , is a truncated octahedron, which is the Voronoi region of the A: lattice; and similarly, the Brillouin zone of the A j lattice is the rhombic dodecahedron, which is the Voronoi region of the A 3 lattice. We finally point out that the Voronoi region of a lattice, which is a nearest neighbor or Dirichlet partition, may also be called a Wigner-Seitz cell for the same lattice (Coxeter, 1961). Another way to construct uniform quantizers that is consistent with the finite reflection group technique for generating the root lattices and that has considerable intuitive appeal is to cover space with copies of an admissible polytope. This approach, due to Gersho (1979),is particularly insightful in 9* and 9,. A convex polytope P is said to tile or cover space or to generate a tessellation of 9Nif a partition of g Nexists where all regions are congruent to the basic polytope P. Furthermore, Gersho defines the class of admissible polytopes as those polytopes which tessellate (or tile or cover) g Nand which constitute a Dirichlet partition with respect to the centroids of each region. Thus, admissible polytopes in W 2are the equilateral triangle, the rectangle, and the regular hexagon. In three dimensions, the space-filling polytopes are the cube (Fig. 17), the hexagonal prism (Fig. 18), the rhombic dodecahedron (Fig. 19),the elongated dodecahedron (Fig. 20), and the truncated octahedron (Fig. 21) (Gersho, 1979; Lyusternik, 1963).Of course, the cube is the Voronoi region for the Z 3 lattice, the rhombic dodecahedron is the Voronoi region for the A , z 0,lattice, and the truncated octahedron is the Voronoi region for the A j z 0: lattice. It is more difficult to conceive admissible polytopes in B Nfor N > 3, but Gersho points out that one method is to form cross-products of lower dimensional polytopes. For example, Z N is the Nth cross product of the
FIG.17. The cube in W ’.
FIG. 18. The hexagonal prism.
LATTICE QUANTIZATION
FIG. 19. The rhombic dodecahedron.
30 1
FIG.20. The elongated dodecahedron.
interval, and the hexagonal prism is the cross product of the regular hexagon in W 2and the interval. A polytope in g 5can be generated by the cross product of the regular hexagon in W 2 and the truncated octahedron in W3.From Section V recall that lattices built from direct products of lower dimensional lattices are called reducible and have a dimension equal to the sum of the dimensions of the lower dimensional systems. A lattice that cannot be written as a direct product of lower dimensional systems is called irreducible. Carrying this nomenclature over to the admissible polytopes, we could say that the cube is reducible but that the truncated octahedron is irreducible. The descriptions of the Voronoi regions of the root lattices and their duals lattices are far from trivial. In fact, the Voronoi and the K , , , AI6, and are not known, but Conway and Sloane regions for E & E:, K , , , A,,, and (1982a) have determined the Voronoi regions for the lattices A,(n 2 l),
FIG.21. The truncated octahedron.
302
JERRY D. GIBSON AND KHALID SAYOOD
A,*(n 2 l), D,(n 2 3), D:(n 2 3), E 6 , E , , and E 8 = EQ. The method they used requires a description of the fundamental simplex for the affine Weyl group of the lattice. This development would take us further into group theory and into Lie algebras, and hence, for simplicity, we do not pursue the description of Voronoi regions for lattices further here. Fortunately, in order to implement a lattice quantizer we do not need to know its Voronoi region; we need only be able to determine which lattice point a given vector is nearest to according to Eq. (22), which we can do. A knowledge of the Voronoi regions is useful for theoretical performance predictions of lattice quantizers, however, as given by Conway and Sloane (1982a). Therefore, given a particular lattice, we can use it for vector quantization by simply calculating the nearest lattice point for a particular input vector. However, we are not quite finished with designing a lattice VQ, since the lattices are generally infinite and we are only interested in L output points, even though L may be large. We must therefore restrict the number of lattice points that are possible output points. One possible selection rule is to generate a codebook with L vectors that has a minimum peak energy, where peak energy is defined as the squared distance of the output point (lattice point or code vector) furthest from the origin. This minimum peak energy rule entails filling the codebook (choosing allowable lattice points) with L points from the innermost shells of the lattice, where a shell or layer consists of all points that fall a fixed distance from the origin. The number of lattice points in each shell is available from the coefficients in the theta function for a given lattice. Sloane (1981) has found the lattices and has theta functions for the A,,, D,, D:, E,,, K 1 2 ,A169 and tabulated the number of points in the innermost shells for the A , , D , , D:, D 4 , E 6 , E , , E 8 , K 1 , , 1\16, and A24 lattices. As an example, the A , hexagonal lattice with its five innermost shells is shown in Fig. 22.
FIG.22. The five innermost shells for the A , lattice containing 1, 6, 6, 6, and 12 points (Sloane, 1981).9
LATTICE QUANTIZATION
303
Another lattice codebook selection rule, due to Conway and Sloane (1983), is based on the Voronoi region of a lattice point. Specifically,for a lattice A in %", we define the Voronoi region around a lattice point Yi as
and we let V(0)denote the Voronoi region around the origin. We call V(0)the Voronoi region of the lattice. For the positive integers r = 1,2,3,. . . ,we let V, denote the Voronoi region for the lattice rA, so that the Voronoi region of A is magnified r times. Recall that the volume of the Voronoi region is equal to the determinant of the lattice, so we note that V, has the volume det(rA) = d(rA) = rNd(A).Thus, V, has r N times the volume of V(O), and since Y(0) contains one lattice point, V, contains r N lattice points. Therefore, we define a Voronoi code with L = r N codewords (or output points) as all vectors x - a for x E A n (a V,) for some vector a E W N .We denote the Voronoi code by C,,(r, a) and discard all lattice points not in the code. The vector a is included to prevent lattice points from falling on C = {x,, . . .,x,}, define the the boundary. For some Euclidean code in g N , centroid
+
and average energy
where d , is the minimum distance between codewords (lattice points). It is desirable to choose a so that C has the smallest average energy, which since the Voronoi codes usually have their centroid at a, gives % = a. Figure 23 illustrates the construction of the Voronoi code CA,(4,a) with a = (-$, 0). The Voronoi region V, for the lattice 4 A, is shown by the hexagonal dashed line. The Voronoi code consists of all lattice points falling within the solid hexagonal line, which are the L = 42 = 16 points with circles around them. Conway and Sloane (1983) present more details on this example and construct Voronoi codes based on the D, and E , lattices. These two approaches are rather ad hoc and it seems that better quantizers could be obtained by a more refined selection rule. Unfortunately, our choice of selection rules is limited by the requirement that we maintain the regular structure of the lattice. In the next section we show how this regular structure is used to obtain fast encoding algorithms for lattice quantizers.
304
oj
JERRY D. GIBSON AND KHALID SAYOOD
. . . . . . . . . . . . O
' 0
<,
\
xo
O
U
O
O
I
OJ /
,0 ," '
. . . . . . \
/
Fic;. 23. The Voronoi code C,, (4,a) with a = (-4,O) within the solid hexagon (Conway and Sloane, 1983)."
VII. FASTQUANTIZATION ALGORITHMS Because of the regular structure of lattices, it is relatively easy to obtain fast quantization algorithms for lattice quantizers. Conway and Sloane (1982) have developed fast quantization algorithms for the vector quantizers based upon the root lattices and several other important lattices (1984), while Sayood, Gibson, and Rost (1984) have described a general algorithm for use with large codebook lattice quantizers. Several of these fast quantization algorithms are presented in this section. The algorithms of Conway and Sloane (1982b) are based upon noticing the specific properties of the root lattices. For example, the D, lattice points are an integral combination of vectors of the form +ei k e j . This implies that the sum of the coordinates of each lattice point is always even. Furthermore, if we overlay the 0, lattice on a rectangular lattice, we find that the 0, lattice consists of those points of the rectangular lattice whose coordinate sum is even. Thus, to obtain the closest D,lattice point to an input vector, all we have to do is find the closest point of the rectangular lattice with an even coordinate sum. The defining characteristic of the A, lattice is that if we embed it in a n + 1 dimensional space, the lattice points are integral combinations of the vectors of the form ei - ei+ i = 1, 2,. . .,n. This means that the coordinate sum of the lattice points is zero. Therefore, if we project the n-dimensional input point onto the hyperplane xi = 0, then all we have to do is find the closest point with integral coordinates on this hyperplane. Relying on similar properties, fast quantizing algorithms for the other root lattices can be found. Before describing these algorithms more carefully, we define a few special functions which simplify the presentation of the algorithms (Conway and
LATTICE QUANTIZATION
305
Sloane, 1982b). For a real number x , define f ( x ) = closest integer to x ,
so f ( 0 . 7 ) = 1 and f(-0.3) = 0. In case of a tie, we pick the integer with the smaller magnitude. The difference between x and f ( x ) is denoted by 6 ( x )= x
-
f(x).
We also define the function w ( x ) as w(x) = the next closest integer to x,
so w(0.7) = 0 and w( -0.3) = - 1. In case of a tie, we pick the integer with the larger magnitude. For a vector x E a",we define the vector version of these functions in a componentwise fashion as f ( x ) = (f(Xl),f(X2),....f(Xn)). Now, given x E W", let k be an integer such that
and and define the function g ( x ) by g ( x ) = (f(x,),f(x2),.. ~ ~ w ( x ~ L . ~ . > . f ( x n ) ) .
Thus, if x = (1.7,-0.4,O.Q f ( x ) = (2,0, l), 6(x) = (-0.3,-0.4, -0.2), and w ( x ) = (1, - l,O), and hence, g ( x ) = (2, - 1,l). The specific algorithms for the root lattices and their duals are developed in the following. Dn
As noted previously, to find the closest point in the 0,lattice, we find the closest point in the rectangular lattice whose coordinate sum is even. By construction, the functions j ( x ) and g ( x ) provide the closest and the next closest points in the rectangular lattice, and their coordinate sums differ by exactly one. Therefore, one of them will have an even coordinate sum while the other will have an odd coordinate sum. To find the closest point in the D, lattice, we simply pick the one with the even coordinate sum. For example, if x=(1.2,0.9,-0.8,0.3),f(x)=(1,1,-1,0), w(x)= (2,0,0, l), 6 ( x ) = (0.2, -O.l,O.2,0.3), and g ( x ) = (1, I , - 1,l). The coordinate sum of f ( x ) is 1 and the coordinate sum of g ( x ) is 2, so the lattice quantizer output vector is
y = Q ( x ) = (1,1, - 1,l).
306
JERRY D. GIBSON A N D KHALID SAYOOD
An
The first step when quantizing with the A, lattice is to map the ndimensional input point onto the plane n + l x i = 0 in n + 1 dimensional space. The lattice points in this plane are integral combinations of the basis vectors ei- ei+ i = 1, 2,. , . ,n, and all that is needed to accomplish the quantization operation is to find the closest point on the plane with integer coordinates. The final step is to transform the output lattice point from the plane in n + 1 dimensions back into n dimensions. The 1 x n input vector x is projected into the plane xi = 0 in n + 1 dimensions by post-multiplying x by an n x (n 1) matrix P to obtain the 1 x (n + 1) vector x'. The transformation matrix P can be found as follows. Let U be an n x n matrix whose rows are basis vectors for the A, lattice in n-dimensions, and simply assume that U is in minimal canonical form, so
Xi=,
I
,,
+
u = [U'
clz:
UZ...U,] T.
(94)
The matrix U can be obtained from the Cartan matrix for A, using the GramSchmidt procedure described in Sec. V (Sayood, Gibson and Rost, 1984). Also let the basis vectors for the lattice in the plane X; = 0 be the rows of the matrix
clz,'
X ' = [x;
x;-x;+,]T,
(95)
then U and X' are related by U P = X'
(96) where P is the desired transformation matrix. Since U is square and invertible, we have from Eq. (96) that P = u-' X'.
(97)
Note that due to the construction of U and since the rows of X' are a base for the A, lattice (in n + 1 dimensions), we know that U U T = kX'(X')T, where k = 1 or k = f depending upon whether the Cartan matrix is unnormalized or normalized, respectively, before U is found. An example will clarify this point shortly. Once we have x', the procedure for finding the closest point of A, to x' is (see Conway and Sloane, 1982b): Step 1: Calculate f(x') and A = El:,' f ( x i ) . Step 2: Sort the X; in order of increasing value of 6(xi), so that 1 --
1
c 6(x;,) I6(x::,) I* * * Id(Xi"+,) I T ,
2-
where xi, has the smallest &xi), xi, has the next smallest h(x;),and so on.
LATTICE QUANTIZATION
301
Step 3: If A = 0, f(x') is the closest point of A, to x. If A > 0, the closest point is obtained by subtracting 1 from the A components f ( x i , ) ,. . . , f ( x i , ) . If A < 0, the closest point is obtained by adding 1 to the IAl components f(x:.), . . ., f ( x : , , + , +,). Step 4: Project the lattice point back into n dimensions. This procedure works because f ( x )is the closest point of Z"' to x, and if f ( x ) does not fall on the plane xi = 0, then Step 3 finds that point on the plane with integer coordinates which changes the norm of f(x) the least. The following two examples illustrate the entire process for vector quantization with A, and A, lattices, respectively. Example I:
Here we wish to quantize a two-dimensional vector with the A, lattice. We must first find the matrix P which, in turn, requires that we know U and X'. Given the Cartan matrix for the A, lattice (Fig. 4;Humphreys, 1972),
-3
[-:
we normalize the matrix such that the diagonal terms are unity, so
and then use the Gram-Schmidt procedure in Sec. V to obtain
u=[
A]. 0
11 --
2
Since the basis vectors for the lattice in n i = 1, 2 , . . .,n, we can write XL"0
(99)
-
2
- 1O
+ 1 dimensions are ei- ei+
-;]
and hence, from Eqs. (96) and (97), we can calculate p =
u-' X' =
A
JIZ
308
JERRY D. GIBSON AND KHALID SAYOOD
4
We note that U U T = 3 X ’ ( X ’ ) T ,where the factor of results since U was obtained from a normalized version of the Cartan matrix; without the normalization of G , we would have found that U U T = X ’ ( X ’ ) T(see Conway and Sloane, 1982b). We are now ready to quantize a given two-vector. Let the vector to be xi = 0 quantized be x = (1,l). First, this vector is transformed to the plane in n + 1 dimensional space by post-multiplying by P to yield
~~~:
X’
= (1.577, - 1.155, -0.423).
(102)
Figure 24 gives a geometric view of what is being accomplished by the transformation matrix P. In Fig. 24(a), the A , lattice points nearest the origin in 9’are shown, while Fig. 24(b) shows the plane onto which B 2 is projected by the matrix P. From Step 1 of the procedure, we have
f(x’) = (2, - 190)
(103)
with 3
A=
C f(x;) = + l . i= 1
In Step 2 we calculate S(X‘)
= X ’ - f(x’) =
(-0.423, -0.155, -0.423)
(105)
which upon ordering the components gives 6(x;) I qx;) I 6 ( x ; ) .
(106)
Since 6(x;) = 6(x;) and A > 0, we could subtract 1 from either f ( x ; ) or f ( x ; ) . Subtracting 1 from f ( x ; ) yields the closest lattice point in the plane, (1, - l,O), which we then multiply by t P T to find 1
-
1
fi
-3= (;, 2
O
-1
Thus, the quantized value of x = (1,l) is
1 -
8
$).
LATTICE QUANTIZATION 0
0
0
a
0
a
0
0
309
a
0
t
(1 ,O*-l)
(b) FIG 24 The A , lattice in d zand .d3
Y
310
JERRY D. GIBSON AND KHALID SAYOOD
What if we had subtracted 1 from $(xi) instead of $(xi)? Then the closest lattice point in the plane would have_been (2, - 1, - l), which when reflected back into two dimensions is
(:,$).Note that the Euclidean distances
between x and this last vector and between x and y = Q(x) in Eq. (108) are the same, and thus, either point is an acceptable output vector. The input vector (1,l) is marked by an “X” in Fig. 24(a), where it is clearly evident to be equidistant from 1 4 and 2, 3 & 2’ 2 An alternative quantization procedure for the A, lattice is to note that it is the union of two rectangular lattices, so that a vector can be quantized by finding the closest point in each of the rectangular sublattices and then finding which of the two candidates is closer by direct calculation (Gersho, 1982).This is a simple example of the “union of cosets” approach described shortly in the subsection on En.
(- -)
(
i>.
Example 2: As a slightly more complicated example, we consider a vector quantization problem using the A , lattice. The Cartan matrix for A , with the diagonal terms normalized to one is 1 1 2
-
0
-
1 -2
0 1 2 ’
1 -
-
1 2
1
--
-
which, based upon the Gram-Schmidt procedure in Sec. V, allows us to generate 1
0
0
(1 10)
U =
The X ’matrix is formed from the basis vectors for the lattice in the plane C x : = 0 as 1 -1 0 (111)
1 -1
LATTICE QUANTIZATION
31 1
From Eqs. (110)and (11 l),we find P to be
Now suppose that we wish to quantize the vector x = (I, multiplying by P gives
-9,g).Post-
X P = (0.854,- 1.146,1.354, - 1.061).
(113) which has xi 10, and hence falls on the plane defined by the rows of X’ in Eq. (11 1).From Step 1, we find
x:=
X’ =
f ( x ’ ) = (1,- 1,1,
-
1)
and A
= 0.
Step 2 yields S(x’) = x’ - f ( x ’ ) = (-0.146,-0.146,0.354, -0.061),
(1 16)
the components of which are ordered as
I I& x i ) I S(x;). (117) Note, however, that since A = 0, Eqs. (1 16)and (117)are not needed, and f ( x ’ ) in Eq. (114)is the nearest A, lattice point in the plane. The output vector in three dimensions is obtained by post-multiplying f ( x ’ ) by P T giving qx;)
We note that since A, z D,, we could have also used the D, lattice, which has the fast algorithm previously described.
En There are numerous possible representations of the lattices E , , E , , and E,, but the definitions given here have been the most useful for finding fast quantizing algorithms (Conway and Sloane, 1982b). We begin with the E, lattice since E, and E, are subspaces of E,. The E, lattice is the union of the D , lattice and the coset
1 1 1 1 1 1 1 1 2’ 2’2,292’2,j , j
)+
D,.
312
JERRY D. GIBSON AND KHALID SAYOOD
Note that this last coset is often written in the form
4
where the superscript on the means that the is repeated eight consecutive times. The definition of E , follows very simply since E , is a subspace of dimension 7 in E8 specified by the points ( x l ,x 2 , .. .,x , ) E E , with x7 = - x , . Similarly, E6 is defined as a subspace of dimension 6 in E , with ( x l , x z , . .. , x , ) E E , and x6 = x , = - x , . For the purposes of finding a fast quantization algorithm, the alternative definition of E , in terms of A , given by E7=A7u((-i
14 , i14 ).
A,)
has been used. Since we already have fast algorithms for D, and A , , and E , and E , can be written as the union of cosets of D, and A , , respectively, we can use the following approach (Conway and Sloane, 1982b). If @(x) is the algorithm for finding the closest point of the lattice A to a point x, then the closest point of the coset r + A to x is @(x - r)
+ r.
( 120)
For a union of cosets, which is the situation for E , and E , , we simply find the closest point of each coset to x, and then of these candidates, select that point as the output vector which is closest to x by direct calculation of Euclidean distance. More explicitly, if a lattice 9can be expressed as a union of d cosets of the lattice A, so
u (ri + d
Y
=
A).
i= 1
for each coset we can find the candidate output points yi = @(x - ri)
+ ri.
(122)
The closest output vector is that yi such that for all i # j . Example 3:
We wish to find the closest point of E , to the input vector x =(0.1,0.1,0.8,1.3,2.2,-0.6,-0.7,0.9).
( 124)
LATTICE QUANTIZATION
313
We first find the point in D, closest to x by the algorithm demonstrated previously for D,. Thus, we compute
f(x) = (0,0,1,1,2, - 1,
- 1,1)
and g(x) = (O,O, 1,1,2,0, - 1,1),
and since the sum of the components of g(x) is even, y1 = g(x). Next we must find the closest D, lattice point to x - in, so we compute
I( x -
1”)
= (0,0, 0, 1,2, - 1, - 1,O)
and g
Therefore,
(x - - :”)
y2 = g(x
-
=(-1,0,0,1,2,-1,-1,0).
1”) +);(
= (-0.5,0.5,0.5,
1.5,2.5,-0.5,-0.5,0.5).
By direct calculation,
IIx
-
= 0.65
IIx
-
= 0.95,
and
so y1 =
n o , 1,1,2,0, - 1,1)
is the closest point of E , to x. For E , we can use the same approach in conjunction with the previously described quantization algorithm for A,. We do not give an example here, since the procedure is similar to that just demonstrated (Conway and Sloane, 1982b) 0:
Like many of the lattices discussed in this chapter, the lattice D, is and thus D: can be written as a union of cosets contained in its dual D,*, of D, as 4
0: =
U (ri + D,) i= 1
314
JERRY D. GIBSON AND KHALID SAYOOD
where rl = (0"), r2 =
(1")
(i ,:). I-
r3 = ( O n - ' , 11, r4 =
1
With Eq. (127), a fast quantization algorithm for D,*can be developed using the algorithms for D, and the method for lattices that are the union of cosets as described in the section on E,,. However, an alternate definition of D,, leads to a faster algorithm. Designating the rectangular lattice by Z", the D,* lattice is given by (Conway and Sloane, 1982b)
D,*= 2" v
(:)
+ Z").
Comparing Eqs. (127) and (128), it is'evident that the latter expression will yield a faster algorithm since there are only two cosets and quantizing with 2" is easier than with 0,.Illustrative examples of the two methods are given in Conway and Sloane (1982b). We present here only the simpler method for the example of the body-centered cubic lattice Dj. Example 4:
From Eq. (128),the 05 lattice is the union of two cosets of Z 3 which have Given the coset representatives rl = (O,O,O)= (0') and r2 = (+,+,+) = (t3). the input vector x = (0.2,0.5,0.8), we first find y 1 = f(x) = (O,O, l), and then compute f(x - r2) = f(( - 0.3,0,0.3)) = (O,O, 0) so y2 = f(x - r2) + r2 = (0.5,0.5,0.5). Calculating the distance between x and yl,
IIx
-
IIx
- yzl12 = 0.18,
y11I2 = 0.33
and between x and y2, (130) we conclude that y2 = (0.5,0.5,0.5) is the closest lattice point of 03 to x. (Conway and Sloane, 1982b). A,*
The lattice A,, is also contained in its dual A,*,and hence A,* can be written as a union of cosets of A,,,
u
n+ 1
A,* =
i=l
(ri
+ A,,)
(131)
315
LATTICE QUANTIZATION
where
r. = I
((
(i - I)j)
- j >”,
n+l
n+l
’
+ 1, and j = n + 2 - i. A fast quantization algorithm for A,* thus consists of the fast algorithm for A,, and the “union of cosets” technique already used for E,, and 0;.Note, however, that for A,*, the number of cosets, and hence the number of lattice points to be compared by direct calculation, grows linearly with n, the number of dimensions. i = 1,2, ..., n
E,*
The dual lattice EQ = E,, so a new algorithm is not required. For the dual E : , we note that
where si =
((2)
, (k+)2J),
2(i - 1 )
(134)
i + j - 1 = 4. It is evident that the nearest point of E: to an input vector x can thus be found from the algorithm for A, and the union of cosets approach. Fast quantizing algorithms for the Coxeter-Todd (K12),Barnes-Wall (A 16), and Leech (A24)lattices also exist and are developed in Conway and Sloane (1984). All of these fast algorithms make use of the union of cosets method.
The K12 lattice has a sublattice that is isomorphic to A ; with 64 coset representatives. Thus, the fast algorithm requires that we find the candidate output point for each coset and then compare the given input vector to all 64 candidates to find the closest one. A 16
The Barnes-Wall lattice 1\16 has a sublattice 2DI6with 32 coset representatives that are the codewords of the [16,5,8] first-order Reed-Muller code. Using the fast algorithm for 0,from Conway and Sloane (1982b), we generate the 32 candidate points from which we choose the closest by direct calculation (Conway and Sloane, 1984). For simplicity, we do not attempt to develop the Reed-Muller codes here (see MacWilliams and Sloane, 1977). A24 The Leech lattice A24 has a sublattice 4024 with 8192 coset representatives of the form 2c and 2c + u, where u = (- 3 , 1 , 1 , . . . , 1 ) and c consisting of the
316
JERRY D. GIBSON AND KHALID SAYOOD
vectors of the [24, 12, 81 Golay code (MacWilliams and Sloane, 1977). Therefore, for this lattice, 8192 direct distance calculations must be performed for each input vector which yields a relatively slow quantizing algorithm (Conway and Sloane, 1984).
VIII. PERFORMANCE COMPARISONS Performance evaluations for lattice quantizers are based upon the conjectures mentioned in Section IV which imply that the Voronoi regions of good lattice quantizers are those which best approximate a sphere in WN. Furthermore, since the best covering of W Nis a dense packing of nonoverlapping spheres, we may find good lattice quantizers by looking for dense sphere packings in W Nwhere the sphere centers are the lattice points. The sphere packing interpretation is very useful for gaining insight into the problem by examining spaces with dimension N I 3. In one dimension the densest lattice packing is called 2’ with the lattice points corresponding to the integers. As shown in Fig. 25(a), the “spheres” are line segments of unit length, and the entire space is covered by nonoverlapping spheres. A lattice packing in two dimensions is Z 2 , as shown in Fig. 25(b), which has spheres centered at every point in the plane with integer coordinates. The nonoverlapping spheres clearly do not cover W 2 . Another two-dimensional lattice packing is the hexagonal or triangular lattice packing, denoted by L , and A , and illustrated in Fig. 25(c).This packing is constructed by forming one layer of spheres with centers at the integers along the horizontal axis and then adding a layer of spheres that fits in the “slots” of the first layer. The third layer, like the first layer, has sphere centers that are integers in the x-coordinate, and the process Which is is continued. The nonoverlapping spheres in L , also do not cover 9,. the denser packing, Z 2 or L,? The density of a lattice packing is that fraction of the space covered by spheres, and can be calculated by dividing the volume of a sphere by the volume of space nearer to its center than any other center. Thus, for 2’ the density is 1, for Z 2 the density is n/4 Z .7954, and for L , the density is 7c a 1 6 E .9069. The denser sphere packing is therefore L,. A dense sphere packing is not guaranteed to yield a good quantizer, and hence it is necessary to calculate the distortion associated with each lattice when used as a quantizer. A lattice quantizer can be constructed from the lattice packings Z 2 and L , by forming Voronoi, or nearest neighbor, regions about each lattice point (sphere center), which is the output point for the particular region of interest. The Voronoi regions are squares in Fig. 25(b) and hexagons in Fig. 25(c). To find the MSE per dimension, we simply find the average squared error between the output point and all other points in the region. For the Z ’ , Z 2 , and L , quantizers, the MSE per dimension
317
LATTICE QUANTIZATION
ONE DIMENSION 21 1
.
1
. 1
1 -
3
. 1
1 -
1
4
1 -
5
“SPHERE”
-
1
1
6
,
1 -
7
-
1
.
1
I
8
9
I
-
10
,
1
(4
TWO DIMENSIONS A
Lz
Z2 =D2
t
(4 FIG.25. Sphere packings in one and two dimensions (Sloane, 1984).”
-
318
JERRY D. GIBSON AND KHALID SAYOOD
can be readily evaluated as & = 0.08333.. .,& = 0.08333.. .,and 5/36$ = 0.0801875.. .,respectively, assuming a uniform input distribution (Makhoul, Roucos and Gish, 1985; Conway and Sloane, 1982a). Although it is simple to calculate the MSE for quantizers in one and two dimensions, the calculation becomes increasingly difficult in higher dimensions. The structure of lattices provides assistance in these cases. The basis vectors of a lattice A can be selected in many different ways, and so there is tremendous flexibility in specifying a lattice quantizer. The generator matrix for A is defined as the N x N matrix
and the determinant of A is det A = (det MMT)l12 = ldet MI.
(136) The determinant of a lattice gives an indication of the amount of space represented by a single point of the lattice, so the determinant of a lattice is the volume of that lattice's Voronoi region (Conway and Sloane, 1982a). Furthermore, the density of a lattice (sphere) packing of radius p is
A = - VNpN det A
(137)
where A"2
is the volume of the unit sphere in 9". Some of the most important lattices for VQ design are the root lattices A N ( N 2 l), DN(N 2 2), and EN(N = 6,7,8) and their duals which yield the densest known sphere packings and coverings for N I 8 (Conway and Sloane, 1982b). As an example of the calculation of the quantities in Eqs. (136) and (1 37), consider the two-dimensional lattice A&,). The basis vectors for this lattice are a1 = (l,O), a2 = so that
(-+,q),
319
LATTICE QUANTIZATION
and d e t h = det M = &/2 (Sloane, 1981). With p = 1/2, we have from Eq. (137) that A = .9069, which agrees with the earlier direct calculation. It is also easy to check that with p = 1/2, the volume of the A, lattice’s Voronoi region (a hexagon with side l / d ) is det A = &/2. The dual lattice A t only differs from A, by a rotation and scale change, so these lattices are considered equivalent, which is indicated by the notation A t A , . To compute the MSE per dimension for lattice quantizers in higher dimensions, it is common to rely on Gersho’s previously mentioned conjecture that for large L, the Voronoi regions of an optimal quantizer are all congruent to some polytope, say P, and define quantities called the volume, the unnormalized second moment, and the normalized second moment of P, respectively, as vol(P) = Jp dx,
(139) ( 140)
and
W) I(P)=vol(P) ’ where P is the centroid of P. Using Eqs. (139)-(141), we can then define the dimensionless second moment of P, denoted G ( P ) ,to be
Gersho (1979)calls the quantity in Eq. (142)the coefficient of quantization, but it is equivalent to the MSE per dimension for large L as previously calculated for lattices in dimensions 1 and 2 under the assumption of a uniform input distribution. The connection between G(P) and the MSE per dimension can also be made through a result of Zador’s. If the MSE per dimension is I
f
then under rather general assumptions on f(x), Zador (1982) showed that (N + 2 ) / N
lim L z / N D ( N = ) GN( JSN f ( x ) ~ / ( ~ + 2 ) d x )
L-00
(14)
where G N does not depend upon f(x). Therefore, GN is interpreted to be the minimum MSE per dimension achievable by vector quantization, and
320
JERRY D. GIBSON A N D KHALID SAYOOD
assuming (as Gersho conjectures) that the Voronoi regions are all congruent to some polytope P, then GN = min G ( P ) (145) P
where the minimum is taken over all admissible N-dimensional polytopes (Conway and Sloane, 1982a). Since GN does not depend upon f(x), any convenient f(x) can be used to find GN, and hence, f(x) is often chosen to be uniform. If Eq. (145) holds, then we can find GN by calculating G ( P ) for all admissible N-dimensional polytopes and selecting the smallest as GN.If the conjecture does not hold or if we cannot specify all possible admissible polytopes, then we still have an upper bound on GN by finding G ( P ) for any admissible P. For N = 1, the optimum uniform quantizer is a uniform partition of the real line and G , = & = 0.08333.. . . In two dimensions there are many admissible polytopes, including all triangles, quadrilaterals, and hexagons (Gersho, 1979), but the minimum MSE per dimension is achieved by the hexagonal quantizer based upon the A 2 lattice, and G2 = 5/36fi = 0.0801875.. . (Conway and Sloane, 1982a; Newman, 1982). Gersho (1979) specified five admissible polytopes in three dimensions, namely, the cube, the hexagonal prism, the rhombic dodecahedron, the elongated dodecahedron, and the truncated octahedron, and found by calculating G ( P ) for all five polytopes that the truncated octahedron had the smallest G(P)of these five which is 0.0785433.. ..Table VI lists G(P)for four of the admissible polytopes in W 3 .He conjectured that this value was not just an upper bound to G,, but that the truncated octahedron is the optimal polytope in three dimensions so that G , = 0.0785433.. .. This conjecture is proved by Barnes and Sloane (1983)who show that the optimal lattice quantizer in three dimensions is based upon the body centered cubic lattice 0: E A : , which has Voronoi regions that are truncated octahedra. As the dimension of the VQ increases, the problem centers around finding admissible space-filling polytopes and then evaluating G(P).The principal TABLE VI G ( P ) FOR FOURPOLYTOPES IN g3(CONWAY AND SLOANE, 1982)”
Cube Hexagonal Prism Rhombic Dodecahedron Truncated Octahedron
,0833333.. . ,0812227.. . ,0787451 . . . ,0785433.. .
32 1
LATTICE QUANTIZATION
approach to solving this problem has been to determine the Voronoi regions corresponding to the root lattices in each dimension, calculate G ( P ) ,and select the lattice with the smallest G ( P ) as the best known lattice quantizer of dimension N . Conway and Sloane (1982) have carried out this procedure for the lattices A N ( N 2 I), AE(N 2 l), D,(N 2 3), DE(N 2 3), E 6 , E , , and E , = E Q . Neither finding the Voronoi regions for a lattice nor evaluating the corresponding G ( P ) is necessarily simple, and different methods may have to be used for different lattices. For example, the lattice A , and its dual A ; demand quite a separate treatment for N > 2. The Voronoi regions and normalized second moment are calculated by Monte Carlo integration for the E g and EF lattices in Conway and Sloane (1984). Table VII lists the best known lattice quantizers in dimensions 1-10 along with the normalized second moment G P ) . Also shown in Table VII is something called the sphere bound. Zador (1982) showed that a lower bound to GN is (1/N + 2)Vi2” for the squared error distortion measure, where V, is the volume of an N-dimensional sphere as given in Eq. (138). This lower bound is the column labeled “Sphere Bound” in Table VII. Another lower bound to GN suggested by Conway and Sloane (1985) is presented in the “Proposed Bound” column. While this bound is tighter than the sphere bound, only a plausability argument has been given for its validity. Another way to find candidates for good vector quantizers in N dimensions is to study lattices which have the densest known sphere packings. Lattices which fall in this category are the Coxeter-Todd lattice K , , (Coxeter and Todd, 1953), the Barnes-Wall lattice A16 (Barnes and Wall, 1959), and the TABLE VII
BESTKNOWNLATTICE QUANTIZERS AND DIMENSIONLESS SECONDMOMENT G ( P ) (CONWAYAND SLOANE 1982 AND 1984)13
N 1
2 3 4 5 6 7 8 9 10
Sphere Bound
Proposed Bound (Conway and Sloane, 1985)
.0833 .0796 ,0770 ,0750 ,0735 ,0723 .07 13 ,0704 .0697 .069 1
,0833 ,0802 .0779 .076 1 .0747 ,0735 ,0725 .07 1 6 ,0709 ,0703
Best Lattice
G(P) ,0833 ,0802 ,0785 ,0766 ,0756 .0742 .073 1 .07 17 .0747 .0747
322
JERRY D. GIBSON AND KHALID SAYOOD
Leech lattice 1\24 (Leech, 1964 and 1967). Conway and Sloane (1984) use Monte Carlo integration to compute the normalized second moment for VQs based upon these lattices as G ( K , , ) = 0.0701, G(A16) = 0.0683, and G(A24) = 0.0658. The duals of these lattices are also contained in the original lattice, so K12 E Kf2, A16 Z A76, and A24 = AZ4, and VQs based upon these lattices are the best known quantizers in their respective dimensions. Figure 26 presents the normalized second moment G ( P ) for several important lattice quantizers, as well as the sphere lower bound, the conjectured lower bound of Conway and Sloane (1985), and Zador’s upper bound given bv
In light of the results in Fig. 26, which show that known lattice quantizers are close to the sphere bound and extremely close to the proposed bound, it is natural to inquire as to how close the performance of these quantizers is to D ( R ) . Recall the result of Gish and Pierce (1968) that the optimum entropy constrained scalar quantizer for the MSE distortion measure performs within
8
:
0.082
a
0 v1
0.080 0.078
0.076 0.074 0.072
! z
0.068 0*070 0.066
0.064 0
2
4
6
8
10
12 14
10
18 20 22 24
DIMENSION, N FIG.26. Performance comparison of several important lattice quantizers (Conway and Sloane, 1982, 1984 and 1985).14
LATTICE QUANTIZATION
323
1.53dB of D ( R )for a large number of quantization levels L. In comparison, for N = 8 and large L, the Gossett lattice (E,) quantizer with entropy coding reduces this gap to (Makhoul, Roucos and Gish, 1985)
= 0.879dB,
where G ( A , ) and G ( E , ) are taken from Table VII. The reduction in the rate R provided by a higher dimensional quantizer with respect to scalar quantization can be expressed as
and Makhoul, Roucos, and Gish (1985) plot this quantity for many of the best available lattices through dimension N = 24. We can check the results obtained in Eqs. (39) and (40) by substituting G ( A , ) 2 0.0802 (actually, 0.0801875.. .)into Eq. (147) to find a rate reduction of 0.028 bits/sample for the hexagonal quantizer over the scalar quantizer. Lattice quantizer performance results for sources other than those with a uniform distribution are relatively meager. Some performance comparisons for rates less than 2 bits/sample and Gaussian, Laplacian, and Gamma distributed sources are given in Sayood, Gibson and Rost (1984), Rost and Sayood (1984) and Rost (1984). Of course, the most important question of all is, “How well do these lattice quantizers perform for moderate N and L?” Now we are moving completely out of the realm of the performance analyses presented previously, since the source distribution may no longer be uniform and edge effects at the overload regions may not be negligible. There is a dearth of results for lattice quantizers with finite N and L, with most V Q work in this range emphasizing the LBG algorithm. One particularly striking illustration of the perceptual performance improvement available with lattice V Q is provided by the results in Sayood, Gibson and Rost (1984),where the A: lattices are used to quantize the two-dimensional discrete cosine transform (DCT) coefficients calculated on a monochrome 256 by 256 pixel image at 0.5 bit/pixel for one, four, and eightdimensional quantization. These results are reproduced here in Fig. 27(a) for scalar quantization, Fig. 27(b)for AX lattice quantization, and Fig. 27(c)for A: lattice quantization. The performance improvement is quite phenomenal. Figure 27(d) is the image that has been reconstructed using the same DCT coefficients as in Figs. 27(a)-(c), but without the coefficients being quantized. Comparing Figs. 27(c) and (d) reveals that the eight-dimensional quantizer is
324
JERRY D. GIBSON AND KHALID SAYOOD
FIG.27. A comparison of one, four, and eight dimensional lattice quantization of DCT coefficients (Sayood, Gibson and Rost, 1984). (a) Scalar quantizer. (b) Four-dimensional quantizer. (c) Eight-dimensional quantizer, (d) Ideal quantizer.”
contributing very little distortion. For more details, the reader is referred to Sayood, Gibson, and Rost (1984). Since A: is not the optimal lattice quantizer (for uniform inputs) in four dimensions and AQ is not the optimum lattice quantizer (for uniform inputs) in eight dimensions, it may be possible to improve on the performance shown in Fig. 27. Performance curves for the AN and A $ ( N 2 1) lattices and the DN and D$(N 2 4) lattices are shown in Conway and Sloane (1982a; 1984; 1985).
LATTICE QUANTIZATION
325
It is interesting to observe, as pointed out by Conway and Sloane, that for all of the results presently available, the optimal or best known lattice quantizer is the dual of the densest lattice packing (sphere packing whose sphere centers form a lattice). This goes against our intuition since it says that the best N-dimensional VQ is not the same as the best lattice cooering of Nspace. As specific examples, note from Table VII that O$ and O: are the best quantizers in four and five dimensions, but it has been shown that A$ and A are the optimal coverings. Furthermore, the best known coverings for dimensions N I 2 3 are based on the A: lattices (Ryskov and Baranovskii, 1978; Bambah and Sloane, 1982; Conway and Sloane 1984).
:
IX. RESEARCH AREASAND CONNECTIONS TO OTHERFIELDS There is much current research on lattice VQs along the same lines as the work described herein; that is, work is proceeding to find the optimal lattice coverings and packings for all those cases still open, to determine the Voronoi regions of lattices, to compute G ( P ) and ultimately GN,and to discover fast encoding and decoding algorithms. Another direction of VQ research based upon lattices is to employ lattices as an encoding tool for non-lattice VQs. One such effort is motivated by the work of Sakrison (1968) alluded to in Sec. 111. Sakrison showed that for an N-vector of Gaussian i.i.d. source samples, that as N + m, the source vectors fall with high probability on the surface of an Ndimensional sphere. Thus, a good VQ would simply place its representation vectors throughout this high probability region (the sphere surface). This same concept is pursued in Fischer (1986) for a memoryless Laplacian source. In this case, the region of high probability for the source vectors is the surface of an N-dimensional hyperpyramid. Based upon this observation for N large, a finite N VQ is proposed where the output points lie on concentric hyperpyramids, but only those points which lie on the pyramid that are also points of the cubic lattice Z N are allowable representation points. Thus, the output points lie on a lattice, and a fast encoding procedure is possible (Fischer, 1986). Applications of this pyramid vector quantization approach to speech and images are given in Fischer and Malone (1985) and Tseng and Fischer (1987), respectively. Similar research in spirit has been conducted by Adoul(1986a, b) on what he called spherical vector quantizers. These quantizers are constructed from the points of the Leech lattice (A2J that fall on the shells at various radii around the origin. The norm (radius) and the lattice point in a shell are encoded separately, and the main ideas behind this approach are that “sphere hardening” is already taking effect in 24 dimensions and that relatively
326
JERRY D. GIBSON AND KHALID SAYOOD
fast quantizing algorithms are available because of the lattice structure. Another lattice-based VQ performs the encoding in two steps (Moayeri, Neuhoff and Stark, 1985). First, the source vector is finely quantized using a V Q with a fast encoding algorithm, and second a table look-up finds the codebook output point which is closest to the finely quantized vector. Yet another research area is that of multidimensional companding. Motivated by the success of logarithmic companding for the scalar quantization of speech signals, investigations are underway to utilize multidimensional companding with lattice based uniform quantizers to produce nonuniform VQs with reduced encoding complexity (Bucklew, 1981 and 1984). A totally different research area that makes use of lattices is that of coding for reliable transmission of information over communications channels. Pertinent references for this field are Sloane (1984 and 1981), Conway and Sloane (1982b), Leech and Sloane (1971),Forney (to be published) and Forney (1984). The papers by Forney present some very interesting constructions for some of the lattices discussed here which may serve as fast algorithms for vector quantization in the near future. Further results on these topics are left to the references.
X. CONCLUSIONS An introduction to vector quantization, in general, and lattice quantization, in particular, has been provided. The development presented here shows that lattice-based vector quantizers can perform arbitrarily close to the rate distortion bound as the number of dimensions becomes large and that it may be possible to avoid entropy coding of the quantizer output points with lattice quantizers. Furthermore, fast quantization algorithms are known for many important vector quantizers. On the other hand, the performance improvement provided by lattice quantizers over scalar quantization with entropy coding may be only a few tenths of a dB. What is definitely lacking, however, is enough applications of lattice quantizers to non4.i.d. sources, such as speech and images, to be able to discern the available subjective performance gains not evident in the mathematical analyses of idealized sources. The few studies available are encouraging, but much work is needed in this area. The development in this chapter includes enough mathematical detail for the reader to be able to implement lattice quantizers for many applications and to allow judicious tradeoffs among the various lattice-based vector quantizers to be made. A mastery of the material in this chapter is a necessary background for a fruitful investigation of the literature on vector quantization and lattice quantizers; however, this material is not sufficient to conduct
LATTICE QUANTIZATION
321
research on many fundamental theoretical issues which remain unresolved in lattice quantization, such as finding the Voronoi regions of certain lattices, and a more detailed examination of the references is required to pursue this goal. ACKNOWLEDGMENT The authors are indebted to Dr. Thomas R. Fischer for numerous discussions concerning vector quantization over the past few years.
NOTES I Adapted from Table 8.1 of R. Gilmore, Lie Groups, Lie Algebras, and Some of Their Applications, copyright 01974John Wiley & Sons, New York. Reprinted by permission of John Wiley & Sons, Inc. Adapted from Table 1 on p. 59 of J. E. Humphreys, Introduction to Lie Algebras and Representation Theory, Springer-Verlag,New York, 1972. Adapted from the diagram at the top of p. 58 of J. E. Humphreys, Introduction to Lie Algebras and Representation Theory, Springer-Verlag, New York, 1972. Adapted from Table 8.2 of R. Gilmore, Lie Groups, Lie Algebras, and Some of Their Applications, copyright 01974John Wiley & Sons, New York. Reprinted by permission of John Wiley & Sons, Inc. Adapted from Fig. 4 of J. H. Conway and N. J. A. Sloane,“On the Voronoi regions of certain lattices,” SIAM J. Algebraic Discrete Methods, vol. 5, pp. 294-305, 1984, copyright 0 1984 Society for Industrial and Applied Mathematics, Philadelphia, PA. Adapted from Fig. 4 of N. J. A. Sloane, “Tables of sphere packings and spherical codes,” I E E E Trans. Inform. Theory, vol. IT-27, pp. 327-338, May 1981. Copyright 0 1981 IEEE. Adapted from Fig. 5 of N. J. A. Sloane, “Tables of sphere packings and spherical codes,” IEEE Trans. Inform. meory, vol. IT-27, pp. 327-338, May 1981. Copyright 01981 IEEE. Adapted from the figure on p. 122 of N. J. A. Sloane, “The packing of spheres,” Scientifc American, pp. 116-125, Jan. 1984. Copyright 0 1984 by Scientific American, Inc. All rights reserved. Adapted from Fig. 2 of N. J. A. Sloane, “Tables of sphere packings and spherical codes,” IEEE Trans. Inform. Theory, vol. IT-27, pp. 327-338, May 1981. Copyright 0 1981 IEEE. lo Adapted from Fig. 2 of J. H. Conway and N. J. A. Sloane, “A fast encoding method for lattice codes and quantizers,” IEEE Trans. Inform. Theory, vol. IT-29, pp. 820-824, Nov. 1983. Copyright 0 1983 IEEE. I1 Adapted from the figure on p. 1I8 of N. J. A. Sloane, “The packing of spheres,” Scientifc American, pp. 116-125, Jan. 1984. Copyright 0 1984 by Scientific American, Inc. All rights reserved. Adapted from Table I of J. H. Conway and N. J. A. Sloane, “Voronoi regions of lattices, second moments of polytopes, and quantization,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 211-226, March 1982. Copyright 0 1982 IEEE. l 3 Adapted from Table V of J. H.Conway and N. J. A. Sloane, “Voronoi regions of lattices, second moments of polytopes, and quantization,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 21 1-226, March 1982. Copyright 0 1982 IEEE.
’
’
328
JERRY D. GIBSON AND KHALID SAYOOD
l4 Adapted from Fig. 1 of J. H. Conway and N. J. A. Sloane, “A lower bound on the average error of vector quantizers,” I E E E Trans. Inform. Theory, vol. IT-31, pp. 106-109, Jan. 1985. Copyright 01985 IEEE. Adapted from Figs. 3-6 of K. Sayood, J. D. Gibson, and M. C. Rost, “An algorithm for uniform vector quantizer design,” I E E E Trans. Inform. Theory, vol. IT-30, pp. 805-814, Nov. 1984. Copyright 0 1984 IEEE.
’’
REFERENCES Adams, Jr., W. C., and Geisler, C. E. (1978). “Quantizing characteristics for signals having Laplacian amplitude probability function,” I E E E Trans. Commun. COM-26, 1295-1297. Adoul, J.-P. (1986a). “La quantification vectorielle des signaux: Approache algebrique,” Ann. Tdldcommun. 41. Adoul, J.-P. (1986b).“Decoding algorithm for spherical codes from the Leech lattice,” submitted for publication. Adoul, J.-P. (1987).“Speech-Coding Algorithms and Vector Quantization,” in Advanced Digital Communications, K. Feher, ed., Prentice-Hall, Inc. Englewood Cliffs, NJ, pp. 133- 181. Bambah, R. P. and Sloane, N. J. A. (1982).“On a problem of Ryskov concerning lattice coverings,” Acta Arithmetica 42, 107-109. Barnes, E. S. and Sloane, N. J. A. (1983). “The optimal lattice quantizer in three dimensions,” SIAM J. Algebraic Discrete Methods 4, 30-41. Barnes, E. S. and Wall, G. E. (1959). “Some extreme forms defined in terms of Abelian groups,” J . Australian Math. SOC.1,41-63. Berger, T. (1971).Rate Distortion Theory, Prentice-Hall, Inc. Englewood Cliffs, NJ. Bucklew, J. A. (1981). “Companding and random quantization in several dimensions,” I E E E Trans. Inform. Theory IT-27, pp. 207-211. Bucklew, J. A. (1984) “Two results on the asymptotic performance of quantizers,” I E E E Trans. Inform. Theory IT-30,34 I - 348. Cohn, H. (1962).A Second Course in Number Theory, John Wiley & Sons, New York. Conway, J. H. and Sloane, N. J. A. (1982a). “Voronoi regions of lattices, second moments of polytopes, and quantization,” I E E E Trans. Inform. Theory IT-28,211-226. Conway, J. H. and Sloane, N. J. A. (1982b). “Fast quantizing and decoding algorithms for lattice quantizers and codes,” I E E E Trans. Inform. Theory IT-uI,227-232. Conway, J. H. and Sloane, N. J. A. (1982~).“Laminated lattices,” Annals of Mathematics 116, 593-620. Conway, J. H. and Sloane, N. J. A. (1983). “A fast encoding method for lattice codes and quantizers,” I E E E Trans. Inform. Theory IT-29,820-824. Conway, J. H. and Sloane, N. J. A. (1984). “On the Voronoi regions of certain lattices,’’ SIAM J. Algebraic Discrete Methods 5,294-305. Conway, J. H. and Sloane, N. J. A. (1985). “A lower bound on the average error of vector quantizers,” I E E E Trans. Inform. Theory IT-31, 106-109. Coxeter, H. S. M. (1961).Introduction to Geometry, John Wiley & Sons, New York. Coxeter, H. S. M. and Todd, J. A. (1953). “An extreme duodenary form,” Canad. J. Math. 5, 384-392. Farvardin, N. and Modestino, J. W. (1984). “Optimum quantizer performance for a class of nonGaussian memoryless sources,” I E E E Trans. Inform. Theory IT-30,485-497. Fischer, T. R. (1986).“A pyramid vector quantizer,” I E E E Trans. Inform. Theory IT-32,568-583.
LATTICE QUANTIZATION
329
Fischer, T. R. and Malone, K. T. (1985). “Transform Coding of Speech with Pyramid Vector Quantization,” Conf. Rec., MILCOM ‘85, 620-623. Forney, Jr., G. David. “Coset codes I: Geometry and classification,”IEEE Trans. Inform. Theory, to appear. Forney, Jr., G. David. “Coset codes 11: Binary lattices and related codes,” IEEE Trans. Inform. Theory, to appear. Forney, Jr., G. David, et a/. (1984). “Efficient modulation for bandlimited channels,” IEEE J. Selected Areas Commun. SAC-2,632-647. Gallager, R. G. (1968). Information Theory and Reliable Communication, John Wiley and Sons, Inc., New York. Gameckii, A. F. (1962). “On the theory of covering n-dimensional space with equal spheres,” Sooiet Math. 3, 1410-1414. Gersho, A. (1979). “Asymptotically optimal block quantization,” IEEE Trans. Inform. Theory IT-25,373-380. Gersho, A. (1982). “On the structure of vector quantizers,” IEEE Trans. Inform. Theory IT-28, 157-166. Gersho, A. (1986). “Vector Quantization: A New Direction in Source Coding,” Digital Communications, E. Biglieri and G . Prati, eds., North-Holland, Amsterdam, 267-281. Gersho, A. and Cuperman, V. (1983). “Vector quantization: A pattern-matching technique for speech coding,” IEEE Communications Magazine 21, 15-21. Gilmore, R. (1974). Lie Groups, Lie Algebras, and Some of Their Applications, John Wiley & Sons, New York. Gish, H. and Pierce, J. N. (1968). “Asymptotically efficient quantizing,” IEEE Trans. Inform. Theory IT-14,676-683. Gray. R. M. (1984). “Vector quantization,” IEEE ASSP Magazine 1,4-29. Gray, R. M. and Davisson, L. D. (1974). “A Mathematical Theory of Data Compression?,” in Proc. 1974 Int. Con5 Commun., 40A-1-40A-5. Grove, L. C. and Benson, C. T. (1985). Finite Rejection Groups, Second edition, Springer-Verlag. New York. Humphreys, J. E. (1972).Introduction to Lie Algebras and Representation Theory,Springer-Verlag, New York. Jayant, N. S. and Noll, P. (1984). Digital Coding of Waueforms, Prentice-Hall, Inc., Englewood Cliffs, NJ. Joshi, A. W. (1977).Elements of Group Theory for Physicists, Seconded., John Wiley & Sons, New York. Leech, J. (1967).“Notes on sphere packings,” Canad. J. Math. 19,251-267. Leech, J. (1964).“Some sphere packings in higher space,” Canad. J. Math. 16,657-682. Leech, J. and Sloane, N. J. A. (1971). “Sphere packings and error-correcting codes,” Can. J. Math. 23,718-745. Lekkerkerker, C. G. (1969). Geometry of Numbers, John Wiley & Sons, New York. Linde, Y., Buzo, A. and Gray, R. M. (1980).“An algorithm for vector quantizer design,” IEEE Trans. Commun COM-28.84-95. Lyusternik, L. A. (1963). Convex Figures and Polyhedra, Dover, New York. MacQueen, J. (1967).“Some Methods for Classification and Analysis of Multivariate Observations.” in Proc. 5th Berkeley Symp. on Math Statist., and frob., Berkeley, CA: Univ. of Calif. Press, 28 1 - 297 MacWilliams, F. J. and Sloane, N. J. A. (1977). The Theory of Error-Correcting Codes, NorthHolland, Amsterdam. Max, J. (1960). “Quantizing for minimum distortion,” IRE Trans. Inform. Theory IT-6,7-12.
330
JERRY D. GIBSON AND KHALID SAYOOD
Makhoul, J., Roucos, S. and Gish, H. (1985).“Vector quantization in speech coding,” Proc. IEEE 73,1551-1588 Moayeri, N., Neuhoff, D. L. and Stark, W. E. (1985).“Fast Vector Quantizers,” in Proc. of the 23rd Annual Allerton Conf. on Commun., Control, and Computing, Monticello, IL, 347-353. Newman, D. J. (1982).“The hexagon theorem,” I E E E Trans. Inform. Theory IT-28,137-139. Rost, M. C. (1984).“Lattice Quantization,” M.S.Thesis, Dept. of Electrical Eng., University of Nebraska, Lincoln, NE. Rost, M. C. and Sayood, K. (1984). “Investigation of Lattice Vector Quantizer,” Proc. of the Twenty-Seventh Midwest Symp. on Circuits and Systems, Morgantown, West Va., 149- 152. Ryskov, S. S. and Baranovskii, E. P. “C-Types of n-dimensional lattices and 5-dimensional primitive parallelohedra (with application to the theory of coverings)” (in Russian), Trudy Mat. Inst. Steklou., 137, 1976. English translation in Proc. Steklou, Inst. Math., Issue 4, 1978. Sakrison, D. J. (1968). “A geometric treatment of the source encoding of a Gaussian random variable,” IEEE Trans. Inform. Theory IT-14,481-486. Sakrison, D. J. (1979). “Image Coding Applications of Vision Models,” in Image Transmission Techniques, W. K. Pratt, ed., Academic Press, New York, 21-51. Sayood, K. Gibson, J. D. and Rost, M. C. (1984). “An algorithm for uniform vector quantizer design,” I E E E Trans. Inform. Theory IT-30,805-814. Shannon, C. E. (1948). “A mathematical theory of communication,” Bell Syst. Tech. J . 27, 379-423,623-656. Shannon, C. E. (1959).“Coding Theorems for a Discrete Source with a Fidelty Criterion,” in IRE Nat. Conv. Rec., Pt. 4, 142-163. Sloane, N. J. A. (1981). “Tables of sphere packings and spherical codes,” IEEE Trans. Inform. Theory IT-27,327-338. Sloane, N. J. A. (1984).“The packing of spheres,” Scientific American, 116-125. Swaszek, P. F. (1986). “Vector Quantization,” in Communications and Networks, I. F. Blake and H. V. Poor, eds., Springer-Verlag,New York, 362-389. Tseng, H.-C. and Fischer, T. R. (1987). “Transform and hybrid transform/DPCM coding of images using pyramid vector quantization,” IEEE Trans. Commun. COM-35,79-86. Zador, P. (1982).“Asymptotic quantization error of continuous signals and their quantization dimension,” IEEE Trans. Inform. Theory IT-28, 139-149.
Index A
Best known lattice quantizers in dimensions 1-10,321,325Best lattice covering, 325 Bloch functions, periodic parts, 5 Bloch theorem, 14 Block quantizer, 265 Body centered cubic lattice, 298,314 Bohr radius, 3D effective, 80 Brillouin zone, 300
Abelian Group, 186 Abelian or commutative group, 276 Absorption coefficient, semiconductor heterolayers, 71, 73 Absorption spectroscopy GaAs-Gal _,AI,As system, 85-87,93-94 Ino,,,Gao,4, As-In,,,, Alo,a As, 122- 123 quantum well, under electric field, 113-1 14 Action of a group G, 208,246 effective, transitive and free action, 247 on its Lie algebra, 230 Admissible polytopes, 300,320 Airy functions, 36 Algebra, 251 Angle as a canonical parameter, 229 as a quotient of two lengths, 189 measure, 193 Minkowskian angle, 188 the status in Dimensional Analysis, 193 unit, 193 Approximate invariance of &functions, 222-224 Atlas, 248 Automorphism condition, 232 outer automorphisms, 233 Average mutual information, 267,273
B Ballistic motion, 221 Bandgap engineering, Capasso’s concept, 2 renormalization, GaAs-Gal -,AI,As system, 107 Band mixing effects, semiconductor heterolayers, 74-75 Band offsets determination, GaAsGal -,AI,As system, 90-93 Barnes-Wall lattice, 276,294-295, 321 Base of a root system, 280 Basic subset of quantities, 219 Basis set, 277 Basis set of the lattice, 291
C
Canonical coordinates of first kind, 233,254 for Galilei group, 233 of second kind, 254 for the Poincare group, 234 Capasso’s concept of bandgap engineering, 2 Carrier gas, see GaAs-Gal -,Al,As system Cartan integers, 285 Cartan matrix (matrices), 281-284, 289-293, 310 Cd, -,Mn,Te SL, 169 Centroids, 270 Charts, 248 Coefficient of quantization, 319 Common anion rule, 6 Conjugation, 231 selfconjugate subgroups, 231 -232 Constants Newton’s Gravitational Constant, 222, 225 permeability, 221 permitivity, 221 Planck constant, 225 quantum Constant h, 225 structure constants of a Lie algebra, 253 universal constants, 191, 194,207,225 Contraction of groups line-like contraction, 228 from Poincart to Galilei group, 228 point-like contraction, 229 Controversy of Rayleigh-Riabouchinski, 183, 198 Conventionality in the choice of the group of change of units, 207
33 1
332
INDEX
Cosets, 247 Coulombic impurity states, semiconductor heterolayers, 45- 52 Coxeter-Dynkin diagrams, 284-285,290-29 1 Coxeter-Todd lattice, 276, 294-295, 321 Critical layer thickness InAs-GaAs, 148 thermodynamical approach, 126- 128 Critical phenomena, 182 Cube, 300,320 Curvature, 188 non-zero curvature, 191
D Data compression, 260 Deep holes, 293 Dense sphere packing, 271 Densest known sphere packing, 293,318 Densest lattice packing, 325 Density of a lattice packing, 316 Derivation of an algebra, 252 Derived units, 203 Determinant of a lattice, 277, 318 Diffeomorphism, 249 Differentiable manifolds, 247 Differentiable map, 249 Differential equations, symmetry group, 236 Differential of a map, 250 Dilatation like symmetries, 220 Dilatations, 233-234 Dimensional Analysis, 182 conventional Dimensional Analysis, 182, 193,197 examples, 195 group theoretical aspects, 197 history, 183 its physical meaning, 216 linear space structure, 205 literature, 184 mathematical foundations, 199 Dimensional dependence, 204 Dimensional equation interpretation, 192 in terms of linear algebra, 198 Dimensional homogeneity, 200 Dimensionally homogeneous function, 211-212 Dimensional quantities, its meaning, 192 Dimensional relation, 195
Dimensional structure assignment, 226 differences in predictive power, 225 in terms of a symmetry group, 226 Dimensional symbol, 192-193,195, 197 Dimension group, 218 Dimensionless monomials, 196 basic dimensionless monomials, 196 complete set of basic dimensionless monomials, 196 Dimensionless products, 208,222 complete set of, 209,212 Dimensionless quantities, 203 Dimensionless second moment, 319 Dimension matrix, 209,213, 222 Dimensions derived dimensions, 189 dimension group, 182 fundamental dimensions, 184,207,215 group of dimensions, 182 meaning of dimensions, 185,197 primitive dimensions, 189 Dimension vector, 205 transformation under change of units, 206 Dirichlet partition, 270,300 Discrete subgroup, 276 Dispersion relations in-plane, 28-35 band edge profiles, heterostructures lacking inversion-symmetry, 3 1 diagonalization of off-diagonal perturbation, 29 r,-related states, 30-31 InAs-GaSb superlattice, 32, 34 Kramers degeneracy, 29,31 modulation-doped quantum well, 68-69 strained-layer superlattices, 135 valence subbands, 31-32 parabolic in-layer, 71-72 Distinction between dimensionless quantities and real numbers, 203 Distortion rate function, 267-268 Dual lattice, 279, 291, 294, 300
E Elastic properties, strained-layer heterostructures, 131- 132 Electric field dependence of heavy hole exciton binding energy, 58- 59
333
INDEX in-plane, 35-36 longitudinal, excitons, 57-59 quantum well, GaAs-Gal -,AI,As system, 112-113 Electromagnetism. 193 Electronic properties, strained-layer superlattices, 132-136 Electro-optics, semiconductor heterolayers, 2-3 Elongated dodecahedron, 300,320 Energy levels, see also Envelope function model calculations, doped heterostructures, 66-67 flat band heterostructures, see Flat band heterostructures heterostructures containing charges, 64-69 semiconductor heterolayers, 2-3 Entropy, 262-263 Entropy coding, 264,269 Envelope function approximation, HgTe-CdTe superlattices, 153 band edge profile, 24 conduction and valence ground subbands, 76-77 differential equation, 18 dimensionless, 68-69 electron and heavy hole in quantum well, 38-39 factorizing, 36 idealised quantum well, 71 in-plane electric field, 35-36 In,Ga, -,As-GaAs, 147 overlap, 72-73 Schrodinger equation, 26 superlattices, 75-76 tight binging analysis, 17 two-particles, 80 Envelope function model, 4- 15 Bloch theorem, 14 coupled second-order differential system, 5 effective hamiltonian, 9- 10 eigenfunctions of &? + 6% 13 Harrison's tight binding approach, 6 heterostructure hamiltonian, 5 Kane matrix element, 5 6 . f matrix, 11-12 piecewise constant functions, 7,9 quasi-Ge model, 13 slowly varying potential, 7
Tejedor-Flores-TersoR's model, 6 valence band offset, 6-7 Equations of Euler Lagrange, 244 of Ricatti, 244 Equivalence relation, 246 in a Lie algebra, 230 Exciton binding energy InP-Ino,,,Gao,4,As, 119, 121 measurements, 96-97 vanishing, 105-106 bound, 60 trial wavefunction, 62 defect hamiltonian, 61 longitudinal electric field, 57-59 trapped binding energy, 62-63 density of states, GaAs-Ga, -,AI,As system, 99-100 Excitonic effects GaAs-Gal -,AI,As system, 93-95 semiconductor heterolayers, 79-82 Excitonic luminescence line, width, GaAs-Gal -,AI,As system, 98-101 Exponential decay law, superlattice bandwidth, 19-20 Exponential map, 230,254 Exponential matrix, 209 Extended generator, 239 Extended transformation, 239 Extension of a Lie algebra, 233 Extrinsic processes, GaAs-Gal -,AI,As system, 101-104
F Face centered cubic lattice, 298 Fast quantization algorithms, 304-316 Finite reflection groups, 278 Flat band heterostructures, 15 bandgap energy difference, 15 band parameters, 15 energy levels, 16-28 allowed superlattice states, 19-20 approximate quantization rule, 27 band edge profile, 24-26 bandwidth of ground superlattice subband, 21 bound and virtually bound states, 19-20, 22-23
334
INDEX
Flat band heterostructures (Continued) dispersion relation of superlattice hole band, 17 graded quantum well, 25 heavy hole states, 16-17 interface between materials, 23-24 Kane model dispersion relations, 19 kinetic energy, 18 LH, and HH, hole bound state dependence on barrier height, 22 light particle states, 18 non-parabolicity effects, 22-24 number of allowed, 21 potential energy versus carrier position, 27
pseudo-parabolic well, 26-27 separate confinement heterostructures, 26-27
superlattice dispersions, 16, 19 in-plane dispersion relations, see Dispersion relations modulation-doped p-type, 31 Free group, 219 Fundamental reflections, 280 Fundamental simplex, 302 G
GaAs-All -,In,As systems conduction subbands, 138 diagonal approximation, 141 interband magneto-optical absorption spectra, 140-141 L-minimum, 139 luminescence spectra, 143-145 optical transmission spectra, 137-138 GaAs-Gal -,AI,As system, 85 absorption spectroscopy, 85-87,93-94 aluminum concentration profile, 107-108 bandgap renormalization, 107 band offsets determination, 90-93 conduction band density of states, 88-89 2D carrier gas in quantum wells and optical spectroscopy, 104-112 electron areal density n,-dependence of band-to-band luminescence energy, 107, 109
electron-to-acceptor photoluminescence lineshape, 104
energy diagram of lowest conduction and heavy hole levels, 86.88 energy of excitonic peak as function of applied external voltage, 113 exciton binding energy measurements, 96-97
excitonic effects, 93-95 excitonic versus band-to-band process, 95-96
extrinsic processes, 101-104 heavy-hole exciton peak, 101 impurity-related features, photoluminescence spectra, 102 many-body effects, 107 modulation-doped, 105-109 photoluminescence, 110 modulation techniques, 90 Moss-Burnstein shift, 105 multi-quantum well structure, 86 ni-pi superlattices, 111 on-edge acceptor binding energy, 103 one-isolated defect model, 101 photoconductivity, 89-90 photoluminescence excitation spectroscopy, 86-89
conduction band density of states, 88-89 embedded quantum well, 91-92 exciton binding energy measurements, 96-97
excitonic versus band-to-band process, 95-96
impurity-related features, 102 recombination lifetime, 94-95 quantum wells, under magnetic or electric fields, 112-113 recombination lifetime, 94-95 as reference system, 1-2 relaxation of electrons towards lowest Landau level, 109 shape of trapped exciton density of states, 99-100
shrinkage of bandgap, 107 2s method, 97 Stokes shift, 99-100 superlattices, vertical transport, 113-1 17 transition energy dependence, conductionband discontinuity, 91-93 two-coupled-well, excitation and photoluminescence spectra, 114-1 15
335
INDEX vanishing of exciton binding energy, 105- 106 width of excitonic luminescence line and interface defects, 98-101 Galilei group, 220, 227-229 extended Galilei group, 233 quantum-mechanical Galilei group, 233-234 GaSb-AISb conduction subbands, 138 diagonal approximation, 141 heterostruct ures, 136- 137 interband magneto-optical absorption spectra, 140-142 L-minimum, 139 luminescence spectra, 143-144 optical transmission spectra, 137-138 Stokes shift, 144 X-ray double diffraction spectra, 136-1 37 Gauge, 205,207,210 changes of gauge, 207-208 relation to a fundamental set of units, 205 Gauge group, 206,216,220 transformation law of measures under gauge group, 208 Gaussian sources, 269 Generator matrix for the lattice, 294,318 Generator of a one-parameter group, 254 central generator, 233 Geometry Euclidean geometry, 188, 191, 193 hyperbolic geometry, 228-230 non-Euclidean geometry, 203 Riemannian geometry, 203 Gossett lattice, 323 Gram-Schmidt procedure, 291,306,310 Gravitational field of the Earth, 221 Group Abelian group, 245 definition, 245 of dimensions, 2 19 of Galilei, 220, 227-229 of invariance of an equation, 210-21 1 linear representation, 213 locally-operating realization, 213 normal subgroup, 246 of Poincar6,228-229 of scale changes, 216
of similarities, 207 subgroup, 245 of units changes, 206 Group theory, 245,276 in the theory of quantities, 186
H Harmonic oscillator, 245 Harrison’s tight binding approach, 6 Hartree approximation, 65 Heat transfer, 183, 197 Heavy hole superlattice states, flat band heterostructures, 16-17 Hexagonal prism, 300,320 Hg, -,Cd,Te-CdTe SL, 169 Hg, -,Mn,Te-CdTe SL, 169 HgTe-CdTe superlattices band parameters, 153 band structure calculations, 151-157 characteristics, 157, 166 discontinuity A, 152 energy gap and cutoff wavelength as function of layer thickness, 155-156 envelope function approximation, 153 infrared transmission, 166-169 interaction energy gap, 153 Landau levels, 157-158 Luttinger parameters, 153 magneto-optical transmission, 157- 166 bandgap as function of HgTe layer thickness, 164-166 band structure, 159-160 energy position, 158-159, 163-164 as function of B, 161-162 as function of magnetic field, 158 interband, 164-165 Landau levels, 162-163 selection rule, 159-160 transmission minima, 160-161 resonant Raman scattering, 168 Homogeneity property, 245 Homogeneous space, 228,230 Homomorphism of algebras, 252 Homomorphism of Lie groups, 253 Huffman coding, 264 Hyperbolic geometry, 228-230 horocyclic displacement, 229
336
INDEX
I InAs-GaAs, 148-1 50 band extrema, 149-150 band gap, 149-150 critical layer thickness, 148 host band structure parameters, 149 luminescence, 148-149 InAs-GaSb, 123-125 band-to-band luminescence lineshape, 124-125 in-plane dispersion relations, 30,32 low temperature optical absorption, 78-79 luminescence, 124- 125 optical absorption, 78-79 Index of politropy, 195 Indicia1 method of Rayleigh, 195-196 Inertial frames, 188, 191 Infrared transmission, HgTe-CdTe superlattices, 166-169 In,Ga, -,As-GaAs, 144-148 diagonal approximation, 147 envelope function, 147 Kronig-Penney formula, 147 optical absorption, 145-146 Stokes shift, 146 under biaxial compressive stress, 145 Ino,,,Gao,47As-Ino,,,Alo.,As, 121-123 absorption spectrum, 122-123 confinement energies, 122 InP-In,,,,Ga0,,,As, 118-121 concentration bandgap dependence, 118 conduction- to valence-band discontinuity, 119 exciton binding energy, 119, 121 luminescence and photoluminescence, 119-121 Integrating factor, 238 obtained from a one-parameter subgroup of symmetry, 238 Interband absorption, idealised quantum well, 11-74 Interband transitions, polarization selection rules, 72 Interface defects, 60 GaAs-Gal -,AI,As system, 98-101 semi-gaussian, 61 shape, 60-61 Interface grading, modelling, 23-24
Invariance of a first order differential equation, 240 of an equation under change of gauge, 21 1 of an equation under change of units, 192, 194 group of Classical Mechanics, 220 numerical invariance under change of units, 192 under Lie derivative, 236 under the group of change of units, 207 Irreducible systems, 280 Isometry group, 228 Isomorphism of algebras, 252 of groups, 246 Isotopy group, 214,228,247,255
K Kane matrix element, 5 Kane model, dispersion relations, 19 Kepler problem, 245 Kernel of a homomorphism, 246 Kinematic groups, 182,226 K-means algorithm, 270 Kramers degeneracy, 29,3 1 Kronig-Penney formula, In,Ga, -,As-GaAs, 147 L Lagrangian function, 244 Laminated lattice(s), 294,297 Landau levels densities of states, 83 energies, semiconductor heterolayers, 82 heterostructures, 39-45 HgTe-CdTe superlattices, 157-158, 162163 relaxation of electrons towards lowest, 109 Laplacian source, 269 Lattice definition, 271 properties, 275-296 Lattice packing, 271,293-294 Lattice quantizer, 271, 296-325 Lattice quantizer design, 296-304 LBG algorithm, 270-271 Leech lattice, 276, 294, 296, 322 Length as a basic quantity, 228
337
INDEX as a canonical parameter, 229 in Newtonian space-times, 233 Lie algebra, 251 of a Lie group, 253 of the Aristotle group, 240 of the Euclidean group in two dimensions, 230 Lie algebras, 278, 302 Lie derivative, 236 Lie group, 229,25 1 canonical coordinates, 229 conjugate subgroups, 230 Lie subgroup, 252 one-dimensional subgroup, 229 one-parameter Lie subgroup, 234,238,254 one-parameter Lie subgroup of symmetry of a differential equation, 236 Lie’s theory of symmetry of differential equations, 239 Light speed as a standard in relativistic mechanics, 234 Linde, Buzo Gray algorithm, 261, see also LBG algorithm L-minimum, 139 Logarithmic companding, 326 Luminescence GaAs-All -,In,As systems, 144-145 GaSb-AISb and GaAs-All -,In,As systems, 143-144 InAs-GaAs, 148-149 InAs-GaSb, 124-125 Luttinger parameters, HgTe-CdTe superlattices, 153
Magnitude, 187 Manifold, 248 open manifold, 248 product manifold, 248 Many-body effects, see also Semiconductor heterolayers, many body effects GaAs-Gal -,AI,As system, 107 Metrology, 182 Minimal canonical form, 293 Misfit dislocations geometry, 126-127 plastic relaxation, 128-131 square network, 127 Modulation doping, 64-65 Modulation techniques, GaAs-Gal - ,AI,As system, 90 Moss-Burnstein shift, GaAs-Gal -,AI,As system, 105 Motion in a gravitational field, 221 MSE per dimension, 319 Multidimensional companding, 326
N Natural change of units as automorphisms of a Lie algebra, 23 1-232 in Euclidean geometry, 232 in hyperbolic geometry, 232 Newtonian gravitation, 216,220-221 Newtonian Mechanics, 215,220 Newton’s gravitational constant, 222 Noiseless source coding, 264 Normalized second moment, 319,322 0
M Magnetic field Landau levels and, 4 - 4 5 quantum well, GaAs-Ga, -,AI,As system, 112-113 Magnetic length, 41 Magneto-optical absorption, semiconductor heterolayers, 82-84 Magneto-optical transmission fan diagram and calculation, 142-143 GaSb-A1Sb and GaAs-All -,In,As systems, 140-142 HgTe-CdTe superlattices, 157-166 Magneto-optics, semiconductor heterolayers, 3
One-forms, 236 One-isolated defect model, 101 Optical absorption, In,Ga, -,As-GaAs, 145146 Optical properties, semiconductor heterolayers, 70-71 Optical transmission spectra, 137-138 Optimal lattice quantizer in the three dimensions, 320 Optimal N-dimensional quantizer, necessary conditions, 270 Optimal vector quantization, 265 Optimum entropy constrained quantizer, 269 Orbit of a point, 214,247 Ordinary differential equation, 182, 234, 240
338
INDEX
P Partial differential equations, 182 Permeability, 221 Permitivity, 221 V-function, 210 Photoconductivity, GaAs-Gal -,AI,As system, 89-90 Photoluminescence excitation spectroscopy electron-to-acceptor lineshape, GaAsGal -,Al,As system, 104 GaAs-Ga, -,AI,As system, 86-89 conduction band density of states, 88-89 embedded quantum well, 91-92 exciton binding energy measurements, 96-97
excitonic versus band-to-band process, 95-96
impurity-related features, 102 recombination lifetime, 94-95 InP-Ino,,,Gao,,,As, 119-121 ni-pi structure, 111-112 quantum well, under magnetic or electric fields, 112- 113 superlattices with enlarged well, 116 Physical algebra 185-186, 190 definition, 201 dimensional structure, 182 functions in the physical algebra, 210 similarities, 207 Physical laws, 190 in the interpretation of Dimensional Analysis, 217 Physical quantities, 186, 190 in Euclidean geometry, 202 Physical relations as equations for real numbers, 191 Physical relations as equations in physical algebra, 191 P-theorem, 182,195,212 applications, 215,225 classical proofs, 199 group theoretical meaning, 212,214 history, 199 in the ballistic motion, 223-224 Plastic relaxation, misfit, 128-131 Poincark group, 228-229 canonical coordinates, 234 Polarization selection rules, interband transitions, 72
Products of magnitudes, 218 Prolongation of a vector field, 239,243 Properties for root systems, 285 Pseudo-parabolic wells, 26 Pyramid vector quantization, 325
Q Quadratic form, 291 Quantities, 185-186, 190 action, 186 angle, 188, 218 area, 192 area of a rectangle, 192 area of a square, 191 average speed, 188 basic quantities, 194 charge, 186 derive quantities, definition, 189 dimensionless quantities, 193 empirical versus formal, 188 energy, 193,218 instantaneous velocity, 188 length, 187-188,192 measurements of, 187 radian, 193 rapidity, 188,234 speeds, 185 temperature, 186 torque, 193,218 velocity, 185 Quantizer performance comparisons, 316-325 Quantum well asymmetric, 31 band edge profiles, 9 bound excitons, 59-64 densities of states of Landau levels, 83 graded, energy level calculations, 25 idealised, interband absorption, 71-74 M.B.E.-grown, photoluminescence, 98 modulation-doped, 3 conduction and valence band edge profiles, 67 in-plane dispersion relations, 68-69 pseudo saw-tooth, 31 rectangular coulombic impurity states, 46 tilted by electric field, 31 valence Landau levels, 42 Quasi-Ge model, 13
INDEX
R Rate distortion function, 267, 273 Rate distortion theory, 260,266 Real numbers as physical quantities, 202 Reducible system, 280 Reduction of the dimensional structure, 228 Reduction of the group of dimensions, 219, 224 Reduction of the number of fundamental dimensions, 225 Relativity Einsteinian relativity, 200 Galilean relativity, 188 general relativity, 216 Renormalization, 182 Representation of a function in a gauge, 210 Resonant Raman scattering, HgTe-CdTe superlattices, 168 Rhombic dodecahedron, 298,300,320 Root lattice(s), 278, 321 Root(s), 280 Root system(s),280,289-290 Root systems of Lie algebras, 275 Rotation, 215 Rydberg energy, 3D effective, 80
S Scalar quantization, 262 Scalar quantizer design, 264 Scale changes, 237,243 Schrodinger and Poisson equations, coupled, 65-66 Schrodinger equation, 26 electric quantum limit, 67 Second order differential equations linear, 243 symmetries, 244 Semiconductor heterolayers, see also specific systems absorption coefficient, 71,73 absorption profiles, 80-81 band mixing effects, 74-75 band structure, 7-8, 151, see also HgTeCdTe superlattices binding energies, isotropic 3D and purely 2D systems, 80 calculated magneto-optical transitions, 84 Capasso’s concept of bandgap engineering, 2
339
Cd, -,Mn,Te SL, 169 coulombic impurity states, 45-52 acceptor and donor binding energies, 46-47,50 density of states, 49 hamiltonian, 47 hydrogenic binding energy, 47 longitudinal electric field, 51-52 magnetic field, 51 off-center impurities, 47 onedge donors, 47-48,51 potential energy, 46 rectangular quantum wells, 46 trial wavefunction, 49-50 tunnel time, 47-48 doped quantum wells, 3 eigenstates, 1 1 electro-optics, 2-3 energy-gap and lattice constant, 118 energy levels, 2-3 envelope function, 71-73 envelope function model, see Envelope function model excitonic effects, 79-82 flat band heterostructures, see Flat band heterostructures Hg, -,Cd,Te-CdTe SL, 169 Hg, -,Mn,Te-CdTe SL, 169 InAs-GaSb, 123-125 Ino.,,Gao,47As-Ino,,,AIo~,As, 121-123 InP-Ino,,,Gao,47As,118-121 interband absorption, idealised quantum well, 71-74 Landau level energies, 82 low temperature optical absorption, 78-79 magneto-optical absorption, 3,82-84 many body effects, 52-69 binding energy, thickness dependence, 56 bound excitons in quantum wells, 59-64 bound exciton trial wavefunction, 62 coupled Schrodinger and Poisson equations, 65-66 dimensionless envelope function, 68-69 effectivemasses, 55 electric field dependence of heavy hole exciton binding energy, 58-59 energy level calculations, doped heterostructures, 66-67 energy levels of heterostructures containing charges, 64-69
340 Semiconductor heterolayers (Continued) energy position of HHI-El exciton peak, 59 exciton binding energy, 54-56 exciton defect hamiltonian, 61 excitons in longitudinal electric field, 57- 59 r,-related subbands, 68 ground electron state and conduction band edge drop, thickness dependence, 68 ground exciton wavefunction, 58 Hartree approximation, 65 in-plane dispersion relations, 68-69 interface defects, 60-61 light hole and heavy hole excitons, 56-57 modulation-doped quantum well, 67 modulation doping, &4-65 thermal and electrical equilibrium, 66 trapped exciton binding energy, 62-63 trial wavefunction, 54-55, 67 two-dimensional vectors, 54 multi-heterojunctions, 7 optical absorption, superlattices, 75-79 optical matrix element, 74-75,84 optical properties, 70-71 oscillator strength, 80 parabolic in-layer dispersion relations, 71-72 perturbation of electronic states, 35 Airy functions, 36 arbitrary, 8,45 associated wavefunctions, 42-43 carrier wavefunction, 38-39 density of states, 43-44 electric field effects, 35-36 exciton resonance energy shift, 38,40 F/z, 36-39 r6and r,-related, 42 ground state confinement energies, 37 kinetic energy, 36 Landau levels, 39-45 perturbation expansion, 38 vector potential, 40 with wide gap hosts, 42 polarization selection rules, interband transitions, 72 residual impurities and interface defects, 2 Sommerfeld factor, 80
INDEX staggered configuration, 9 strained layer systems, 125-126, see also GaAs-All -,In,As systems; GaSb-Alsb band offset configurations, 134- 135 band structure, material submitted to biaxial compression, 133-134 critical layer thickness, 126-128 elastic properties, 131-132 energy per unit length, 127 hamiltonian, 132- 133 homogeneous elastic energy, 128 InAs-GaAs, 148-150 in-plane dispersion relations, 135 in-plane lattice parameter, 132-1 36 In,Ga, -.As-GaAs, 144-148 Kronig-Penney like formula, 135 lattice parameters, 125 plastic relaxation of misfit, 128-131 strain tensor, 131 structural aspects, 126-132 X-ray rocking curves, 129-130 superlattice wavevector, 14 transmission minima, 82 truth table of parity statement, 76 two-particles envelope function, 80 valence subband anticrossings, 75 ZnTe-CdTe, 169 ZnTe-HgTe, 169 Separate confinement heterostructures, 26 Set of physical quantities, 202 Shannon lower bound, 273-274 Similarity dynamical similarity, 183 geometrical similarity, 183 of the physical algebra, 207 Similarity group, 213,220 Simple positive root, 280 Simple roots of a root system, 284 2s method, 97 Sommerfeld factor, 80 Space of classes of dimensions in physical algebra, 201 Euclidean space, 191 Space-filling polytopes, 300 Space-time Minkowskian Space-time, 228 Newtonian Space-time, 228
34 1
INDEX Species, 186,200,208 of angles, 200 of energies, 200 of horizontal lengths, 200 of lengths, 200 Specific properties of root lattices, 304 Sphere bound, 321 Sphere hardening, 325 Sphere packing, 293-294,325 Spherical Blast, 195 Spherical vector quantizers, 325 Stability group, 247 Standard, 202,207 action h, 234 natural standards for angle in Euclidean geometry, 233 natural standards for length in hyperbolic geometry, 233 in the electromagnetism theory, 221 Standard for length, 230 Stark effect, 57 Stark shift, quadratic, 37-38 Stokes shift GaAs-Ca, -,AI,As system, 99-100 In,Ga, -,As-GaAs, 146 Strain tensor, 131 Subgroup of transformations, 279 Super-alloys, 1 17 Superlattices absorption edge shape, 77 absorption lineshapes, 78 densities of states of Landau levels, 83 enlarged well, photoluminescence spectra, 116 GaAs-Gal -,AI,As system, vertical transport, 113-117 in plane lattice parameter, 132 ni-pi, 111 optical absorption, 75-79 type I and II,77-78 Symbols, dimensional symbols, 184 Symmetry of differential equations, 183, 234 of first order differential equations, 236 group of symmetry, 226 of higher order differential equations, 242 infinitesimal symmetry of a system of differential equations, 241 scaling, 183
strict infinitesimal symmetry of a vector field, 242 Symmetry group of scale changes, a twodimensional example, 237 Systems of differential equations, 240 autonomous systems, 241 Systems of units as sections, 202
T Tejedor-Flores-Tersoff’smodel, 6 Temperature, 190 Theorem of Buckingham, see n-theorem of Noether, 244 Theta functions, 302 Tight binging analysis, envelope function, 17 Time as a basic quantity, 228 Transformation groups, 246 Transformation law as a locally-operation realization, 213 Truncated octahedron, 298,300,320 Two-dimensional uniform hexagonal quantizer, 275 U Uniform V Q performance, 274 Union of cosets, 310,312,314-315 Unit free function, 21 1 Unit free relations, 195 Units of angle, 191 of area, 191 change of units, 185, 191, 194,231 coherent set of units, 205 derived units, 190, 194, 197 fundamental of complete set, 204-205 of length, 191 of measurements, 187 natural change of units, 232 obtained from a set of fundamental units, 204 Planck units, 226 primary units, 190 primitive units, 197 propagation from one class to another, 230 for quantities with standards, 203 set of units, 191 special units, 193
342
INDEX
Units (Continued) systems for one-parameter subgroups, 231 Unnormalized second moment, 319
Voronoi code, 303 Voronoi region(& 270-271,296-298, 301-303
W
V Variable-length coding, 269 Vector at a point, 249 Vector field, 250 associated to a first order differential equation, 235 complete vector fields, 254 flow of a vector field, 251 F-related vector fields, 251 fundamental vector fields, 255 integral curves, 235,251 left- and right-invariant vector fields, 253 Vector quantization, 260 Volume, 3 19
Weyl group, 302 Wigner-Seitz cell, 300
X X-ray double diffraction spectra, GaSb-AlSb, 136-137 Z
Zador's upper bound, 322 ZnTe-CdTe, 169 ZnTe-HgTe, 169