ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS
VOLUME 85
EDITOR-IN-CHIEF
PETER W. HAWKES Centre National de la Recherche ScientiJique Toulouse. France
ASSOCIATE EDITOR
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California
Advances in
Electronics and Electron Physics EDITEDBY
PETER W. HAWKES CEMESILaboratoire d’Optiqur Electronique du Centre National de la Recherche Scientifique Toulouse. France
VOLUME 85
ACADEMIC PRESS, INC. Harcourt Brace Jovanovich, Publishers Boston San Diego New York London Sydney Tokyo Toronto
This book is printed on acid-free paper. @
COPYRIGHT 0 1993 BY ACADEMIC PRESS, INC. ALLRIGHTSRESERVED. N O PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY. RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM. WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER
ACADEMIC PRESS, INC. 1250 Sixth Avenue, San Diego. CA 92101-4311
United Kingdom Edition published by ACADEMIC PRESS LIMITED 24-28 Oval Road. London NWI 7 D X
LIBRARY OF CONGRESS CATALOG CARD NUMBER: 49-7504 ISSN 0065-2539 ISBN 0-12-014727-0 P R I N T E D IN THE UNITED STATES OF AMERICA
93 94
95 96 BC 9 8
I 6 5 4 3 2
1
CONTENTS
CONTRIBUTORS ............................................. PREFACE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I. I1 . 111. IV . V. VI . VII .
Recent Developments in Kalman Filtering with Applications in Navigation HANS-JURGENHOTOP Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The New Generation of Navigation Systems . . . . . . . . . . . . . FilterTheory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Kalman Filter Formulations . . . . . . . . . . . . . . . . . . . . Review of the Backward Kalman Filter Theory . . . . . . . . . . . Application of the Kalman Filter in Navigation . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii IX
1 2 16 25 49 53 70 71 71
Recent Advances in 3D Display D . P . HUIJSMANS AND G . J . JENSE Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Representation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . Voxel-Based Display Methods . . . . . . . . . . . . . . . . . . . . . . . Spatial Selection and Division . . . . . . . . . . . . . . . . . . . . . . . Hardware Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78 95 113 147 159 177 221 224 225
Applications of Group Theory to Electron Optics Yu LI I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1 . M function and Its Symmetry Group . . . . . . . . . . . . . . . . . . 111. Applications to Electrostatic Multipoles . . . . . . . . . . . . . . . .
231 233 241
I. I1 . 111.
IV . V. VI . VII .
v1
CONTENTS
IV . Applications to Magnetostatic Multipoles . . . . . . . . . . . . . . . V . A General Method for Deriving Constraint Relations . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
245 251 256 257
Parallel Programming and Cray Computers R . H . PERROTT I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. Approaches to Parallel Programming . . . . . . . . . . . . . . . . . . 111. Implicit Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV . Explicit Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Cray Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI . Parallel Computing Forum . . . . . . . . . . . . . . . . . . . . . . . . . VII . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . INDEX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
259 261 263 265 271 297 299 300 300 303
CONTRIBUTORS Numbers in parentheses indicate the pages on which the authors’ contributions begin.
HANS-JURGEN HOTOP ( l ) , Fachhochschule Hamburg, Fachbereich Electrotechnik/Informatik, D-2000 Hamburg, Germany D. P. HUIJSMANS(77), Computer Science Department, University of Leiden, PO Box 9512, 2300 RA Leiden, The Netherlands G. J. JENSE(77), Institute for Applied Computer Science, TNO (ITI), PO Box 6032, 2600 JA Delft, The Netherlands YU LI (231), Research Section of Applied Physics, PO Box 251, Shanghai Institute of Mechanical Engineering, Shanghai 200093, China R. H. PERROTT(259), Department of Computer Science, Queen’s University, Belfast BT7 lNN, United Kingdom
vii
This Page Intentionally Left Blank
PREFACE The four chapters that make up this volume cover a range of subjects, though none is really a newcomer to the series. We begin with an account of new developments in recursive filters of the Kalman type. With increased computing power, new versions have been developed and H.-J. Hotop shows how these have grown out of the earlier formulations. The applications in the domain of navigation are explored in detail, and this will surely be of interest to many who are not specialists in this field, for aircraft navigation affects almost all of us. The bulk of this volume is occupied by a very full and scholarly account of advances in three-dimensional display. The authors, D. P. Huijsmans and G. J. Jense, have written a monograph on the subject that covers virtually all the various possible techniques, including their theory and implementation; it is abundantly illustrated. The range of fields in which 3-D display is needed is extremely wide, ranging from medicine to geology, with several types of microscopy in between, and I have no doubt that readers from many of these fields will be grateful for so careful a survey. Next, we have a short chapter by Yu Li on the use of group theoretical reasoning in electron optics. This is a relatively new approach in this subject and I hope that this succinct account will generate further developments. We end with a chapter that has an interesting history. Some years ago, I read a plea by R. H. Perrott for urgent consideration of the language to be adopted for the parallel computers which were then relatively new. I invited him to present his ideas at greater length in these Advances, but as time has passed, the theme of his chapter has evolved and we now have not only a general discussion of programming for such computers, but also a detailed examination of a particular system, the Cray computer family. It only remains for me to the thank most warmly all the contributors and to encourage anyone who is contemplating preparing a survey on one of the themes covered by this series to get in touch with me. A list of forthcoming reviews follows.
FORTHCOMING ARTICLES J . B. Abbiss and Neural networks and image processing M. A. Fiddy H. H. Arsenault Image processing with signal-dependent noise Parallel detection P. E. Batson ix
X
PREFACE
Microscopic imaging with mass-selected secondary ions Magnetic reconnection Sampling theory ODE methods Interference effects in mesoscopic structures Integer sinusoidal transforms The artificial visual system concept Dynamic RAM technology in GaAs Minimax algebra and its applications Corrected lenses for charged particles Data structures for image processing in C The development of electron microscopy in Italy Electron crystallography of organic compounds The study of dynamic phenomena in solids using field emission Gabor filters and texture analysis Amorphous semiconductors Median filters Bayesian image analysis Non-contact scanning force microscopy with applications to magnetic imaging Theory of morphological operators Noise as an indicator of reliability in electronic devices Applications of speech recognition technology Spin-polarized SEM Fractal signal analysis using mathematical morphology Expert systems for image processing Electronic tools in parapsychology Image formation in STEM Phase-space treatment of photon beams Fuzzy tools for image analysis Z-contrast in materials science Electron scattering and nuclear structure Edge detection The wave-particle dualism
M. T. Bernius A. Bratenahl and P. J. Baum J. L. Brown J. C. Butcher M. Cahay W. K. Cham J. M. Coggins J. A. Cooper R. A. CuninghameGreen R. L. Dalglish M. R. Dobie and P. H. Lewis G. Donelli D. L. Dorset M. Drechsler
J. M. H. Du Buf W. Fuhs N. C. Gallagher and E. Coyle S. and D. Geman U. Hartmann H. J. A. M. Heijmans B. K. Jones H. R. Kirby K. Koike P. Maragos T. Matsuyama R. L. Morris C . Mory and C. Colliex G. Nemes S. K. Pal S . J. Pennycook G. A. Peterson M. Petrou H. Rauch
PREFACE
Electrostatic lenses Scientific work of Reinhold Riidenberg Metaplectic methods and image processing X-ray microscopy Accelerator mass spectroscopy Applications of mathematical morphology Focus-deflection systems and their applications The suprenum project Knowledge-based vision Electron gun optics Spin-polarized SEM Cathode-ray tube projection TV systems n-Beam dynamical calculations Parallel imaging processing methodologies Parasitic aberrations and machining tolerances Signal description The Aharonov-Casher effect
xi
F. H. Read and I. W. Drummond H. G. Rudenberg W. Schempp G. Schmahl J. P. F. Sellschop J. Serra T. Soma 0. Trottenberg J. K. Tsotsos Y . Uchikawa T. R. van Zandt and R. Browning L. Vriens, T. G. Spanjer and R. Raue K. Watanabe S. Yalamanchili M. I. Yavor A. Zayezdny and I. Druckmann A. Zeilinger, E. Rase1 and H. Weinfurter
This Page Intentionally Left Blank
ADVANCES IN ELECTRONICS A N D FI.EC7RON PHYSICS. VOL 85
Recent Developments in Kalman Filtering with Applications in Navigation HANS-JURGEN HOTOP*
.
.
Fachhochschule Hamburg Fachhereirh Elekrroteehnikllnformatik Hamburg. Germany
. . . . . .
I . Introduction . . . . . . . . . . . . . . . . . . . . I1 . The New Generation of Navigation Systems . . . . . . . A . Inertial Navigation . . . . . . . . . . . . . . . . B. Radio Navigation . . . . . . . . . . . . . . . . C . Error Models for the Navigation Systems . . . . . . . I11. Filter Theory . . . . . . . . . . . . . . . . . . . A . The Conventional Kalman Filter Theory . . . . . . . B. The Discrete Kalman-Bucy Filter . . . . . . . . . . C . Square-Root Formulations of the Kalman Filter Algorithm D . Other Kalman Filter Algorithms . . . . . . . . . . . IV . New Kalman Filter Formulations. . . . . . . . . . . . A . Motivation . . . . . . . . . . . . . . . . . . . B. Application of Orthogonal Transformations . . . . . . C . New Formulation of the Kalman Filter Prediction . . . . D . New Formulation of the Kalmdn Filter Update . . . . . E . Review of a New Kalman Filter Algorithm . . . . . . . V . Review of the Backward Kalman Filter Theory . . . . . . A . New Formulation of the Backward Kalman Filter . . . . B. Review of the Backward Kalman Filter . . . . . . . . VI . Application of the Kalman Filter in Navigation . . . . . . A . Establishing a Simulation . . . . . . . . . . . . . B. Simulation Data Results . . . . . . . . . . . . . . C . Presenting the Data of a Flight Test . . . . . . . . . D . Flight Test Data Results . . . . . . . . . . . . . . V11. Summary . . . . . . . . . . . . . . . . . . . . . Acknowledgments. . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . .
. . . . . . . . . . . . . . . . . . .
. . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2 2 10 11 16 17 19 22 24 25 25 26 34 36 47 49 50 52 53 57 61 65 66 70 71 71
1 . INTRODUCTION In recent years the Kalman filter technique has been used for more and more different applications. as for example the speech parameter estimation * This material is based on the authors work at the Deutsche Forschungsanstalt fur Luftund Raumfahrt ( D L R ) 1
Copyright
(
1993 by Acddemlc Press. lnc
All rights of reproduction in any form reserved ISBN 0-12-014727.0
2
HANS-JURGEN HOTOP
(Asmuth and Gibson, 1984), image processing (Biemond and Plompen, 1983; Kaufman er al., 1983), traffic control (Okutani and Stephanedes, 1984), control of turbines (Sasiadek and Kwok, 1983), tracking problems (Kolodziej and Mohler, 1984; Ramachandra, 1984), etc.. Analogous to the evolution of physical applications, the new generation of computers has increased the number of new algorithms of the Kalman filter, too. Although many of these new formulations are nearly equivalent, the development of algorithms points in the direction of more numerical stability. Here a part of different Kalman filter algorithms is outlined and will be studied with regard to their computer time usage and numerical stability. A new Kalman filter algorithm is presented and the advantage over the conventional formulations is discussed. The basic application of Kalman filter technique is the support of inertial navigation systems. A short introduction to inertial navigation, radio navigation and the combination of these systems shall explain the problems and examination of the development of new Kalman filter algorithm. The comparison of the various Kalman filter formulations is presented by the evaluation of a highly accurate reference path and inertial navigation data for aircraft usage. In this case simulation as well as real flight data for the support of inertial navigation systems by radar data are utilized.
11. THENEWGENERATION OF NAVIGATION SYSTEMS
This section shall describe the application of the Kalman filter with regard to the support of inertial navigation systems. As the main intention is to present new Kalman filter techniques, only a short introduction to the main principles of the different navigation systems for aircraft is outlined. A . Inertial Navigation
An inertial navigation system contains two principal parts of hardware components: accelerometers and gyros. These instruments measure the translational and rotational motions of the body. An accelerometer in its simplest form (see Fig. 1) is a mass balanced on two spring forces and can be used to measure the translational motion of a body in only one direction. The problems of manufacturing such a sensor have been solved and a lot of highperformance accelerometers with bias errors of less than lop5g have been built. To measure the translational motion of a body in the three-dimensional coordinate frame, a so-called acceleration-triad is needed with three
RECENT DEVELOPMENTS IN KALMAN FILTERING
3
+
FIGURE I . Principle of a simple accelerometer
accelerometers sensing along orthogonal axes. The main problem of the accelerometer installed in an aircraft is the influence of the earth gravity field and, specifically, how to align the triad always orthogonal to the earth coordinate frame, which means with one axis parallel to the gravity vector (g). As the earth is rotating the effects of coriolis and centrifugal acceleration are measured by the accelerometers too. In the z-axis, pointing to the earth center, these effects of earth and transport rate are negligible because the accelerometer senses mainly the g-vector of 9.8 1 m/s'. With the g-vector the accelerometer triad is aligned orthogonal to the earth coordinate frame. Therefore, on the ground and before the start the maximum acceleration direction is measured and the z-axis of the accelerometer cross is moved in the direction of the maximum acceleration. Otherwise, if the triad is fixed, the evaluated data of the g-vector in all three directions of the accelerometer cross are stored in the navigation computer and can provide the elements of the corresponding transformation matrix. The interesting data for navigation purposes are normally the position and the velocity. With the laws of Newton it is well known that the acceleration is equal to the first derivate of the velocity and the second derivate of the position. On the ground, after the acceleration cross is aligned, this situation is uncomplicated, but if the aircraft is moving, other sensors must provide the angles between the measured signals in the aircraft (body) coordinate frame and the navigation coordinate frame. Therefore a navigation system requires three gyros in addition to the accelerometers. Let us assume that the data of the gyros are present, then the accelerometer data have to be transformed from the aircraft coordinate frame to the earth coordinate frame by a transformation matrix Cnh ( h =body to n =geographical) or all hardware components have to be mounted on a gimbal platform (see later), which is stabilized by the gyros. Therefore
4
HANS-JURCEN HOTOP
two different navigation equations hold true: for the platform system, i, = a, - ( 2 . 4 + w T ) x v, +g,
(1)
and for the fixedly mounted accelerometers,
v,=C,h'ah-(2*w,"+f&)
xv,+g,
(2)
The indices are defined as follows: n, navigational; i, inertial; e , earth-fixed; and b body-fixed coordinates. The vector v is the velocity, w the rotation rate, a the acceleration and g, the g-vector containing the gravitation and centrifugal acceleration of the earth g, = G - w," x (w: x
S,)
(3) with S, as the position vector. The term (2 W: w f f ) x v, is well known as the coriolis acceleration. To calculate the velocity, the preceding equation has to be integrated by additionally compensating the g-vector as well as the coriolis vector. The position vector in the earth geographical frame is evaluated by integrating the velocity vector components. The other measurement components of an inertial navigation system are the gyros. A lot of different gyro types are available, but they are basically composed of two parts: the mechanical and the optical rate sensors. A mechanical gyro is basically a rotor whose axis is fixed in a gimbal element (see Fig. 2). If a torque vector or angular rate vector is applied to the gyro, the response vector is orthogonal to both the applied vector and the angular momentum vector. The sense axis points in the direction that would take the momentum vector by the shortest way towards the applied vector. To measure the angular rate a pickoff at one side of the gyro element gimbal senses the rotation and at the other side a torque generator applies the
+
Gyro Element Gimbal
Output A x i s , 0 ;
Rotor Drive
1
H, Spin A x i s , 5
Input A x i s , I ;
FIGURE 2. Principles of a gyro.
RECENT DEVELOPMENTS IN KALMAN FILTERING
5
negative rotation to the gyro element. The sensed data are integrated step by step and supply the angular rate of the gyro. The main problem of a gyro is its drift due to factorisation errors such as unbalance, anisoelasticity, motor hunting, etc. (Stieler and Winter, 1982; Wrigley, Hollister, and Denhard, 1969). For minimal sensor errors, the gyros are built for special applications, and the accuracy is specified mainly by the drift factor per hour. For navigation purposes gyros are needed with a drift less than O.l‘/h for attitude and heading systems and less than 0.0lC/h for real navigation systems. The mechanical differences of these gyros are the bearings for the rotor, which can be gas or balls. Other differences are the number of gimbals and that implies one or two measurement axes. Optical gyros are divided into two different types: the laser gyros and the fiber gyros. The physical principle of both gyros is based on the “Sagnac” effect. Here two light beams travel in opposite directions in a closed loop and are influenced by the rotation around the axis normal to the optical ring. This main principle is seen in Fig. 3, where M 2 and M 3 are mirrors, LS is the laser tube, M I is a partially transmitting mirror, p is a prism and SC is a screen. The light beams I and I I are added together and their interference pattern can be observed on the screen. If the whole gyro is rotated, the interference pattern moves and the number of fringes that pass can be measured by the photo detector. The whole optical part of the gyro is built around a Cervit or Zerodur block, while inside the block a tube is drilled for the beam. The differences between the several ring laser gyros are the number of mirrors (three or four) and the diameter of the circular path. In general the perimeter of the gyros is between 20cm and 40cm. These gyros show no
FIGURE
3 . Principles of a laser gyro
6
HANS-JURGEN HOTOP
errors due to the dynamic environment, which is the main advantage in contrast to the mechanical gyros. But the laser gyro cannot detect low frequencies of rotation, the reason being the so-called 'lock in' effect (Rodloff, 1981; 1987; Aronowitz, 1971). This effect is regarded by any two weakly coupled oscillators with neighbouring natural frequencies, and if they are excited near the natural frequency, they oscillate in the same time period. A mechanical dithering of the gyro or special magnetic mirrors prevent these errors. Many of the factorized laser gyros use the mechanical dither, where between case and gyro a torsion spring produces angular vibrations with different frequencies. In the readout electronic these vibrations have to be compensated. Figure 4 shows the Honeywell laser gyro GD4003, which is used in Honeywell navigation systems. In the center of the gyro the dither mechanism can be seen and unlike the principles drawing of Fig. 3, the laser tube is inside the whole triangle tube with its cathode and anode. The drift error of such a laser gyro for navigation purposes must be less than 0.01"/h, although normally the gyros produce 0.001 "/h. The other optical gyro used for navigation systems is the fiber gyro, which works on the same principle as the laser gyro except that the light beams are inside a fibre circuit that is wrapped around a coil. However, for highly accurate rotation measurements, these sensors are not applicable. Otherwise, the production of a fiber gyro is cheaper than a laser gyro.
FIGURE 4. The Honeywell laser gyro GD4003.
RECENT DEVELOPMENTS IN KALMAN FILTERING
7
Commercial navigation systems using optical gyros are equipped with laser gyros. These two hardware components, accelerometers and gyros, are the basis of a navigation system. In the past the first navigation systems were platform systems. The principle is demonstrated by the schematic view in Fig. 5. On the middle platform three gyros and three accelerometers are mounted in orthogonal directions. This platform is decoupled from the angular motion of the vehicle, which means, it is base-motion isolated. The stabilization of the platform is realized by the gyros. For example, if a rotation around one axis is sensed by a gyro, the signal is measured by the pickoff and transmitted to the associated gimbal servo motor. This is called the servo-loop. The signal of the angle is measured by the resolver on the same gimbal. The platform with the gyros and accelerometers itself keeps its orientation to the inertial space and therefore it moves with reference to the ground one time in 24hours in a direction opposite to the earth’s rotation. These errors as well as the drift and acceleration errors have to be compensated by a computer program. A picture of one of the first platform systems, the “LN-3A” of Litton Industries is shown in Fig. 6. The advantages of the platform system is a nearly total isolation from the vibrations of the aircraft, because the gyros sense only small rotation rates. However, a sophisticated mechanisation is necessary for the factorisation of the gimbal platform, as seen in Fig. 6. As a supplement, the rotation rate and the accelerations in the aircraft body axes etc. must be measured by an additional gyro and acceleration cross. Therefore a new generation of inertial navigation systems has been designed. The idea is to put the gyros and accelerometers parallel to the NA, EA,VA
North, East, Vertical Accelerometer
N G, EG, VG
North, East, Vertical Gyro
R R , PR. YR, AR
Roll, Pitch, Yaw, Auxiliari Resolvei
SM
Gimbal Servo Motors
P
Platform
OG
Former Outer Gimbal
OG’
Obter Gimbal
Roll Axis IG Yaw Axis
-
-
FIGURE 5. Schematic view of a gimbal platform.
Inner Gimbal
8
HANS-JURGEN HOTOP
FIGURE 6 . The inertial measurement unit of the platform navigation system Litton LN-3A.
aircraft’s body axis and to transfer the hardware gimbal platform mechanisation into an analysis, which has to be evaluated in a computer. The main problems for these “strapdown systems” are in the beginning a decreased accuracy of the navigation and an extremely wide scale region for the measured rotation rate. For this mechanisation all the rotations of the aircraft have to be measured directly - not in parts as described by the platform system - and the vibrations of the aircraft are detected as well. Only the laser gyros survey such a wide region of rotation rate with the required high accuracy. The accelerometers cause no problem. The functional diagram of a strapdown navigation system is outlined in Fig. 7. As can be seen, the main work is done by the computer. What is important, is the calculation of the transformation matrix Crib, which transforms all data from the aircraft body coordinate frame into the geographical coordinate frame. The elements of this matrix are evaluated by using the theory of quaternions. The advantage of the strapdown system is that it delivers additional data useful for the flight guidance of an aircraft. Nevertheless the navigation accuracy is comparable to that of platform systems, which is about 1 nm/h (nautical miles per hour) for a standard navigation system.
9
10
HANS-JURGEN HOTOP
The main advantages of all inertial navigation systems are independence from ground information and the provision of additional information useful for aircraft guidance.
B. Radio Navigation Radio navigation is the most common and well-known navigation aid for aircraft and ships. The navigation information is based on the amplitude, frequency or phase angle of the transmitted radio signals. These systems measure the orientation with reference to a ground station by calculating the distance or the angle. The main radio navigation systems for aircraft are NDB/ADF (nondirectional beacon/automatic direction finder), VOR (VHF omnidirectional radio range), DME (distance measuring equipment), TACAN (tactical air navigation system), LORAN (long range navigation) and GPS (global positioning system). All radio navigation systems with the exception of the LORAN and GPS have a range of about 200 nm. NDB/ADF and VOR are systems that produce angle information, so the pilot gets the angle of the aircraft with reference to the ground station. To get a position information two systems are needed and the intersection gives the position sought for. While the NDB/ADF works on 200-1700 kHz and VOR on 108- 1 18 MHz the DME and TACAN use frequencies between 960MHz and 1215MHz. The DME measures the direct distance to a ground station in nm (nautical miles). In this case the pilot knows only the radial distance to the station and the information of two stations is generally not sufficient because two arcs have two crossing points. Therefore more DME stations must be used to give proper position information. A so-called multi-DME receives data from all DME stations around the present position and calculates the position of the aircraft by additionally eliminating the errors of the utilized signals, if sufficient stations are present. On the other hand the pilot often combines the navigation information of a VOR and a DME, because these ground stations are at many places at the same position. The navigation aid TACAN used to produce informations only for military purposes, but nowadays it is combined with a VOR and is called VORTAC. Analogously to VOR/DME, this system evaluates a radial and an angle to the transmitting ground station. The LORAN navigation system works on 1750-1950 kHz and is utilized especially by mariners and transocean flights, because the stations are located near the coast. A master station and two slave stations transmit the same frequency, and by regarding the time differences, two hyperbolic lines of position can be crossed and give the user position. The accuracy of all these depicted radio navigation systems depends on the distance between the transmitter and the receiver and can reach 200m.
RECENT DEVELOPMENTS IN KALMAN FILTERING
11
The newest radio navigation system is the satellite-based GPS, which is composed of three parts: the space, the control and the user segment. Eighteen satellites form the space segment to warrant the reception of a minimum of four pieces of satellite information. Currently 16 satellites are in space and therefore the GPS information is available nearly all times a day. The satellites move on Kepler paths around the earth, and the path parameters are measured by the ground stations. The information of the control segment is sent to the satellites, so the user receives this coded information in addition to the time at which the signal is transmitted. The user segment contains a microwave receiver, a high-precision time base and a computer for calculating the position from these signals. The computation program contains mostly a Kalman filter for the evaluation of the present position of the receiver out of the satellite data. The accuracy for the GPS is less than lOcm for stationary operation and about 50 m for a manoeuvering basis. Some GPS receivers are on the market but, due to the missing satellites in orbit in the past, the first receivers are used only for testing or stationary operations (Hurrass, 1988; Schanzer, 1989). In aircraft the GPS information is influenced by the switching between the satellites and the signal cutoff during roll manoeuvers.
C . Error Models f o r the Navigation Systems Combining the signals of the two navigation types, the following becomes evident: Inertial navigation system errors increase with time, while the errors of the radio navigation systems are determined by the position. As regards time, an inertial navigation system has a small error for short flight duration and a large error for long flights. This can be seen by the accuracy information of l nm/h, which becomes 10nm position error after a 10h flight, for example from Frankfurt (West Germany) to New York (United States). To reduce these errors the two kinds of system information have to be merged. To do this, a good description of the errors of both systems is needed. As the inertial navigation system delivers a lot of signals for aircraft flight guidance, the errors of all these signals have to be calculated using the radio navigation data as support information. Here the error model of an inertial navigation system is outlined. This is nearly independent of the kind of inertial navigation (platform or strapdown). The only difference is the additional transformation with the matrix Crib. For navigation purposes the following three error parts are of interest: angle, velocity and position errors. The angle error equation has to be evaluated from the transformation equation
12
HANS-JURGEN HOTOP
which represents the relation between the navigation and body coordinate frame. Differentiating this equation leads to in= Crib * Vb
+ Crib + Cbn . Crib *
Vb
= Crib * [Vb
(5) 'Vb]
By multiplying this equation with C,-d = Cbn - because matrix that implies C,-d = CTb = Cbn - it follows,
Crib is a rotation
+
c h n ' Vn = Vb C b n ' Crib ' vb (6) The last term on the right-hand side of this equation produces the errors of the transformation, especially the errors of the measured angles relative to the rotation rates. This term can be written in the following form
Chn* Crib ' vb = wbnb x vb = @ ' vb
(7) where wZb is the rotation rates in the rotating body coordinate frame with respect to the navigation coordinate frame, which means
.
These equations yield the matrix differential equation Crib = Cnb s2Zb and the error equation with respect to the angle errors is evaluated in= M,,
t,,
+ M,,
6vn
+ M,s - 6Sn - Crib
6wt
(9) with the angle error t n ,the velocity error Sv,, the position error SS, and the drift error w:
tn=
*
*
[z] [s] 6vn=
6Sn=
[2]
w:=
[;;]
(10)
( N = north, E = east, D = down). The drift error part is multiplied by the transformation matrix only for strapdown error evaluation. Defining cp as the latitude, A as the longitude, wo= 15"/h as the earth rotation and E as the earth radius (calculating with respect to the reference ellipsoid), the matrices M can be evaluated with the definition CY
-
= wo sin cp
p=wo-coscp tan cp y=E
RECENT DEVELOPMENTS IN KALMAN FILTERING
13
as follows
-wO sin cp Mrs
0
=
-,L-
E - C O2 S ~p
-7
0
Analogously to the angle errors the velocity errors are evaluated by linearising the navigation Eq. ( I ) : sV,, zz bcnb
-
.ah + Cnh hah
( 2 * w;
-
( 2 . SW;
+
x v,
SW;)
(15)
+ w?) x sv, + sg,
and this can be remodeled to a similar equation as described for the angle error Sv,
=
M,,,
- en + M,,,,- Sv, + Mus- bS,
-
-
Crib 6ab
(16)
Finally, the position error has to be calculated using the equation bS, = R-' bv,. Putting all the results together, the error model of an inertial navigation system can be established. As explained previously, the vertical velocity as well as the height are incorrect, because of influences of the variation of the gravity field. In addition, the great value of the gravity vector leads to an inaccuracy of the measured vertical acceleration. The vertical velocity and height are the results of the integrated acceleration and are therefore characterized by significant errors. Thus the vertical axis of the navigation part is continuously supported by the barometric height, calculated by the air data computer. Therefore the system error model is formulated only for the horizontal axes and a separate error model exists for the vertical axis. For supporting the position data of a navigation system only the horizontal error model is significant and is described here while the
.
14
HANS-JURGEN HOTOP
other model can be found in the literature (Hotop, 1989; Lohl, 1982; Stieler and Winter, 1982). Using the Eqs. (11) as an abbreviation for the error model evaluation, one can additionally define VE
C=a+vE.y
K = p +
E*
C O S cp ~
Together with these abbreviations the error model is given in a vectormatrix formulation:
0 1 E
--
0 ...
-
1 E
a
0
0
0
0
-7
--K
0
...
0
-aD
aE
aD
0
-aN
...
3
-2.C
E a+C
P
0
1
0
0
0
T
O
0 ..
0
E 0
..
-E ...
... 1 -
I
E.
cOS
9
0
0
...
..
0
0
This differential equation describes the error behaviour of the horizontal axes of an inertial navigation system. The vertical error description is based on the error model of the barometric height, which can be evaluated by partial differentiating the barometric height formula. The errors of radio navigation systems are very different and depend on
15
RECENT DEVELOPMENTS IN KALMAN FILTERING
the receiver in the aircraft as well as on the transmitter on the ground. Other effects, such as for example the intensity of the signal and the distance to the ground station are information that influences radio navigation accuracy. With this information as well as the geometry and the electromagnetic influence, the accuracy data can be calculated with the probability theory. Some equipments, especially the GPS, evaluate the stochastical data for the measured position, for example the covariance and correlation coefficient, which may be useful when combing with inertial navigation systems. Because such accuracy information is very different, the actual value for the radio navigation system used in the present has to be adapted before applying it as support data. But it is not reasonable to evaluate all these errors with different error models. For a first comparison between these two different navigation systems inertial versus radio navigation - Figs. 8 and 9 show the position error of an inertial navigation system and the stochastical characteristic of radar measurements for real flight data. In the first figure the smoothed curve indicates a minimum of stochastical behaviour, but the maximum error is about 1.5 km after 60min, which corresponds to the accuracy of a good inertial navigation system. The second figure outlines the errors of a radio navigation system, especially a radar, which represents very accurate radio navigation systems. Deterministic errors of these systems are very small, only the distance error can be observed. At about 40min after takeoff the error increases and at this time the distance between the aircraft and the radar station reaches a maximum. The stochastical errors of about f 5 m
8
28
40
tiiw
-
60
.In
FIGURE 8. Typical east position error of an inertial navigation system during a flight.
88
16
HANS-JU RGEN HOTOP
+ I
lee
5
m
e
-
.
,-
*.
I
t
-im
I
t
I
1
characterize a good radio navigation system. Noise values of other radio navigation systems, except the GPS, are normally higher.
111. FILTER THEORY
For merging the signals of an inertial navigation system with those of a radio navigation system, a lot of mathematical algorithms can be used. The main problem is how to reduce the noise of the signals, especially the high noise of the radio navigation system. In this case one can use digital filtering, based on the theory of differential equations. A digital filter equation is developed as the solution of a differential equation primary in the form i+,.X+b*x+c
=y
(19)
or the derivation of y on the right-hand side. The solution of such an equation in form of a difference-equation can be calculated with the Laplace-transformation and the Z-transformation or in a simple form with the differential quotient. All evaluations lead to the following expression:
The coefficients a, p, 7, 6 depend on the coefficients of the differential equation as well as on the solution method. With this method the signals of the input y , can be smoothed, but cannot
RECENT DEVELOPMENTS IN KALMAN FILTERING
17
be combined. If one uses this equation for filtering the signals of the radio navigation system, only the position errors of the inertial navigation system can be evaluated and subtracted. If there are no radio navigation signals, because of cutoffs, then the error of the position increases analogously to the error equation of the inertial navigation position. Such a sawtooth graph for the position error cannot be accepted. and moreover, the errors of the accelerations, velocities and angles cannot be calculated exactly. So, this method is inadequate for solving the given problem. Looking further in the probability theory, the next method for this problem is regression. In its multiple-dimensional form the errors are described in a matrix-vector equation: ~ = A . x + u (21) where x is the unknown vector to be calculated, y is the input vector and u is a random error characterising nonmodelled system parameters. The solution of this equation is well known as the pseudo-inverse matrix. The following formulation solves the equation:
This solution must normally meet the requirements for the regression method, which means that the distribution is Gaussian and the matrix A must be well known and describe the problem almost exactly. For our problem with the inertial navigation, the matrix A models the errors of the inertial navigation system, and if the radio navigation is present, the errors can be calculated with the regression analysis. On the other hand the errors and the random process of the radio navigation are not taken into account. T o utilize all this information, the theory of Kalman filter should be applied. A . The Coriventionul Kulmun Filter Theory
Here, the so-called Kalman-Bucy Filter (Kalman, 1960; Kalman and Bucy, 1961) is generally named the Kulmunfilrer (or convenlional Kulmun,filrer),as is standard in many papers. The following stochastic process x(t) is given: X(t) = F ( t ) .x(r) z(t) = y(r)
+ G ( r )-u(r)
+ v(r) = H ( r ) .x(t) + ~ ( t )
(23) (24)
These definitions include 0 0
a vector ~ ( t )as system state vector (e.g., the INS errors) with n components, a vector z ( t ) as measurement vector (e.g., radio navigation measurements) of m components with m 5 n,
18
0 0
HANS-JURGEN HOTOP
a matrix F ( t ) as n x n matrix, a matrix G ( t ) as n x p matrix ( p 5 n), a matrix H ( t ) as m x n matrix and random vectors ~ ( t and ) ~ ( t as ) vectors with p or m components.
The vectors u ( t ) and v(t) should be independent stationary processes with constant spectral density; which means the probability has “white noise.” ) v(t) are random vectors with zero mean value, it Since the processes ~ ( tand follows E [ u ( t ) ]= E[v(z)]= 0
(25)
The covariances, or to be more precise the covariance matrices, for all t and T of the definition interval, can be set:
cov [ ~ ( t ) ,u(T)]= Q ( t ) * 6 ( t- T )
(26)
cov[v(t), v(T)] = R ( t ) - S ( t - T )
(27)
V(T)]= 0
(28)
COV[U(t),
6 is the dirac delta function, Q ( t ) and R ( t ) are symmetrical nonnegative matrices, differentiable in t. The differential equation of the linear dynamic system and the continuous function describe the physical system. F ( t ) should be transformed into the discrete matrix @ ( k 1; k ) , k = 0, I , 2 , 3 , . . . with k = k At for a time interval At, because the input signals of such physical processes are discrete values. This matrix is often referred to as a transition matrix and can be described by evaluating the differential equation. A solution is an integral equation:
+
-
The matrix @ ( t ; t o ) is nonsingular and it follows for to 2 t l 1 t2 1 0 and with I as the unit matrix (Coddington and Levinson, 1955):
If the solution is regarded for a small time interval, the assumption can be made that the physical system matrix F ( t ) is independent from time and therefore F is a constant matrix. In this case the solution of the differential
RECENT DEVELOPMENTS IN KALMAN FILTERING
19
equation is much simpler:
Because this solution holds only true, if the time interval is very small, the e-function can be expanded by Taylor into a series:
and with a linear approximation the equation has the form @(ti to) x I + F A t (33) For each time interval A t , the elements of the physical matrix F change analogously to the measured data. In addition, this formulation is particularly suitable for applications in the computer. The theory behind these explanations is the theory of linear dynamic systems. With the solution of the differential equation in the preceding form the equation of the stochastic process now becomes the discrete form
(34) with u h as an independent Gauss stochastic process of zero mean value. Similarly, the measurement vector z k can be calculated by the measurement equation for a fixed discrete time point k = k A t : 1:
Xk+l
k).Xk+Uk,
-
Zk = H k
+ Vk
(35) As indicated, v k should be an independent Gauss stochastical process with zero mean value. For both random vectors u k and v k the covariance is given in Eqs. (26) to (28). For the application in navigation the mathematical error model, shown in Eq. (18), can be used here. The state vector x k and the matrix F are those lined out in the Eq. (18). So, to construct the transition matrix 9,only the value 1 must be inserted in the main diagonal or added to those diagonal values. The measurement vector z k is normally for the support of inertial navigation systems by radio navigation data - a two-dimensional vector and contains only the horizontal position values. The measurement matrix H k has only standard values in the main diagonal, corresponding to the position elements. If the supporting system supplies additional data, for example the horizontal velocities, the dimension of the measurement vector increases. *
Xk
-
B. The Discrete Kulmun- Bury Filter The idea of the Kalman filter is to minimize the expected value of the quadratic error between the state variable and an estimation vector, which
20
HANS-JURGEN HOTOP
means E[llxk
- xk-l,k112] = z i y E I I I X k -
%?1l21
(36)
with
T = (21%is a linear estimation of x k } Generally the Kalman filter is outlined for continuous equations. For applications in engineering and physical problems the following discrete Kalman filter equations are used.
Theorem 1 (Discrete Kalman Filter) The optimal state estimation x ( k 1 Ik) and the corresponding covariance matrix P k of the estimation problem
+
xk+l = zk
=
@(k
+ 1;
H k 'xk
k)' x k f
uk
+ vk
has to be evaluated by the following equations.
Prediction.
Update. kk(+)
(39)
=Xk(-)fKk'[Zk-Hk'Xk(-)]
P k ( f ) =
[I
(40)
-Kk'Hk]*Pk(-)
Kk =P k ( - ) . H ~ . [ H k . P k ( - ) . H ~ +
Rk1-l
(41)
The matrix Kk is the so-called Kalman gain matrix. In these equations the symbol (-) means the value immediately at the time before a measurement takes place and (+) the value after the measurement. The following diagram shows this situation:
measurement 1
measurement 2
The so-called prediction is evaluated for the covariance matrix P and the
RECENT DEVELOPMENTS IN KALMAN FILTERING
21
state variable x, even if no measurement vector exists. In this case, the system equations for the estimation problem are processed for the future time intervals. Otherwise, if there are measurements for fixed time intervals, the update Kalman filter equations have to be evaluated in addition to the prediction equations. The proof of this theorem is described in the literature (Kalman, 1960; Kalman and Bucy, 1961; Hotop, 1989a). The main problems of the conventional Kalman filter for practical applications are the inversion of the matrix (H Pa H T + R), which is part of the Kalman gain matrix equation, and within the calculation of the covariance matrix in the update Eq. (40) determining numerical instabilities. In this equation a minus sign appears between two matrices. Evaluating the equation with single precision (REAL*4) on a computer creates negative main diagonal elements of the covariance matrix, which conflicts with the theory of covariance matrices. A first new formulation to eliminate the negative diagonal elements was made by Joseph (1964). He utilized for the covariance matrix P the following equation:
-
Pk(+) = ( I - K k . H k ) . P k ( - ) - ( l - K k . H k ) T + K k - R k - K r
(42)
which is part of the discrete Kalman filter proof. However, this formulation requires more calculation operations and has no advantages in the numerical stability (Carlson, 1973; Thornton and Bierman, 1980). In most of the papers about Kalman filter techniques (Carlson, 1973; Bierman, 1977; etc.) the authors use the update algorithm of the conventional Kalman filter for only one measurement after the other, which means the components of the vector zk are evaluated successively. For all “squareroot” algorithms this is a necessary condition. With this idea, the measurement vector of m components at time r = tl is interpreted as m measurements following directly one after the other. Remembering this and defining hk,ras the ith-row vector of the measurement matrix Hk and of course R k as diagonal matrix D(r,) at this time instant, the Kalman gain matrix K is transformed in the following form:
with a j = h kT, i . P k . h k , i + r l
( i = 1 , . . . ,m )
The other equations of the conventional update algorithm have to be modified analogously. Especially the so-called Joseph algorithm can also be reformulated to use one measurement after the other. The main effect, using only one measurement, is the substitution of the
22
HANS-JURGEN HOTOP
matrix inversion, necessary for the calculation of the Kalman gain matrix Kk - which is performed generally by numeric computer inversion - by division. The advantage lies in a decreasing computer running time as well as the numeric stability. But with this method the whole update algorithm has to be evaluated m times, where m is the dimension of the measurement vector zk. As seen in the preceding equations, an additional assumption is a statistically uncorrelated ratio of the measurement vector zk. This follows from the diagonal characteristic of the matrix Rk,containing the stochastic description of the measurement. For practical reasons these assumptions cannot be made in reality; therefore, this is a large source of error. C . Square- Root Formulations of the Kalman Filter Algorithm
To eliminate the problem of a negative covariance matrix Pk Carlson (1973) and Bierman (1977) formulated a new Kalman filter algorithm, called the square-root formulation. The idea is to divide the matrix P k into two (three) matrices with an upper or lower triangular structure. The algorithm of Carlson is based on the assumption that the measurement was a scalar zk with uncorrelated covariance r, and uses the Cholesky decomposition to transform the positive definite quadratic covariance matrix Pk into Pk =
Sk'Sl
(44)
The Kalman filter equation can then be reformulated. The optimal state estimation x(k + 1 Jk) and the corresponding covariance matrix Pk of the estimation problem
Theorem 2 (Carlson Algorithm).
xk+l = Zk
k ) ' x k f uk
@ ( kf 1; T
= hk
*
xk
+ rk
has to be evaluated by the following equations. Prediction. xk+l(-)
= @ ( k f 1;
k)'xk(f)
(45) (46)
Update.
23
RECENT DEVELOPMENTS IN KALMAN FILTERING
where a = f k ( - ) *f,'(-)
+ rk
1
I?=
~+(cr.r~)"~
f k ( - ) = SF(-)* hk
Q -- Qi1I2 . The proof of the theorem is easily seen by inserting the equations into those of Theorem 1. For the factorization of the covariance matrix the method of Cholesky, which is outlined in the next section, can be used. A more practical decomposition of the covariance matrix is demonstrated by Bierman (1977). He represents the covariance matrix in the form of an upper triangular matrix U and a diagonal matrix D: Pk = Uk ' Dk * Uk7
(49) The other equations and thus their evaluation is developed analogously to the Carlson algorithm. The following theorem shows the factorization calculation of the matrices U and D in the form of vectors u,, which represents thejth column of the matrix U , and constants d, as thejth diagonal element of the matrix D.
- -
Theorem 3 (Bierman Algorithm). The covariance matrix Pk = Uk Dk Uk7 of the estimation problem can be calculated by the following equations: f
=
U T . h with f T = ( f,, . . . ,f,)
(50)
w = D . f c ~ , = d , - f ; ( i = l , . . . ,n )
(51)
The elements of the upper triangular and the diagonal matrix can be evaluated with the recursion d, . r d I' -- with a l = r + wI j = 1 ffl
u;
= uj
+ A.,
k, with A, = -
.r, ffj-1
(53)
24
HANS-JURGEN HOTOP
With this recursion at each time interval, new matrices U' and D* with the corresponding elements are calculated, At the beginning (time to = 0) the covariance matrix is generally a diagonal matrix, otherwise a decomposition of this matrix has to be done with a different algorithm. The proof of the theorem can be found in Bierman (1977) or Hotop (1989a). The triangularization of the covariance matrix produces a numerically stable algorithm. But this formulation can be used only if there are singledimension measurements with uncorrelated measurement noise, as explained earlier. Generally, one does not have this in practice. To handle correlated measurement noise, additional transformations are required, and this increases the computing time. But these algorithms cannot be utilized on a vector processor effectively - in contrast to the conventional Kalman filter algorithms - because the equations are made for serial working computers. They are structured serially, using a minimum of vector-matrix operations to save storage. D. Other Kalman Filter Algorithms
In contrast to the discrete Kalman filter the continuous formulation is used in special applications when the system matrix as well as the analytical solution of the differential equation are known. These formulations are different from the algorithm presented previously. For the support of navigation systems only the discrete formulation is applicable, and therefore no continuous formulation is outlined here. A further problem is often the unknown system noise matrix, which cannot be calculated or estimated in some applications, or it is possible that the system noise varies over the course of time. To manage all these problems, adaptive Kalman filter algorithms are formulated. The idea is to estimate the system noise matrix during execution, when a measurement takes place. A lot of different formulations are described in the literature (Groutage, Jacquot, and Smith, 1983; 1984; Jazwinski, 1970; Lechner, 1982; 1983; etc.). Here the one based on Jazwinski (1970) is presented as an example. The main purpose of each filter is to minimize the error residuals. In the Kalman filter algorithm the residuals themselves are mapped into the noise matrices Q and R. The expectation of the difference between a measurement vector z and the corresponding part of the state variable x is calculated. This difference must depend on the measurement noise matrix R, the system noise matrix Q and those parts of the covariance matrix P that describe the measurement statistics. Therefore the following equation can be formulated: E [ ( z ~ - H ~ ' X ~ ) ' ( Z ~ - H=Rk+Hk*[Pk+Qk].HT ~.X~)~]
(54)
RECENT DEVELOPMENTS IN KALMAN FILTERING
25
The covariance matrix and the state estimation vector are evaluated just in time before the update algorithm is calculated: x k = xk(-) and Pk = Pk(-). This equation is utilized to determine the system noise matrix Q or the elements of that part of the matrix which corresponds to the measurement vector :
The matrix Q is set to these values if they are greater than 0, otherwise the elements are set identically to 0. This often-used algorithm is very effective and simple. However, to determine all elements of the system noise matrix a sophisticated algorithm must be developed, which costs a lot of computation time and often leads to unrealistic results. The adaptive filter is an option of the usual Kalman filter algorithm. Therefore, all new formulations of the Kalman filter are independent of the adaptive part.
IV. NEWKALMAN FILTER FORMULATIONS A . Motivation
As a result of the evolution of the computer market, a new generation of computers is now available. Two aspects have influenced the development of software and algorithms: low-cost computers with decreasing CPU-time and increasing storage, PCs and new-generation computers, these are vector, array and systolic array computers. Here is a short characterisation of these computers vector computers: They use the “pipeline principle,” which consists of chaining and vector registers. Chaining in this case means that different floating point operations can be evaluated successively without temporary storage, which is needed if von-Neumann architecture is used. In addition the vector registers store the participated vector values, so the calculating units can load them without time delay. This explains the advantage for matrix-matrix or matrix-vector operations. Typical vector computers in the past were factorized by CRAY-Research Inc. (CRAY-1, CRAY-IS, CRAY-XMP etc.) and Control Data (CDC-CYBER 205 etc.). Today IBM and other computer manufacturers produce vector processors in addition to their computers with conventional architecture. array computers: They consist of several processors arranged and connected like the elements of an array. The conventional processors are handled by a host computer and each has its own storage and software. The problem
26
HANS-JURGEN HOTOP
is how to divide the software into parts, so that all processors work and terminate at nearly the same time. For the personal computer (PC) generation the transputer boards can be used in addition to the host PC. Special software for these transputers is available, whereas the hardware boards contain four or more transputer units. Beside these new applications on PC, some models of array computers were built in the past, such as ICL-DAP and Illiac IV of Burroughs. New developments of array computers are being made in several countries, for example in the United States, Japan and Germany. The inves’tigations focus especially on software, architectural problems and increasing the number of processors (for example 4096 or more). systolic array computers: They are not built yet, because the main idea of an optical processor has not yet been realised. The idea is to make a matrix-matrix operation in one cycle time with acoustooptic cells by using multiplication by convolution (Fisher, Casasent, and Neuman, 1986; Kung, 1984, 1985; Travassos, 1983, 1985). The software development is similar to that of an array computer. A few components of these optical computers or processors are already available, but the complete hardware has not yet been built. Therefore, new algorithms must be developed that are adapted to all these computers. In particular the main difference between the von-Neumann computer generation and the new ones has to be regarded. Otherwise, the principle of algorithm stability, which was negligible in the past, is now essential because of high-precision codes in the computers. For example, the conventional computers work on a 16 or 32bit data bus, the vector computer uses a 64bit bus, which implies a presentation of 128bits for double precision calculation. While in the past CPU time and storage were the restrictions, now the stability and special architecture of the computers are the main aspects in software development. In this chapter some new Kalman filter algorithms are presented, which include the numerical stability of the so-called UDUT formulations and are adapted to the new generation of computers. Not all the new formulations and ideas are presented, but all those that can be coded on the existing computers. Each of these new formulations is based on orthogonal transformation, so a short introduction into this theory is given and then the presentation of the Kalman filter algorithms follows. B. Application of Orthogonal Transformations As described earlier, all UDUT formulations need an orthogonal transformation for factorising the matrix P into that form. The familiar decomposition
27
RECENT DEVELOPMENTS IN KALMAN FILTERING
for a quadratic, positive-definite matrix P into an upper triangular matrix S is the Cholesky decomposition. Here, this transformation P = S S T for computer program application is briefly outlined, for i = n , . . . , 1 one has to evaluate
-
I
n
Pa f o r j = i - 1, . . . , 1
-
c+
/=I
.slf =
Sf,
forj> i
S/I'S,I
I
(56)
s/,= 0
The numerical problem of this transformation can be clearly seen in the negative sign in each equation, which probably erases the leading digits in the number representation of the computer. This extinguishment of leading bits can accumulate, and a negative value for the square root operation produces a fatal run-time error. To find these errors, additional software must be implemented to make the program safe. Therefore, other transformations are needed to prevent this disadvantage. The basis of all orthogonal transformations - in contrast to the Cholesky decomposition - is finding an upper triangular ( n , n ) matrix B and an orthogonal matrix W for an arbitrary ( n , n ) matrix A, so that W . A = B. 1. Householder Transformation
The idea of the Householder transformation is to evaluate an arbitrary ( n ,n ) matrix column by column using an orthogonal matrix. Naturally, a triangular matrix is the aim of this decomposition. Therefore the result of the transformation of each ith column is the ith unit vector, which means a vector with a 1 at the ith column and all other elements 0.
Theorem 4. Let v be an n-dimensional vector: v = V I , ~ 2 , 2 1 3 , .. . , unjT $ 0 and e l thefirst unit vector, with el = ( e l ,e2, e 3 , .. . , e n(7) = ( l , O , . . . ,0) , then there exists an orthogonal matrix W with
where f l
u1
LO
Ut
if
u={
-1
28
HANS-JURGEN HOTOP
Proof. A matrix, satisfying the assertions of the theorem is defined by
with u=v
+
llvll -el
0-
This can be seen by the following equations:
+ 0 -llvll -er) - (v + 0 -llvll . e l ) = vT * v + 2 ' 0(Iv(I 211 + 0211v112
llull 2 = u T . u = (vT
*
= 2.
llvll * (Ilvll
(59)
+a-vl)
For the multiplication of the matrix W with the vector v, the following equation is necessary: u - u T - v= (v
+
= (Ilvll
llvll . e l ) . (v
0.
+
0-
llvll . e l ) T . v
+ 0 * 211) * llvll * (v + 0 -llvll - el 1
(60)
and together with the previous denominator evaluation the result is
= v - 2*(IIvII+ ~ . ~ l ) . l l V l l . ( V + ~ ' l l V l l . e l ) 2 * llvll * (Ilvll + 0 * V I ) = v - (v + 0.llvll . e l ) = - a - llvll - e l
(61)
With the definition in Eq. (58) a transformation is specified that solves Eq. (57). One need only show, that the matrix W is orthogonal and symmetric. This can be easily seen by proving W WT = I, which is a simple calculation.
-
The matrix W is calculated for each column of the covariance matrix P, so that the output matrix has upper triangular form. The transformation itself is numerically stable, but for the calculation of the norm llvll a square root must be evaluated. Other aspects, for example minimal CPU time for computer realisation (see Table I), are motivations for us to look for other orthogonal transformations.
2 . Givens Transformation The Givens orthogonal transformation is similar to the Householder transformation, but here only a two-dimensional vector is evaluated.
29
RECENT DEVELOPMENTS IN KALMAN FILTERING
TABLE 1. CPU T I M EI N SECONDS FOR SEVERAL ORTHOGONAL TRANSFORMATIONS ON THE COMPUTERS ISM 4381, IBM 3090 A N D CRAY-IS 100 applications
IBM4381
IBM3090
CRAY- IS
Cholesky decomposition Householder transformation Givens transformation:conventional D”*.B formulation Last formulation
0.3120n 4.85233 0.78964 0.84426 0.68422
0.058 I 6 0.91450 0.14311 0.15675 0.13066
0.03740 0.14222 0.03904 0.03742 0.03242
Theorem 5. Let v be a two-dimensional vector: v = ( u , , ~ gonal matrix G exists with
G.v=
[
2 $ )0. An ~
ortho-
]
where a n d c2 f s 2 = 1 Proof. The matrix G, especially the elements c and s, can be evaluated in form of the following equations: v1 v2 and s = (63)
c=m
-
With these definitions, the multiplication G v can easily be evaluated and therefore the matrix G satisfies the assertions of the theorem. So, the transformation matrix has the form of a rotation matrix (s sin and c 2 cos), which completes the proof.
For the transformation of a whole covariance matrix P, ( n - i) matrices G has to be calculated for transforming the ith column of the matrix. Let us remember the presentation of the P-matrix factorization in square root form: the Givens transformation must be modified to become the same output form. The following construction is made for an arbitrary ( n , n ) matrix A to form an algorithm, which can be utilized not only for the Kalman filter, but also generally for similar problems. The matrix A can be presented in the form A = D’/2. (64) For example, let us set the diagonal matrix D as the unit matrix and B = A.
30
HANS-JURGEN HOTOP
The aim is to find an orthogonal transformation G , which transforms this matrix A into a diagonal matrix D and an upper triangular matrix B. So, the Givens transformation is made for a two-dimensional vector. The description of the algorithm is first outlined for a (2,2) matrix A:
If the element b21= 0, then no transformation is needed, because the matrix B is already in upper triangular form (==+ G = I). Otherwise, if bZI# 0, the following matrix G is used together with the definition:
Transforming the original matrix in its factorized form, it follows
(68)
But the result should be of the same form as the input matrix, therefore the matrix in Eq. (68) is divided into the two matrices D and B:
These two matrices are the result of the Givens transformation, which means no matrix G must be evaluated. Otherwise in this formulation no square root is necessary, because the matrix D can be used as the diagonal matrix instead of the multiplication of D 1 l 2 by itself. But the algorithm requires a lot of operations, because all columns and rows of the (n,n) matrix A has to be transformed.
RECENT DEVELOPMENTS IN KALMAN FILTERING
31
To save computing time the algorithm is reformulated to produce the value I in the main diagonal of the matrix B. Several formulations can be found to obtain such a matrix. Here an algorithm is outlined that requires a minimum of CPU time on a computer. To transform an ( n ,n ) matrix A, the same decomposition A = D1/' B as just described is used together with the formulation
- d l O 0
d2
0
..
0 . ..
'.
D=
. -0
0
...
...
...
0
0
0 dn
-
Theorem 6. The result D ' / 2 B of the Givens transjbrmation upplied to an arbitrary (2, n ) matrix, presented in the ,form D112 B, cun be described us followls, if bZI # 0 yields
-
whereas the variable r is deji'ned as in Eq. (66). For evaluating the next rows of the whole ( n ,n ) matrix - with the assumption that bIl # 0 - the calculation process can be reduced because the main diagonal element of the matrix B is 6 ,I = 1. Therefore the result of thejth row of the complete ( n , n ) matrix A or, to be precise, D 1 / 2B . is
Remark.
,.l(I)
= dl( I - ' )
.b:(,J-')+ 4 . b2/ I
= ,.Z(I-I)
+ dJ
*
bfl
32
HANS-JURGEN HOTOP
Proof of the Theorem. Since the matrix A is represented in factorized form, the product AT A = B T . D B is evaluated and compared with the transformed matrices BT D B. For a (2,2) matrix it follows
-
=
- -
-
- + d2
. . . = dl b:2
*
b&
-
This result is equivalent to the matrix B T - D 8 , which can easily be evaluated out of Eq. (65). The numerical stability can be proved by using the rounding error analysis of Wilkinson (1965; Wilkinson and Reinsch, 1971) and is described in details by Hotop (1987; 1989). The quantity of operations is reduced, which can also be seen by a comparison of the CPU-times for the presented orthogonal transformations on a computer IBM438 I , IBM3090 (with vector processor) and a vector computer CRAY-IS (Table I). 3. Singular- Value Decomposition In most problems the matrices that have to be transformed are nearly singular, which means Rg(A) 5 n for a ( n , n ) matrix. The singular values
33
RECENT DEVELOPMENTS IN KALMAN FILTERING
of a matrix A are the positive square root of the eigenvalues of the matrix A - A T . It can be proved, that an arbitrary matrix A can be factorized into orthogonal matrices U , V and a diagonal matrix D(A = U D - V T ) ,containing the singular values. In the literature (Bunse and Bunse-Gerstner, 1985; Maess, 1985; Schwarz, 1986; etc.) several algorithms are described to evaluate a singular value decomposition (svd). Here the so-called QR algorithm with “shifts” is demonstrated. The algorithm is based on Francis (1961) and is adapted to the outlined Givens transformation. Especially for the Kalman filter formulation the covariance matrix P = U D . U?’ has to be transformed. In a first step, the covariance matrix is transformed into a bidiagonal matrix using the Givens transformation:
-
.
- & J;I;.e2
o
BT =
0
a
0
...
0 -
0
...
0
o
&.e3
0
...
(75)
0
0
0 -
0
0
0
0
a-
a * e ,
2 . Calculate the eigenvalues by using the characteristic equation:
o = [d, - ( I + ef ) - XI - [d,+ I - ( I + ef+ I =
X’ - x
-
ti
r!+f
1
2
4
+ c) + r .s - d,
-
-
XI
-
- d, d,+ I
- + ef )
6 = d, (1
2
I
2
d,+ I e, + 1
(77)
- x ~ , =~ - ~ - - J ( d - t ) 2 + 4 . d , . ~ , + I . e f + l with
- e,+
c = d,+l *
(1
+ ef+l)
34
HANS-JURGEN HOTOP
3. Evaluate the singular value oj,which is the totally greatest eigenvalue! Therefore the minus sign holds, if 6 + E < 0, otherwise the positive sign must be used for oi= A. Now the QR decomposition: B ~B .- o k . I = a,. R
has to be evaluated and the transformation of the matrix a,' BT again into bidiagonal form is the last part of the singular value decomposition. The whole algorithm together with a FORTRAN program is described in Hotop (1989). The steps - calculation of the singular values, evaluation of the Q R decomposition Eq. (78) and transformation of (2,'. BT to bidiagonal form must be repeated until the element e, is 0 or nearly 0. Then the same procedure is realized for all bidiagonal elements of the matrix BT and the algorithm terminates if all these elements become 0. The number of iterations for a singular value decomposition can be found in Maess (1985), Schwarz (1986), etc., and they outline a number from 2n up to 5n iterations. For the covariance matrix decomposition on a CRAY-IS with an accuracy limit of the number of iterations is 2.51 on average. The description of the algorithm shows that a lot of CPU time is needed to get the singular-value decomposition of a matrix. But otherwise this algorithm is extremely numerically stable and evaluates a triangularisation for nearly singular matrices. C . New Formulation of the Kalman Filter Prediction For most of the new Kalman filter algorithms the covariance matrix is factorized by the square root formulation of Bierman (1977). Using the UDUT decomposition for the covariance equation and, for simplification and reducing the formula, omitting the signs (-) and (+), the prediction part of the Kalman filter can be written as follows:
Together with the additional definition
u; = @(k
-
+ 1;
k) Uk
Equation (79) can be rewritten into the form U k + l * Dk+l *
T
-
Uk+l = U; Dk * U;'
+ Qk
(81) The goal is to calculate Uk+, as an upper triangular matrix and Dk+, as a diagonal matrix with the preceding equations. For the decomposition of
RECENT DEVELOPMENTS IN KALMAN FILTERING
35
each U; and Qk as upper triangular matrices, one of the orthogonal transformations can be chosen. Without restrictions to the generality, the orthogonal transformation matrix is called G . Using this definition and evaluating first an upper triangular matrix U; for the right-hand side of the covariance equation, it follows
-
-
G DL/* * UIT = G DL'2 * ( @ ( k + I ;
k ) * Uk)7 = D, - 1P.0,'
(82)
with DL/?.D ' / * = D x!
k
This formulation is equivalent to the formulations for the Givens transformation. To get the full covariance matrix, it must be multiplied by its transposed matrix:
0,.
Dk .0[ = u; .D:/*. G . G T . D;/*. U: = u;. D ~u;7 .
(83)
= @ ( k + I ; k)-Uk-Dk-U[-@T(k+l: k ) This provides the decomposition of the first term, which is, of course, the main part of the covariance equation. The matrix Q, as the second part of the covariance equation, can be factorized by the orthogonal transformation into the form Q 1 / 2Q . ' / 2= Q . So, Q is normally a diagonal matrix, and therefore the decomposition is simple. But the main problem for the addition of the two decomposed parts still remains. This can be solved by the decomposition of a separate presentation for the two matrices as follows:
1
1
To show that the decomposition represents the prediction Eq. (79) of the Kalman filter algorithm, the matrix must be multiplied by its transposed matrix. Using the calculation rules for matrices, the equation can be confirmed by Uk.Dk-U,T=@(k+
1; k ) * U k * D k * U [ * @ 7 ( k +I ; k ) + Q 1 / 2 . G * G T . Q ' / 2
= @ ( k + l ; k).Uk*Dk*U,'*@7'(k+1;k ) + Q (85)
Summarizing the result, the matrices Uk and 6, can be used as a decomposition for the prediction covariance equation. The orthogonal transformation G is applied to the transposed matrices and is not needed in reality, as the evaluation shows. The matrix Q is needed in decomposed form, which
36
HANS-JURGEN HOTOP
poses no problem, because in general Q is a diagonal system noise matrix. Otherwise, a decomposition has to be made using one of the procedures described previously, or if the matrix is constant for the complete application time, an analytical decomposition can save CPU time. To sum up, the set of these equations define an algorithm, which is numerically stable by means of orthogonal transformations. Different transformations can be utilized in accordance with the nature of the matrix. This also applies to singular covariance matrices, which can be decomposed by the svd. The state vector evaluation is not affected by the decomposition, because no covariance matrix is used in that equation. So only the transition matrix @ is part of this equation. In that case no numerical problem arises and the calculation of the state vector is not changed from the conventional formulation. D . New Formulation of the Kalman Filter Update
Similar to the prediction part of the Kalman filter the update equation is now reformulated in the UDUT notation. For these equations as well, the index k and the (-) and (+) signs are omitted by defining the covariance matrices as follows P* = Pk(+) and P = Pk(-). Looking at Theorem 1 the Eqs. (39) and (40) have to be recalculated, and in a first step, the definition of the matrix Kk is substituted into the covariance and measurement update equation, applying the previous abbreviations: X* = X
+ P - H T . [H - P *H T + R]-' - [Z - H * X I
(86)
P*= P-P*HT.[H-P.HT+R]-l-H*P
(87)
The problems occur mainly in the covariance equation, and therefore a decomposition of these matrices is of fundamental interest. Using the factorization of P with P = U D ' UT, it yields the result
.
.
. .
. .
.'
.
U*. D*. U*T= U D. UT- U D UT.HT.[H U D. U HT+R]-'. H .U D. UT =
U . { D - D . U T . H T . [ H . U . D . U T . H T + R1-I . H . U . D } . U T
= U.{D
-
D . X T . [ X . D . X T + R]-' . X - D } - U T
.
(88)
The assistance matrix is defined as X = H U. This formulation is used for the Bierman algorithm with the restriction to a scalar measurement with uncorrelated noise. In particular Bierman designed a formulation for serial computers and on-line computation, at a time when CPU time and storage were important for the implementation.
RECENT DEVELOPMENTS IN KALMAN FILTERING
37
I . Update Formulation Based on Andrews Other authors such as Andrews (1968) uses the full covariance equation in matrix form. In contrast to the original description, here the covariance matrix P is used in the form P = W . W T instead of the formulation P =U*D*UT.
Theorem 7 (“Andrews” Update Formulation). The symmetric, positive definite covariance matrix P’ o f t h e Kulman filter E y . ( 4 0 ) can be,factorized in the form p* = U * . D * . U * 7 = U . Y . D . Y T . U T
(89)
where Y=I-D.XT.Z-T.[Z+R1/2]-1.X
X=H-U
z . z T =x . D . x ~ +R The proof of this theorem is easily seen by substituting the additionally defined matrices Y, X and Z into Eq. (89) and calculating the covariance matrix (Andrews, 1968; Hotop, 1989). This algorithm was not used in the past, because more computing time is necessary in comparison with the other formulations. The reason for the increased CPU time is the decomposition of the matrix 2.With the formulation described here and the Givens transformation of Section 1V.B the CPU time can be reduced. To triangularize the matrix Z the same idea as that for the prediction can be used:
Utilizing the decomposition by the Givens transformation for the righthand side matrix, the lower part RTi2 becomes 0, while the result of the upper part is a diagonal and an upper triangular matrix, as required. Only the problem of the matrix inversion [Z R1’2]-1remains. Since these matrices have an upper triangular form, the following lemma calculates the inversion based on the Gaussian algorithm (Lawson and Hanson, 1974).
+
Lemma 1. Let an arbitrary quadratic matrix B be factorized into a regular upper triangular ( m ,m ) matrix V (elements u,,) and a regular diagonal (m,m ) matrix D (elements d,) in the f o r m B = V D WT.The elements of the inverse
. -
38
HANS-JURGEN HOTOP
matrix 6-1 = v - ~D-' . .v-' , or to be more precise, the elements Gly of the matrix V-l are calculated by the following equations:
I
J
-
uly i-l
0
j < i
The elements of the matrix D-I are the reciprocal values of the diagonal element di. Proof. The proof is divided into two parts: First the matrices V and D are regular, which implies the existence of their inverse matrices; second, to show the preceding construction for the elements, the equivalence V-' . V = I is used by inserting the definition of Go. If the Givens transformation is utilized for the decomposition, the values of the diagonal elements vii of the matrix V are all 1. This yields a reformulation for the preceding equation of the element f i g : ( 1 j = i
I
i-l / =i
I
0
j
and reduces the number of operations as well as the CPU time. One problem remains for this algorithm: The factorized matrix R 1 / 2is needed. As described for the system noise matrix the measurement noise matrix can be decomposed with one of the orthogonal transformations. In general, this matrix has a special form and can therefore be triangularized analytically. For example, let R be a (2,2) matrix and characterise the covariance of a position measurement, say, a radio navigation system, described in the form
with p I 2 as the correlation coefficient between the two measurements. A decomposition can be found: R = S - 1 S T with S=
[
-
q
.
r 0
H
P12'rll r22
1
(94)
RECENT DEVELOPMENTS IN KALMAN FlLTERlNG
39
which can be proved by multiplying the matrix with its transposed matrix. The decomposition of other kinds of matrices R differs, but this previous work on analysis reduces the computing time and in most cases factorization can be done very easily. The last work on this algorithm has to be done on the calculation of the state-variable update. Therefore the Kalman gain matrix K is needed. Together with the defined matrices X and Z this matrix can be constructed in The results of this algorithm are disthe form K = U-D-XT-ZpT-Z-l. cussed together with those of the others in Section V.
2. Update Algorithm Based on Jover and Kailath The following algorithm has its origin in Jover and Kailath (1986). The basic idea of this evaluation is the so-called Schur complement, a method to solve a linear system of equations. Any given linear system of equations in the form A * x = r with A (n,n) matrix, x and r n-dimensional vectors can be evaluated if the matrix A is split into four blocks and analogously the two vectors x and r into two new section vectors x I ,x2 and r l , r2. Using these split terms, the linear system of equations becomes and
All
A = [A2,
x=
,I:[
r=
[ri]
(95)
Now we will define the so-called reduced matrix A;,, which is called the Schur complement in several papers: A;2 = A22
A21 *AT,' A12
(96) Similar, for the right-hand side of the equation one has to define a new vector: -
r; = r2 - AZI.ATI' . r l
(97)
Using these definitions, the linear system of Eqs. (95) is reduced to
The solution is a lower stepped linear system of equations, which can be easily evaluated by calculating first the vector x2 and then the vector xI. If one uses the idea of the Schur complement for the update formulation of the covariance matrix equation in the form p' = p - p . H T . [H. p . H7 + R1-I. H . p
(99)
this equation can be modified analogously to the definition Eq. (96). The simplest way is to define abbreviations for the concerned matrices as
40
HANS-JURGEN HOTOP
follows: p* = P - A . R - ' . A T
(100)
with R=H.P.HT+R
and A = P . H T
If one compares this modified covariance equation with the Schur complement equations and matrices, the matrix P* can be described as the Schur complement of the following matrix M: M=[:
21
Let us bear in mind the decomposition of the covariance matrix, the aim is to construct an analogous factorization for the matrix M.
Theorem 8. Define R as a regular quadratic matrix with R = . = H p . HT + R
v vT
.
(102) and a symmetric matrix P decomposed into P = U * D UT, then the matrix M outlined in Eq. (101) can he decomposed into the following three matrices:
.
M= where K=P.HT.VpT and p * = U * . D * . U j T P' satisfies the covariance Eq. (40) of the Kalmanfilter algorithm.
(104)
Proof. Multiplying the three matrices in Eq. (103) with each other, it follows
Inserting the abbreviations outlined in Eq. (104), the matrix M can be reformulated into
RECENT DEVELOPMENTS IN KALMAN FILTERING
41
The lower right matrix X can be recalculated together with the covariance matrix Eq. (100) as follows
.H T . V-T. 9-1 .H . p + p* = p . H T . (v.vT)-'. H . p + (p - p . H T. R-' .H.p)
X =p
= p.HT.R-'.H.P
(107)
+ p - P.HT.R-' .H . p = p
In addition to this evaluation, the full matrix M of Eq. (101) is obtained. In addition, a decomposition for the same matrix M of Eq. (101) can be evaluated, using the measurement noise matrix R, factorized in the form R =V.VT: V '=[O
H*U I U ]*[O
0 D]'[U$
This can be easily checked by multiplying the matrices with each other. In this case, two decompositions of the matrix M are available, and based on the theory of transformations, an orthogonal matrix G must exist, so that the following equation is valid:
For the transposed matrices the Givens transformation decomposition of Section 1V.B can be used analogously:
With these equations the covariance matrix U*T and the corresponding diagonal matrix D* can be obtained by evaluating the orthogonal transformation for the matrices on the left-hand side in Eq. (1 10). These matrices are all well known. The right-hand side matrices are calculated by eliminating the product of the matrices U T .HT by applying the orthogonal transformation. The interesting parts of the covariance matrix development are the decomposed matrices D* and U**. With this equation the covariance matrix P' is available in triangular form. To get the full Kalman filter in triangular form, the state vector after measurement has to be considered:
+ Kk* [Z - H .XI = x + { P HT (H - P
X =x
*
*
-
-
HT + R)-'} [Z- H X]
(111)
42
HANS-JURGEN HOTOP
analogously, rZ = x i k ( + )and x = xk(-), see Eq. (37). This poses a problem, because the Kalman gain matrix, used for calculating the state vector x, is not explicitly present in the preceding decomposition. A new algorithm has to be developed, which needs only matrices of the previous covariance matrix decomposition. Especially the upper part of the matrix M in form of the right-hand side in Eq. (1 10) can be utilized. Placing the definition of the matrix V into this upper part of the matrix, it follows:
[VTIV-'. H .PI = [(H. P . HT. R)T/2I (H. P . H T + R)-'/2. H -PI
(112) Remembering again the Schur complement and the equation for the state vector x one can construct an analogous matrix C to evaluate the state vector or, rather, the transposed vector xT:
( H aP.H'+ R)T/2 (H . P . H T + R)-'/2. H . P
c=[
...
...
T
( H - x- 2 )
XT
]
(113)
Applying the idea of the Schur complement to the state vector evaluation, the solution of the preceding matrix C can be calculated analogously to the Eq. (96): i'
= x'
-
.
(H . x - z ) ' ( ~ .p . H T + R ) - T / ~ (H . p . HT
+ ~ ) - 1 / 2 . H. p
= x T - (H . X - z ) ~ ( H . P . H ~ +R)-[ . H . P = xT
+ [PT-H T . (H - P
*
HT + R)-T * (Z
-
-
(1 14)
H * x)]'
.
So the matrix H P H' + R is triangularized by V V T .This matrix is symmetric and regular, and therefore the transposed matrix is identical to the origin matrix, which holds true for the inverse matrix as well. This explains the conversion in the previous evaluation. The covariance matrix itself is symmetric, so that the last equation is identical to the state vector equation of the Kalman filter. For evaluating the state vector X* of the update formulation the following recursion can be used.
Theorem 9. Let K be an ( m , m + n ) matrix of the following form:
- -
-
K = [(H P HT + R)T'2I (H P H'
+ R)-'12 - H - PI
(115)
with these components:
kj,j
i = 1 ,... , m ; j = 1 , . . . ,n + m
K is the upper part of the matrix shown in Eq. (113).That means this matrix is part of the result of the decomposition of the covariance matrix. In this case the two part-matrices are lower triangular matrices. Dejine a vector y as
y T = [ ( H - x- z ) I x'] ~
(116)
43
RECENT DEVELOPMENTS IN KALMAN FILTERING
The state vector f consists of the last n components of the vector y*, whose components yT are evaluated by using the following recursion (kji E K): f o r k = 1, . . . ,n + m
y;=yk
for j = m, . . . , 1
Ym = Y;
(117)
for i = 1,. . . , m + n
y: = y; - kji.ym
To prove this theorem an additional lemma must be utilized. Lemma 2. Let X be an upper triangular (m,m) matrix with 1 in the main diagonal and y an m-dimensional vector. The jollowing equation
.
y* = x-I y
(1 18)
can be evaluated by calculating the elements of the result vector with the recursion Y ; = Ym
The proof of this lemma is easily seen, because it is the so-called backward recursion of the inversion, using the Gauss algorithm for linear equation systems (Schwarz, 1986).
-
Proofofthe Theorem 9. Define an m-dimensional vector a as a = H x - z. For an arbitrary (m, m + n) matrix K the following multiplication holds true: m
m
c k j , 2 - a j. . . ,
ckj,n+m.a,
j=l
j=l
and, if the result is subtracted from an m-dimensional vector y T , it follows
If the output vector is named y*, each element is calculated by the following equation: m
y; = y / - ~ k j ~ ~ . a lI = 1, . . . ,m + n j=1
In this equation the elements kj,/ are set to the elements of the upper part of matrix C (see Eq. (113)). For the last n components of Eq. (122), the
44
HANS-JURGEN HOTOP
variable 1 starts at m and ends at m + n, the elements yl are those of the vector x (see definition of y Eq. (1 16)) and the elements y* are those of the vector x*. Using the definition of the elements kj,l (right part of the matrix K, Eq. (115)), it follows X*T=XT-ilT.(~.p.~T+~)-1/2.~.p (123) If the first m components of y (Eq. (116)) are defined as the vector a = H x - z, the recursion of Eq. (122) produces
.
m
y ; = a , - ~ k j 3 1 . a l I = l,...,m
(1 24)
j=l
This equation is equal to the recursion in Lemma 2, so it can be reformulated by using the transposed vectors and the definition of the left part of K (see Eq. (1 15)) and vector y (see Eq. (1 16)):
a*T = a . X - T = (H . x - z l T . (H . P . H ~ +~
) - ~ / 2
The equivalent evaluation of this equation occurs as in Lemma 2: a; = a,
fori=m-1,m-2
,..., 1
af=ai-
2
kl,i.a;
l=i+l
and outlined step by step, it follows
A new arrangement for these equations can be found, if the calculation is done column by column and not row by row. An additional vector b saves the column results, which means, for the first column, to set for all i, b y ) = a, and then to store the result a; = b:). For the last column in each row the following equation has to be evaluated: bi(m-l) = ai - k m.1. - a,* = b,( m ) - k m,r b p ) (i = 1 , . . . ,m - 1) (128) and the column before the last one is given by ,
-
45
RECENT DEVELOPMENTS IN KALMAN FILTERING
Summarizing these results, a recursion represents the evaluation of the elements a: forj= l , . . , , m : for I = m,.. . , 1 :
hJ = aJ a; = bl
f o r i = I , ...,1 - 1 :
b,=hj-kl,,.a;
So the matrix C is of the upper triangular form, which means the elements
k,, of the upper right part are 0 for i > I . This implies, for the recursion, that the index of the variable i can stop at m. Comparing this result with the assertion of the theorem, one concludes that the preceding equation is identical to the recursion for the first m components. For proving the whole recursion (122) and (124) are combined. Because the equations are identical, the same recursion, as outlined in Eq. (130), can be obtained for the last n components. Together this implies the recursion for the index i from 1 to m + n. It remains to be shown, that the recursion evaluates the state vector estimation x of the Kalman filter. The first m components produce the Eq. (125). The result is then the input for the second part of the recursion from m up to m + n. So the vector a* is identical to the vector a in Eq. (123). Combining these equations, it follows
ir= x T - (H . x
-
z)T. (H. p . H
~ R+) - T I Z . (H. p . H T +
R)-I/~.
H. p
(131) and comparing this result with Eq. ( 1 14) shows the evaluation of the state variable estimation by this recursion.
This theorem completes the formulation of the new Kalman filter algorithm. The three parts - the prediction described in Section IV.C, the update of Section IV.D.2 and the state vector evaluation - are specially arranged for the Givens transformation of Section IV.B.2. Likewise other orthogonal transformations can be used, which deliver the same form of decomposed matrices. This shows the multiple usage of these algorithms to solve numerical stability problems in the Kalman filter design. Because this algorithm is outlined here in an analytical form, in Section 1V.E a conclusion with aspects for the implementation will follow. 3. Update Algorithms f o r Array Processors
Using algorithms for an array processor, all operations should be matrixvector or matrix-matrix operations, to save time. A special problem depends on the architecture of each computer, which means how many
46
HANS-JURGEN HOTOP
processors are available and in which configuration they work. The adaptation to these specialised hardware can be made only by special software for the computer itself, but the main adaptation for array computers is the design of algorithms in matrix-matrix architecture. The conventional Kalman filter algorithm equations are suitable for array computers, because they are in a matrix-vector form. In the literature (Kung, 1984; 1985; Travassos, 1983; 1985; Fisher et al., 1986; etc.) the authors look for equations without vector addition to make such operations parallel. This can be done with the same formulation used here for the prediction evaluation in Eq. (84) of Section 1V.C. For example, the conventional update state vector equation can be written in a similar form:
= (I - K . H ) . x + K . z
=X +
K*[z- H-X]
These conversions are helpful for the adaptation of the conventional Kalman filter for array computers, but no stability or matrix-inversion requirements are improved. The new formulation outlined here can save CPU time on these special computers, for the algorithm is described in matrix-vector notation exclusively and high numerical stability requirements are obtained. The whole prediction algorithm as well as the covariance matrix calculation in the update formulation comply with the requirements, but for the update state variable evaluation this is not clearly seen. Remembering Eq. ( I 17) in Theorem 9, the last part of the recursion can be written in a vector multiplication form as follows n+m times
-
y* = y* - kj y; =
-[
m,
n f m times
yj*, . . . ,yj*]
(133)
where the vector kj represents thejth column of the matrix K, which means the upper part of C. Analogous reformulations for the Givens transformation algorithm of Section 1V.B in vector form produces a complete Kalman filter algorithm adapted to array computers. To sum up, the advantage of the new Kalman filter algorithm can be seen in the numerical stability and especially the suitable formulation in matrixvector form. Nevertheless, for special hardware components or computers architecture, specifically adapted Kalman filter algorithms can be evaluated that are numerically stable and save CPU time.
47
RECENT DEVELOPMENTS IN KALMAN FILTERING
E. Review of a New Kalman Filter Algorithm The described Kalman filter algorithm including the Givens transformation has been developed, but for implementation purposes a complete presentation is still missing. Therefore, in this section, a conclusion of the algorithm is outlined. Because one can use a number of different orthogonal transformations, this part of the algorithm is not included in the listing. As described, the Givens transformation shown earlier can be utilized effectively. As an abbreviation set D' Y T = ortho(D X'), if a triangularization should be done. That means, the product of the diagonal matrix D and the matrix X T has to be transformed into a diagonal matrix D* and an upper triangular matrix Y T using the orthogonal transformation ortho(D. X'). The input matrices are arbitrarily given ( m , n ) matrices ( m 2 n ) and the output at any time (n,n)matrices.
-
-
Initialisation. At the beginning we generally set the matrix Uo as a diagonal matrix. This means the square roots of the starting covariance matrix Po are placed into the main diagonal of this matrix Uo or the decomposed upper triangular covariance matrix is used. Normally, the diagonal matrix Do is the identity matrix. The other matrices and vectors are initialised as usual for the conventional Kalman filter formulations. Let
n
=
dimension of the state vector xk
m = dimension of the measurement vector zk
Uk
=
Dk xk, zk
=
Hk
=
1,0
=
@(k + I ;
=
k) = Q, R = M[2n, m + n] = D*[2n] = U', x* =
- -
upper triangular covariance matrix of Pk = u k Dk U,' diagonal covariance matrix of Pk = U k Dk Up state and measurement vectors measurement matrix identity matrix and zero matrix transition matrix inside the interval [ ( k 1 ) . At, k At] system noise matrix and measurement noise matrix matrix of dimension (2n, m + n) diagonal matrix of dimension (2n, 2n) matrix and vector for the evaluation.
- +
-
Prediction. This part of the algorithm has to be calculated during each cycle k (at each time)
u;7=U;'@T(k+1;
k)
M[1 : n, 1 : n] = UiT
+
~ [ ( nI ) : 2n, I : 4 = Generally Q is a diagonal matrix,
~ =iJG' ~
flis an abbreviation for the square roots
48
HANS-JURGEN HOTOP
of the main diagonals. In the case, that Q is not a diagonal matrix, Q has to be decomposed into a triangular matrix. This operation can be performed analytically at the initialization of the algorithm, because Q is constant in general. Otherwise an adaptive formulation of the Kalman filter must be utilized. D*[l : n] = Dk
-
Dk+l UEl
+
D*[(n 1) : 2n] = I
and
.
M)
= ortho(D*
-
and Pk+l = Ukfl Dk+l U,TI
The state variable is calculated with the following equation: X&+l = @ ( k + l ; k)'Xk Update. This part of the algorithm is evaluated after each measurement. At this instant, the measurement noise covariance matrix Rk has to be decomposed into a triangular matrix. Often an analytical evaluation for the transformation of this matrix into a triangular form is possible and can save CPU time. M[l : rn, 1 : rn] = R;'*
and M[1 : m,(m
+ 1) : (rn + n)]= 0
U i T = U,T1* H l M[(m+ 1 ) : ( m + n ) , 1 : m ]= U i T M[(m
+ 1) : (rn + n), (m+ 1) : (m+ n)] = U;+I
D*[l : m]= I
and D*[(rn + 1) : (rn + n ) ] = Dk+l D' M = ortho(D* M)
Uk+l = M[(m
-
+ 1) : ( m + n), ( m + 1) : ( m + n)] +
+
D*[(rn 1) : (m n)] These equations evaluate the covariance matrix Pk+lafter a measurement update. Now the calculation of the state vector xk+l for the update follows: Dk+1 =
+
x*[l : m]= zk - Hk - x ~ + ~and x*[(rn 1) : (rn
+ n)] = xk+l
for j
xrn = x*(j ) fori=l tomfndo x * ( j ) = x * ( j ) - M ( j ,i) Xk+l
+
= x*[(m 1) : (rn
* xrn
+ n)]
With this presentation and the orthogonal transformation best suited for the user's application this Kalman filter algorithm can easily be implemented.
RECENT DEVELOPMENTS IN KALMAN FILTERING
49
V. REVIEW OF THE BACKWARD KALMAN FILTER THEORY
For navigation problems measurement is often not available at any time. In this case the prediction extrapolates the variables of the state vector depending on the transition matrix and the system noise matrix. After a long time interval without measurement, the estimate and the covariance is unstable because the measurement noise is then normally smaller than the covariance of the state variable. Such a configuration is shown in Fig. 10 for the position estimate by the so-called forward Kalman filter of an 80-minute flight and measurement update for the first and last 5 minutes of the flight. This is a realistic demonstration, because at the start and after the landing the position is highly accurate and well known, while during a flight other navigational aids are probably not available. If the information of the measurement is available only at the end of the flight, the estimation error of the position is not continuous, when the airfield measurements are used. For on-line evaluation this situation cannot be changed but for off-line (postflight) computation all information is present at any time. For off-line calculation the Kalman filter equations can be evaluated backward in time, which means beginning at the end of the flight back to the start. This situation is also plotted in Fig. 10 as the backward Kalman filter curve. Symmetric patterns of the backward and forward Kalman filter estimations are observed. To reduce the error of backward filter with siiiootlihg Iwtckward Kaliiiaii filter forward Kaliiiaii filter 0
I
-t --(t
-t-
-
I
( 0 -
a
FIGURE10. Principle of the forward, backward and smoothing algorithms of the Kalman filter.
50
HANS-JURGEN HOTOP
the estimation and cut off the instabilities both sorts of information can be combined by smoothing. The result of such an algorithm is also demonstrated in Fig. 10. In most cases the backward smoothing Rauch-Tung-Striebel (1965) algorithm is used, which is represented here.
+
Theorem 10 (Backward Kalman Filter Algorithm). Let @ ( k 1; k ) be the system matrix, P k + l (-), P k ( + ) be the covariance matrices and Xkfl(-), Xk(+) be the state variables, evaluated by means of the forward Kalmanfilter (Theorem 1 ) . The smoothed backward Kalman filter estimation and covariance can then be calculated by the equations ffk = Xk(+)
+ Ak ‘ [ x k + l
- xk+l(-)1
Pk =Pk(f)
+ Ak ’ [ p k + l
- Pk+l
(-11
(1 34) ’
(135)
with Ak = P k ( + ) * a T ( k +1;
k).Piil(-)
Regarding the index of the equations, one can see that the higher index is on the right-hand side and therefore the algorithm has to be evaluated backward in time. This algorithm is based on the well-known “maximum likelihood estimator” (MLE) and can be proved with this theory (Rauch et al., 1965; Hotop, 1989a; etc.). A . New Formulation of the Backward Kalman Filter
The conventional forward Kalman filter can produce numerical instabilities, which implies a singular covariance matrix. In this case, the backward Kalman filter is stopped by a fatal error, due to the inversion error of the covariance matrix. T o avoid this effect and to design a description analogous to the new forward Kalman filter formulation, the backward algorithm is reformulated using orthogonal transformations. For the covariance matrix together with the definitions, P = Pk, P = P k ( + ) , P = Pk+l(-), @ = @ ( k + 1; k ) I
-
and the properties of the covariance matrix (symmetric, positive definite), the following holds true: p=p =p
+ p .@ T .p-1 . [pk+l- p] .p-1 .@ * p + p . @ T . p-1. p,,,
.p-’ .@ . p
-
p . @ T . p-1 * @ . p
(136)
As with the forward Kalman filter a matrix is established to apply the Schur complement for a part of the above equation.
RECENT DEVELOPMENTS IN KALMAN FILTERING
51
Theorem 11. Let P be the covariance matrix of the update algorithm and p be that of the prediction part - lf no measurement occurs it follows P = P k ( + ) = P k ( - ) - and @ as the transition matrix. The connection between these matrices is esrablished by
[email protected]+Q
A decomposition into the upper triangular matrix U and the diagonal matrix D with the help of an orthogonul transformation G of the matrix p, which consists of t h e j r s t and third parts of Eq. (136) p = Ij.D,(iT= p -
[email protected]'
[email protected]
can be evaluated with the following matrix equation:
using the definition X = P
- aT- P-l .u.
Proof. The proof of this theorem is analogous to that of Theorem 3 in Section IV.D, which means these matrices have to be calculated with their transposed matrices. The result of the lowest right matrix is X . D . X T + I j . D . I j T = p . @ T . p - I. u . d . U T . p - T . @ . p . p (138) = p.@T. p-T.@. p + p = p These whole evaluations can be found in Hotop (1989a). Now the remaining part of the backward Kalman filter equation has to be decomposed together with the state vector evaluation.
Theorem 12. The same assumption as that for Theorem 5 is asserted and the state variable fi and the matrix U#, defined as U#T = iJk+l .p-1.
a .p
( 139)
can be evaluated out of the following matrix Y: y=
[
UT
UT.
p-1. @ . p 0
-uk+I
(xl+I - %k+l)
%k
using the recursion
1
for i = n + l , n + 2 , . . . , 2 n + 1 : f o r j = n , n - l , ...,I : for k = 1 , 2 , . . . ,2n :
ym=yii yjk = yik -yjk * y m
(140)
52
HANS-JURGEN HOTOP
The elements of the matrix U#T are those of the submatrix of Y starting at the ( n + 1)th row and column (yik E U#T for i = n + 1 , . . . ,2n and k = n + 1,. . . ,2n) after evaluating this matrix with the recursion. In addition, the state vector is the (2n+ I)th row of the evaluated matrix Y ( ~ ~ ~for +1~ = n, + 1,, = . . . ,i2n). , The proof is identical to that for Theorem 4 in Section IV.C.2 (Hotop, 1989a). This theorem supplies the elements of the state vector and one part of the covariance matrix for the backward Kalman filter. The complete covariance matrix must be calculated as follows: P = P$U#.D,+l .U#T
.p-1. 9 . p = p - p . fjT. p-1.9. p + p . t p T . p-7.. p,,, .p - 1 . 9 . p = p + p . @ T , p - T . lj,T+I, Dk+l* irk+,
(142)
Regarding this solution, the addition of the two upper triangular matrices in Eq. (142) still remains to be decomposed. This can be easily realized with the known presentation of the matrices, if one seeks the decomposition of parts that contain addition. Using also orthogonal transformation G it follows
Multiplying this presentation with its transposed matrix, the desired decomposition of the covariance matrix for the backward Kalman filter is evaluated. (The proofs can be found in Hotop (1989a).) B. Review of the Backward Kalman Filter Since the development of the new backward Kalman filter algorithm is rather complicated, a rough summary here describes the evaluation:
Qk/2
1. Generate the matrix 2. Calculate the product Ul+l(-) aT(k+ 1; k ) 3. Construct the matrices for the Schur complement
53
RECENT DEVELOPMENTS IN KALMAN FILTERING
4. Apply the Givens transformation to the preceding matrices and calculate the following results:
with p=u.fj.u7 = P/((+) - Pk(+).
+
aT(k 1;
k ) . P&(-) .@(k
+ 1;
k ) * Pk(+)
This is one part of the backward Kalman filter covariance matrix (see Theorem 6). 5 . Calculate the state variable xk and the matrix U r T = u;+1
.Pill(-). 9 ( k
+ I;
-
k ) Pk(+) by applying the recursion lined out in Theorem 7 to the matrix Y: Y=
[
U[+l(-)
u;+I %;+I(-)
Ur+l(-)*PFil - Q T ( k +1: 0
-C+l
k).Pk(+)
Xkr(+)
6 . Use again the Givens transformation for the matrices
1
The result provides the lower triangular matrix U l and the diagonal matrix 6, and therefore the backward Kalman filter covariance matrix: Pk
--
" T
= U k Dk * Uk
VI. APPLICATION OF THE KALMAN FILTER
IN
NAVIGATION
There are a lot of algorithms to calculate the errors of an inertial navigation system with support information. The additional measurements could be done by one of the radio navigation systems, described in Section 1I.B. The errors and the error models of such systems are very different. For example a VOR/DME receiver produces probabilistic errors and of course
54
HANS-JURGEN HOTOP
additional errors due to the transmission geometry. Nevertheless, all these errors can be described in the form of covariance information dependent on the distance between the transmitter and receiver, which has to be evaluated during the calculation of the support process. Furthermore, these measurements are not continuously available, because of switching to other stations, overhead a station (cone of silence) or disconnecting during roll manoeuvers of the aircraft, etc. Different from these errors are those of the GPS, which provides highly accurate position information (less than 10 m), but the signal disappears if the antenna is not directed to the satellites. Besides, not all the GPS satellites are already launched, so the GPS navigation is not available at all times. Therefore other highly accurate support information will be needed. To demonstrate the work of the Kalman filter one can use simulated flight data and support data, which is outlined in the next section. In general, real flight data are different from simulations, so here too results of a real flight together with radar information are used. This radar information simulates the support data of a highly accurate radio navigation system. In addition to the radio navigation information, ground-ground information is used. These support data are produced while the aircraft is at the airport. Each airport has an aerodrome reference point, which provides highly accurate position measurement. So, if the aircraft stays near this position for about 5 minutes, a highly accurate position update can be made. All flight tests at the DLR in Braunschweig include such a ground-ground testing in order to evaluate off-line, after the flight, a highly accurate reference path (Lechner, Hotop, and Zenz, 1981; Hotop, 1983; 1984). Of course, for real-time filtering a backward Kalman filter is not suitable, but to demonstrate the advantages of the application as well as the accuracy of a high precision flight path calculation, the results of the backward Kalman filter are shown in the figures. As well as the application of this backward filter to the whole flight, the evaluation between parts of the flight can make the navigation information precise. The intention of this chapter is to present the differences between the various Kalman filter formulations. In this case a technique for evaluating highly precise reference navigation data is used. These values are needed to investigate other navigation systems. For example to test a radio landing system or other inertial navigation systems, the reference data might be ten times more accurate than the test object. Therefore, this procedure using ground-ground and radar information together with an off-line forwardbackward Kalman filter calculation is developed. The advantages of this method can be seen by the results of a flight analysis. The flight path is shown in Fig. I I , where the aircraft (a DO-28
RECENT DEVELOPMENTS IN KALMAN FILTERING
55
I
52 .So
Celle
Helm: Braunschweig
P2.00
-
k ildesheim
-
F I G U R11. E Flight path of a
test
flight
turboprop aircraft) is flying in the northern area of Germany for test purposes. The start and landing aerodrome was Braunschweig, and the aircraft was equipped with a platform navigation system Delco Carousel IVa and an onboard data recorder. In addition, the data of a radar situated at Braunschweig Airport could be used to calculate a reference path. These data are available only after the flight. The evaluation was realized in the computer center of DLR in Braunschweig. To show the accuracy of the evaluation, in Fig. 12 only the ground-ground support data are used for the calculation of the reference path. In the five minutes before takeoff and after landing, the position and the velocities are known. The position can be calculated from the airport map and the velocity is 0 in all axes, because the aircraft did not move. This implies a four-dimensional measurement vector and a (4,lO) measurement matrix H with 1 for four elements and 0 for the rest of the elements. In Fig. 12 the accuracy of the reference path, which is the corresponding diagonal element of the covariance matrix, can be seen in contrast to the difference between the reference and the radar path. The theory of the sigma band, normal distribution, is confirmed, because only a few differences are outside the covariance graph. The estimate has an accuracy of about
I
8
-
time
FIGURE12. Difference between reference path and radar data and the covariance of the Kalman filter if only ground-ground support data are used.
l00m for a 75-minute flight. Let us keep in mind the accuracy of the pure navigation system 1 nm/h (M 2.2 km for this flight), the improvement of the reference path calculation has a factor of 22! Next Fig. 13 shows the results using the radar data as support. The covariance has a maximum of 10 m and therefore the accuracy is improved. (-'stiiiiatioii covariance ( i - u ) iiortli positioii (radar rstiiiiatioii)
+
~
-ia I
I
I
II
44 t h e
I
I
-
61
mln
08
FIGURE 13. Difference between reference path and radar data and the covariance of the Kalman filter if radar support data are used.
57
RECENT DEVELOPMENTS IN KALMAN FILTERING
Naturally, this accuracy is influenced by that of the radar system, which of course is affected by the range. I n addition, the noise characteristic of the radar data are demonstrated again. The previous two Figs. 12 and 13 show smoothed curves, a result of the backward Kalman filter. If one uses only the forward Kalman filter the covariance of the position error is calculated with respect to the corresponding value in the system noise matrix Q. The effect of this error modeling is presented in Fig. 14, where estimates are plotted of the position error of the inertial navigation system for the forward as well as for the backward Kalman filter with only ground-ground support data. The effect of a noncontinuous path is seen at the end of the flight, when the ground support information is used in the forward Kalman filter. This effect is normally regarded at each point of support information so the position reference path becomes a noncontinuous function. To prevent this fault the backward Kalman filter can use all support information of the whole flight and smooth the graph as seen in Fig. 14. This demonstrates again the advantage of the backward Kalman filter for high-precision reference path evaluation. Naturally, these effects are observed for all state variables evaluated by the Kalman filter technique. A . Estublishing a Simulation
To develop building a simulation two problems have to be solved: constructing iiortli position estiiiiatioii (backward) * iiortli positioii estiiiintioii (forward) +.
8
-2.rt3 0
2m
4a
time
-
1
.In
a
FIGURE14. North position estimation of forward and backward Kalman filter with ground-ground support data for the test flight.
58
59
RECENT DEVELOPMENTS IN KALMAN FILTERING
a nearly realistic flight path and calculating the errors of an inertial navigation system. For the flight path computation one can ignore the roll and pitch angle because they are not part of the system model matrix (Eq. (18) in Section 1I.C). Here simulation flight data are constructed with three programs: one for a constant acceleration, one for a linear acceleration and one for a horizontal circular flight path. By connecting these programs and applying them to the three axes, the flight path shown in Fig. 15 is created. The imaginary aircraft starts after five minutes of ground test at the null position to the northeast, performs four standard turns at about 90 km north and 70 km east from the start point, leaves the circle in a southwest direction and makes another two turns in about llOkm west and 20km north from the start. The flight ends 70 km west and 60 km north from the starting point. The altitude is 1600m and the time of the flight 134minutes. As an example Fig. 16 shows the north velocity during this simulated flight. Second the errors of the inertial navigation system must be evaluated, because the simulation above calculates only the flight path, velocities and accelerations. Equation (18) presents the error differential equation of an inertial navigation system. To evaluate the differential equation i t is assumed that inasmall timeinterval AT, themeasurement valuesoftheacceleration, the velocity and the position could be regarded as a constant. With these assumptions, the differential equation is given in the form t+At).x(t) =ao.x(t)
x(t)=@(t;
(144)
Using the Laplace transformation it follows s * x(s)
-
iiortli velocity
xo = 9
0
*
x(s) I (I * s - @o) x(s) = xo
(145)
tj,y
I)
W
time FIGURE
-
11)
16. North velocity of the simulated flight.
@In
18)
60
HANS-JURGEN HOTOP
with xo as the state vector at the starting time to = 0. This linear equation can be evaluated by a numerical method, for example the Runge-KuttaVerner method as part of the IMSL (1982) library (DVERK). It is well known that many numerical methods produce errors for special applications. Therefore this method is verified by a special evaluation for the differential equation. An analytical evaluation of the differential equation is not realistic, because the linear equation leads to a partial fraction, as for example for the north position error in the form 7
i =O
The coefficients a; and b, should be variable, which implies, that the 0 positions of the denominator polynomial cannot be evaluated analytically. Therefore all cases of polynomial 0 positions have to be regarded and then for all the different equations a transformation from the Laplace space into the time space must be performed. Summarizing the number of equations and back transformations, 8 66 = 528 cases have to be examined. In addition, a numerical method must be used for evaluating the 0 positions of the denominator. This is the reason to use a numerical evaluation for the whole differential equation. For checking the numerical method, the number of cases for an analytical solution is reduced and only the following special conditions take place: U N = a E= 0, aD = -9.81 m/s 2 , W N = V E = vD = cp = 0. The analytical evaluation (Hotop, 1989a) is compared with the numerical method and a relative maximum difference of 5 lo-' is reached using the computer CRAY-1S with a relative accuracy of For the simulation of an inertial navigation system these errors are negligible, and the numerical method of the IMSL library can be used to evaluate the differential equation without any restrictions. With this solution and the simulation of a flight path the errors of the inertial navigation system are calculated. At the beginning the following values are set for the errors
-
-
angle error: eNo
= em =
=0
velocity error: 6wNo = 6wm = 0 position error: 6SNo = 6Sm = 0 drift error: DNo = DEo = DDo = 0.0l0/h As an example Fig. 17 shows the evaluated north velocity error with these start values. These errors are then added to the pure velocities, positions and
RECENT DEVELOPMENTS IN KALMAN FILTERING
61
FIGURE 17. North velocity error of the simulated flight by using the Runge -Kutta -Verner procedure for the evaluation of the ditrerential equation.
angles. In addition, a random number generator produces the noise of the signals. All this together is the test input signal for the Kalman filter algorithm. The advantages of such a simulation are the variable construction of a flight path and the possibilities, to investigate different error behaviour for correlated or uncorrelated noise. Simulation of the support data is easier, because one needs only the position and these signals without any additioial error calculation. Also a modification of the signals can be performed to demonstrate the statistical behaviour of the support. Analogous to the inertial navigation errors, this is done by adding the values of a noise generator. All other errors of any radio navigation signal is dependent on the functioning and physical behaviour, which differ among the systems. Naturally, different error models can be implemented and used for the support data, but the main concern of this simulation is the calculation of the inertial navigation errors. B. Simulation Datu Results
The simulation is used as input data for the different Kalman filter formulations. The advantages of the new formulations are regarded especially with respect to the computation time as well as to the accuracy of the algorithms. Different simulations are used, while the amount of support data changes. A total of 80 500 pieces of data on position, velocity, acceleration and angle vectors are generated as input for the error model of the inertial navigation system. As support data three increments of updated values are used: 100,
62
HANS-JURGEN HOTOP
450 and 1750. Each number of data includes the ground-ground measurement, which means that highly accurate support data are available for the Kalman filter. For the simulation 50 values are taken before the start and 50 values after landing, with a time interval of 6 s (summarizing 5 minutes) as ground-ground measurements. The other values are equally distributed between the 30th minute and the 100th minute of the flight. This is realistic, because when the aircraft is climbing as well as in the landing configuration, radio navigation information is often not present. Therefore, during the approach air traffic control operators or special radio landing systems guide the aircraft. The comparison of CPU times is always critical, because these values depend on the computers themselves as well as of the implementation. Most of the computers have clocks with time intervals that are too small, so the calculation of the CPU time for a program part is either impossible or inaccurate. Adding such time values leads to errors, that cannot be observed. The other problem corresponds to the implementation, which is very important on a vector computer. A11 the algorithms described here are implemented in FORTRAN IV. For the vector computer the algorithms are implemented by using special subprograms available from the distributor. The CPU times of the vector processor can give an estimate for the application of the algorithms in array processors. But, of course, the construction of array computers is different from vector processors. Tables 11 and 111 show the CPU time of the different computers for all formulations of the Kalman filter described here. In the tables the new algorithm means the formulations in Section IV.C, which are differentiated no further because their CPU time variation is very small. The CPU times for the singular-value decomposition as an orthogonal transformation can be presented only for the CRAY-IS, one gets 574.95s for 100 inputs of support data and 587.178s for 1750 inputs of data. The high CPU time results from the number of Givens transformations that have to be evaluTABLE 11. CPU T I M EIN SECONDS FOR THE SIMULATED FLIGHT DATAWITH 100 [PUTS OF SUPPORT DATAFOR SEVERAL KALMAN FILTER ALGORITHMS ON THE IBM 4381, IBM 3090, CRAY-1S ~
~~~~
Kalman filter
IBM4381
IBM3090
CRAY-1S
Conventional Conventional with 1 measurement Joseph algorithm Carlson algorithm Bierman algorithm New algorithm
2290.566 2141.808 2299.383 2306.1 17 1729.598 1565.542
432.716 396.756 434.989 438.846 333.471 303.81 I
1 1.954 1 1.004 12.129 55.271 122.476 61.824
RECENT DEVELOPMENTS IN KALMAN FILTERING
63
TABLE 111. CPU TIMEIN SECONDS FOR THE SIMULATED FLIGHT DATAWITH 1725 IlNPUTS OF SUPPORT DATAFOR SFVERAL KALMAN FILTER ALGORITHMS ON THE IBM 4381, IBM 3090, CRAY-IS Kalman filtei
IBM438 I
IBM3090
CRAY- I S
Conventional Conventional with I measurement Joseph algorithm Carlson algorithm Bierman algorithm New algorithm
2327.55 1 2187.493 2385.250 2402.644 1738.802 I65 1.920
437.187 400.627 448.343 462.627 333.439 3 12.780
13.149 12.519 13.489 58.5609 124.21 1 63.0425
ated and afterward multiplied by the covariance matrix. These operations are calculated for the columns as well as for the rows. To reduce the CPU time for the singular-value decomposition a new algorithm can probably merge the orthogonal transformation and the QR algorithm (which might be the subject of a future project). For the two U D U T formulations, uncorrelated measurement noise is used, which means the additional transformation to process such probabilistic measurements was not implemented. The CPU time is normally longer for uncorrelated measurements, while in the new algorithm this is automatically included. Comparing the CPU times, it is easy to see that the conventional algorithm is the fastest formulation on the vector computer. This can be interpreted by the consistent matrix-matrix operations in the algorithm. The increased CPU time of the UDUT formulation is based on the factorisation. It is generally understood that improvement in numerical stability costs time. The new algorithm requires less CPU time than the other factorised Kalman filter formulations. The additional advantage of the new formulation becomes evident when one compares the CPU times between the conventional Kalman filter and the new algorithm on the conventional computer IBM4381 and the CRAY-1s. These four values demonstrate that the new formulation is at least 30% faster on the conventional computer, while on the vector computer this algorithm needs six times the CPU time of the conventional algorithm. While the prediction part of the algorithm is executed 80450 times, the update algorithm is evaluated only 100 and 1725 times. The main advantage of the new formulations as well as those of Carlson and Bierman is numerical stability. Therefore, the factorisation of the covariance matrix is introduced, which of course needs a lot of additional CPU time for the prediction. Therefore, it might be of interest to examine the CPU time differences among the various update formulations, because this is the key task in the development of a new algorithm. In Table IV the CPU times are
64
HANS-JURGEN HOTOP
TABLE IV. CPU T I M EIN SECONDS FOR THE SIMULATED FLIGHT DATAWITH loo A N D 1725 INPUTSOF SUPPORT UPDATEFORMULATIONS ON THE CRAY-IS. DATAFOR SEVERAL Number of measurements Kalman filter
I00
1750
Conventional Conventional with 1 measurement Joseph algorithm Carlson algorithm Bierman algorithm New algorithm Singular value Decomposition
0.1032 0.0555 0.1197 0.1111 0.0708 0.0596
1.2597 0.6587 1.5624 1.3099 0.8515 0.8404
0.9940
16.2651
listed for the updated algorithm on the CRAY-IS for 100 and 1725 support measurements. One concludes that the CPU time of the new algorithm is shortest with the exception of the conventional filter with consecutive and constant measurements. But this algorithm, as well as the other factorisation formulations, generally needs additional transformations to process correlated measurement noise. After these results of the CPU time differences, the accuracy of the Kalman filter is analysed. Comparing the different formulations with the conventional Kalman filter algorithm, no significant difference can be observed. For example, the absolute difference of the east position estimation is less than 2 m (or relatively less than 1 m ) and therefore negligible in contrast to the representation of values in a computer. Figure 18 shows the difference between the simulated and the Kalman filter data evaluating the east velocity error. This computer run utilizes 1725 measurements as support information. The figure points out that all differences are within the covariance data (1 - ). calculated by the Kalman filter. The large value of the covariance at the beginning of the flight depends on the system noise value in the matrix Q. Because a velocity error influences the position error and the noise characterisation value must not be too small. Furthermore, the stochastical behaviour must correspond to the resolution of the inertial navigation system output signals. In the digital output bus the velocity has a resolution of about 0.06m/s, and this value corresponds to the diagonal element of the system noise matrix or, to be precise, the square value divided by the update rate. This is the cause of the relatively high value of the covariance, when no measurement update is present, but nevertheless the covariance varies in small increments and
-
-
65
RECENT DEVELOPMENTS IN KALMAN FILTERING
v t h c i t y covariaiice of 6 2 7 ~ & diffc*wiiw (oiiiirilatioii estiiiiatioii)
+
-
.. D
8 tiiiie
4D
-
111
.In
1 8
FIGURE 18. Difference of the east velocity error between Kalman filter estimation and simulation data, together with the 1 (T values. ~
provides an accuracy of about 0.08m/s. For the standard turns the estimated error is higher and depends on the rotation rate. Only during these manoeuvers, the difference between Kalman filter estimation and real data reaches the accuracy specified by the covariance value. For the other flight times, the differences are less than 0.01 mjs. The difference of the position error reaches a maximum of 3 m . Naturally, for all data the Kalman filter algorithm estimates the errors of the inertial navigation system with the specified accuracy. Therefore, for this simulation the stochastical values in the system noise matrix Q and the measurement noise matrix R are identical to those for a real flight. This is evident, because the data for the simulation are calculated with the accuracy of the number representation in the computer which is much higher, than those of the digital data output bus of an inertial navigation system. Moreover, these simulation data should demonstrate the equivalence between the different Kalman filter formulations, which can be seen by the negligible difference values. But in this simulation no correlated measurement noise was utilized.
C . Presenting the Data of'a Flight Test To compare a number ofdifferent Kalman filter formulations for the support of inertial navigation systems, real flight data are needed to demonstrate the algorithm under the real stochastic behaviour. In this case the data are based on a laser-gyro strapdown inertial navigation system. A helicopter
66
HANS-JURGEN HOTOP
flight with a BO-105 was selected, including extremely high manoeuvers. The navigation system requirements were very high, because of the manoeuvers and the strong vibration of the helicopter in comparison with the normal operation environment, where the system is installed in an Airbus A310. Figure 19 lines out the flight path of this demonstration flight. The steps in the graph shows the position increment of the last bit. The computer of the inertial navigation system produces position data, varying in a range of f180" and stores these data in a word with 20 significant bits, which correspond to 0.000172" = 18 m for the last significant bit. This is the accuracy of the position data output. The position error of the pure navigation system can be determined with Fig. 19, because the starting point is set to 0 and the landing point is identical to the start. The difference between these points in the plot marks the navigation error, which is approximately 1 km in the northern and 0.5 km in the eastern direction. The mission of the helicopter was to fly overhead along a road in both directions with different velocities and different drift angles. The drift angle is defined as the angle between the aircraft's longitudinal axis and the direction in which the velocity vector points. The different velocities are seen in Fig. 20, where the north velocity is plotted. The oscillations of up to &50 m/s within less than 30 s at the end of each mission part demonstrate the high manoeuvers during this flight test. As support information the so-called ground-ground information before starting and after landing is used. In addition, a laser radar tracks the helicopter during its straight flights overhead the road. The accuracy of the position measurement of this laser radar is about 1 to 2 m, and according to the measurement method the values are correlated. This is one of the reasons to show these results. D . Flight Test Data Results
The results of this test flight demonstrate the influence of the correlated measurement noise, which can be used with the new Kalman filter formulation without any additional transformation in contrast to the other factorised formulations. Figure 21 shows the difference between the conventional Kalman filter and the new formulation for the north position estimation of the described flight path. All support data, ground-ground data as well as the radar measurements, are used to evaluate the state variables. So both formulations are able to utilize correlated support data, the difference is below 10-'m, which is less than the computer accuracy. Moreover, the estimation of the position error is up to 1000 m, whereas the
I
67
I
Im n m
m
68
HANS-JURGEN HOTOP
FIGURE 20. North velocity for the flight path evaluated by the inertial navigation system on the helicopter with a BO 105.
relative difference must be less than l o T i 2m. The results for the east position are similar. The covariance for the position error of this flight is less than 1 m. The reason for this small value is the highly accurate support information. For this test flight the distance between helicopter and laser radar was between 200m and 2000m, which implies a negligible error of the radar measurements. In this case the accuracy of the support data is between 1 m and 2m, which explains the accurate position and covariance estimation. Figure 22 shows the difference between the north position estimation of the Bierman and the new formulations. In this case the Bierman formulation is used without any additional transformation, so that the effects of correlated measurement noise is not regarded. The difference in Fig. 22 reaches values of about 3 m, which is more than the calculated covariance of about 1 m. But the covariances could be higher in the Bierman algorithm. A comparison of the covariance values between the two algorithms reaches a maximum difference of about 0.2m. To sum up, the Bierman algorithm estimates a low covariance of about 1.2m while the difference between the “real” pasition is greater than 3 m. For high-precision navigation this error cannot be neglected. So, for the Bierman and all other factorisation formulations additional transformations must be evaluated, however, this increases the CPU time. Analogous to the simulation results, a comparison of the CPU times is
69
RECENT DEVELOPMENTS IN KALMAN FILTERING difference positioii rstiiiiatioii 6 S,y I .)la4
FIGURE 21. Difference between the conventional Kalman filter and the new formulation for the north position estimation.
noted. In Table V the CPU times for both the forward and backward Kalman filters are outlined. Effects the same as those for the results of the simulation are seen. The results of the backward formulations for the first three algorithms are identical, while the CPU time differs. Here the errors in calculating these CPU times are demonstrated. For the Bierman algorithm the same backward formulation is used as for the others, but for evaluating these matrices a matrix multiplication of U - D . U T must be executed additionally. This consumes about 13s on the IBM 4381 and 0.7s on the CRAY-IS. On the CRAY-IS the new backward formulation has nearly the tliffiwwcc positioii estiiiiatioii 6S\
I
D
n
I
*
the
-
I
I
6B
.In
I
F I C ~ J R22. E Difference between the Bierman algorithm and the new formulation of the Kalman filter for the north position estimation.
70
HANS-JURGEN HOTOP
same CPU time as the conventional algorithm, although the numerical stability is better for the new formulation. On the IBM 4381 the new algorithm needs about 5 s more CPU time or 7%. In particular this result is very important, because a lot of applications of the Kalman filter for supporting inertial navigation systems produce fatal errors during computer runs. These errors are caused by problems in the inversion of the covariance matrix during backward Kalman filter evaluation. The reason for nearly singular covariance matrices is a highly accurate inertial navigation system supported by a likewise highly accurate radio navigation system.
VII. SUMMARY
The support of inertial navigation systems is an important and expanding part of aeronautics and aviation research. The systems used for air navigation today are laser-gyro strapdown inertial navigation systems and, for support data, radio navigation systems. The accuracy of the inertial navigation, which depends on the accuracy of the gyros (0.0Io/h drift implies 1 nm/h position error) will probably not decrease, because higher accuracy would dramatically increase the costs. But the systemsdeliver a lot ofdata, which are important for flight guidance and control. The present radio navigation systems have errors depending on the principles of measurement and do not calculate any data for flight control. In the future the position accuracy will decrease, if the global positioning system (GPS) is available anywhere, any time. The analysis for the support of inertial navigation is described by the Kalman filter technique, which has been well known since 1960. As accuracy increases, the numerical stability of the software grows more important. As the new generation of computers, such as vector or array computers,
TABLE V CPU TIMEIN SECONDS FOR THE HELICOP~ER FLIGHT WITH ALLSUPPORT DATAFOR SEVERAL KALMAN FILTER ALGORITHMS ON T H E IBM 4381 AND CRAY-IS. Kalman filter
Conventional Joseph algorithm Carlson algorithm Bierman algorithm New algorithm
Forward
Backward
IBM4381
CRAY-IS
IBM438 1
CRAY- 1S
911.026 950.864 925.422 680.610 607.495
5.017 5.125 22.7 17 47.986 24.350
67.7399 67. I563 67.0391 80.8773 72.0721
2.7464 2.7312 2.7261 3.4189 2.5081
RECENT DEVELOPMENTS IN KALMAN FILTERING
71
enters the market and the performance of low-cost computers (PC) keeps growing, new software must be developed to take advantage of this evolution. In this case new Kalman filter algorithms have been developed to improve numerical stability. First, new formulations by Bierman were made before the computers of the new generation were available. Today the main interest is to adapt new formulations for special computer hardware. The Kalman filter formulations outlined here are usable on most computers. The advantages of the new formulation are shorter CPU time, numerical stability, use of correlated measurements without additional computation and adaptation to vector and array computers. The improvement in numerical stability is made possible by the use of orthogonal transformations. A special algorithm for the Givens transformation helps to decrease CPU time for the Kalman filter algorithm. The CPU times of several computers is outlined together with the results of simulated and real test-flight data. The future task for new algorithms is surely the development of new algorithm for various orthogonal transformations. Perhaps new formulations for the singular-value decomposition imply more numerical stability by reducing CPU time.
ACKNOWLEDGMENTS The author wishes to thank Prof. W. Lechner for his critical reading of the manuscript and Lien-sung Chang for his help in improving the English. Moreover, the author thanks the DLR in Braunschweig, especially Dr. B. Stieler and Dr. R. Rodloff, for their assistance during the author’s research work at the DLR. REFERENCES Agee, W. S.. and Turner, R. H . (1972). “Triangular Decomposition of a Positive Definite Matrix Plus a Symmetric Dyad with Application to Kalman Filtering,” White Sands Missile Range Technical Report, No. 38. Andrews, A. (1968). “A Square Root Formulation of the Kalman Covariance Equations,” .4IAA Journal 6, No. 6, 1165-1 166. Aronowitz, F. (1971). “The Laser Gyro,” in “Laser Applications” (M. Ross, ed.). Academic Press, New York, I, 135-200. Asmuth, J. C . , and Gibson, J. D. (1984). “Sequential Noise Spectral Shaping in ADPCM”, IEEE Transactions on Acoustics, Speech and Signul Processing ASSP-32, No. 2, 228-235. Baird, C. A. (1983). “Performance Analysis of Elevation Map Referenced Navigation Systems,” in “Proceedings of the lEEE/AlAA 5th Digital Avionics Systems Conference,” 14.6.1-14.6.7.
12
HANS-JURGEN HOTOP
Bar-Itzhack, I. Y. (1982). “Minimal Order Time Sharing Filters for INS In-Flight Alignment,” Journal Guidance 5, NO. 4, 396-402 (AIAA 82-4196). Bar-ltzhack, I. Y., and Porat, B. (1980). “Azimuth Observability Enhancement During Inertial Navigation System In-Flight Alignment,” Journal Guidance and Conrrol3, No. 4. Bellantoni, J.F., and Dodge, K. W. (1967). “A Square Root Formulation of the KalmanSchmidt Filter,” AIAA Journal 5, No. 7, 1309- 1314. Biemond, J., and Plompen, R. H. J. M. (1983). Comments on “A Recursive Kalman Window Approach to Image Restoration,” IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-31, No. 6, 1573-1576. Bierman, G . J. ( 1977). “Factorisation Methods for Discrete Sequential Estimation,” Academic Press, New York, San Francisco and London. Bodewig, E. (1959). “Matrix Calculus,” Wiley (Interscience), New York. Britting, K . (1971). “Inertial Navigation System Analysis,” John Wiley & Sons, New York. Bronstein, S. (1973). “Taschenbuch der Mathematik,” Verlag Harri Deutsch, Zurich and Frankfurt am Main. Buchberger, B. (1986). “Rechenorientierte Verfahren.” Teubner Verlag, Stuttgart, Germany. Bunse, W., and Bunse-Gerstner, A. ( 1 985). “Numerische lineare Algebra,” Teubner Verlag, Stuttgart, Germany. Caglayan, A. K., and Lancraft, R. E. (1983). “A Bias Identification and State Estimation Methodology for Nonlinear Systems,” Identification and System Parameter Estimarion 1982. Proceedings of the 6th IFAC Symposium 2, 13 13- 1318. Carlson, N. A. (1973). “Fast Triangular Formulation of the Square Root Filter,” AIAA Journal 11, NO. 9, 1259-1265. Coddington, E. A,, and Levinson, N . (1955). “Theory of Ordinary Differential Equations,” McGraw-Hill, New York. CRAY Research. (1984). “Library Reference Manual,” Publication No. SR-0014, Rev. 1, CRAY Research. Cuppen, J. J. M. (1983). “Product Form - SVD Singular Value Decomposition,” SIAM Journal Scientific Statistical Computing 4, No. 2, 216-222. Doetsch, G. (1976). “Einfiihrung in Theorie und Anwendung der Laplace-Transformation,” Birkhauser Verlag, Basel, Switzerland, and Stuttgart, Germany. Dongarra, J. J., Sameh, A. H., and Sorensen, D. C. (1984). “Implementation of Some Concurrent Algorithms for Matrix Factorisation,” Mathematics and Computer Science Division Technical Memorandum, No. 25. Dyer, P., and McReynolds, S. (1969). “Extension of Square-Root Filtering to Include Process Noise,” Journal of Optimization Theory and Applications 3, No. 6, 444-458. Faddejew, D. K., and Faddejewa, W. N. (1973). “Numerische Methoden der linearen Algebra,” R. Oldenbourg Verlag, Munich and Vienna. Feilmeier, M. (1977). “Parallel Computers Parallel Mathematics,” North-Holland Publ. Co., Amsterdam. Feilmeier, M., and Ronsch, W. (1977). “Parallel Nonlinear Algorithms,” Computer Physics Communication, No. 26, 107-1 12. Fisher, J. L., Casasent, D. P. and Neuman, C . P. (1986). “Factorised Extended Kalman Filter for Optical Processing,” Applied Oprics 25, No. 10, 1615-1621. Fisz, M. ( 1 976). “Wahrscheinlichkeitsrechnung und Mathematische Statistik,” Hochschulbucher fur Mathematik, Band 40 VEB Deutscher Verlag der Wissenschaften, Berlin. Forsythe, G . E., and Moler, C. B. (1971). ”Computer-Verfahren fur lineare algebraische Systeme,” R. Oldenbourg Verlag, Munich and Vienna. Francis, J. G . F. (1961/62). “The QR Transformation Parts I and 11,” Computation Journal 4, 265-271 and 332-345. -
RECENT DEVELOPMENTS IN KALMAN FILTERING
73
Fraser, D. C. (1965). “A New Technique for the Optimal Smoothing of Data,” MIT doctoral thesis T474, 1445-1450. Gelb. A. (1974). “Applied Optimal Estimation,” MIT Press, Cambridge, MA. Gentleman, W. (1973). ”Least Squares Computations by Givens Transformations Without Square Roots,” Journal Inst. Maths. Applics. 12, 329-336. Golub. G. H., and Kahan, W. (1965). “Calculating the Singular Values and Pseudo-Inverse of a Matrix.” Journal SIAM Numer. Anal. Ser. B 2, No. 2. Groutage, F. D., Jacquot, R. G., and Smith, D . E. (1983). “State Variable Estimation Using Adaptive Kalman Filter with Robust Smoothing,” Proceedings o / the 22nd IEEE Conference on Decision and Control, 3, I3 16- 1318. Groutage, F. D., Jacquot, R. G . and Smith, D. E. (1984). “Adaptive State Variable Estimation Using Robust Smoothing,” Proceedings ofthe I984 American Conrrol Conference, 3, 148 1 1483. Heath, M. T., and Sorensen, D. C. (1985). “A Pipelined Givens Method for Computing the Q R Factorization of a Sparse Matrix,” Mathematics and Computer Science Division Technical Memorandum, No. 47. Ho, Y . C . (1963). “On the Stochastic Approximation Method and Optimal Filtering Theory,” Math. Anal-ysis and Applications 6, 152- 155. Hockney, R. W., and Jesshope, C. R. (1981). “Parallel Computers,” Adam Hilger Ltd., Bristol, England. Hotop, H.-J. (1983). “Flighttesting of a Strapdown-System (MSS). Results of a Special Flighttest.” ESA Technical Transla!ion ESA-TT-813. Hotop, H.-J. (1984). “Using Regression Analysis for Determining Air Data Sensor Errors by Means of an Inertial Navigation System,’’ ESA Technical Translation ESA-TT-886. Hotop, H.-J. (1985). “Flight Testing the LITTON LTN 90 Laser Gyro Strapdown System for Civil Aviation,” ESA Technical Translation ESA-TT-904. Hotop, H.-J. (1989a). ”New Stable. Vectorisable Kalman Filter Algorithms Based on Orthogonal Transformations,” ESA Technical Translation ESA-TT-I 108. Hotop, H.-J. (1989b). “New Kalman Filter Algorithms Based on Orthogonal Transformations for Serial and Vector Computers,” Parallel Computing. No. 12, 233-247. Hotop, H.-J., and Zenz. H.-P. (1987). “Test Equipment with Fibre Optic Data Bus and Laser Gyro Strapdown System LTN-90 for the Helicopter BO-105,” ESA Technical Translation ESA-TT-1038. Hotop, H.-J.. Lechner, W., and Zenz, H.-P. (1981). “Flugerprobung von Tragheitsnaviigationsanlagen. Beschreibung der Mess und Auswertetechniken,” DFVLR Research Report DFVLR-Mitt 81-14. Householder, A. S. (1964). “The Theory of Matrices in Numerical Analysis,” Ginn (Blaisdell), Waltham, MA. Hull, T. E., Enright, W. H. and Jackson, K. R. (1976). ‘‘User’s Guide for DVERK - A Subroutine for Solving Non-Stiff ODE’S,” Department of Computer Science, T R No. 100, University of Toronto. Hurrass, K. (1988). “Erprobung eines GPS Empfangers.” DFVLR Research Report DFVLRMitt 88-30. IMSL. (1982). “IMSL Library, Reference Manual.” 9th ed.. IMSL Inc., Houston. Jazwinski. A. H. (1970). “Stochastic Processes and Filtering Theory,” Mathematics in Science and Engineering, 64,Academic Press, New York. Jover. J. M., and Kailath. T. (1985). “A Parallel Architecture for Kalman Filter Measurement Update.” Mathematics and Computer Science Division Technical Memorandum No. 47. Jover, J . M., and Kailath. T. (1986). “A Parallel Architecture for Kalman Filter Measurement Update and Parameter Estimation,” Autumatic’a 22, No. I , 43-57. -
74
HANS-JURGEN HOTOP
Joseph, P. D. (1964). Space Control Systems - Attitude, Rendezvous and Docking, Course Notes, Engineering Extension Course UCLA, Los Angeles. Kalman, R. E. (1960). “A New Approach to Linear Filtering and Prediction Problems,” Journal of Basic Engineering, Transaction of the A S M E 83, 35-44. Kalman, R. E., and Bucy, R. S. (1961). “New Results in Linear Filtering and Prediction Theory,” Journal of Basic Engineering, Trunsaction of the A S M E 83d, 95- 108. Kaufman, H., Woods, J. W., Dravida, S., and Tekalp, A. M. (1983). “Estimation and Identification of Two-Dimensional Images,” IEEE Transaction on Automatic Control AC-28, No. 7. Kolodziej, W. J., and Mohler, R. R. (1984). “Conditionally Bilinear Filter with Tracking Application,” Journal of the Franklin Institute 317, No. 4, 263-214. Kung, H . T. (1984). “Systolic Algorithms and Their Implementation,” Proceedings of the Seventeenth Annual Hawaii International Conference on System Sciences 1, 5- 11. Kung, H. T. (1985). “VLSI Signal Processing: From Transversal Filtering to Concurrent Array Filtering,” VLSI and Modern Signal Processing, 127- 152. Lawson, C. L. and Hanson, R. J. (1974). “Solving Least Square Problems,” Prentice-Hall, Englewood Cliffs, NJ. Lechner, W. (1982). “Techniques for the Development of Error Models for Aided Strapdown Navigation Systems,” AGARDograph, No. 256. Lechner, W. (1983). “Application of Model Switching and Adaptive Kalman Filtering for Aided Strapdown Navigation Systems,” Advances in Control and Dynamic Systems 20, Part 2. Lohl, N . ( 1 982). “Genauigkeitsanalyse von Tragheitsnavigationssystemen,” dissertation, T U Braunschweig, Germany. Maess, G. ( I 985). “Vorlesungen iiber numerische Mathematik I. Lineare Algebra,” BirkhauserVerlag, Basel, Boston, Stuttgart, UTB fur Wissenschaft. Modi, J. J., and Clarke, M. R. B. (1984). “An Alternative Givens Ordering,” Numerische Mathematik, Springer Verlag, No. 43, 83-90. Okutani, I., and Stephanedes, Y . J. (1984). “Dynamic Prediction of Traffic Volume Through Kalman Filtering Theory,” Transpn. Res. - Part B lab, No. 1, 1-1 1. Potter, J. E. (1963). “New Statistical Formulas,” Space Guidance Analysis Memo 40, Instrumentation Lab. MIT, Cambridge, MA. Ramachandra, K. V. (1984). “Position, Velocity and Acceleration Estimates from the Noisy Radar Measurements,” IEE Proceedings 131, Part F, No. 2, 167-168. Rauch, H. E., Tung, F. and Striebel, C. T. (1965). “Maximum Likelihood Estimates of Linear Dynamic Systems,” AIAA Journal 3, No. 8, 1445-1450. Ronsch, W. (1983). “Stabilitats- und Zeituntersuchungen arithmetischer Ausdrucke auf dem Vektorrechner CRAY-I S,” dissertation, TU-Braunschweig. Rodloff, R. (I98 I ) . “Untersuchung zum Lock-in-Problem und zur Signalauslesung an Laserkreiseln,” DFVLR Research Report DFVLR-FB 81-39. Rodloff, R. (1987a). “A Laser Gyro with Optimized Resonator Geometry,” IEEE Journ. of Quantum Electronics QE-23, No. 4, 438-445. Rodloff, R. (198713). “Concept for a High Precision Experimental Laser Gyroscope ‘ELSy’,” Precision Engineering 12, No. 2, 67. Rodloff, R., Burchardt, W., and Jungbluth, W. (1985). “Measurements of laser gyros errors as a function of beam path geometry,” ESA Technical Translation ESA-TT-992. Sasiadek, J., and Kwok, C. K. (1983). “Optimal Control of Turbine with Great Output Using Asymptotic Prediction Estimator or Kalman Filter,” Proceedings of the I983 American Control Conference 1, 252-256. Schanzer, G. (1989). “Einsatzmoglichkeiten von Satellitennavigation,” Symposium Satellitennavigation in der Flugfiihrung, Braunschweig, Germany.
RECENT DEVELOPMENTS IN KALMAN FILTERING
75
Schur, 1. (1909). “Uber die charakteristischen Wurzeln einer linearen Substitution mit einer Anwendung auf die Theorie der Integralgleichungen,” Maihemaiische Annalen 66,488-5 10. Schwarz, H . R. (1986). “Numerische Mathematik,” Teubner Verlag, Stuttgart, Germany. Stieler, B.. and Lechner, W. (1977). ”Calibration of an INS Based on Flight Data,” Application of Advances in Navigation to Guidance and Control, A G A R D Conference proceedings, No. 220. Stieler, B., and Winter, H. (1982). “Gyroscopic Instruments and Their Application to Flight Testing,” A G A R D Flight Test Instrumentation Series, Vol. 15. Thornton, C. L., and Bierman, G . J. (1980). “ U D U T Covariance Factorisation for Kalman Filtering,” in “Control and Dynamic Systems, Advances in Theory and Applications” (C. T. Leondes, ed.). Los Angeles, 16, 178-248. Travassos, R. H. ( 1983). “Real-Time Implementation of Systolic Kalman Filters,” Proc. Spie Int. Soc. Opt. Eng. 431, 97-104. Travassos, R. H. (1985). “Application of Systolic Array Technology to Recursive Filtering,” VLSI and Modern Signal Processing, 375-388. Wilkinson, J. H. (1965). “The Algebraic Eigenvalue Problem,” Clarendon Press, Oxford. Wilkinson, J . H.. and Reinsch, C. (1971). “Handbook for Automatic Computation,” Vol. 11. “Linear Algebra,” Springer-Verlag, Berlin, Heidelberg. and New York. Wrigley, W., Hollister, W. M., and Denhard, W. G. (1969). “Gyroscopic Theory, Design and Instrumentation,” MIT Press, Cambridge, MA. Zurmiihl, R., and Falk, S. (1984/86). “Matrizen und ihre Anwendungen,” Vol. I . “Grundlagen”: Vol. 2. “Numerische Methoden,” Springer-Verlag, Berlin.
This Page Intentionally Left Blank
ADVANCES I N ELECTRONICS AND ELECTRON PHYSICS.VOL 85
Recent Advances in 3D Display D . P . HUIJSMANS
.
Computer Science Department (in isersity oJ' Leiden The Netherland.$ ~
G . J . JENSE
.
Institute for Applied Computer Science. Delft The Netherlands
. . . . . . . . . . . A . Models, Computers and Humans . . . . . . . . . . . . . . . . . . . . . . . . B. Interactive Exploration . . . . . . C . Image Computing . . . . . . . . . . . . . . . . . . . . . . D . Related Work . . . . . . . . . . . . . . . . . . . . . . . . E. Overview of this Chapter . . . . . . . . . . . . . . . . . . . Representation Schemes . . . . . . . . . . . . . . . . . . . . . A . Related Work . . . . . . . . . . . . . . . . . . . . . . . . B . An Overview of Representations . . . . . . . . . . . . . . . . . C. Conversions . . . . . . . . . . . . . . . . . . . . . . . . D . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . Voxel-Based Display Methods . . . . . . . . . . . . . . . . . . . A . Sampling Aspects of Voxel Models . . . . . . . . . . . . . . . . B . Geometrical Aspects . . . . . . . . . . . . . . . . . . . . . C . Rendering Methods . . . . . . . . . . . . . . . . . . . . . . D . Comparisons of Algorithm Time Performance . . . . . . . . . . . Spatial Selection and Division . . . . . . . . . . . . . . . . . . . A . Binary Space Partitioning . . . . . . . . . . . . . . . . . . B . Creating a Subdivision . . . . . . . . . . . . . . . . . . . . C . Displaying Subdivided Volume Data . . . . . . . . . . . . . . . Hardware Support . . . . . . . . . . . . . . . . . . . . . . . A . Special Purpose Architectures . . . . . . . . . . . . . . . . . . B . Transputer-Based Parallel Hardware . . . . . . . . . . . . . . . C . Image-Processing Hardware . . . . . . . . . . . . . . . . . . D . TAAC-I: A Flexible Pipeline Architecture . . . . . . . . . . . . . E . Comparison . . . . . . . . . . . . . . . . . . . . . . . . Implementations . . . . . . . . . . . . . . . . . . . . . . . . A . 3D Image Processing and Image Analysis . . . . . . . . . . . . . B . Exploring Binary Voxel Models with a PC-Based Image Processing System C . Exploview TAAC-I: Interactive Exploration of Gray-Value Voxel Models . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . .
1. Introduction . . . . . . . . . . . .
I1.
I11 .
IV .
V.
VI .
VII .
11
.
. 18
.
. 78
.
83 86 93 93 95 95 91 106 109 113 115 116 130 142 147 148 151 154 159 161 165 169 172 175 Ill 117 194 206 221 224 224
. . . . . . . . . . .
. . . .
. . . .
. . . . .
. . .
. . . .
. . . .
. .
Copynghi 0 1993 by Acddemlc Press. Inc All nghts of reproduction in any form rescrved ISBN O-I2-014727-0
78
D. P. HUIJSMANS AND G. J . JENSE
I. INTRODUCTION A . Models, computers and humans
1. Models at a Human Scale Before the advent of computers, physical models were used to bring certain aspects of a multidimensional structure into a handy scale for human inspection. An architectural scale model for instance is a cheap (compared to the actual construction of the planned object) scaled-down representation of a geometrical form and its spatial relations. Chemists often built scaledup versions of molecular structures because a molecule’s three-dimensional form and chemical function are closely interrelated and real molecules are too small to be seen even with the most powerful microscope. The study of real world phenomena by computer always involves the construction of a mathematical model. The model represents the interesting parts of the phenomena’s structure in numbers or equations; the aspects not represented are those thought to be irrelevant to the study. Model making is always incomplete; a scaled architectural model for instance is not built from the same materials as the building planned and in the molecular scale model the atoms do not move. Computer models however can easily be extended to cover extra aspects like the structure and color of wallpaper within modeled rooms, and the amount of rotation and vibration as a function of temperature within a molecule. Not only can computer models be extended more easily, one can just as easily reduce the number of parameters involved. One is free to choose any particular combination of parameters from among the modeled ones, e.g., in the architect’s computer model wall textures or colors can be enabled or disabled. Also computer models have no fixed scale like a physical model. This enables one to adjust the scale according to the view wanted, global or detailed. When time is one of the modeled parameters, faster or slower than real time and as many times as one wants. When the architect’s plans are represented within a computer, both an external global view and an interior view from one of the rooms can be shown just as well using the same information. The construction of a model may be justified for a number of reasons. 0
One reason may be that the modeled object does not (yet) exist in the physical world. Architects and engineers design models of buildings or machines to study properties like spatial layout, structural length, etc., prior to the actual building.
RECENT ADVANCES IN 3D DISPLAY 0
0
79
Sometimes, the use of a model is motivated by the requirement of being able to study a phenomenon in a controlled fashion. This may not always be possible in the physical world. Scientists can hardly be expected to study climatic change or the evolution of galaxies (or even the entire universe, for that matter) without employing mathematical and computer models. Finally, it may not be possible to observe or manipulate the object of interest directly. Examples of models, used for the “indirect” study of objects are for instance molecular models, determined from measurement data, or anatomical models, reconstructed from x-ray scans.
This final reason originally inspired the research described here. The main purpose was to provide insight in the spatial structure of complex threedimensional physical objects through the (re-)construction and manipulation of their computer models. With a computer model one can more easily experiment with different ways to convey information via artificial stimulation of the senses. Do we for instance convey the amount of force coerced by nearby surface patches of interracting molecules via color or do we give real force feedback to a researcher wearing a so-called data-glove while manipulating the models of the two molecules? For all parameters that do not naturally map to sensory stimuli we have to find out which ways are effective (Tufte, 1990) and which are uncomprehensible or even misleading.
2. Ever-Growing Data sets The motivation behind computer-aided scientific visualization is the evergrowing size, complexity and rate of data pouring out of measuring instruments and numeric simulation programs.
0
In meteorology, measuring devices on the ground, on balloons and onboard satellites monitor the parameters of our atmosphere on an almost continuous basis at a growing level of detail. In order to predict the evolution of the weather system large amounts of data are needed from extended parts of the globe. Without computers the collection of input data, the simulation for the weather forecast as well as the distribution of the output would always lag far behind the actual changes. Without computer visualization the mechanisms of a hurricane would largely remain obscure. Medical imaging: hospitals all over the globe operate all sorts of noninvasive scanning devices like CT, MRI, PET and ultrasound scanners. A number of diagnosis or treatment planning applications
80
D. P. HUIJSMANS AND G. J . JENSE
0
0
(reconstructive surgery for instance) need series of scans for threedimensional views. Especially when working with a series of sections in space or time, it takes too much time to evaluate the images in a clinical session. Visualization tools are developed that deal with volume data, allowing the clinician to get a view of the three-dimensional structure in one image or to follow the evolution in time via an animation series. In the oil industry complete 3D seismic studies mapping the layered structure beneath the surface have become practice. A larger percentage of the oil within a reservoir is pumped up in a shorter period by using some of the wells to inject steam into the oil field, enhancing overall production. In order to plan the amount of steam injected over time simulation programs are run to evaluate the effects of certain scenarios. Acrodynamic simulation studies are accompanied by wind tunnel tests to measure the flow characteristics of different wing designs. Visualization of the three-dimensional flow structure has pertinent advantages over series of cross-sections. In computational fluid dynamics the flow structure of incompressible fluids (usually water) is simulated and largely replaces simulation using scaled down physical models. In finite element analysis one started out with surface stress contours. Reality however is always three-dimensional and what is actually needed is analysis with finite volume elements and ways to find out what stresses occur throughout the volumes and not just on the outside.
Many scientific studies deal with inherently three-dimensional phenomena changing over time. Until about a decade ago such systems were studied in a layered fashion. Recently the number of layers has grown to the extent that the distance between layers are getting small enough to make a transition to volumetric models. Visualization has to keep up with this way of looking at nature. When observing a phenomenon with a one percentage accuracy, every extra dimension involves about a hundredfold increase in data. 3 . How to Make Optimal Use of Our Senses?
As stated before, the task we set ourselves is how to make the best use of our senses to gain insight into unstructured sampled multidimensional data sets of complex phenomena. Since normal human beings have several senses we will first sketch their relative strengths. Human sensory perception can be grouped into five systems, each based upon a different class of sensors (Goldstein, 1984). We have light sensors
RECENT ADVANCES IN 3D DISPLAY
81
for the detection of electromagnetic radiation (our visual system); sound sensors for the detection of air pressure waves (our sense of hearing); pressure sensors for the perception of forces and temperature sensors for avoiding extremely hot or cold objects (our touch); chemical sensors for gaseous (our smell) and liquid or solid substances (our taste). Each of these perception systems provides us with a different way of dealing with the world we live in. Not all senses have the same complexity and discrimination power. In hearing we translate a time series of sound waves into words, sentences, melodies or noises (unrecognized sound patterns). By using the information from both ears we can selectively concentrate upon sounds coming from a certain direction. However we are bad in understanding several sound patterns simultaneously. Hearing is made to follow series of sound patterns from a certain direction, one at a time. Although this makes our hearing sense rather one-dimensional, it is the most powerful system to communicate abstract concepts. In seeing we perceive a world of objects with spatial dimensions and changing in time through a number of wavelength windows (their relative contributions make up for the perceived colors) via reflected, absorbed and refracted light rays focussed on the retinas of our two eyes. Of all our nerves about two-thirds deal with the detection and processing of visual information. The visual system is by far the most complex, multidimensional and accurate perception system. Especially the dimensionality is high. For location we employ a three-dimensional space, changes in time allow us to detect motion within this space, and objects are characterized by their interaction with light sources that can be described in a three-dimensional colour-space. Although we concentrate our attention consciously upon specific locations at a time, we remain sensitive to many visual phenomena in parallel. Touch enables us to characterize objects by the way they react to forces and the way heat flows upon contact. Solids have characteristic surfaces (smooth or rough), a specific temperature and are more or less flexible. In order to recognize three-dimensional spatial shape we follow an object’s surface with our fingers. Taste and smell allow us to characterize the chemical composition of gases, liquids and solids. Although the elementary sensors are very sensitive, the low number of sensors employed limits the accuracy of chemical characterization to a rather low number of smells and tastes. Our sound and vision system is also used to communicate about the inferred meaning of what we sense. Analyzed information can be talked about using standardized sounds or read using standardized two-dimensional patterns (spoken or written languages). Because we have no sense for speed like we have one for acceleration, speed is inferred from visual information, especially from peripheral vision.
82
D. P. HUIJSMANS AND G. J. JENSE
There is no way to obtain insight into multidimensional data stored in a computer but by artificially stimulating one or more of the researcher’s senses. As far as computer-aided insight gaining is concerned, we can evaluate how well, how quick and how cheap certain sensual stimuli can be mimicked, No doubt the interaction of the researcher with his or her data set is maximal when all senses are stimulated. The tendency to all embrace the researcher in a simulated environment is one of the ideas behind virtual reality. In some of these experimental setups, the spectator wears glasses with tiny screens, whose contents are updated as soon as the viewer moves his head or walks around (head motion parallax). When provided with a force input device like the data glove and surrounded by stereo sound most of the perceptual spectrum can be covered. The remaining sensations could be provided by using a force feedback device (Iwata, 1990; Brooks et al., 1990) and by producing odors and flavors or by direct electrical stimulation of nerves associated with those senses. The researcher however might soon become more of a victim than an explorer, because most senses only allow slow changes in order to feel comfortable. If we refrain from direct electrical stimulation of nerves associated with our different sensors, especially the use of contact stimuli like smell and taste would be cumbersome. Noncontact stimuli like sound and light that hardly show after effects can be used to process much more rapidly changing situations. Of the contact stimuli only force feedback by using hands or feet may provide an effective user interface for the manipulation or orientation of shapes; but according to Brooks et al. (1990) it makes visual feedback only about twice as efficient. If we rank our senses according to contact-noncontact, with-without after effects, ease of shielding stimuli for other people nearby, the dimensionality (limiting the number of independent parameters), the degree of parallelism, and the amount of nerve activity associated with each sense we find that visual stimuli are by far the most effective, followed at a distance by audio stimuli. Their combination is effective enough to let people cling to their TV sets for several hours a day! We also state that “we do not believe it until we have seen it.” Already a generation of TV viewers compares reality to TV instead of the other way around. This is because the imaged world is so much richer and changing more rapidly then everyday live. The use of image and speech is complementary; images present the unstructured raw events, sound the structure (speech conveys the meaning behind the pictures and music evokes emotions). As far as the exploration of unstructured data is concerned it is images rather than sound that prove effective. Scientific visualization deals with methods using all or part of this up to seven-dimensional space, time, color environment to help the viewer in grasping the structure in his multidimensional data set.
RECENT ADVANCES IN 3D DISPLAY
83
This chapter will be devoted to the use of interactive visualization methods for the display of stationary discrete three-dimensional scalar field data obtained by sampling real objects. Especially interaction puts high demands on the applications: ideally, new frames must be provided in a split second. The gain in effectiveness provided by near real-time interaction is so great that it is certainly worth the effort.
B. Interactive Exploration 1. The High Demand of Interactivity
The exploration of a complex data set involves the presentation of selections of data by a range of visualization methods. For instance, when investigating a 3D flow field of a river passing a harbor, one might 0 0
0 0
Follow particles in time. Steer a cross-section plane through the 3D field and display colorcoded velocities or the direction of the flow using shadowed vectors. Show the form of isovelocity surfaces using contours, wireframes or solid surfaces with flat, Gouraud or Phong shading. Get an idea of the amount of turbulence or torsion by displaying ribbons that follow a set of originally parallel trajectories.
The sooner feedback is obtained from one of these aspects, the more aspects can and will be brought forward in a session and the more completely the phenomenon can be understood, because all the views of the underlying structure are still fresh and can be mentally integrated more easily. Especially when motion or a range of views from different angles is essential, the refreshing rate to create the impression of continuous change must at least be 10 frames a second. Almost any interaction mechanism will d o if the feedback takes place within a split second because one soon gets a feel for it. When feedback takes too long the specification of how we want to look at the data must be much more specific and accurate, so as not to frustrate the user, and the building of interfaces then becomes an art. What does it mean in terms of computing power if we want to be able to refresh the display of a scene often enough to be perceived as continuous? First of all, when does a still scene look continuous? The real world we look at is projected on a retina containing about 150 million rods sensitive to overall intensity and about 7 million cones, red, green and blue ones, sensitive to subranges of the visual spectrum. The triple of phosphor points of the RGB color monitor for every pixel lights up when struck by electrons from the three guns; their relative amount of excitation determines the color you see.
84
D. P. HUIJSMANS AND G. J. JENSE
Three numbers per pixel must be provided. How accurate should these numbers be? According to Weber’s law our visual system is logarithmically sensitive and accurate at any level to a 1-2% change; so about 50 to 100 levels per primary colors will do, which means that three 6- or 7-bit numbers are enough to create continuous changes in color space. How many pixels do we need? Again to create the impression of a continuous intensity pattern the spatial resolution of the screen should remain below the spatial resolution of our visual system. Because computer displays are looked at from much shorter distances than when watching TV (with its spatial resolution of about 300 x 250 pixels) we need an increase of about five times in linear resolution and end up with an image of about 1500 x 1250 pixels with at least three 6-bit numbers per pixel recalculated at least 10 times a second. A complete recalculation of a frame then comes down to calculating 60 million numbers with 6-bit accuracy. As will be explained in Section 111 with a 3D scalar data set of 512 x 512 x 512 voxels this comes down to the following series of calculations per pixel: 0 0 0
0 0
0
Calculate where a ray through the viewer’s eye and the screen pixel intersects the subdivided, translated, rotated and scaled voxel cube. Search along the ray between first and last contact point. Collect information about the local environment of the next volume element along the ray. Determine local gradients. Update the local surface orientation value when the surface is reached or in the case of transparency update opacity values till the ray leaves the model. Set pixel RGB values according to the surface orientation of the first contact point taking lighting conditions into account or according to total opacity along the ray.
At present even on powerful workstations this may take from 10 seconds up to minutes, which means that we need another speedup factor in the range of 100 to 1000. When the current general speedup of serial computers continues to double every 2 years, it will take another 10 to 20 years before this potential is reached in general purpose hardware (clock speed would then be measured in giga- rather than megahertz). There are physical limits to this speedup, so it remains to be seen if serial computers can become that fast. If we want to attain this speedup earlier we have to spread the intrinsically parallel tasks among a number of processors (at present no less than about a few hundred of them) or build special purpose hardware designed for this specific task. In Section V we will address this solution.
RECENT ADVANCES IN 3D DISPLAY
85
Instead of trying to be fast we can also try to be smart: adapt software to the display task at hand. There is a lot of ongoing research into display algorithms that somehow make use of the amount of coherence that exists on all levels of the calculation. Often, changes from frame to frame, from ray to ray, and from voxel to voxel are small and can be predicted with almost certainty. In practical applications we can be sure the changes are gradual for if neighboring display frames differ too much (more than about a few percent) the time series of frames becomes as incomprehensible as illdesigned video clips. In Section 111 we will discuss possible ways to exploit coherence on all stages during the calculation. 2. A World of’Illusions
The display methods we will talk about in this chapter are aimed at a certain range of displays: high-resolution analog RGB color monitors of at least 640 x 480 pixels, RGB with 6-bit intensity values and 50 Hertz refreshing rate. Since this enables us to show only consecutive frames with a twodimensional pattern, we have to use additional tricks to mimic three-dimensional space. Our visual system is used to extract three-dimensional information from the projected scenes light throws on our retinas. The following depth cues can be used: 0 0 0
0
0 0
View angle (projection changes with viewing direction during rotation). Viewpoint (perspective: parts of the object nearby look greater). Atmospheric perspective (contrast and intensity nearby is greater). Hidden features (nearby objects may obscure those further away). Shadows (shadows of nearby objects may partly cover further ones). Shading (due to the presence of light sources, surface geometry and reflection and absorption characteristics of the surface). Ambient light (indirect light from reflections). Stereo vision (small view point difference leads to slightly different projections).
A sophisticated 3D display program may employ any or all of these depth cues. Most programs employ only a few of them because some of them take too much time. In Section 111 they will be discussed and often used simplifications will be explained. Although there are research projects going on that focus on holographic screens able to display three-dimensional scenes, the mass of color displays will be confined to two-dimensional projections of three-dimensional scenes for some time to come.
86
D. P. HUIJSMANS AND G . J. JENSE
C. Image Computing
The computer science fields that deal with information in two-dimensional images are computer graphics and image processing. These fields encompass all aspects of generating, processing and analyzing pictorial information. Three-dimensional images are the domain of volume visualization. This new field employs techniques from both computer graphics and image processing. Speedup methods are essential because interactive visualization is the aim. 1. Computer Graphics
In computer graphics, data structures and algorithms are developed to generate images from geometric descriptions of objects. Generally, a distinction is made between the actual construction of a geometric description (geometrical modeling) and the generation of images from such a model (computer graphics). The design of data structures for the representation of geometric information is an important element of computer graphics research. Another issue is the development of display algorithms. A display algorithm performs the transformation of the information in the geometric model, described in some object space, into pictorial information in image space. This chapter deals with models of physical objects, in which case the objects are described in a three-dimensional object space. Since image space is by definition two-dimensional, display algorithms must provide some way of creating an illusion of the lost (third) dimension.' Several criteria influence the choice of a spatial representation model (data structure) and algorithms for processing the geometric data. At the same time the choice of a data structure influences that of a display algorithm, and vice versa. Some of the criteria involved in these choices are Accuracy of representation. Efficient usage of computer memory. a Interactivity; i.e., rapid visual feedback from user actions. Rendering quality. Each of these criteria puts different demands on the geometric representation model and algorithms. Figure 1 shows the flow of information in a
'
Sometimes, a finer distinction is made: lying between three-dimensional object space and two-dimensional screen space, image space is called two-and-a-half dimensional when the Z-, or depth-, value is retained; e.g., when there is a depth-ordering among objects in screen space (Teunissen and Van den Bos, 1990).
RECENT ADVANCES IN 3D DISPLAY
Display
..
87
............._.... . _ _ _ _ __ _ __ . _ _ :
.... .* u s e r action
I
I
Geometric M o d e l l i n g FIGURE I . Interactive 3D computer graphics
simple interactive computer graphics system. Of special importance here is the interaction loop formed by the user’s influence on the image generation process, based on observation of the displayed image. The key elements from the computer graphics field for this chapter are 0
Three-dimensional models are synthesized from geometric descriptions. Images are the output of some display algorithm that takes these models as input data.
In the next section we will look into a related, but fundamentally different, computer science field to identify several other important issues. 2 . Image Processing
Image processing itself is concerned with the processing and analysis of information in two-dimensional images, yielding as output either new images or information in other forms; e.g., geometric or symbolic descriptions. Rosenfeld and Kak (1982), in what is generally seen as the standard reference book of the field, distinguish three major subareas: image sampling, image processing and image analysis. Before images can be processed, they have to be represented in some machine-readable form. After all, in the physical world an image is a continuous two-dimensional distribution of light intensity and color. This is called a physical or continuous image. To obtain a digital image, discrete samples are taken (of the light intensity for instance) on many places in the continuous image. The sampling locations usually lie on some regular grid.
88
D. P. HUIJSMANS AND G. J. JENSE Ima ge Process in g
.
s
Image Ana 1ys i s
.............
Fl
...........
........
Geometric or Symbolic Model
FIGURE 2. Interactive image processing.
The samples are subsequently digitized and stored, after which they can be processed. A digitized sample value is called a picture element or pixel. A digital image is simply an exhaustive enumeration of the occupancy values of cells (represented by the pixels) on a regular 2D grid. When the result of some processing step is another, possibly enhanced, image, we call this image processing. When the aim is to extract information in some other form from an image, we speak of image analysis. Both areas are usually referred to as image processing. As an example, consider a system for counting blood cells in microscopic images. This may be done by first enhancing the raw image (processing step) and then locating connected regions in the image and isolating these regions from the background and counting them (analysis step). Figure 2 shows the relationships between both types of image processing. When this figure is compared to Fig. I , several differences and similarities will be obvious. As has been said before, in computer graphics, images are the output of the display algorithm. In image processing, images occur as input data to both processing and analysis steps and as output from processing algorithms. At the same time, the required sampling of the continuous image, prior to processing, adds an extra component to image processing systems. Note that in both figures, images also play a role as intermediaries. They convey information to the user, upon which some action may be taken to influence the display-processing-analysis process. This aspect will play an important role in the rest of this chapter.
3. Volume Visualization
a. Volume Data. The result from the sampling of some physical
RECENT ADVANCES IN 3D DISPLAY
89
phenomenon or object in three dimensions is a volume data set. Volume data may be produced by various types of acquisition equipment: 0
0
0
Computer assisted tomography (CAT), magnetic resonance (MR), and ultrasound (US) scanners provide three-dimensional data for medical diagnosis, organized as series of parallel virtual slices, where each slice typically consists of 256 x 256 or 512 x 512 8- or 12-bit grayvalue elements. Series of microscope photographs of physical slices or coupes from a biological specimen. Usually, regions of interest in each coupe are colored by (immuno-)histochemical dyes. Confocal scanning laser microscopes (CSLMs), where the extremely small depth of field of these devices is used to make virtual slices through a sample of tissue. The resulting data set may consist of up to 5123 8-bit samples.
Other sources of volume data are 0
0
Analysis programs for seismic data yield 3D models of geological structures from which geologists try to deduct the location of, e.g., oil, gas and water deposits. Simulation programs for various physical phenomena, such as those used in atmospheric research, usually provide numeric data on some 3D grid.
Note that several of these acquisition methods produce a volume data set via series of intermediate images. All the above previously mentioned methods produce a so-called voxel model of the sampled physical object; i.e., an exhaustive enumeration of the occupancy of elementary volume cells on a regular 3D grid. Each voxel represents the sampled physical quantity (e.g., density) at the implied spatial coordinates. When a voxel represents one (or possibly several) scalar value(s), the model is also called a scalur$eld. Analogous to traditional 2D image processing, scalar voxel models may be perceived as three-dimensional digital images. A crucial difference, however, is that an explicit display operation is required before the contents of a 3D image can be visualized. This means that the development of methods for (interactive) volume visualization should incorporate techniques from both the fields of computer graphics and image processing. In this chapter the boundaries between the two fields will repeatedly be crossed and the distinctions between the two will often blur. Voxels may also represent more complex quantities, e.g., direction. Such vector fields are encountered in, for instance, fluid-flow experiments.
90
D. P. HUIJSMANS AND G. J. JENSE 2D Image P r o c e s s i n g
...............................................................
..............................................................
I
30 Image P r o c e s s i n g
Sampling
...........................
..... ......-
Conversion/ 30 Image A n a l y s i s
Represen-
FIGURE 3. Interactive volume visualization
It will be obvious that these voxel models demand an entirely different visualization technique. The work described here deals with scalar voxel models only. In Fig. 3 we have tried to illustrate the most important parts of a system for the interactive inspection of volume data. As can be seen, it contains elements of computer graphics and traditional 2D image processing. 4. Speedup Methods
The generation of images to support the interactive exploration of voxel models will usually be too slow if programmed in a straightforward way and run on a general purpose computer system. In setting up interactive visualization systems two ways can be followed to enhance the time performance: dedicated hardware and smart algorithms. Both ways, hardware and software optimization, will be treated. Hardware architectures can be found in Section V, smart algorithms mainly in Sections I11 and IV. In this section we will discuss a number of methods that can be used to reduce the number of calculations needed for volume visualization tasks without resorting to special purpose hardware. In combination with such special architectures these methods may give even better performance. The speedup gained may be due to the use of look-up tables, the lowering of dimensionality, the exploitation of coherence, replacing a complex situation by a series of simpler ones, by carefully designed spatial data structures, by
RECENT ADVANCES IN 3 D DISPLAY
91
carefully choosing one’s algorithm or by a mix of software and hardware acceleration techniques. a . Choice qf Algorithm. One of the first things to do when searching for an implementation of a visualization task is to scan scientific literature for algorithms and select the best one based upon time complexity analysis and timing tests. Although this seems to us an obvious road to follow, one often encounters systems based upon suboptimal algorithms. We hope this overview gives ample references to choose from better ones.
b. Look-up Tables. Complex transformations that take up a lot of time can be substantially speeded up if the number of possible inputs and outputs are limited because of the discrete character of the transformation. In case the number of possible incomes and outcomes are an order or orders of magnitude less than the number of transformed pixels or voxels, the complete set of possible values can be precalculated and stored in a table, the transformation look-up table. This way the cost of many intricate and costly calculations can be reduced to the cost of an I/O-access operation. A simple example of such an approach that is also incorporated into graphics and image processing hardware is the implementation of point operations on pixel values using a look-up table. Take for instance the thresholding of a gray level image that transforms an image of pixels with 256 possible gray values into a binary image of foreground and background pixels. Instead of checking every pixel value against a range of gray values, the resulting intensity 0 (background) or 1 (foreground) is stored in a 256-entry table indexed by the gray value of the input pixel. Now that memory size is no longer a problem, this method will often give the desired speedup. In this overview of three-dimensional display it will be presented as an attractive solution in certain phases of voxel projection algorithms and for decision making in morphological operations like thinning algorithms. c. Lowering Dimensionality. Dealing with three-dimensional data does not always mean that the transformation ofthese data sets is an inherently threedimensional problem. Three-dimensional transformations can often be solved in a lower dimension. A well-known example is the calculation of the Fourier transform of a 3D “image.” Because the Fourier transform is separable, the three-dimensional transform can be obtained from a series of one-dimensional Fourier transforms. In this overview we will show in Section 111 how the threedimensional problem of shading can be reduced to a two-dimensional problem by using the projected voxels distance image (depth-buffer gradient shading) or by using the depth-coordinate buffer (DjC buffer). Both methods reduce the 3D shading problem to a 2D visible voxels shading problem. This method was also tried with success in 3D thinning; 18-connected thinningcan be implemented as a
92
D. P. HUIJSMANS AND G. J. JENSE
series of 8-connected thinning steps in three mutually perpendicular 2D neighborhoods. d. Exploiting Coherence. Most of the geometric transformations needed in volume visualization have to be applied to large sets of voxels with input coordinate values that are nearly identical to those of their immediate neighbors. By scanning a volume slice by slice, row by row minimal changes between successive inputs become available. When changes are small enough, their effects can be approximated by linear differences; this way a series of multiplications can be replaced by a series of incremental additions that are performed much faster. Visualization algorithms make extensive use of incremental techniques to exploit spatial coherence between input values and the performance gains are tremendous. In ray casting a volume, for instance only the coordinate triples of the bounding box, have to be calculated in full. All the other outcomes within the transformed bounding box can be obtained at a fraction of the cost by incremental techniques. The same incremental approach can be followed for the entire voxel cube as well as for a subdivided volume. e . Replacing Complex Situations by a Series of Simpler Ones. If the dimensionality of a problem cannot be lowered, the extension often can. Sorting n coordinates on depth to find the nearest one (minimum filter) can be reduced to a series of painvise sorts; new depth values are compared with the current lowest, after n comparisons the nearest of n depth’s is obtained. This hidden volume method can be hardware supported by a so-called Z buffer. The result of a merge of n opacities along a line of sight can be transposed into a series of merges of current sum and weighted next opacity. In that case the arrival order must be strictly increasing or decreasing. Hardware support for this incremental transparency calculation is called an a: channel. Twenty-six connected thinning via local changes involves a 3 x 3 x 3 configuration; an exhaustive table of all possible configurations consumes too much memory. A number of ways have been suggested to reduce the complexity. One of them lowers the extension to a series of 2 x 2 x 2 configurations within a 3 x 3 x 3 neighborhood; that way the number of configurations can be tabulated using a 9-digit binary number instead of a table with a 27-digit binary index.
f. Data Representations. Visualization of multidimensional data has also led to new sorts of data structures and data bases. Ordinary data bases turn out to be very inefficient when used for data sets defined on a regular grid and with geometric characteristics. They are far too slow when one has to decide in what order an object has to be scanned out to produce an hidden feature display. All sorts of geometric questions like, Is this point
RECENT ADVANCES IN 3D DISPLAY
93
in space within one of the volumes? are very inefficiently dealt with. The appearance of volumetric data has given rise to new data structures that deal efficiently with geometric data and interactive ways to divide space using cutting planes. In Section IV we will treat some useful examples.
D. Related Work Important work on visualization techniques for binary voxel models has been done by Herman and co-workers. Display methods have been reported in Artzy, Frieder and Herman (1981); Udupa (1983); Tuy and Tuy (1984); and Frieder, Gordon and Reynolds (1985), while voxel based shading techniques are described in Chen et af. (1985) and Gordon and Reynolds ( 1985). Goldwasser ( 1 984; Goldwasser and Reynolds, 1987) has developed special purpose hardware. Some manipulation and simple spatial editing facilities have been described in Brewster et uf. (1984) and Trivedi (1986). “Real” volume rendering, i.e., working from the unsegmented volume data with transparency taken into account, is the research subject of many groups all over the world. A number of important techniques are described in the Siggraph 1988 proceedings, notably the Pixar volume rendering system (Drebin, Carpenter and Hanrahan, 1988), but also Upson and Keeler (1988) and Sabella (1988). Volume ray tracing algorithms have been described extensively by Levoy (e.g., 1988 and 1990). The volume visualization field has expanded rapidly (Kaufman, 1991). Several conferences and workshops have been devoted solely to this subject. The first one was the Chapel Hill Workshop on Volume Visualization in 1989: (Upson, 1989; Herr, 1989), which has had successors in Paris (Grave, 1990) and San Diego in 1990, Delft (Post and Hin, 1991) in 1991. Many more publications are available in the various application fields, such as medical diagnostics, molecular graphics, computational fluid dynamics, etc. In the beginning focus tended to center on the details of rendering data sets for a particular application, i.e. to bring out as much detail as possible from specific volume data sets, while being less suitable to render volume data from other application areas. Recently more generally applicable visualization tools have emerged like apE (Dyer, 1990) and Upson’s AVS (Post and Hin, 1991).
E. Overview of’this Chapter An important shortcoming of current volume visualization methods is that most of them have been developed for individual applications: they often do
94
D. P. HUIJSMANS AND G. J . JENSE
not provide an adequate solution for displaying volume data other than those from their own application domain. The image that is produced by a volume rendering algorithm often depends on a large number of display parameters. When the contents of a volume data set is unknown, one would like to be able to experiment with the settings of these parameters. However, volume rendering algorithms, implemented on general purpose hardware, e.g., a workstation, often take minutes to produce an image. Special purpose hardware solutions yield much better performance figures, but they often support only one particular display algorithm, thereby trading off flexibility versus display speed. It is likely that the nature of the underlying physical phenomenon, from which a volume data set originated, to a large extent determines the choice of a particular visualization technique. For instance, a more or less amorphous distribution of some scalar variable may be displayed, using a method that employs transparency. However, when there are clearly delimited structures present in the data, it is often desirable to extract these substructures and manipulate them separately. Automatic segmentation of voxel models can be difficult . An obvious solution is to create, as a first step towards segmentation, a subdivision of the voxel model through interactive manipulation. Such a spatial selection facility would allow a user to remove parts of the volume data in order to reveal hidden details. This requires the “embedding” of several different volume rendering and 3D image processing algorithms in an interactive environment. Such an environment would enable a user to inspect the contents of a volume data set in an exploratory manner. In order to achieve this interactive visualization of volume data, several aspects of the visualization process will be studied in this chapter. In Section I1 several relevant data representations and methods for conversions between them are presented. From that overview it turns out that voxel models are suitable representations for interactive inspection of irregular 3D objects. Because of the requirement of rapid visual feedback, the efficiency of display algorithms is important. Several different methods are described and compared quantitatively in Section 111. Section IV describes a new method for spatial selection of voxel models. It is based on the notion of binary space partitioning and allows the incorporation of various different display algorithms, thus providing a flexible way for the interactive inspection of volume data sets. Section V deals with hardware to speed up various operations on voxel models. First, an overview is presented of special purpose hardware systems for the display of voxel models. Then, a PC-based image processing system
RECENT ADVANCES IN 3D DISPLAY
95
is described. Finally, the architecture of an accelerator system for workstations is described. The various subjects from previous sections are brought together in Section VI, where a description is given of three systems for the processing, display and interactive selection of voxel models. First we have experimented by extending a number of existing 2D image processing methods to 3D for the processing of voxel models. Section V1.A shows that some image processing techniques may be extended quite naturally, or even trivially, to 3D, while others require a fundamentally different approach. The second system is meant for the visualization of binary voxel models. This PC-based system described in Section V1.B equipped with image processing boards is well suited to support exploration of binary voxel models. The third system described in Section V1.C offers facilities for both spatial selection by subdividing and for interactive selection of viewing and rendering parameters for gray value and labeled voxel models. It was implemented on a TAAC-1 to demonstrate volume visualization as well. Finally, the conclusions from this research are drawn and suggestions for further work are made.
11. REPRESENTATION SCHEMES
There are many schemes for the representation of complex 3D objects. Each of these exhibits various characteristics that must be taken into account to select one for a given application. A specific representation may not suit all the requirements of an application equally well. This occasionally demands a conversion from one representation to another. In this section a systematic overview is presented of a number of representation schemes and algorithms for conversions between them. Particular attention will be paid to the suitability of representations for interactive spatial selection purposes. This will show that the so-called exhaustive enumerations, or voxel models, have a number of pertinent advantages when representing volume models of sampled 3D objects. A . Related Work
The way in which representations can be classified often depends on the application area. Much work has been done in the area of geometric modeling on developing efficient representations for synthesized objects. A well-known theory for the representation of solid models has been developed by Requicha and his coworkers. It is based on the mathematics of
96
D. P. HUIJSMANS AND G. J. JENSE
regularized point-set topology. In Requicha ( 1980) several representations are classified according to their properties under various point-set operations. These representations are targeted mainly at mechanical engineering applications. In his book on solid modeling, Mantyla (1988) gives a classification of representation schemes that is similar to Requicha’s, but more systematic. His taxonomy is based on the observation that a representation is a method to encode an infinite 3D point set (describing the model) in a limited amount of computer memory. Different representations exhibit different characteristics. Representation schemes are subdivided in three broad classes: 0
Decomposition models, in which an object is subdivided in simple elements such as elementary cubes. Constructive models, which describe how an object is constructed from primitive building blocks like cubes, spheres and cylinders. Boundary models, where a three-dimensional object is described in terms of its bounding surface.
Again, the emphasis in this classification is on the representation of synt hesized objects. An overview that takes into consideration both synthesized objects and arbitrarily shaped sampled objects has been given by Badler and Bajcsy (1978). These authors divide representation schemes into two classes, each of which is subdivided according to the primitive describing elements. They distinguish 0
0
surface models, where an object’s boundary is composed of 3D surface points, polygonal elements (triangular tiles are often used) and curved or quadric patches; and volume models, where an object’s volume is decomposed in cubic volume cells, convex polyhedra and primitive geometric forms such as ellipsoids, cylinders or spheres.
Several points of criticism can be raised against this classification. First, the distinction between curved and quadric surface patches is confusing: quadric surface patches are curved (in general), but not all curved patches are quadric; i.e., can be described by polynomial equations of degree two. Second, the categories of ellipsoids, cylinders and spheres are special cases of geometric forms,and there appears to be no special reason to assign them to separate categories. In a review article on software for 3D reconstruction applications, Huijsmans et al. (1986) provide an overview of representations according to the geometric dimension of the describing elements. They distinguish
RECENT ADVANCES IN 3 D DISPLAY
97
between 0 0
0 0
OD descriptors: point clouds and volume occupancy arrays. I D descriptors: wire frames and stick figures. 2D descriptors: stacks of planar curved contours. 3D descriptors: surface tilings with flat polygons or curved patches.
Chen (1987) takes a similar approach, but considers only contour stacks (which he calls 1D primitives), surface descriptions in terms of boundary faces of cubic volume cells, i.e., flat squares (2D primitives), and volume tesselations with cubic cells (3D primitives). This clearly shows that classifications are subject to debate: Are the primitive cells in volume occupancy arrays OD or 3D primitives? The dimensionality of planar contours is also unclear. The contours themselves are lD, i.e., linear primitives, but they enclose (2D) polygons and surface elements that are curved in the third dimension. It seems that each of the previously mentioned classifications addresses a different part of the problem of providing a unifying framework to evaluate the pros and cons of various representation schemes, but none of them is particularly useful for our aim; that is; to select a suitable representation for highly irregular 3D objects. Above all, none of them pays attention to the properties of the different representations regarding spatial selection and editing operations.
B. An Overview of Representations We will review a number of known schemes that may represent sampled 3D objects. However, attention will also be paid to several representations that are more suitable for synthesized objects. This is done for two reasons: 1. Sampled 3D objects may have to be combined with synthesized ob-
jects, for instance when a series of CT scans of a hip are displayed together with a CAD model of an implant. 2. Selection and editing of sampled volume data involves the specification of (sometimes many) spatial intervals. These may be represented as synthetic 3D objects, and this representation is then used with the original volume data at display time. An attempt will be made to classify the representations, based on the dimensionality of their primitive elements. In this we follow the previously mentioned approach by Huijsmans et al. and Chen, but with several alterations and additions.
98
D. P. HUIJSMANS A N D G . J. JENSE
Representations will be classified according to the highest topological dimensionality of the primitive describing elements. This is similar to Chen’s approach. However, like Huijsmans et al., we will be more complete by including OD primitives and several kinds of 2D primitives. Furthermore, the distinction between continuous and discrete representations will be made. These notions are used rather loosely by defining a continuous representation to be one in which coordinate values can be real valued, while in a discrete representation coordinates can assume only integer values2 (Kong and Rosenfeld, 1989). The following properties of representations will be evaluated: Data availability. The form in which volume data becomes available is determined by the type of acquisition equipment. Sampled volume data may originate from a number of widely different sources (see Section V1.C). Storage cost. The cost of computer memory is continuously decreasing, making the storage of larger models more feasible than would have been practical only a few years ago. Another issue is that of conceptual simplicity versus expressive power of primitives. Display complexity. Displaying a 3D object involves the calculation of a viewing transformation as well as clipping, hidden feature removal, shading, transparency and possibly other rendering operations. The performance of algorithms for these operations depends heavily on the underlying representation. Spatiul editing. The ability to perform spatial selections in order to remove parts or move them with respect to each other may provide additional insight in the object’s structure. The underlying representation should support such manipulations. Quanfijicarions.In addition to the visual inspection of 3D objects, it may be required to quantify parameters like distances, areas (both cross sectional and surface) and volume, center of gravity and moments of inertia.
Since a particular choice may not be the best one with regard to all these issues, the suitability for conversion to other representation schemes is an additional requirement. The characteristics of representation schemes with regard to conversion algorithms are discussed in a separate section.
’
The distinction continuous versus discreet is not the same as exact versus approximative. A continuous representation can also be approximative; e.g., a surface tesselation with many small triangular elements can be used to approximate a curved surface. However, thz use of a continuous or a discreet representation does form an important difference between surface and solid models, on the one hand, and volume models (volume data sets), on the other.
RECENT ADVANCES IN 3D DISPLAY
99
FIGURE 4. Point cloud representation (laser range meter data).
1. 0-Dimensional Primitives The simplest primitive descriptors are those of dimension 0 points. In this case the representation of a spatial data set is simply the collection of coordinate triples (x, y , z ) of all data elements. Such a representation is called a point cloud. In a point cloud representation, the coordinates of a data point are noted explicitly (see Fig. 4). When the point data are organized on planes, lying perpendicular to the three major axes at certain (possibly nonregular) intervals, a 3D grid may be defined over the modeling space, and the data points can be considered values of some continuous scalar function of three real-valued spatial variablesf(x, y , z ) sampled at fixed points in 3D space. When viewed in this way a model is called a 3 0 scalarfield. A spatial ordering may now be imposed on the data values by storing them in a 3D array. As the grid spacing may vary, it still has to be represented explicitly. In spite of this, the scalar field representation is more implicit than the point cloud. Because the coordinate values in both point cloud and scalar field representations can be real valued, they are considered continuous representations. A
100
D. P. HUIJSMANS AND G. J . JENSE
FIGURE 5. Volume cell versus grid point view of voxel models.
case of special interest arises when the sampling is done in a uniform fashion; i.e. when the sampling distances along each of the axes are constant (but not necessarily all of the same value). This allows an even more implicit representation of the coordinate values. The coordinates of the grid points can, after a suitable scaling, be represented by integers. When a volume is intersected with a regular three-dimensional grid, it is subdivided in small, rectangular volume cells. The volume can then be represented by marking all volume cells that are contained (say, at least 50%) within the object. This representation is often called an exhaustive enumeration. The name voxel model is also appropriate (voxel = volume element), although this name usually denotes the case when the volume cells are cubic; i.e., when the grid planes are placed at equal distances along all three coordinate axes. Note that this 3D-primitive-based representation is, for all practical purposes, equivalent to a representation described in the section on OD primitives; namely, the scalar field within uniform grid spacing. In Section I11 we will show that both views of the exhaustive enumeration scheme have their advantages. Some voxel-based methods can be understood clearest by seeing voxels as grid points of a scalar field, while others are easier understood when seen as volume cells (see Fig. 5). The amount of information that is stored per voxel can vary. The sampled value of some continuous physical quantity, like density, is recorded. The dynamic range and accuracy of the numeric value is determined by the sampling method. Since both are often limited a suitable scaled 8- or 16bit integer interval can be used. Such a model is called a gray value voxel model. A second kind of voxel model is the one in which each voxel value indicates to which structure (possibly none, or the “background”) it belongs; this will be called a labeled voxel model. When the only information is whether a voxel lies inside or outside the sampled object, which takes on I
RECENT ADVANCES IN 3D DISPLAY
101
bit of information, the model is called a binary voxel model. Usually some kind of processing (3D image analysis) is required to obtain a binary or labeled voxel model from the gray value data, via a process known as segmentation and class$cation (Rosenfeld and Kak, 1982). A binary voxel model can be seen both as the simplest gray value voxel model and as the simplest labeled voxel model.
2 . I - Dimensional Primitives Going up one level in the dimension hierarchy we arrive at descriptors that have spatial dimension one: lines. If one-dimensionality would be the only criterion, both infinite, seminfinite and finite straight and curved lines would have to be taken into account, but we will confine ourselves to straight line segments of finite length. In applications such as the study of neural networks, objects are often represented as so-called stick figures. These are treelike figures, consisting of a collection of points, or vertices, and line segments, or edges, connecting
FIGURE 6 . Wireframe representation (laser range meter data).
102
D. P. HUIJSMANS AND G . J. JENSE
pairs of vertices. In this application the points represent neurons, branch points or synapses, and the edges represent the nerve tracks in between. In order to be called trees in the strict sense of the word, cycles of edges may not occur in stick figures. Because of this, stick figures are inherently unsuitable for the representation of volumetric data. A similar representation is the wireframe, which is used to represent objects in, eg., computer-aided design and drafting systems. Usually, a wireframe model of such a synthetic object is subject to various constraints, for instance, each vertex should have more than one incoming edge, prohibiting the occurrence of “dangling” edges. Such constraints stem from the fact that in these applications wireframes actually represent the boundary surfaces of physically realisable objects. Because in the case of wireframes cycles must occur (indeed, each edge must be part of a cycle), wireframes are graphlike representations (see Fig. 6). An example of a mixed representation can be found in molecular modeling. Here, points represent the locition of atoms and line segments the interatomic bonds. The “backbone” of a complicated linear molecule is
FIGURE 7. Contour stack representation (reconstructedsnail reproductive organ).
RECENT ADVANCES IN 3D DISPLAY
103
often visualized by means of a chain of straight segments, possibly with branches (stick figure). On the other hand, such a model may contain cycles, for instance when a benzene group is present in the molecule (wireframe). The contour stack is an example of a representation with a more hierarchical structure (see Fig. 7). In fact, a contour stack encompasses OD, 1D and 2D descriptive elements. At the lowest level of the hierarchy the descriptors are points. Within a plane, points are chained by means of straight line segments into closed contours. Sets of nested inner and outer contours implicitly represent a cross-sectional surface through the object described. At the highest level, the parallel planar contours are stacked in the third dimension. Contours must be simple; i.e., not self-intersecting. (After all, the contour stack is supposed to represent a physical object). A cross-sectional plane may contain several, possibly nested, contours. The surface area enclosed by a contour (or by nested outer and inner contours) can be used to perform hidden line removal when displaying the contour stack model. In the description of the contour stack given previously, all real-valued contour points are stored explicitly. This requires a large storage space. The amount of storage needed can be dramatically lowered in the following way: when a two-dimensional grid is defined on the planes, a contour can be efficiently represented by means of a chain or crack code. This eliminates the need to store all contour point coordinates. Instead, the coordinates of only one starting point are represented explicitly. The other points are represented as a sequence of steps from this starting point of the contour. When there are four possible steps directions (up, down, left, right), each step can be encoded in 2 bits (crack code). When diagonal steps are allowed as well, 3 bits are required to encode each step (chain code). It could be argued that, because of the chain coded contours, this version of the contour stack is a mix between a continuous and a discrete representation. 3 2-Dimensional Primitives A common way of representing a 3D object is by means of its bounding surface. A surface description closely corresponds to the way an object is perceived in the real world; i.e., via the interaction of light with the visible surfaces. The boundary surface of a sampled real-world object can be complex, both topologically and geometrically. In such a case, the surface can be approximated by tesselating it with simple two-dimensional elements (see Fig. 8). The simplest form of planar polygon is the triangle, which can be described by the coordinates of its vertices. Optionally, the equation ax + by cz d = 0 of the plane that embeds the triangle may be included, although this is redundant since it may be derived from the vertex coordinate values.
+ +
I04
D. P. HUIJSMANS AND G. J. JENSE
FIGURE 8. Surface tesselation with triangles (laser range meter data).
For higher-order planar polygons, the number of vertices may usually vary and we can no longer use an array to store them. A circular list is used instead, which has the additional advantage of implicitly representing the edges between successive vertices. A more compact representation of a 3D surface can be achieved by using curved surface elements. These can for instance be Btzier or B-spline patches, depending on whether we want first- or second-order continuity at the joining edges of the patches. The representation of a curved surface patch contains the coordinates of the control points: When visualization of the model is the only objective, merely storing all elementary surface elements is all that is basically required. However, this does not provide a very useful representation when the 3D object will be subject to further analysis. The idea is to represent the object in terms of its “skin,” so information about the connectivity of the primitive surface elements will have to be stored as well. These considerations involve concepts from topology (e.g., what is a valid surface representation of a 3D “solid” object, will a representation still be valid when a number of
RECENT ADVANCES IN 3D DISPLAY
f
105
fb
2
d f
2 FIGURH
9. CSG representation
surface elements is added or removed, etc.). For a rigorous treatment of these issues we refer to the literature on geometric modeling (Mantyla, 1988). 4. 3-Dimensional Primitives
In volume-based representations the primitive elements have extent in all three spatial dimensions. There are two basic approaches in describing an object’s volume: by constructing it from basic building blocks or by decomposing it in smaller primitive volumes. The two are more or less complementary. However, the way in which volume elements are combined may be more complex in constructive descriptions. Also, a constructive description is more natural when dealing with synthetic objects, while a decomposition is more appropriate for sampled objects. The constructive approach is seen in computer-aided design systems based on constructive solid geometry ( C S G ) ,as in Fig. 9. The repertoire of building blocks usually consists of simple geometric forms such as cubes, spheres and cylinders that can be combined pairwise using set operations (union, difference, intersection). The resulting objects can in turn be combined with other (basic or composite) objects. The overall structure of the final object is described in a binary tree with primitive objects in the leaf nodes and set operations in the internal nodes.’ More on CSG representations can be found in Requicha’s standard review (1980). The representation of these basic primitives is a separate issue. In geometric modeling applications, they can be represented by, cg.. surface tesselations (Jansen, 1987). A voxel model representation would also be possible because of their inherent suitability for Boolean set operations.
106
D. P. HUIJSMANS AND G. J. JENSE
On the other hand an object’s volume may be decomposed in a collection of basic volume elements. The primitive elements in a decomposition are of the same type as cubes or convex polyhedra. There is only one “combination operator”: the entire object is simply the union of all primitive volume elements, For example, the object’s volume may be partitioned into convex polyhedra by recursive binary subdivision with arbitrarily oriented planes. The resulting data structure, a binary tree, is called BSP tree (binary space partitioning tree). It was originally designed to store a spatially sorted set of planar polygons (Fuchs, Kedem and Naylor, 1980; Fuchs, Abram and Grant, 1983). Thibault and Naylor (1987) describe how BSP trees can be used to represent 3D solid objects. We refer to these publications for discussions of storage cost and the complexity of algorithms that operate on BSP trees. The BSP tree is a particular good example of the duality between constructive and decomposition models: on the one hand, it partitions a volume in convex subvolumes; but on the other hand, each subvolume can be thought of as being constructed by the intersection of all half-spaces encountered on a path from the root of the tree to the subvolume’s node (either internal or leaf). In Section IV we treat the BSP tree more extensively. In particular, we use it as an auxiliary data structure to perform spatial selections on voxel models. A major disadvantage of voxel models is that they tend to consume large amounts of memory because information on every elementary volume cell is stored. This drawback can be alleviated by exploiting spatial coherency of the object; that is, by adjusting voxel size to detail size. This can be achieved by subdividing the object’s volume adaptively. A well-known adaptive subdivision scheme is the octree (Jackins and Tanimoto, 1980), shown in Fig. 10. Three-dimensional space i s recursively subdivided into octants. If an octant is found to lie entirely inside or outside the object its value is marked and subdivision stops, if not the octant is further subdivided. This process continues until the level of individual volume cells (voxels) is reached. The object can now be represented by the tree structure that describes the spatial subdivision. Unfortunately, because it relies on the occurrence of large homogeneous areas in the voxel data, octree encoding is of limited use in the case of gray-value voxel models. Octrees have also been used for spatially presorting other kinds of data, such as line segments in polygonal maps and surface elements in solid modeling systems (Samet, 1990a; 1990b).
C. Conversions As a single representation generally will not fit all requirements of a given
application equally well, conversion from one representation to another will
RECENT ADVANCES IN 3D DISPLAY
107
FIGURE 10. Octree representation: (a) octant labeling, (b) object, (c) corresponding octree (black and gray nodes only).
sometimes be required. This section deals with conversion algorithms and their properties. Several classes of conversion algorithms can be distinguished: Conversion at the OD level between continuous (real-valued) and discrete (integer-valued) representations. Conversion from OD discreet or continuous representations to higherdimensional (continuous or discrete) representations: these conversions constitute the 3D equivalents of 2D image analysis. Conversions from higher-dimensional continuous to OD discreete representations: this can be called 3 0 scun conversion due to the analogy with traditional scan conversion algorithms in computer graphics. Conversion from a contour stack representation to a surface description has traditionally been subject of research for 3D reconstruction purposes. This is due to the fact that data input procedures of reconstruction systems are usually based on manual contour tracing on photographs of microscopic slices. Each of these classes is dealt with in more detail in the following subsections. 1. Conversion Between OD Representations
When compared to the other categories, conversions between different OD representations are almost trivial. Converting from OD continuous representations to voxel models can be accomplished most easily by resampling the original data and interpolating to obtain the new data values at the grid points. The accuracy of the conversion then depends on the choice of the interpolation method. An example is the interpolation of gray values
108
D. P. HUIJSMANS AND G. J . JENSE
between adjacent slices in a system for 3D reconstruction from serial slices. Often, the interslice distance is much larger than the distance between pixels within a slice (Huijsmans et a f . , 1986).
2. Conversion from OD Representations Huijsmans (1983) outlines algorithms to connect manually traced contour points into chain-coded contours. Special attention is given to obtain welldefined, i.e., reasonably smooth and closed, contours. This involves connecting starting and end points across gaps, correcting overshooting of the end point past the starting point, and detecting and removing local double points from the contour. Veltkamp (Veltkamp 1991) has described a method to obtain the boundary of a set of points that describes the “shape” of the point set. A data structure, called the y-neighborhood graph, is used to represent geometric information about the point set. The result is a boundary surface description in terms of triangles. Extraction of a discrete surface description is the aim of the algorithm that has first been described by Artzy et a f . (1981) and later in an improved version by Gordon and Udupa (1989). Their method is an example of the “volume cell view” of voxels. Input consists of a binary voxel model, while the output is a list of voxel faces that lie on the boundary of the object; i.e., that separate object from the background. The “marching cubes” algorithm (Lorensen and Cline, 1987) constructs a surface tesselation of tiny triangles from a gray-value voxel model. It operates by examining the gray values of all “cubes” of eight adjacent voxels and determining the location of an isogray-value surface element within each “cube.” Note that this algorithm is best understood by taking the “grid point” view of voxel models. 3. Conversion to OD Representations Of special interest in this category are the conversions from higher-dimensional continuous representations to voxel models. In principle, this could be accomplished by “sampling” the continuous representation on a regular 3D grid, but this would be a very inefficient approach. A better way is to employ the spatial coherency of continuous representations and perform 3 0 scan conversion. Kaufman (1986; 1987a; 1987b) has developed efficient algorithms for the conversion of lines, polygons, polyhedra, parametric curves and surfaces, etc., to voxel models. The conversion of CSG trees to voxel models can be accomplished by 3D scan converting the primitive descriptors in the tree’s leaf nodes and combining them with bitwise logical operations (Jackel, 1988). Similarly,
RECENT ADVANCES IN 3D DISPLAY
109
converting from BSP trees to voxel models involves a 3D scan converting the object’s constituting polyhedral cells. An algorithm for 3D scan conversion of convex polyhedra and its use in the display of subdivided voxel models will be described in Section IV.
4. Conversion from Contour Stacks As stated previously, conversion from contour stack representations have traditionally been of special interest to researchers in 3D reconstruction work. Application areas for 3D reconstruction are, e.g., medical diagnostics, anatomy and embryology. For the construction of a triangulated surface from a contour pile the algorithms take pairs of contours lying on adjacent parallel slices through the object as input and produce a “band” of triangles, each with two vertices on one contour and the third vertex on the second contour. The algorithms of Keppel (1975); Fuchs, Kedem and Usselton (1977); and Christiansen and Sederberg ( 1978) all have difficulties handling topologically complex cases. Such cases arise, for instance, when a given-cross sectional slice contains one contour, while the next slice contains two (or more) contours. For a recent discussion of heuristic solutions. see Ekoule, Peyrin and Odet ( 199I ) . Boissonnat ( 1988) describes an algorithm for shape reconstruction from contour data that is capable of handling such cases automatically. However, it employs a volume-based technique: starting with the 3D Delauney triangulation (or, more accurately: the Delauney tetruhedralizution) of the contour points, a shape representation in terms of tetrahedral primitives is constructed. For the conversion of contour piles into labeled voxel models the previously cited article by Huijsmans (1983) describes an algorithm that takes well-defined sets of nested outer and inner contours as input and produces a “scanline” representation of the area enclosed within these contours. This is the most important step in the slice-by-slice conversion of a contour stack description to a voxel model.
D. Discussion Now that the various representation schemes and conversion algorithms have been described, a “reasonable” choice has to be made for the representation of 3D irregular objects, for purposes of visualization. Our taxonomy is depicted in Fig. 1 1 . The boxes denoting the cases of discrete, higher-dimensional descriptive elements have been separated to indicate the notion that for discrete representations, higher dimensionality
110
D. P. HUIJSMANS AND G . J . JENSE
CONTINUOUS
OD
p o i n t cloud scalar f i e l d
D IS CRE TE
v o x e l model
FIGURE 11. Representation taxonomy
is present only implicitly, and not in explicit form. This figure also illustrates the idea that the OD level continuous and discrete representations are closely related. First, the availability of real-world data is important. It turns out that most data acquisition equipment favors voxel models, because the perform either direct 3D sampling or yield data as series of cross-sectional slices (intermediate images). Programs for analyzing data from physics experiments and simulation programs are typical sources for scalar field data. Examples of sampled data that can conveniently be represented as point clouds are the results of simulation programs for particle kinematics, measurements of tracer chemical diffusion for medical diagnostic purposes, and measurements from equipment such as laser range meters. Some 3D reconstruction systems, as used in anatomical research, employ manual input procedures and produce data in the form of contour stacks. However, it can be argued that this already is segmented and classified data applied to the original photographic images and the resulting contour stack is not “raw” input for direct visualization but a labeled voxel model. Generally, higher-order representations are the result of some image analysis step, performed on a OD discrete representation (image or voxel model). Storage cost for point cloud models is high because the coordinates of each data point are represented explicitly. Because the coordinate values for scalar field data and voxel models can be represented more implicitly, they do slightly better in this area, but they still require large amounts of storage. Higher-order representations hold more implicit spatial information (or alternately, have more expressive power). For instance, all the points of a triangular surface eIement are implied by the coordinates of the three corner vertices. There also is a tradeoff between the conceptual simplicity of the
RECENT ADVANCES IN 3D DISPLAY
111
descriptive elements (i.e., triangles are simpler than B-spline patches) and the number of them (it takes more triangles than B-spline patches to accurately represent a curved surface). Recently, techniques have been developed for the display of sampled objects directly from their original scalar field or voxel model representation. Conceptually, these display algorithms are simple, although display times can be high due to the amount of data involved. Display complexity for stick figures is low as it involves only the viewing transformation and projection of the vertices, followed by scan conversion of the line segments4 When a wire frame is used to represent an object’s surface in terms of the edges of the surface elements, hidden line elimination must be performed and display complexity increases. The same holds for the contour stack representation. A boundary surface description of a 3D object is especially appropriate when generation of high-quality shaded images of the object is important. Current computer graphics hardware systems allow the real-time display of surfaces, made with tens of thousands of surface elements, with hidden surface removal and sophisticated Gouraud or Phong shading. In the case of 3D descriptive elements, e.g., CSG, BSP tree, or octree representations, the higher-dimensional information in the data structure may be employed to reduce the number of surface elements that are processed at display time. Spatial selection and editing operations are readily performed on scalar field data and voxel models because the descriptive elements are sparidly presorted. Representations based on 3D descriptive elements (like CSG, BSP tree and octree representations) are also well suited for spatial selection and editing. Spatial selection and editing with all other representations require searching and sorting operations that are more time consuming. For instance, in the case of surface representations for 3D solid modeling, the removal or addition of elements usually involves extensive checking of the data structure to ensure the validity of the resulting representation. To cut down on this overhead, operations on boundary surface representations are often designed to be inherently “validity preserving.” See Mantyla (1988) for a detailed treatment of this matter. Generally, higher-order representations are suited for extraction of those numerical parameters that are related to their dimensionality. For instance, stick figures and wireframes allow easy measurement of linear distances, while surface representations are preferred for computation of surface area. This also applies to point clouds, because coordinate values are repreAlthough they have now largely been replaced by raster-scan displays. vector refreshing displays are especially suited for the display of stick figures and wireframes, allowing real-time rotation to provide the illusion of depth.
112
D . P. HUIJSMANS AND G. J . JENSE TABLE 1. SUMMARY OF REPRESENTATIONS Data av.
0-D Point cloud Scalar field Voxel model
Storage
+ +
Editing
Quant.
+
-
N
N
+
N
1-D Stick figure Wireframe Contour stack
+ + +
2-D Triangles Polygons Curved patches
+
3-D CSG tree BSP tree Octree
Display
+ +
+ N
N
+ +
+ +
N
N
+ +
N N
N
N
N
+
N
N
+ +
+
+ + +
N N
? -.
sented explicitly here. Scalar fields and voxel models are well suited for numeric quantifications. For instance, computation of volume and surface related parameters usually involves counting the number of voxels that meet certain constraints (such as belonging to the object, or, having one or more neighbors that do not belong to the object). This presumes that the original volume data has already been segmented or can be classified with enough accuracy. The properties of the various representations are summarized in Table I. The column heads are the properties of representations, mentioned in Section I1 .B: data availability, storage requirements, display complexity, and suitability for spatial editing and numeric quantijication. The symbols +, N ,and - denote the, rather informal, qualifications “good,” “average” and “bad.” From what has been presented so far the following may be observed: 0
0
The distinction between scalar field (continuous OD) and voxel model (discrete OD) representations is of minor importance. The real distinctions are those between OD representations and those of higher dimensionality and between gray-valued and labeled voxel models. For display purposes, a surface description would be most advantageous, mainly because most modern display hardware can readily handle such representations. However, surface descriptions must always be constructed from another representation. The surface
RECENT ADVANCES IN 3D DISPLAY
0
0
0
0
0
1 I3
construction algorithms involved are time consuming and cannot always handle complex 3D shapes. Display algorithms that operate directly from the OD representation are available, although display complexity is rather high (see the next Section). Spatial selection and editing is most easily done on the original volume data and on 3D representations. Conversions from other representations to voxel models (3D scan conversion) are available; sampled and synthetic data sets can therefore be combined. Numerical analysis (extraction and quantification of parameters such as surface area, volume and tensor of inertia) may also be performed directly on voxel models. Extraction of higher-order representations (3D image analysis) remains a (longer-term) goal.
Therefore, in the remainder of this chapter we will concentrate on voxel based methods; i.e., methods that operate directly on the (more or less “raw”) voxel data, for the interactive visual inspection of volume data sets. Such an exploration may serve as a first step. or preparation towards later 3D image analysis.
111. VOXEL-BASED DISPLAY METHODS
The information contained in a three-dimensional density distribution is often too complex to be grasped from one view direction or from a single display method. Our visual system can deal better with scenes that consist of opaque surfaces than with a completely transparent world. We are used to inferring three-dimensional characteristics from opaque objects by cutting pieces away and by looking at it from all sides. A real three-dimensional density distribution hardly ever appears in reality (an aquarium with transparent distinctly colored liquids would be such an exception). Therefore one must be able to look at the voxel model from all sides, at different magnifications and in all the mentioned forms real-world phenomena present itself to our eyes. In our view a visual inspection tool would have to offer the following support: arbitrary geometric transformations, spatial selection (cutaway views), intersection views, opaque surface views and true 3D; i.e., transparent views. One must also be able to combine a number of different viewing modes in one image. Most of these presentations would still contain three-dimensional
114
D. P. HUIJSMANS AND G. J . JENSE
information but the computer generation of truly three-dimensional images, e g , by means of holography, is at this time not really feasible (Owczarczyk and Owczarczyk, 1990). The visualization of voxel models therefore requires generation of a two-dimensional projection image on a computer display screen. This involves a projection from three-dimensional object space to two-dimensional image space. A range of drawing techniques called depth cues may be employed to add a three-dimensional impression to the image on the screen. Some of these are 0
0 0
Hidden line/surface/volume removal. Simultaneous views from different directions. Perspective projection. Surface shading. Casting of shadows. Shadowing. Depth shading or atmospheric perspective. Stereo display. Transparency. Continuous rotation.
The more irregular and complex biological structures are, the more additional depth cues will be needed to create an unambiguous three-dimensional illusion. Rotation is particularly effective, but requires that images can be generated at a rate of at least seven per second. The large amount of data in a voxel model makes this demand difficult to meet. Two main approaches for image generation can be followed. The first method scans 3D object space (the voxel coordinates), transforms them and projects (a function of) their values onto the 2D image. This approach, called voxel projection, is a generalization of pixel carryover in image processing. The second method starts from 2D image space; the screen is located in the desired view direction, rays are cast through all pixels and the intersection of each ray with the voxel model determines which value is given to the associated screen pixel. This approach is called ray casting and in the display of opaque surfaces generalizes pixel filling in image processing. Both approaches can be used to render opaque surfaces. Transparent volumes can be rendered as well, provided the output voxel’s depth coordinates come out sorted along the lines of sight. Both approaches have been implemented in our experimental exploration system to evaluate their relative strengths. The size of volume data sets is such that conventional display methods, based on a standard graphics pipeline (e.g., Foley and Van Dam, 1982) are
RECENT ADVANCES IN 3D DISPLAY
1 I5
usually incapable of generating an image within an acceptable time for interactive exploration. Most CAD display algorithms are based on surface descriptions of 3D objects such as surface tesselations with triangular (or generally, polygonal) patches. To fit the visualization of volume data into these CAD renderers, the required surface description (for instance an isosurface) must first be extracted from the volume data set in a separate preprocessing step (Artzy et al., 1981; Gordon and Udupa, 1989; Lorensen and Cline, 1987). Although the extracted and tiled surface can then be viewed from all sides at animation speed, changing the isosurface value is a slow procedure not integrated with the display pipeline, which means the researcher has to wait until the filer has generated the new surface description. Generating a 2D image directly from the volume data seems to be a better way. This approach integrates selection, extraction, classification and rendering operations in one pass, allowing the explorer to quickly change each of them. At present, research groups all over the world are developing methods for volume visualization (Upson, 1989; Kaufman, 1991; Herr, 1989). This has resulted in a number of volume rendering methods; i.e., algorithms that display particular features of interest directly from volume data sets. A . Sampling Aspects of' Voxel Models
Before we go further into voxel-based algorithms, some basic concepts are introduced that will be used in the rest of this section. In Section I1 it has been shown that there are two dual interpretations of voxel models: 1. A voxel may be interpreted as the average value of the sampled physical
quantity in an elementary volume cell centered around the sampling point (most types of scanners). These extended voxels are the threedimensional equivalent of pixels. Results from two-dimensional image processing may easily be carried over to 3D for these extended voxels. 2. A voxel may also be thought of as the value of some continuous scalar function of three spatial variables sampled at a specific point in 3D space (as in computational fluid dynamics). In this approach voxels are more naturally viewed as point samples. Results from signal processing may be generalized to 3D for these point voxels. Although the choice of interpretation is a fundamental one, methods developed for either type of voxels are often used interchangeably. Once obtained both volume samples are described by a scalar value per grid element into a three-dimensional array and are indistinguishable. Algorithms treat the voxel values sometimes as point samples and sometimes as average cell
116
D. P. HUIJSMANS AND G. J . JENSE
values. From now on, we will use the term voxel model for both interpretations unless the situation requires otherwise. Voxel models are assumed to be stored as 3D arrays of values. The spatial information therefore is only implicitly coded; only the voxel value is explicitly recorded, saving a lot of memory. A voxel may then be addressed by its indices ( i ,j , k ) . When the sampling distances along the coordinate axes are S,, S y , 6,, respectively, a voxel’s world space coordinates are, implicitly, ( x , y , z ) = (iSx, j S Y , k6,). Unless otherwise indicated it will be assumed that voxels represent cubic regions of space; i.e., 6, = by = 6,= 6.
I . Adjacency An important notion is that of neighborhood, or adjacency. When considered as volume cells, voxels can be adjacent to one another across aface, edge or vertex. More formally, when no more than n coordinates of two voxels differ by 1, the voxels are called n adjacent. A voxel can have three kinds of neighbors: 6 that are 1-adjacent (across a face), 18 that are 2-adjacent (across a face or an edge), and 26 that are 3-adjacent (across a face, an edge, or a vertex). These neighboring voxels are called the 6-, 18-, and 26-neighbors, respectively (Fig. 12). The exact type of adjacency is important in many cases like morphological operations and the estimation gradient directions.
B. Geometrical Aspects 1. From Object to Image: Voxel Projection
Algorithms in this class project geometrically transformed voxels from 3D object space onto screen pixels in 2D image space while solving the hidden volume problem, a process sometimes referred to as forward mapping (Westover, 1989). In principle voxels can be projected either unsorted on their new depth coordinate or sorted. When transformed voxels arrive with
FIGURE
12. 6 - , 18-, and 26-neighbors.
RECENT ADVANCES IN 3D DISPLAY
117
depth coordinates in a random order, an extra so-called Z buffer is needed to solve the hidden volume problem. No such extra support is needed if the transformed voxels can be delivered in such a way that along lines of sight their order of arrival is either from back to front or front to back. In a backto-front arrival order the hidden volume problem is solved by overwriting current pixel values (temporal priority or painters algorithm). In a front-toback arrival order an extra bit mask would have to be kept to decide whether the output image pixel has already been addressed. The frontto-back method is worth the extra effort of testing against a bit mask only when enough rendering calculations can be avoided. Both forward mapping approaches have been tested. This involves the following steps: 1 . Calculation of a viewing transformation matrix. 2. Scanning the voxel array in some sequence. 3. Selecting which voxels are to be projected. 4. Applying the viewing transformation matrix to the coordinates of all selected voxels. 5. Computing the intensity of the pixels corresponding to the projected voxels (rendering).
Because it is common to all voxel projection algorithms, the viewing transformation will be described first. u. The Vievt!ing Trun.formution. The viewing transformation specifies the orientation of the voxel model with regard to the observer. Voxel positions are given in a right-handed world-coordinate system called object space. An object space coordinate triple will be denoted by (x. y , 2 ) . The viewing transformation transforms these world coordinates into image space, which is also right handed and three-dimensional. Image space coordinates will be denoted by triples ( u , v , w).Screen space is obtained by projection (setting depth w to 0). Row vectors will be used throughout. Coordinates are transformed by premultiplying them with the corresponding transformation matrix. Figure 13 shows the relationship between object space and image space. The viewing transformation consists of a rotation R about an axis through the center of the voxel model and a scaling S regarding that center. The rotation is composed of three separate rotations about the coordinate axes, R,, R , , R,. The concatenated viewing transformation matrix M is computed from the following steps:
D. P. HUIJSMANS A N D G . J . JENSE
118
FIGURE13. (a) Object space and image space; (b) The three rotations.
1. A translation To to align the center of the voxel model with the object
space origin
where X,,,, Y,,, and Z,,, are the sizes of the voxel model in the X , Y and Z directions, respectively. 2. A scaling S with factor sx = s y = sz = s relative to the origin (which now coincides with the voxel model's center)
3. The rotations of angles a , p and y about the X , Y and Z axes
RECENT ADVANCES IN 3 D DISPLAY
1 I9
(4)
[-:’ ‘7 01 cosy
R,(y) =
sin?
0 0
Note that positive angles denote counterclockwise rotations (when looking from a positive axis towards the origin.) 4. Finally, a translation in image space, T,, to center the transformed voxel model in the viewport (with sizes Urnaxand Vmax).Additionally, this translation ensures that depth values, i.e., w coordinates, are positive 1 0 0 0 1 0 0 0 TI = 0 0 1 0 urnax12 Vrnax12 DmaxI2 where the maximum depth value is given by
1
The concatenated transformation matrix is computed by multiplying the preceding matrices together:
M = To*S*RZ*Rx*Ry.TI (8) The order in which the rotations are specified, first R Z , then R x , and finally R y ,is chosen to achieve a natural way of manipulating an object on the screen (Frieder et al., 1985). After transformation from world to image space coordinates, the voxels are projected on the screen (a plane in image space). This is done by simply “throwing away” the w coordinate (the new depth coordinate). Orthographic projection is faster than perspective projection and the effect of perspective projection is limited in the case of irregular objects because of the lack of straight features. (Only when the object rotates on the screen is a perspective projection required.) Note that
D. P. HUIJSMANS AND G . J . JENSE
120
the projection matrix 1 0 0 0 0 1 0 0
Portho =
[o
o)
(9)
0 0 0 1
cannot be “multiplied into” the concatenated viewing transformation matrix. This would simplify calculation of the projected screen space coordinates (u, u)by eliminating several elements from M , but the depth coordinate w may be needed for hidden feature elimination (when using Z-buffer algorithms) and during the rendering stage (see the section on surface shading). b. Fixed Incrementul Scanning Order. If all voxels were transformed in a random order, each transformation would have to be computed explicitly by means of a vector-matrix multiplication: (u, Vl w , 1) = ( x , y , z , 1) ’ M
(10)
This would require nine multiplications and nine additions for each voxel. By scanning the voxel array in a regular order, incrementing one coordinate at a time, the viewing transformation may be computed incrementally by using table look-up operations (Frieder et al., 1985; Trivedi, 1986). Writing out the preceding equation for the individual coordinates we obtain
+ Mloy + M20Z + M30 = M0lx + Mlly + M21Z $- M3I w = M02x + M12y + M27.z + M32 u =
Moox
‘U
(11)
Since the coordinates x , y and z can assume integer values only between 0 and some maximum parts of the transformation formulas may be precomputed and stored in tables Lkl, with k E { X , Y , Z } , and I E (0, 1, 2} =
L X l ( X ) = MOIX,
LYO(Y) = MlOYl
L Y l ( Y ) = MllY,
LXO(X)
LZO(z)
= M20z
Lx2(x) =
+ M30?
(0 5 x < X,,,)
M02X
(0 I Y < Yrn,,)
LY2(Y) = M12y LZ2(z) = M22Z
LZl(z) = M21Z f M31r
+ M32
(0 5
Z
<
(12)
z,,,)
The voxels are now scanned in an ordered sequence by an algorithm that
RECENT ADVANCES IN 3D DlSPLAY
121
contains three nested loops. Within each loop, part of the viewing transformation is calculated by simple look-up tables and additions. The advantage of performing the viewing transformation via look-up tables is an enormous reduction of the number of arithmetic operations. When the transformation is calculated by full matrix multiplication, it requires ,' A x Z,,, x 9 multiplications. For a 128' voxel model that would be approximately 2 x lo7. Using the look-up table approach brings the number of multiplications down to 3 x X,,, + 3 x Y,,, + 3 x Z,,,, or slightly more than lo', a reduction of lo4! The disadvantage of scanning the voxels in a fixed, but regular, sequence is that it may require the depth sorting of transformed voxel coordinates for removal of hidden features. The easiest way to do this is to use a depth or Z buffer (Foley and Van Dam, 1982) to decide which voxel is nearest to the viewer. In Section 1II.D timing tests are given for this forward mapping, using a 2 buffer to suppress further rendering calculations in case the transformed voxel turns out to be invisible. The extra storage requirements for this can also be avoided, as will be shown in the following section. c. Slice-by-Slice, Buck-to-Front Scanning Order. The slice-by-slice, back-to-front algorithm (Frieder rt ul., 1985) also scans the voxel array in a regular sequence; however, the directions of scanning along the axes are not fixed but depend upon the location of the viewpoint with regard to the object's origin (see Fig. 14). The directions of scanning are chosen in such a way that output voxels that end up closer to the screen are projected later than those that end up further away. This sequence guarantees that, although the voxels are not projected in a strictly depth-sorted order, voxels
F I G U R E 14. The slice-by-slice, back-to-front algorithm: numbers indicate the scanning order for a correct back-to-front projection on the U V plane.
122
D. P. HUIJSMANS A N D G . J. JENSE
closer to the screen overwrite previously projected voxels that project on the same screen location. The need for a (hardware or software) Z buffer is thereby eliminated. The fact that the outer loop is across slices has an additional advantage. When the voxel model is stored on disc, as is often the case in, e.g., medical applications, the slices (images forming the voxel model ) may be sequentially read into memory, which is substantially faster than random disc access. Depending upon the view direction slices are scanned from low to high or vice versa.
d. Recursive Back-to-Front Scanning Order. The slice-by-slice scanning order is not the only one that yields a back-to-front depth ordering of the voxels. In two articles that describe the hardware architecture of a voxel based display system (see also Section V), Goldwasser and Reynolds give the outlines of an algorithm for a recursive back-to-front scanning order (Goldwasser, 1984; Goldwasser and Reynolds, 1987). One way of describing the recursive back-to-front algorithm is to view the voxel model as a complete octree. The post-order, or depth-first, traversal of the octree recursively visits the octants and projects the octants at the lowest level in the tree (i.e., the individual voxels) on the screen. The order in which the octants at any level in the tree are visited is determined by the position of the viewpoint with respect to the origin of the world coordinate system (Fig. 15). Of course, the visiting order is the same for all levels. As in the slice-byslice (SBS) BTF algorithm, the regular incremental scanning order is used to simplify the calculation of screen space coordinates. Look-up tables are used to store the octant visiting order and the transformed screen space coordinates. Unlike in the SBS BTF algorithm, only the screen space
(a)
(b)
FIGURE 15. The recursive back-to-front algorithm: (a) small numbers are the octant labels, while large numbers indicate the back-to-front scanning order; (b) the “unit” cube.
RECENT ADVANCES IN 3D DISPLAY
123
coordinates of the transformed corner vertices of a unit cube are precomputed and stored. The screen space coordinates of any octant can then be computed incrementally from these “reference values” using multiplication and addition operations. Two look-up (or sequence control) tables are used, called SCTf and SCT2. The entries in SCTl determine the visiting order of the octants, while SCT2 stores the transformed unit cube vertices. The algorithm consists of the following steps: 1. Compute the entries of SCT2 by applying the viewing transformation to the eight vertices of a “unit” cube. 2 . Sort these entries by decreasing M’ coordinate. 3. The depth ordering of the vertices thus obtained gives rise to a permutation of the octant labels. This permutation is stored in the entries of SCTl. 4. Recursively visit the octants, using SCTl and compute the screen space coordinates of each transformed octant from SCT2.
Several things should be noted about this algorithm: 0
0
0
Octree encoding of gray-valued voxel models are usually very inefficient. Prime candidates for octree encoding are labeled voxel models, binary voxel models in particular. It is assumed that the size of the voxel model is a power of 2. In that case, the multiplication by s and division by 2 can be replaced by more efficient shift operations.
e. Slice-by-Slice, Front-to-Rack Scanning Order. The bit mask needed to check whether a pixel has already been addressed by a projected voxel nearer to the screen can be used to save rendering calculations for voxels covered by those nearer to the screen. The Z buffer from the fixed scanning order forward mapping can be used for the bit mask or the bit mask can be integrated with the screen, for instance by setting an initial background. .f: The Djwamic Screen Algorithm. The algorithms treated so far select voxels for projection on the screen (segment the gray value data) by, e.g., thresholding during the scanning of the gray-value voxel model. Binary voxel models, the simplest labeled voxel models, that are produced as the result of some separate preprocessing step, may be efficiently displayed by the dynamic screen algorithm (Reynolds, Gordon and Chen, 1987). Briefly, the algorithm works as follows: 0
The binary voxel model is run-length encoded, in one-dimensional runs, parallel to the X axis,
124
0
D. P. HUIJSMANS AND G. J. JENSE
The viewing transformation contains scaling and rotations about X and Y axes; 2 rotation is done afterwards on the projected “preimage,” Runs, parallel to the object space X axis, are parallel to the screen space U axis after transformation, The voxel model is scanned in front-to-back order, A dynamic data structure (the dynamic screen) maintains the parts of the screen that are still unpainted, Transformed runs are merged with the horizontal strips in the bucket lists of the dynamic screen data structure.
This algorithm is suitable only for the display of binary voxel models in a specific format and may well be used to quickly render a binary skeleton of the gray-value voxel model during interactive rotation, see Section V1.A.
g. Affected Screen Pixels in Forward Rendering. Forward-mapping methods can easily miss screen pixels if one applies a one-to-one correspondence between projected voxels and screen pixels; especially when zooming in. To ensure the projected image remains connected, neighboring pixels of those hit should be addressed as well. Depending upon the orientation and the scaling factor, a minimum extent of the projected voxels can be calculated. In our implementations we just copied the calculated pixel value to its n x n neighborhood. For better quality pictures one may use the “splatting” approach advocated by Turner-Whitted (Herr, 1989). 2. From Image to Object: Ray Casting
Instead of projecting voxels from object space to screen space, backwardmapping algorithms operate the other way around: for each screen pixel, they determine which voxels project onto them. This is usually accomplished by some type of ray-casting algorithm: rays are fired from the view direction or viewpoint through the screen pixels toward the volume model. A search is made along each ray and some function of the voxel values along the ray determine’s the pixel’s color. In interactive exploration the name ray casting is used in order to make the distinction with ray tracing, a well-known technique for the rendering of highly realistic images from synthetic objects or scenes. The term tracing refers to the fact that rays are traced as they reflect and refract at surfaces, spawning secondary rays (Glassner, 1989). In ray casting, rays are followed in straight lines from the observer’s viewpoint only until they leave the object scene. In terms of simulating lighting conditions this means that no reflection or refraction, but only absorption, is taken into account. True ray tracing is computationally expensive, while the resulting effects, such as
RECENT ADVANCES IN 3D DISPLAY
125
FIGURE 16. Volume ray casting.
surface (inter)reflection, highlights, shadowing, etc., are of secondary importance for our purposes. For the inspection of volume data, the relevant objects to be visualized are surfaces (either opaque or semitransparent) and transparent layers. Ray casting (or zero-order ray tracing as it is sometimes called) suffices for explorative interactive inspection (see Fig. 16). The volume ray casting process consists of several steps: 1. The generation of rays in screen space and the transformation to object space. 2. Computing the hither and yon intersection points of the rays with the voxel model’s boundaries (ray-bounding-box intersection). 3. Determining which voxels are intersected by the rays (voxel traversal). 4. Calculating the pixel values from the values of the intersected voxels (rendering).
The ray generation process, the ray-bounding-box intersection calculation and algorithms for voxel traversal are described more fully in the next paragraphs. Various rendering methods are treated in Section 1II.C. a . Ray Generation. Rays that originate at the viewpoint and pass through the projection plane (screen) are generated in a way that depends on the type of projection. In the case of perspective projection (viewpoint at a finite distance from the screen) the rays diverge and pass in different directions through object space (the voxel model). With parallel projection, all rays have the same direction. As was done in the section on voxel projection display methods, only parallel projection will be considered from now on. Because the rays that are cast through the voxel model originate in screen
126
D. P. HUIJSMANS AND G . J. JENSE
space, the first step in the process is the calculation of the origins and direction of the rays in object space. The transformation needed is the inverse of the viewing transformation used for the forward mapping, eq. (8), hence the name backward mapping. The origin of a ray in screen space is simply determined by the coordinates ( u , w) of the screen pixel from which the ray is fired. The direction vector, which is the same for all rays, has to be transformed only once. The resulting origin and direction vector of a ray in object space will be denoted by RorIgand Rd,,, respectively. b. Ray-Bounding-Box Intersection. An efficient algorithm for raybounding-box intersection testing is given by Haines (in Glassner, 1989). It is based on the method of slabs by Kay and Kajiya (1986). A slab is the space between a pair of parallel planes. The bounding box that encloses a voxel model may be transformed by the intersection of three, mutually perpendicular slabs. Let the minimum and maximum extent of the box be given by two vectors B,,, = (X,,,, Z,,,) and B,,, = (Xma,, Z,,,), and the points on the ray defined by R(t)
=
Rorig
+ t&rr
(13)
The ray-bounding box intersection test yields the value FALSE when the ray does not intersect the box, and TRUE when it does. In the latter case, the values t,,,, and tfUr contain the values of the parameter t for the near and far intersection points. The procedure computes the values t,,,, and tjUrfor each of the slabs. The final values are the maximum of the t,,,, values and the minimum of the tfUrvalues, respectively. When at the end t,,,, > tj,,, there is no intersection, otherwise the values can be substituted in eq. (13) to compute the near and far intersection points. Figure 17 shows two (two-dimensional) cases that
RECENT ADVANCES IN 3D DISPLAY
127
may occur: one in which there is no intersection, and one in which there IS.
c. Voxel Cube Traversal Algorithms. Once the entry and exit points of the rays on the boundary of the voxel model are known, the voxels along the rays have to be identified. The search for the relevant voxels starts at the entry point (the near intersection point) and continues until
1. An opaque surface is detected (full absorption), 2. The opacity along the ray saturates at some predetermined value (partial absorption), or 3. The exit point war intersection) is reached before either of these cases occurs. Because of the large number of rays that may be cast and the large number of voxels along each ray, the efficiency of the voxel traversal algorithm is paramount. Identifying the voxels “along” a ray is similar to the well-known scan conversion problem for two-dimensional straight lines in raster graphics. The problem comes down to determining which sequence of grid points best approximates a straight line segment, because a straight line segment, whose starting and end points lie on grid points, may fail to pass through any other grid point (Newman and Sproul, 1979). Three methods for voxel traversal will be described and compared: 1. Nearest neighbor interpolation: Stepping along the ray in small increments and taking the voxel value at each step of the voxel (grid point) closest to the ray. 2. Trilinear interpolation: Following the ray in a similar manner, but reconstructing values on the ray by interpolating trilinearly the values of the eight closest surrounding voxels (see Fig. 18). 3. Digital differential analyet-: Approximating the ray by either a 26- or 6-connected path of voxels using a 3D version of the well-known Bresenham line-drawing algorithm (Bresenham, 1965). Levoy (Levoy, 1988) describes his volume-rendering algorithm as a process of reconstruction and resampling of the object represented by the voxel model. The voxel model is resampled at evenly spaced locations along the rays, and the sample values are reconstructed by trilinear interpolation. In this context, the interpolation of samples from voxel values can be seen as a form of anti aliasing. Note that the trilinear interpolation method is the only method that cannot be used for labeled voxel models where the voxel values stand for discrete classification labels; in that case nearest neighbor, DDA or a rank-filtered value are acceptable interpolations.
128
D. P. HUIJSMANS AND G. J. JENSE
A sample p o s i t i o n 0 neighboring voxel s
.* . . a
.
6.. ....
FIGURE 18. Trilinear interpolation.
The following traversal algorithm is based on a three-dimensional digital dzfferential analyzer (DDA), but in a slightly modified form. It has been described by Fujimoto, Tanaka and Iwata (1986). A much simpler description of this algorithm has been given both by Amanatides and Woo (1987) and Cleary and Wyvill (1988). The algorithms originated as acceleration techniques for ray tracing synthesized objects, where a coarse grid of voxels is used to spatially presort the objects in order to speed up ray-object intersection tests. The modification of the standard DDA algorithm is necessary in order to identify all voxels that are intersected by a ray and not only those that lie close to it. In order to determine which voxel is intersected next by the ray, given a current voxel location, the algorithm maintains three running sums, dX, dY, and dZ, for the distances along the ray to the next crossings of voxel “walls” perpendicular to the X , Y and Z axis, respectively, see Fig. 19(a).The values with which these sums are repeatedly incremented are the distances along the ray between two successive wall crossings, deltaX, deltaY and deltaZ. Figure 19(b) shows the variables for the two-dimensional case. The decision of which wall to cross next is based on the comparison of the running sums. In the case shown, the step taken will be a horizontal one (dx < dy). Trousset and Schmitt (1987) mention in their article that they also use the Bresenham algorithm for voxel traversal. However, the position of the viewpoint in their system is limited to lie in the X Z plane, which allows them to employ a 2D Bresenham algorithm. A full 3D Bresenham algorithm is described later. This routine digitizes a straight line segment into a sequence of voxels that lie as close to the real line as possible, using only integer arithmetic. We start by computing the axis of greatest motion: the
R E C E N T ADVANCES IN 3 D DISPLAY
129
o voxel centers s i m p l e DDA m m o d i f i e d DDA
(a)
(b)
FIGURE 19. D D A voxel traversal algorithm: ( a ) identifying all voxels that are intersected by a ray; ( b ) variables from the algorithm.
driving axis. Next, we consider the two orthogonal projections of the line segment on the two planes that contain both the axis of greatest motion and one of the other coordinate axes. The digitization of the line segment in 3D space can now be performed as a simultaneous digitization of the two projected line segments each by a 2D Bresenham algorithm. In 3D space, the intersection of two planes is a line, but a description of a line in these terms is not unique. Infinitely many pairs of planes can intersect by a given line. Two particular planes are chosen from this set: those planes that contain the line segment and are orthogonal to one of the planes along both the axis of greatest motion, and one of the other two coordinate axes, i.e., two of the three planes x = 0, .v = 0 or z = 0, the third one being orthogonal to the driving axis. lntersection missed
0 voxel p a t h
(a)
i n t e r s e c t i o n f.ound
w
object voxel s
(b)
FIGURE 20. Voxel traversal by an adapted 3D Bresenham algorithm: (a) 26-connected path misses thin features; (b) 6-connected path.
130
D. P. HUIJSMANS AND G . J . JENSE
When this method is employed for surface detection one has to take care choosing the proper connectivity for the generated voxel sequence (see Fig. 20). If the generated voxel sequence is not 6-connected, it is possible to step through a 18- or 26-connected solid object without detecting a single object voxel. This might cause problems with very thin ( 1 voxel thick) surfaces. To avoid this problem, a 6-connected path of voxels should be generated, this guarantees that every voxel, pierced by the line segment, is in the general sequence. This is accomplished by simulating diagonal and double diagonal steps as a series of two or three steps, incrementing only one of the coordinates at a time.
C . Rendering Methods 1. Intersections Traditionally, researchers in medical and biological fields, have attempted to visualize complex 3D objects by rapidly displaying the volume data slice by slice, from back to front and vice versa, and mentally reconstruct the 3D features of interest. There are systems that use interactive video disks of anatomical cross-section atlases to get acquainted with slice anatomy. Reslicing a volume data set along an arbitrary direction, different from the original slicing planes and moving through the data set, provides additional information about the internal structure of the object in arbitrary directions (Huijsmans et a/., 1986). This method is especially effective when the plane can be interactively positioned and “steered” through the volume (Johnson and Mosher, 1989). It allows the user to connect structures in the changing image locally and thus mentally reconstruct the three-dimensional structure of the object. The generation of an image of the voxels that are intersected by a slicing plane through a volume data set containing N 3 voxels uses on the order of only N 2 voxels that have to be accessed, as opposed to about all N3 voxels for volume rendering algorithms. This makes slicing algorithms potentially fast and a prime candidate for interactive exploration systems. An algorithm for voxel mapping convex polygons will now be described. This algorithm will be used later in a spatial selection system for voxel models to help position intersection planes. Kaufman (1987a) gives an algorithm for more general polygons (simple, nonconvex). The algorithm is a 3D extension of well-known algorithms for scan conversion of 2D polygons, based on the exploitation of edge coherence. Such algorithms can be found in general computer graphics textbooks (Foley and Van Dam, 1982; Hearn and Baker, 1986).
RECENT ADVANCES IN 3D DISPLAY
131
current span scan - p l ane FIGURE 21. Voxel mapping a convex polygon.
For the explanation of the algorithm it is assumed that the projection area of the 3D polygon on the X Y plane is the largest (i.e., C is the largest coefficient in the polygon’s plane equation A x By Cz + D = 0). First the intersection with a scan plane is computed. As in the 2D case, this may be done incrementally, as the scan plane “sweeps” through the polygon, using the plane equation. The main steps that constitute the algorithms (see also Fig. 21) are
+ +
1. Sort the vertices of polygon p according to increasing Y coordinate; ~ (d~/dy)= ~ ~- B( /, C~; precompute the values ( a z / d ~ )=~- A~ / ~C , and 2. Sweep the scan plane from Y,,, to Ymax, 3. At a vertex, one of the active edges terminates, while another becomes active. 4. Between vertices, compute the current edge-scan plane intersections incrementally, using the active edge’s &lay and dx/dy values. 5. “Scan convert” the voxels on the current span, i.e., the line segment between the current edge-scan plane intersections, using the values ( a z l w p g u n and ( a z / W , u n .
It is obvious that the restriction to convex polygons leads to substantial simplifications in regard to the general case. Since only two edges can intersect the scan plane simultaneously at any given time, the current intersection of the scan plane with the polygon always consists of a single segment. Because of this, the maintenance of a dynamic active edge list is not required, but a simple fixed-size table with two entries per scan-plane position (starting and end point of the current span) suffices. Figure 22 shows two images with one or more voxel-mapped polygons. a. 3 0 Cutaway Views. By using multiple cutting planes the user can delimit part of the volume data and show the user the values of the 3D
132
D. P. HUIJSMANS A N D G. J. JENSE
FIGURE 22. Slicing: (a) a single slicing plane. (b) a multiplanar reformatting (combined with voxel projection).
RECENT ADVANCES IN 3D DISPLAY
133
density within the cutting planes. This method is known as multiplanar reprojection, or multiplanar reformutting. Examples are shown in Figs. 22 and 7 3 . 2 . Opaque Surface Rendering
The values of the voxels determine color and opacity values, which in their turn determine the color of the screen pixel onto which (possibly many) voxels project. For the display of opaque surfaces the value of each image pixel is determined by the value of a single voxel (and possibly its immediate neighbors). The voxel needed is that which satisfies the display constraints and ends up closest to the screen (the visible voxel). Visible surface detection in gray-value voxel models may be performed by, e.g., gray-level thresholding. Such a binary voxel classification effectively yields a surface rendering of the opaque volume, formed by all the voxels with values above the selected threshold. When a back-to-front voxel projection display algorithm is used, the last voxel projected determines the visible surface. In the complementary case of volume ray casting, it is the first voxel with a value above the threshold. After the visible surface has been detected, intensity values must be calculated for the screen pixels onto which the surface projects. This process is known as shading. Several shading methods exist. We will discuss depth shading and surface normal-based shading. The surface normal direction can be estimated from the distance map (depth-buffer gradient shading) or directly from the voxel model (context shading, binary gradient shading and gray-level gradient shading). a. Depth Shading. By storing, either in the frame buffer or in a separate buffer, the distance, or depth, of the visible surface voxels to the viewing plane, a so-called distance map (Horn, 1986) is obtained. Such a map is referred to as a depth-shaded pre-image in Gordon and Reynolds (1985). Strictly speaking, at this point the distance map does not yet represent a shaded image, but it may be used as input data for a depth-shading algorithm in which each pixel is given an intensity value that depends only on the distance value. To this aim, the attenuation of light under foggy atmospheric conditions can be simulated. In this case the light intensity depends exponentially on the distance from the surface voxel to the light source. In practice, a clearer perception of depth is obtained when the intensity depends linearly on the distance as - d f Iamb (14) Dmax is the total light intensity; Iamb denotes a constant, or ambient,
I
where I,,,
= ([,ax
- Iamb)
Dmax
134
D. P. HUIJSMANS AND G. J. JENSE
intensity term (typically 0.10. . . 0.251max)added to keep parts of the object not directly lit by the light source somewhat visible (this term may be seen as a zero-order approximation of indirect diffuse lighting); D,, is the maximum distance that occurs in the distance map. Formulas similar to this can be found in Chen et al. (1985) and Gordon and Reynolds (1985). Because parts of visible voxels, which lie at approximately the same distance from the screen, are painted in nearly the same color (intensity), even though they may have different orientations, depth shaded images have a very smooth appearance, showing little surface detail.
6. Normal-Based Gradient Shading. Real surface shading requires the calculation of the light intensity along the surface as a result of the diffuse and specular reflection of light. The amount of diffuse reflected light off a surface depends on the angle between the direction of a light source and the direction of the local surface normal vector: I = (Imax + Iumb (15) Again an ambient term has been added. The vectors L and N are vectors in the direction of the light source and the surface normal, respectively. When the effects of the ambient, diffuse surface reflection and depth terms are combined, the lighting equation becomes Dmax - d f Iamb (16) Dmax In both cases, the values of the surface normal vectors have to be estimated. Based on the information that is used to compute these values, surface shading algorithms can be divided in two categories: = (Imax - zamb)(L'N)
Image space shading. The values of normal vectors are determined from a local (2D) configuration of projected visible voxel values. Object space shading. Surface normal vectors are estimated from a 3D configuration ofvoxels in a local object space neighborhood of the visible voxel.
a. Z-Buf'r Gradient Shading. Chen et al. (1985) provide a comparative overview of several image space and object space shading methods for binary voxel models, two of which will be described here. The first method is known as Z-buffer gradient shading, or image space gradient shading, and has first been described in Gordon and Reynolds (1985). When a distance map w = w ( u , v) has been produced, the normal vectors may be estimated from the gradient over the distance map
The partial derivatives are approximated from the forward and backward
RECENT ADVANCES IN 3D DISPLAY
135
differences, +(u) = w , + ~ , ( , M’,,~, and & ( u ) = w,,+ - w , _ ~ , ~ ,similarly , for Sf(7~) and & ( v ) . In order to minimize the influence of the sudden depth changes due to the projection of different surfaces onto adjacent pixels (false edges), the partial derivatives are computed as a weighted sum of forward and backward differences; e.g., for the u direction:
The weight factors are determined experimentally, and depend on the absolute values of the differences. They are chosen in such a way as to assign large weights to very small differences, gradually diminishing weights to “medium” differences, and very small weights to large differences.
d. Context Shading. The second shading method is an object space method called normal-based contextual shading (Chen et al., 1985). The algorithm takes the “volume cell view” of voxels and determines the surface normal vector of each visible voxel face by examining the configuration of the face and the four neighboring faces that share an edge with it. The authors claim that the 81 possible arrangements of adjoint faces leads to 25 different possible orientations of a surface normal vector. e. Binary Gradient Shading. A similar object space shading technique was developed by us in the context of the exploitation of image processing hardware for the display of binary voxel models (see Section V1.A and Laan et al., 1989). Here, the “grid point view” is taken. The values of the normal vectors are approximated by examining the locally six-connected (or face adjacent) neighbors of the surface voxels. When a neighboring voxel has value 1, the normal vector gets a component in the opposite direction. Thus, there are 26 possible values (not counting the 0-normal) of the normal vectors (see Fig. 23). We call this method hinary gradient shading because the surface normal is estimated from the local (object space) gradient of the binary voxel values. When it is described from the “grid point view,” the binary gradient shading method can be seen as the application of a simple 3 0 dge detection operator to a 3D digital image. f . Gray-Level Gradient Shading. Starting from the traditional 2D edge detectors as used in 2D image processing, a number of these have been extended to 3D and used to estimate surface normal vectors in gray-value voxel models. The first operator we tried is the equivalent of the Kirsch operator (Ballard and Brown, 1982). The placement of the weight factors in the 3 x 3 x 3 operator kernel is given in Fig. 24(a). Similarly, a 3D equivalent
136
D. P. HUIJSMANS AND G. J. JENSE
FiGuRE 23. (a) A voxel and its 6-connected neighbors; (b) 17 of the 26 possible surface normal vectors.
of the Sobel, or Prewitt, operator can be devised, see Fig. 24(b). An operator that we arrived at by trial-and-error is shown in Fig. 24(c). It turns out to be nearly identical to the Zucker-Hummel operator (Ballard and Brown, 1982), which has also been used for shading gray-value voxel models by Hohne and Bernstein (1986). All of these operators compute the components of the surface normal vectors from weighted central differences in a 3 x 3 x 3 neighborhood: &)
= fx+l,y,z
6J.Y)
= f , y + l , r - fx,y-12
4 ( z ) = fx,y,z+l
-
-
f-l,y,r
(19)
fx,y,r-l
-1 -1
-l@ -1
-1
@ Jf2--3
-1
-3
-2
3
2
-2
-1
2
(a)
(b)
(c)
FIGURE 24. Gray-value gradient detectors: (a) 3D Kirsch operator; (b) 3D Sobel operator; (c) “Zucker-Hummel” operator. The small sphere indicates the central voxel. T o obtain the three components of the gradient vector, each operator is applied three times, each time in the appropriate (x, JJ or z) direction.
RECENT ADVANCES IN 3D DISPLAY
$\; ; /q1;
I
I
I
I
I
I
I
x-1
I
x
I
I
I
I I
I I
I
I
I
I
I
I
I
I
:L I
I
x-1
I
I I
:
I
xtl I
I
I
137
x
I
xtl
I
I
I I I
I
I
I
I
I
I I I I
I I
I
FIGLIRE 25. Adaptive gray-value gradient shading Ibr thin surface layers. Dashed lines indicate voxel boundaries. arrows the computed dilference. Forward differencing is applied in cases (a) and (b). backward differencing in cases (c) and (d).
with f a function of three discrete coordinate variables. This may cause problems for thin surfaces; i.e., with a thickness of 1 voxel, separating two regions, both of which have a gray value that is either lower or higher than that of the surface layer. The backward or forward difference, depending on the gray values on either side of the surface layer, should then be used instead. Figure 25 shows the four cases that may occur. Tiede rt al. (1990) have reported this method, which they call adaptive gray-level gradient shading.
3 . The Depth-Coordinate Buffer Initially, the system used only the intermediate buffer to render a depth, or distance map as input to the image space gradient shading algorithm: for each visible voxel at screen coordinates ( u , u ) the buffer contains the distance d from the visible voxel to the screen. The buffer depth of 8 bits allows 256 different depth values. The gray-value gradient shading method was first implemented, in conjunction with the back-to-front projection algorithm, as object space shading. Thus, when a voxel is projected, it is rendered at the same time. Since many projected voxels are overwritten later by those that lie closer to the screen, a lot of superfluous shading calculations are performed. If the object space coordinates of the voxels, which are projected last, can somehow be
138
D. P. HUIJSMANS AND G. J . JENSE
Depth
2-grad G-grad AG-grad
uffer
shaded image
AG-grad
FIGURE 26. The use of the depth-coordinate buffer.
retained, shading of the projected image can be applied after projection, based on the information available in image space. This has the advantage that the shading calculations are performed for only the visible voxels. When only rendering parameters are altered, the projected view will stay the same, reprojection is not needed and only the shading step will have to be performed anew. The depth buffer has been extended to a depth-coordinate buffer (D/C buffer), and now also stores the original object space coordinates of the visible voxels (see Fig. 26). Some of the shading information is now available in image space (the coordinates of the visible voxels) and some has to be retrieved from object space (the values of the voxels in the local neighborhoods of the visible voxels). Gray-value gradient shading can thus be considered a hybrid between object space and image space shading. Voxel mapping of the bounding planes can now also be applied after projection. In a scan of the DjC buffer, the voxel (x, y , z ) coordinates are compared to the coordinate values of the bounding planes. When they are equal, the corresponding gray-value is retrieved from the voxel model and rendered on the screen. For the performance of voxel mapping it makes no difference if it is applied during or after the projection step. An additional advantage is that the dynamic range of the distance map that is, the difference between the smallest and largest depth values, occurring in the projected image - may now be fully exploited to improve the depth cue in the shaded image. Because the dynamic range of the depth values is known after projection, the 256 available depth values can be linearly scaled between the minimum and maximum values (contrast stretching.
4. Transparent Volume Rendering True volume visualization algorithms must in one way or another handle transparency. When transparency comes into play, a range of voxel values
RECENT ADVANCES IN 3 D DISPLAY
I39
along a line of sight determine the value of an output pixel. Under these circumstances an incremental calculation of the output value using the current pixel value and the next accessed voxel value is feasible only when the transformed voxels are presented in a strictly increasing or decreasing depth order. Only in that situation can we merge the new voxel value with the current pixel value; hardware support for this merge is called a channel. A Z buffer will not suffice because it can be used as a minimum or maximum filter only when transformed voxels arrive at random. Fortunately both forward and backward mapping algorithms exist that deliver their transformed voxels depth ordered along lines of sight.
a. Integral Method. The simplest way to get a transparent image is by directly interpreting voxel values fx,J,z as opaciry values. One way of visualizing such a semitransparent voxel model is by assigning to each screen pixel the integrated opacity along the line of sight. This may be implemented using a ray-casting algorithm and summing all voxel values along a ray, a method that is known as additive reprojection (Hohne and Bernstein, 1986). Resulting images are comparable to x-ray projection images. A disadvantage of this method is that thin features, with significantly larger voxel values than their surrounding voxels, are lost, due to the averaging along the ray. h. Extremal Method. Thin features may be rendered more clearly by maximum value reprojection, a method where the maximum voxel value found along a ray determines the pixel color (Johnson and Mosher, 1989). This method is rather effective in magnetic resonance angiography (MRA). Examples of both additive and maximum value reprojection are given in Fig. 27.
c. Alpha Blending. Porter and Duff (1984) have formulated a more general method for handling transparency. Their method was originally used to composite series of 2D images by means of transparent overlaying and stenciling. A number of volume rendering algorithms, notably those of the Pixar system (Drebin et al., 1988) and Levoy (1988), are essentially based on this method. Voxel values now specify both color and opacity values indirectly: a voxel value is an index in a look-up table. Each entry of the table consists of a quadruple ( R , G , B, a ) . The R , G and B are the red, green and blue color components in the range 0-1, while a is the opacity value. The a component can also assume values between 0 (completely transparent) and 1 (opaque). The final composition is reached via a series of incremental steps. Each incremental step takes two color-opacity values, e.g., A (current weighted sum) and B (RGBa quadruple indexed by the next voxel value), and com-
140
D. P. HUIJSMANS AND G. J. JENSE
FIGURE 27. Volume rendered images: (a) averaged gray-values along the rays (additive reprojection); (b) maximum gray-values along the rays.
RECENT ADVANCES IN 3D DISPLAY
141
putes a weighted sum of the color values (separately for each of the red, green and blue “channels”), using the respective cr values as weight factors. So, when voxel A lies in front qfvoxel B, we get
where a color component is denoted by C‘ and subscript 0 indicates output value^.^ This equation shows that, when voxel A lies in front of B, it blocks a fraction a A ,and lets through ( 1 - a A )of B’s color. In volume rendering, a pixel color component C is the composite of many voxel colors. The compositing process may operate in either a back-to-front or front-to-back order. In the first case, we start with a background value and successively add voxels in front of the accumulated total, merging the voxel’s color and opacity (C, a ) with the total so far ( C u , a o ) ,resulting in a new value of the accumulated total (Ch, ah):
This compositing order is, of course, naturally incorporated in the back-tofront voxel projection algorithm. When the compositing is done in front-toback order, which is the most appropriate way for volume ray-casting display algorithms, the merging is slightly different:
+ ( I cr0)cr abch = aoCo + ( 1 cYo)aC ah = a0
-
(22)
-
The advantage of front-to-back compositing is that it is easily extended with an adaptive ray termination condition (Levoy, 1990). When the addition of voxels along a ray no longer causes significant changes in the total color value, the compositing can be stopped. The termination criterion is that the total opacity, which increases monotonically along the ray, reaches the value 1 - E , a small value; e.g., 0.05. The value of E may be adjusted: larger values increase rendering speed, while smaller values lead to improved image quality.
d. Fuzzy- Voxel ClassiJication. The construction of the look-up table, In practice, the color components are prrrnulriplied by the value of o, i.e., a quadruple a ) represents a pixel with opacity a and color ( R G B ) components ( r i c k , g/a,b/cy). This convention greatly facilitates the compositing calculations: writing c = a C , eq. (20) becomes co = cA + ( I - n A ) c B .After ull compositing calculations have been carried out, the actual color value Co is obtained by a single division by no. ( r , g, h,
142
D. P. HUIJSMANS AND G. J. JENSE
the quadruples (R, GI B , a ) , usually depends on knowledge about the source of the volume data set and is known as voxel classification. A voxel classification scheme assigns material properties, such as color and opacity, to each voxel value. When every voxel is labeled as one particular material, we speak of binary voxel classification. Gray-level voxel models generated by scanners suffer from the so-called partial volume effect. Densities of different structures within an elementary voxel all contribute to the single gray value stored. In an attempt to partially undo this effect every voxel is assigned a chance of belonging to a set of materials. This is called fuzzyvoxel classification. For instance, the display algorithm of the Pixar volumerendering system employs a sophisticated fuzzy-voxel classification scheme. This method considers voxels as containing certain percentages of different materials. From these percentages, color and opacity values are determined, as well as density gradients. The color and opacity values are used to render volume of more or less translucent materials, while the gradient vectors are used to render (possibly semitransparent) surfaces that mark transitions between different materials. Fuzzy voxel classification, while potentially yielding superior, alias-free images, enormously increases rendering times. D . Comparisons of Algorithm Time Performance
In order to gain both experience with and insight into the computation costs of various voxel-based display and rendering algorithms, two things have been investigated: the absolute display times (e.g., in seconds), and the relative times spent on different stages of the algorithms. The aim of taking these measurements was to discover which combinations of display and rendering algorithms could most effectively be used in an interactive visualization system. Implementations were made on a "standard" workstation (Sun 3/16OC, an 68020 based system with 16Mb of memory), as well as for an accelerated system: the same Sun 3/160C equipped with a TAAC-1 board (see Section V for a description of the TAAC-I accelerator). 1. Voxel Projection Algorithms First, the performance of our implementation of two voxel-projection algorithms was measured. The implementations are of the fixed order, and the slice-by-slice, back-to-front algorithms. The measurements were done on the unaccelerated workstation, using software 2 buffering for the fixed-order algorithm and depth shading for rendering the projected voxels. The measurements were obtained for two settings of the viewpoint: view 1 involved rotations about the X and Y axes of 45" and -45", respectively, while for view 2 these angles were 45" and 135". A CT data set, consisting of
143
RECENT ADVANCES IN 3D DISPLAY TABLE I1 VOXEL-PROJECTION ALGORITHMS ON THE SUN3 ~
SBS BTF order, view 1
(BONF TISSUE) ~~
SBS BTF order, view 2
Size
Projected
Rendered
Time
Size
Projected
Rendered
Time
32’ 643 128’
1270 10782 86553
1270 10782 86553
2 7 47
32’ 64’ 128’
1374 10801 86553
1374 10801 86553
2 7 47
Fixed order, view 1
Fixed order, view 2
Size
Projected
Rendered
Time
Size
Projected
Rendered
Time
32’ 64’ 128’
1370 10813 86553
1312 7829 25904
2 5 26
323 64’ 128’
1370 10813 86553
I370 10813 86553
2 7 47
2 million voxels, was used; and voxels were selected for projection by density windowing (intervals of gray values). Each row in the tables gives the size of the voxel model, the number of voxels that are projected, the number that are finally rendered and the display times in seconds. Table I1 relates to the display of the voxels that represent the bone tissue in the volume data set. The second series of measurements (Table 111) were obtained with the same parameter settings, but now the range of selected voxel values was widened to include skin and soft tissue, increasing the number of voxels projected. The fixed scanning order in view 1 corresponds to a SBS front-to-back output order, while in view 2 it corresponds to the SBS back-to-front output order. From the timings it is clear that time lost in checking depth values against those in the 2 buffer weighs up to rendering invisible voxels. In conclusion one may state that the fixed-order algorithm takes at least as long as the front-to-back and at most as long as the back-to-front algorithm. Depending upon the number of voxels displayed, the front-to-back algorithm is two-four times faster than its back-to-front counterpart. Taking into account an overhead of 1-1.5s (for setting up the transformation tables etc.), display times increase linearly with the number of voxels projected. For the slice-by-slice back-to-front algorithm, there is no difference in performance for the different viewpoint, because only the scanning order is adapted to the location of the viewpoint. These figures also show that the rendering of voxels is very expensive. In order to measure the performance increase of the accelerated workstation over the standard system, the SBS BTF algorithm was implemented on the Sun/TAAC-1 workstation. The performance figures are shown in
1 44
D. P. HUIJSMANS AND G . J . JENSE
TABLE I11 VOXEL-PROJECTION ALGORITHMS ON THE SUN3
(SKIN-SOET-RONE TISSUE)
SBS BTF order, view 2
SBS BTF order, view 1 Size
Projected
Rendered
Time
Size
Projected
Rendered
Time
32’ 64’ 12g3
10065 80723 647307
10065 80723 647307
5 36 270
32’ 64j 128’
I0863 80772 647307
I0863 80772 647307
5 36 270
Fixed order, view 2
Fixed order, view 1 Size
Projected
Rendered
Time
Size
Projected
Rendered
Time
323 64’ 1283
10193 81 I06 647302
6676 15673 32376
4 15 76
323 643 128’
10193 81 106 647302
10193 81 106 647302
6 36 275
Table IV. The display parameters were set to the same values as those for view 1 of the previous tables. Because the display time for the SBS BTF algorithm does not depend on the view, view 2 could of course have been chosen as well. By comparing Tables I1 and I11 with Table IV, the performance increases by a factor between three and seven. 2. Ray Casting A volume ray casting algorithm was implemented on the Sun/TAAC-1 workstation. The ray-bounding-box intersections are computed with the slab-based method given in Section 111, while the voxel traversal is done by the 3D Bresenham algorithm, generating either 6-connected or 26-connected voxel paths. The ray-casting algorithm was used in conjunction with surface rendering by depth shading. The display times for this algorithm are presented in Table V. The column Rays gives the number of rays that were generated, while the display times are again given in seconds, both for 6-connected and 26-connected paths. Studying the table, it appears that voxel traversal along a 6-connected path is roughly 10% more expensive computationally than when the path is 26 connected. Furthermore it may be observed by comparing the table for the “skin” surface to the one for the “bone” surface, that display times decrease when the number of object voxels increases. This is due to the fact that, when the object consists of more voxels, it is larger and ray-object intersections are detected at an earlier stage along the rays. The relative performance of volume ray casting versus voxel projection can be deduced by comparing Tables IV and V. By comparing the two methods when generating a 512 x 512 image (the 128 x 128 x 128 model
145
RECENT ADVANCES IN 3D DISPLAY
TABLE IV SBS BTF PROJECTION ALGORITHM ON THE TAAC-1: ( L E F T ) BONETISSUE: ( R I G H T ) SKIN-SOFTBONF T I S S U E
Size
ProjectedjRendered
Time
Size
Projected/Rendered
Time
32’ 64’ 128j
1270 10782 86553
I 3 16
32’ 64’ 128’
10065 80723 641301
2 8 40
is between 2.5 and 7.5 times slower. However, a rather inefficient method was used to calculate ray-bounding box intersections. This is because, in the slab-based method, each single ray is transformed from screen space to object space and each intersection is computed anew, an approach more representative for perspective views. When ray-bounding-box intersections are computed incrementally using spatial coherence, the times presented in Table V are almost halved.
3. Rendering Methods The rendering algorithms used in this section were all implemented on the Sun/TAAC-I workstation. Again, the CT data set was used. For a comparison of surface-rendering algorithms the skin surface was projected with the SBS BTF algorithm, with the object rotation set to view 1. The results are shown in Table VI. For the image space 2 grad method, the image size was 512 x 512 pixels. Note that the columns Depth and G grad show total display times (in seconds), while the numbers in column Z grad should be added to those in the column Depth to obtain the total display time for a depth-gradient shaded image. These figures again confirm what became evident from Tables I1 and 111, the performance of the SBS BTF algorithm is low because it renders a lot of superfluous voxels. When a sophisticated surface rendering technique such as gray-value gradient shading is used, display times become even larger. Table VII gives performance figures of several rendering methods when TABLE V ON T H I TAAC-I ( L F F T ) BONFSURFACF. (RIGHT) SKIN SURPACF RA\-CASTINC ALGORITHM Time Rays 128*
2562 5122
6 connected
Time 26 connected 1
8 31
27
121
108
Rays
6 connected
26 connected
128? 256? 5122
6 25
6 23 92
101
146
D. P. HUIJSMANS AND G. J . JENSE
used with the depth-coordinate buffer. Depth gradient shading ( Z grad), a “pure” image space method, has been taken as a reference. The figures for gray-value gradient (G grad) and adaptive gray-value gradient shading (AG grad) can be compared to the object space versions of these rendering methods. The figures were obtained using the CT data set. As can be seen from these figures, by first projecting and then rendering (from the D/C buffer), a gray-value gradient-shaded image is obtained in 47 18 = 65 seconds instead of 164 seconds (see Table VI) when projection and rendering are done simultaneously, a speed-up factor of 2.5. Moreover, another image of the same view, but with different rendering parameters, is now available in at most 20 seconds, which is a ninefold increase in rendering speed. Two relatively simple volume rendering methods, average value and maximum value reprojection, were also compared. Both were implemented on the Sun/TAAC-1 system with the same ray-casting algorithm used in the previous paragraph and the voxel models were viewed from the same direction. The table with display times (Table VIII) shows that even these simple volume rendering methods result in substantially longer display times than surface rendering.
+
4. Evaluation The following conclusions may be drawn from the figures presented: 0
0
Front-to-back scanning of the voxel model, in combination with Z buffering or a bit mask, is the fastest display method, but it requires additional storage for the Z buffer. The overhead for software Z buffering is very small, hardware Z buffering could further improve the performance. Although the SBS back-to-front is the slowest of the incremental scanning algorithms, it is very informative to look at when the rendering is slow enough; during the rendering phase one gets a view of the entire volume data set. TABLE VI OBJECT SPACEVERSUS IMAGESPACESURFACE RENDERING METHODS Time G grad
Object size
Depth
322
2
3
642 128*
7
21 164
41
Z grad
5 5 5
147
RECENT ADVANCES IN 3D DISPLAY
SHADNG
VIA
THE
TABLE VII D E ~ H - C O O R D I NBUFFER ATE
Method
Z grad
C grad
AG grad
Time(s)
5
18
20
Display times for object space surface rendering methods depend on the number of voxels that are projected. For image space surface rendering, on the other hand, they depend only on the size of the image, and they do not increase as more voxels are projected. Even simple transparent rendering methods, such as average and maximum value reprojection, are significantly slower than surface rendering. Volume ray casting is at least 10 times slower than the SBS FTB projection algorithm. However, a twofold increase in performance of ray casting is obtained when using more efficient ray-bounding-box intersection computations, by making use of spatial coherence to compute them incrementally. Most of the display and rendering algorithms discussed here have been implemented in the system described in Section VI.
SELECTION AND DIVISION IV. SPATIAL The visualization of a volume data set may require more than just the display of the entire voxel model. The ability to interactively select parts of the voxel model for separate display or for removal from the image may provide additional insight in a complex three-dimensional structure. Hidden details of an object may be revealed by removing obscuring parts. AlterTABLE VIII TRANSPARENT RENDERING METHODS COMPARED Time
Average value
Maximum value
Rays
6 connected
26 connected
6 connected
26 connected
128* 256’
12
9 34 138
12 47 190
10 41 144
512:
44 180
148
D. P. HUIJSMANS AND G . J. JENSE
nately, object models could be interactively constructed by repeated addition of simple primitive parts. When a ray-casting display algorithm is used, slicing an object along a plane is easy to accomplish. Instead of starting the search for the visible surface along a ray from the screen or viewpoint, the search is started at the slicing plane. This ensures that voxels in front of the slicing plane are not displayed. In transparent rendering the search may also be performed from the viewpoint up to the slicing plane, resulting in the removal of all voxels behind it. Because voxel models contain all sampled values on a regular 3D grid, the positions of the voxels are implicitly known. Voxel models are sets (in the mathematical sense) of elementary volume elements or point samples. This property allows the combination of voxel models according to Boolean set operations (union, intersection, complement) in the case of binary voxel models, or arithmetic operations (addition, subtraction, multiplication) in the case of gray-value voxel models. The set operations may be performed by logical operations (Or, And, Not) on corresponding pairs of voxels from two voxel models. For instance, voxels in a certain region may be removed from a voxel model by computing the difference between it and a secondary voxel model that represents the region to be removed. A spatial selection method that is particularly suitable for use in combination with ray-casting display methods is “peeling” the voxel model. After detecting and displaying the outermost visible surface, the rays are followed further inward, subsequently detecting and displaying inner surfaces. Applying this to an anatomic model for example, results in first displaying the skin, then the subcutaneous tissue, followed by muscles and bones. Systems that allow for spatial selection in some form have been described before by others (Brewster et al., 1984; Chen and Sontag, 1989; Upson, 1989), but the functions that are offered by their systems are rather limited.
A. Binary Space Partitioning
One way to create a general subdivision of a voxel model is by means of recursively bisecting the voxel model with partitioning planes. These planes may be placed at arbitrary positions and orientations through the voxel model. The resulting subdivision of the voxel model (in convex parts) is described with an auxiliary data structure: the binary space partitioning tree, or BSP tree. The voxel model can subsequently be visualized as an exploded view. A BSP treebased dividing scheme allows the incorporation of voxel projection, volume ray tracing and slicing methods to render the volume data.
149
RECENT ADVANCES IN 3D DISPLAY
1. BSP Tree Fundamentals Originally, the binary space partitioning tree was devised to speed up the display of 3D objects, represented by a collection of polygonal boundary surface elements (Fuchs et al., 1980; 1983). BSP trees have also been used in the evaluation of set operations on polyhedral objects (Thibault and Naylor, 1987) and in ray-tracing algorithms (Naylor and Thibault, 1986). These applications all require some sort of spatial presortedness of the polygonal input data set. The spatial sorting of the input data is achieved by constructing a tree in the following way: 1. Choose a polygon from the input data set and place it at the root of the tree. Determine the equation of the infinite plane that embeds the polygon. 2. Partition the remaining polygons in two subsets: (a) those that lie entirely to the left of the partitioning plane, (b) and those that lie entirely to the right of the partitioning plane. Polygons that are intersected by the partitioning plane are split along the intersection, and each part is added to the appropriate subset. 3. Recursively apply the previous steps to the two subsets, until the input data set is exhausted.
The choice of which side is ‘‘left’’ and which is “right” is arbitrary. It usually corresponds to the counterclockwise and clockwise orientations of the polygon vertex list. A 2D example of a BSP tree, constructed from a set of line segments, is presented in Fig. 28. Displaying an image from the constructed BSP tree is straightforward given the position of the viewpoint, the BSP tree is traversed in the following
.....
.
...
(a)
(b)
el
dl
d2
(C)
FIGURE 28. (a) A set of 2 D line segments, arrows indicate “left” side; (b) the embedding lines and convex subspaces; ( c ) BSP tree.
150
D.P. HUIJSMANS AND C. J. JENSE
order: 1. At a node, determine the side of the polygon the viewpoint is on. 2. Process the subtrees of this node in this order: (a) Traverse the subtree on the other side of the polygon. (b) Display the polygon at the current node. (c) Traverse the subtree on the same side of the polygon. 3. Apply the previous steps recursively, until the root is reached from its subtree at the “same” side.
This algorithm displays the polygons in a back-to-front sequence, overwriting polygons on the screen further away by those closer to the viewpoint. Note that the BSP tree is a static data structure: it is constructed once for a given input data set and individual polygons or partitioning planes cannot be dynamically inserted or deleted. In case the input data set changes, e.g., when a new polygon is added or removed, the whole tree must be rebuilt, starting at the internal node, where the new polygon is inserted. The deletion of internal nodes may even lead to a complete reordering of the tree. The planes, associated with the polygons, partition 3D space in several, possibly half open, convex subvolumes or cells. An internal node of the BSP tree represents a partitioning plane, ax + by + cz + d = 0, which splits the 3D space in two half-spaces, each represented by the node’s two children:
+ by + cz + d > 0 } , {(x, y , z ) lax + by + cz + d < 0) {(x, y , z ) lax
and (23)
A leaf node then represents a convex volume cell, formed by the intersection of all half-spaces encountered on the path from the root of the tree to that leaf. Figure 29 gives a simple example of a cube, representing (part o f ) 3D space, subdivided by several planes, along with the associated BSP tree. As stated before, a BSP tree is a static data structure. This means that
(a)
(b)
(C)
FIGURE 29. (a) Cube, partitioned by several planes; (b) cells shown in exploded view; (c) corresponding BSP tree. Letters denote partitioning planes, numbers denote volume cells.
RECENT ADVANCES IN 3D DISPLAY
151
when a new partitioning plane is added, a new BSP tree has to be constructed. Because N partitioning planes may potentially yield a subdivision of 3D space, consisting of N3 convex cells, one would expect this to be a costly operation in terms of processing time. However, in practice a BSP tree, constructed from N partitioning planes, contains CN cells, with C a constant between 1 and 5 (Fuchs et al., 1983). At the internal nodes of our “augmented” BSP tree the coefficients a, b , c and d of the plane equation are stored, and the pointers to the node’s children are nonnull. The other fields, in this case, are of no significance. When the node is a leaf, the fields npolys and p l i s t give the number of polygons and the first element in the polygon list, respectively. Each node also has an a t t r i b field. This indicates if and how the cell is to be rendered. The individual polygons also have such a field that is used to determine whether the polygon is to be voxel mapped or not. The c e n t r o i d field contains the value of the arithmetic mean of the polyhedron’s vertices. Its use will be clarified later. A polygon is described in terms of its vertices w,(O 5 i < N ) , stored in an ordered sequence in v l i s t . The ordering is clockwise when seen from the outside of the polyhedron. The vertices uo, . . . , w,,w,+I , . . . , w,,-~ implicitly . . . , eN-1 = represent the polygon’s edges eo = (w,,,vl), . . . , e, = (w,,w,+~), (wN-I,
vO).
B. Creating u Subdivision Construction of a BSP tree that represents a subdivision proceeds as follows. The root of the BSP’tree represents the entire voxel model. This root cell is split in two by a splitting plane, resulting in two sibling cells, feft and right.6 Either of these may be selected for further subdivision, making it the current cell. These actions may be repeated until the desired subdivision is achieved. In the following subsections we will look further into the splitting process and give details of various interaction techniques. 1. Positioning the Splitting Planes
Positioning a splitting plane is done interactively by means of a mouse pointer device. Coarse positioning is done via sliders that control the rotation angles about two axes in the plane and the translation of the plane along its normal vector (see Fig. 30). Fine adjustment of the plane’s position is achieved by moving the plane-polyhedron intersection points along the edges of the polyhedron’s wireframe. The interaction facilities that support Sometimes called fronr and back
152
D. P. HUIJSMANS AND G. J. JENSE
3 FIGURE
30. Directions of movement for a splitting plane.
this direct manipulation of the splitting plane are described more extensively in Section V1.C. Visual feedback is provided by voxel mapping the section polygon. This provides the user with a sense of orientation of the splitting plane regarding the contents of the voxel model.
2. Splitting the Polyhedra After a partitioning plane has been positioned, either all polyhedral cells, or only the current cell may be split. In the first case, the entire BSP tree is traversed and all cells that are intersected by the plane are split. When, in the latter case, the current cell has already been subdivided, all children that are intersected by the plane are also split. Splitting a convex polyhedron results in two new convex polyhedra (see Fig. 3 1 ) . The splitting procedure sequentially examines the polyhedron’s polygon list p l i s t and attempts to split each polygon. Because they are
l e f t (front)
section
ri gh t ( b ack )
FIGURE 31. Splitting a convex polyhedron with a plane.
153
RECENT ADVANCES IN 3D DISPLAY
also convex, the splitting of a polygon will yield two new polygons, each of which is added to the p l i s t of the appropriate new polyhedron. When the original polyhedron’s p l i s t has been scanned, the p l i s t ’ s of the two new polyhedra are complete, except for the newly created section pol.ygon, which they have in common. Every time a polygon is split in two, the vertices of the resulting section edge are stored in a separate list. After all polygons have been split, this list contains the vertices of the section polygon, but not in (circular) order. The ordering of this list requires a separate sorting step. A number of special cases arise when one or more vertices lie on the splitting plane. 1. One vertex lies on the plane. In this case there are two possibilities:
(a) the plane splits the polyhedron exactly on the vertex, the vertex should be added to both left and right children; (b) the polyhedron only touches the splitting plane, all other vertices lie on the same side and all vertices should be added to the same child node. 2. Two adjacent vertices lie on the plane. Again, two cases must be distinguished: (a) the plane passes through the connecting edge and splits the polyhedron, the edge becomes part of the section polygon; (b) the polyhedron touches the plane along the edge; all other vertices lie on the same side of the splitting plane and are added to the same child node. 3. Three or more vertices of the same polygon lie on the plane. This means that the entire polygon lies on the plane. Since the polyhedron is convex the whole polyhedron lies on one side of the splitting plane and it is copied into the appropriate child node. The basis of the splitting for polyhedra is formed by a routine for splitting convex polygons, the polygon’s (ordered) vertex list is scanned until a “transition” from vertices lying on the left to vertices lying on the right side of the partitioning plane occurs, see also Fig. 32. Determining on which side of the splitting plane a vertex lies simply means substituting the vertex coordinates (x,,, y,,, zt,) in the plane equation ax by cz d, with three possible results:
+ + +
+ by,, + cz,, + d < 0, ax,, + hyt, + C I , . + d = 0, a x , + by2)+ cz, + d > 0, ax,,
11
lies to the left of the plane
11
lies on the plane
11
lies to the right of the plane
(24)
Which side is left and which is right is arbitrary. The vertices on either side are stored in the appropriate vertex lists. When the positions of the current vertex
154
D. P. HUIJSMANS AND G . J. JENSE
1
r
FIGURE32. Splitting a convex polygon with a plane.
differ, the corresponding edge intersects the zl; and the previous vertex plane. The coordinates of the intersection point can then be computed:
analogous to the y and z coordinates. C . Displaying Subdivided Volume Data
In the previous section we showed how a subdivision of 3D space may be described by means of a BSP tree. We will now describe how the standard BSP tree display algorithm can be adapted to obtain an exploded view of a subdivided voxel model. The advantage of using a BSP tree to describe the subdivision is that it represents both the partitioning of the volume data in convex cells as well as the polygons that separate these cells. The back-to-front display algorithm can employ the back-to-front ordering of the partitioning planes and the volume cells, while a ray-casting algorithm can benefit from the front-toback ordering of the partitioning planes to speed up ray-polygon intersection tests. A slicing algorithm for the display of individual polygons is also easily incorporated in this scheme. 1. The Main Display Algorithm
The outline of the display algorithm is as follows: a modified version of the standard BSP tree display algorithm traverses the tree and determines how the individual volume cells and their bounding polygons must be rendered. The actual display of the convex volume cells is then performed by an
RECENT ADVANCES IN 3D DISPLAY
155
extended version of the front-to-back algorithm, or a volume ray-casting algorithm, while the partitioning polygons are rendered by a slicing algorithm (see Section 111). The display algorithm must be adapted in order to yield the leaf nodes, instead of the internal nodes, in a front-to-back order:
I . Starting at the root node, determine the node type (internal or IeaJ).If the node is a leaf, render the cell according to its attribute field and the polygons in the list according to the values of their respective attributes, or 2 . Visit the nodes of the subtree, rooted at this internal node, in the following order: (a) traverse the subtree in front of the partitioning plane, (b) traverse the subtree behind the partitioning plane. Our version of the BSP tree display algorithm is a preorder tree traversal (instead of an in-order traversal) because in the original BSP tree the polygons are stored in the internal nodes, while we store them in the leaf nodes. At each node, different display actions are taken, depending on the type of the node (leaf or internal) and the values of its attributes. Some information in our BSP tree is redundant: the coefficients of the plane equations in the internal nodes occur again in the polygon lists of the leaf nodes. The availability of these data in two places simplifies various calculations at the cost of a small penalty in memory. In a standard BSP tree the internal nodes would hold a representation of a partitioning plane, for instance the coefficients of the plane equations, while the leaf nodes are empty, indicating that there is no further subdivision below this level. The polygons, which make up the boundary surface of a convex volume cell at a leaf node, are not represented explicitly. They would have to be determined at display time. This involves traversing the BSP tree from the root to the designated leaf node and computing the intersection of all half-spaces encountered on this path, finally yielding a polygonal boundary surface when the leaf node is reached. However, this computation “on the fly” would greatly increase display time. For our “augmented” BSP tree, we calculate the polygonal boundary surfaces of the convex subvolumes once, during construction of the BSP tree, and store them as a linear list of polygons at the corresponding leaf node. This approach is advantageous when the subdivision of the volume model changes less frequently than the viewpoint from which an image is displayed. 2. Displaying Exploded Views Another way to reveal the contents of the voxel model, while at the same time retaining the spatial relationships between different substructures is to
156
D. P.HUIJSMANS AND G . J. JENSE
provide an exploded view facility. An exploded view is obtained by translating all volume cells along a vector Te by a certain amount, away from the center of gravity of the voxel model C . The amount of translation is determined by the distance of the polyhedron’s centroid, c, to the center of the voxel model, multiplied by a constant factor f that may be set by the user. The direction of translation is from the main center to the cell’s centroid: Te = f ( c p
-
C)
(26)
Thus, polyhedra that are further out from the main center are translated by a larger amount than those closer to the main center. This, together with the fact that all cells are convex, guarantees that there can be no “collisions” between translated cells (see also Fig. 33). 3. Front-to-Back Display of Volume Cells
The conventional front-to-back algorithm assumes that the volume data set is bounded by a rectangular box. Scanning all the voxels within a polyhedral boundary can be accomplished by a 3D extension of the well-known scanline polygon-filling algorithm, as can be found in standard computer graphics textbooks (Hearn and Baker, 1986). The outlines of an algorithm, for the 3D scan conversion of arbitrary polyhedra has been given by Kaufman and Shimony (1986). This algorithm, however, scans the voxels in a sequence that does not guarantee the front-to-back order, but a simple extension remedies this. Another difference is that in our case all volume cells are convex. This knowledge may be used to simplify calculations. The skeleton of the extended front-to-back display algorithm follows. Assume the viewpoint is located such that the voxels are traversed from Zmin-+ Z,,,, Ymin -+ Y,,,, Xmin X,,,. The algorithm performs the following steps:
-
1. Sort polyhedron’s vertices according to Z coordinate into a list. 2. “Sweep” the X Y scan plane through the polyhedron.
FIGURE 33. Determining the translation vectors.
I57
RECENT ADVANCES IN 3D DISPLAY
3. At a vertex, some edges are terminated, while others are made active. This information is kept in an active edge list. 4. Between the vertices in the list, edge-scan-plane intersections may be computed incrementally. At a vertex updating the active edge list is done by a routine that performs the following actions: 1. Delete edges that terminate at that vertex.
2. Insert edges that start at that vertex. 3. Initialize edge-scan-plane intersections for newly inserted edges. The incremental updating of the intersections between active edges and the scan plane is handled by a routine that uses the edge’s dx/dz and dyldz. Routine scan xy( ) (see Fig. 34) takes care of scanning the intersection area in the scan plane. It is simply a conventional 2D scan conversion algorithm for convex polygons:
I . Sort the intersection list according to Y coordinate into a list. 2. “Sweep” the Y scan line through the scan plane. 3. At an intersection vertex, some edges between intersection points terminate, while others become active. The active ones are stored in another active edge list. 4. Between vertices, the span of a row of voxels may be computed incrementally from the edge’s dxldy. Figure 34 shows an example, where the edges that are active at the current location of the scan plane and scan line have been indicated. When the position of the viewpoint is in a different octant, the sweeping directions of the scan plane and scan line as well as the scan directions of the voxel rows must be altered to assure the correct front-to-back order. Of course the sorting order of vertices also depends on the location of the viewpoint.
..............
-
. scanghedron
()
edge active edge scanpl ane edge-scanplane intersection
scan-xy
()
FIGURE 34. Back-to-front display of a convex polyhedral volume cell.
158
D. P. HUIJSMANS AND G . J. JENSE
4. Volume Ray Casting Convex Polyhedral Cells
During traversal of the BSP tree, volume cells may be encountered that are to be displayed with volume ray casting. The basics of volume ray casting have been covered in Section 1II.B. A significant difference with the basic algorithm is that the boundary of the volume is now a convex polyhedron instead of a box. This means that the ray-bounding-box intersection test must be replaced by a more general ray-convex-polyhedron intersection test. An efficient algorithm, utilizing the convexity of the polyhedra and scan-line coherence, is described next. The required results of the ray-convex-polyhedron intersection routine are the coordinates of the entry and exit points of the ray. Since the polyhedron is convex, the polygonal faces can be divided into two groups, the front faces, and the back faces (see Fig. 35), meaning faces whose normal vector points toward or away from the screen, respectively. The entry point of the ray will lie on a front face, while the exit point must lie on a back face. Computation of the intersection points is done, in parallel but separately for the front and back intersections, by an algorithm that is derived from the well-known scan-line visible-surface algorithms (Foley and Van Dam, 1982; Sutherland, Sproull and Schumacker, 1974). The fact that the polyhedra are convex allows various forms of coherence to be exploited (see Fig. 36): Face coherence. At a given scan-line position, several adjacent polygonal faces may cross it, but they never penetrate each other. Edge coherence. As long as the scan-line crosses an edge, edge-scan-line intersections may be computed incrementally. Scanline coherence. The several segments on a scan-line that together make up the current span, show little change from one scan-line to another (as a result of edge coherence).
FIGURE 35. Screen space view of a convex polyhedral volume cell.
RECENT ADVANCES IN 3D DISPLAY
159
segrnen t s
Ph
I I I
I
I
.
. .
I
:*
..
I I : I:
s can-1 i n e
I ,
1
. ....... ..
I I
active -edges
1-
c u r r e n t span FIGURE 36. Principle of the ray-convex-polyhedron intersection test
The algorithm sweeps a scanline across the screen from top to bottom. The vertex positions on the screen indicate events whereupon the list of edge-scan-line intersections must be updated to reflect the current situation. Between event points the edge-scan-line intersections can be computed incrementally from the edge’s au/dw values. Pairs of edge-scan-line intersections delimit scan-line segments along which the ray-polygon intersections can be computed. Again, this may be done incrementally, using the derivative dwldu, which can be determined from the polygon’s plane equation. Frorfi one scan-line to another, the value dw/dw is used to update the current edge-scan-line intersections (see Fig. 36). Once the entry and exit points have been found, one of the voxel skipping algorithms from Section 1II.B can be used to follow the ray’s progress through the voxel model. a. Evaluation of BSP Tree. The BSP tree is an extremely efficient data structure for the sort of selective 3D views we aim at. The various display modes - intersection, surface and transparent - can easily be combined with spatial selection, depth ordering and incremental scanning techniques. The only disadvantage is its static character; when the subdivision is changed, the tree may have to be rebuilt. This, however, did not turn out to be a bottleneck.
V. HARDWARE SUPPORT In a previous section we have seen that the large number of primitive elements in a voxel model can be the source of two major problems. The
160
D. P. HUIJSMANS A N D G. J . JENSE
first problem, that of storage requirement, is often mentioned in early literature on voxel models (Badler and Bajcsy, 1978; Srihari, 1981). It is of less importance nowadays, due to the continuously decreasing price of memory hardware. The second problem is that the large number of elementary volume elements makes algorithms extremely computation intensive. On the other hand, the algorithms are often conceptually very simple, which makes them prime candidates for hardware support. Several special purpose voxel processing systems have been developed. An overview of the different approaches is presented in Section V.A. The algorithms could also be implemented in software on efficient, but more general purpose architectures. There are two principal ways in which voxel-based algorithms can benefit from such hardware support. 1. All voxels are distributed among many processing units that are connected in some network. All units process the voxels that have been assigned to them in parallel. 2. Voxels are processed in some ordered sequence, e.g., slice by slice, row by row, column by column. The operations on successive voxels can then be performed by a pipeline of elementary processing units.
The speed of a network of processors depends in principle on the number of units. However, adding more units to the system implies more communication (data exchange, passing of intermediate results, system messages) through the network links. Therefore, the problem becomes one of selecting a network layout that provides a maximum performance gain, while minimizing the communication overhead. The way in which a voxel model is subdivided and the parts distributed among the processors greatly influences the performance of parallel systems. The evaluation of a number of parallel systems is presented in Section V.B. Instead of trying to design special purpose hardware, we decided to exploit readily available hardware. A not so obvious system, which performs surprisingly well, is an auxiliary processor for two-dimensional image processing purposes. Section V.C describes how such a system can be used in various ways to support a number of operations on binary voxel models. The speed of a pipeline architecture depends on the number of results per unit of time that can be delivered at the pipeline’s output. The main problem with this approach is, given a repertoire of operations on voxels, choosing a number and order of processing units in the pipeline that enables their optimal utilization for all basic operations of the algorithm. Special attention has been paid to evaluating the effectiveness of a relatively general purpose pipelined architecture. The system investigated has a kind of flexible pipeline architecture in which the order of the elementary processing units may be altered under software control. It is described in Section V.D.
RECENT ADVANCES IN 3D DISPLAY
161
A . Special Purpose Architectures
Implementation in special dedicated hardware generally yields the highest performance increase for a system. On the other hand, hardwired logic severely limits the possibilities of adaptation to changing needs. The following sections present three different approaches to the design of voxel-based hardware. The systems have been chosen because they represent three different “philosophies”: one is designed for visualizing sampled data, the second one is targeted primarily at solid modeling (CSG) applications, while the third system attempts to offer a solution when sampled and synthesized data have to be combined. 1. The GODPA System
Under several different names, Goldwasser and his coworkers have described an architecture for the real-time display of gray-value voxel models, specifically for medical applications. The system is alternately called GODPA, or generalized object display processor architecture (Goldwasser, 1984), and the voxel processor architecture (Goldwasser and Reynolds, 1987). A version known as VPPW, or the voxel processor physician’s workstation (Goldwasser er al., 1988), is commercially available. Although these systems differ in minor aspects, their underlying architectures are the same. The basic assumption of the designers was that in order to achieve a significant speed-up of a voxel-based display algorithm, some form of parallelism has to be employed. This is achieved by partitioning object space (the voxel data set) and assigning each subspace to a separate processor to generate an image for its part of the voxel model. These images are then merged into the final image of the entire voxel model. The advantage of this approach is that the projection of N3 voxeb results in an image of O(N*)pixels, which means a reduction in the amount of data of O ( N ) . This amount of data reduction results in less conflicts when accessing shared memory than a method based on the partitioning of image space would entail. For a description of the GODPA system we refer to Fig. 37. A cubic voxel model of size 2563 can be partitioned in 43 = 64 subcubes of size 643. For each of these subcubes a so-called mini-image is generated by the subcube’s own processing element (PE). The resulting 64 mini-images are (eight-way) merged to 8 intermediate images. These are in turn merged by the intermediate processors (IPS) to yield the final output image. The output image can then be processed further, e.g., used as input for the image space gradient shading method (Gordon and Reynolds, 1985). The display algorithm that has been implemented on this architecture is the recursive back-to-front forward-mapping algorithm. The PEs calculate
162
D. P. HUIJSMANS A N D G. J. JENSE .images intermediate
J KEM7
PE 7
-
0 0
(a)
(b)
FIGURE 37. (a) Object space partitioning; (b) GODPA architecture.
both the object space coordinates (voxel addresses) and image space coordinates (pixel addresses) by using look-up tables and simple arithmetic (addition only) and shift instructions. The performance of systems based on the GODPA architecture is as follows: in Goldwasser (1984) an estimated image generation times for a 64 PE system is given as 1/30 of a second for a 2563 voxel model. In Goldwasser and Reynolds (1987) performance figures are given for a hardware implementation of a single PE prototype system. A depth-shaded image can be generated from a 643 voxel model at a rate of 16 frames per second. Finally, the single PE VPPW system is able to produce a surface shaded image from a 2563 voxel moxel in 0.1 ... 10 seconds (1 second typically), depending on the number of object voxels in the volume data set. The system offers some facilities for interactive selection of volume data. Subvolumes may be selected for display by specifying the minimum and maximum voxel coordinates. Depending on the version of the system, a single or several (one per subcube) slicing planes may be positioned to cut away parts of the voxel model, 2. The PARCUM System The central feature of JackZl’s (1985; 1988) “processor architecture based on a cubic memory,” or PARCUM, is the cubic organization of its large voxel memory. This memory can hold up to either 5123 binary voxels or 2563 gray-value voxels. Voxels are grouped in sets of 64 (43) so-called macro volume elements (MVEs), see Fig. 38. The memory is divided in 64 modules in such a way that the individual voxels in a MVE each lie in a different module. This allows the system to read or write 64 voxels from or to memory in parallel.
163
RECENT ADVANCES IN 3D DISPLAY
object genera t o r
1 MVE
memory cube
selector
t address processor
(a)
address convertor
image
(b)
FIGURE 38. (a) Memory organization; (b) PARCUM architecture
The PARCUM system provides hardware support for a ray-casting display method. An address processor incrementally generates the addresses of all MVEs that are pierced by the rays, while a selector unit determines which individual voxels within each MVE are visible. The transformed (image space) coordinates of the voxels are derived from their object space coordinates by an address converter. As in the GODPA system, the resulting image is basically a distance map that is subsequently shaded in a postprocessing step. In addition to these units, the system contains an object generator. This unit is capable of generating voxel models of geometric primitives in the cubic memory. The volume primitives can be combined with the current memory contents with Boolean set operations, this way complex objects can be constructed in a CSG manner. Performance figures for the PARCUM system are not available, since the system is still under development.
3. The CUBE System Similar to the previously discussed architecture, Kaufman’s CUBE system (Kaufman, 1986; 1988a; 1988b; Kaufman and Bakalash, 1988a; 1988b) is centered around a cubic voxel memory (CFB, or cubic frame buffer). Unlike the PARCUM system however, the CFB is organized in a skewed manner. The memory cube of size N3 is subdivided in N modules of size N2 each. A voxel with coordinates ( x , y , z) is stored in module number at address ( i ,j ) where
k = (x
+ y + z)modN
(27)
and i=x,
j = y
(28)
164
D. P. HUIJSMANS AND G. J. JENSE
This organization guarantees that each of the N voxels on a beam, i.e., a line segment along one of the main axes, lies in a different module and may thus be accessed in parallel. Figure 39, adapted from Kaufman (1986), illustrates this memory organization for a small (33) CFB. The second novel feature of the CUBE system is the so-called voxel multiple write bus (VMWB). The VMWB forms part of the 3D viewing processor (called VP3) that produces 2D projections along the main ( k x , k y , kz) axes. After a voxel beam has been read out of the CFB and placed on the VMWB, the surface voxel that is visible from the given viewing direction is determined in O(1og N ) time (instead of O ( N ) in, e.g., the PARCUM system). The VMWB has been designed to handle clipping planes perpendicular to the viewing direction (hither-yon clipping) and transparency of voxels. Because the viewing processor is capable of producing orthographic projections in only one direction, an additional 3D frame buffer processor (FBP3) has been added to geometrically transform the data within the CFB. For example, to produce a parallel projection from an arbitrary direction, FBP3 first performs the rotation of the voxel data within the CFB and VP3 then generates the (orthographic) projection. Because one of the design goals of the CUBE system was to support the combination of both sampled and synthesized data, there is an additional processor, the 3D geometry processor (GP3), for the generation of voxel representations of geometric objects in the CFB. The overall architecture of the CUBE system is shown in Fig. 40.
module 1
module 0
module 2
(b) FIGURE39. (a) CFB organization, numbers along the axes denote module numbers; (b) contents of the modules.
RECENT ADVANCES IN 3D DISPLAY
GP3
).
CFB
*,
vP3
-
165
images
Typical performance figures, assuming a data set of 2563 8-bit voxels, for the CUBE system are rotation about a main axis in 45msec, while an orthographic projection takes 15msec. This means that an object may be rotated about one axis in almost real time (60msec per image, or 16 images per sec). A rotation about an arbitrary axis is computed as three subsequent rotations about the main axes and takes 135 msec. Rotating an object about an arbitrary axis is therefore possible at a rate of about 6 images per sec ( 150 msec per image). A prototype of the CUBE system with a 163 CFB has been realized in hardware. The quoted performance figures were computed from both the measurements obtained from the prototype and simulations of a full-sized CUBE system in software. A VLSI prototype with a 643 CFB is currently under construction.
B. Transputer-Based Parallel Hardware A very brief description of some transputer-related terminology will be given here as an introduction to the systems that are discussed in the following sections. For a more extensive description of transputers and their relationship with other parallel computers, see Hockney and Jesshope (1988). Transputers were designed as processor building blocks for parallel computers. A T800 Transputer consists of a central processor, floating point unit, four bidirectional communication links, some fast local memory, an interface to external memory and process management hardware on a single chip. The CPU, FPU and links (all four of them) may function concurrently. The transputer efficiently executes programs written in the Occam programming language (May, 1987). In fact, Occam may be seen both as a
166
D. P. HUIJSMANS AND G. J. JENSE
high-level language, capable of expressing the sequential and concurrent execution of algorithms and as the transputer’s assembly language. Building a processing network from transputers means that the concurrency aspects of the program should be analyzed. A network topology can then be determined that best matches the structure of the program. The tradeoffs between the static and dynamic distribution of data across the network, as well as the allocation of processes to processor nodes, should also be considered. Finally, the overall speed-up of the program, i.e., the ration SN = TIITN between the program’s execution times on a single processor and on N processors, or alternately the efficiency of the parallel implementation EN = T , / ( N x Tw),should be weighed against the number of processors N in the system (which determines the cost). Since these issues are closely interrelated, the choice of a transputer configuration for a particular application is not easy. In the next sections three different system architectures are described, all based on transputers. Each of them illustrates a different approach to the problems involved in using transputers for the interactive display of voxel models. 1. The Multitransputer Display Engine At the ETH Zurich a display system has been designed that consists of a network of transputers and some additional dedicated hardware (Hiltebrand, 1988). The system architecture is based on the GODPA (see previous section) but uses transputers instead of special purpose processors. A prototype version of the system has been built with eight transputers functioning as PEs and one as IP. The problem with this system turned out to be that the transfer of intermediate images from the PEs to the IP over the transputer hardware links was too slow (although no exact performance figures are given). Note that these links are serial communication lines, whereas communication between processors in the GODPA system occurs over parallel busses. The proposed solution was to add a special 8-bit-wide bus to the system for the transfer of image data between processing nodes. Across this bus, data can be transported at 10 Mbytes/s (10 Mbits/s. for the transputer links). Interfacing each transputer to the bus requires extra hardware. Synchronization of bus data transfers is done via the transputer links. Although no (estimated) performance figures are given for the enhanced system, display times apparently are significantly longer than 1 sec (for a 1283 voxel moxel). In order to reduce display times to less than 1 sec the designers suggest additional hardware to implement the inner loop of the (slice-by-slice BTF) display algorithm.
I67
RECENT ADVANCES IN 3D DISPLAY
2. The Transputer-Based Voxel Processor The FEL/TNO Voxel Processor system is described by Huiskamp et al. (1990). It is on the one hand similar to the previously described system, because it also employs the slice-by-slice BTF display algorithm. It differs on the other hand because it is based on transputer hardware only. The transputers in this system are connected in a doubly linked chain topology, as shown in Fig. 41. A voxel data set is subdivided in a number of subcubes, each of which is assigned to a processing node. Each node in the chain has two main tasks: first, to perform a geometric transformation on its local data (subcube); and second, to merge the resulting subimage with that of its immediate neighbor in the chain. The merging sequence is determined by the depth priority order of the transformed subcubes. In addition to these main tasks, a third task is to receive and distribute control information. The realized system consists of 16 transputers to handle the transformation and merging processes (PEs), and 1 transputer that acts as a controller (CTRL). On this sytem a depth-shaded image can be generated from a 256* x 32 voxel model in 1 sec (a surprisingly fast figure when compared to the hardware assisted transputer system of the previous section). 3 . The MEIKO Computing Surface
Herberts (1989) has reported his findings about the implementation of a volume visualization algorithm on a MEIKO M40 Computing Surface system. This machine was designed to overcome the problem that a transputer can be connected to only four other transputers. Instead of being hardwired in a fixed topology, the system’s 72 processor nodes are connected to a fink switch. This link switch can be reconfigured under software control, enabling the user to define the network topology. The approach chosen here differs from the previous two in that the display algorithm is based on volume ray tracing. It is a parallel version, with some minor alterations, of the algorithm described by Levoy (1988). images
PEO
PEl
PEl4
host
FIGURE 41. The transputer-based voxel processor.
PEl5
168
D. P. HUIJSMANS AND G. J. JENSE
The network topology that was devised is depicted in Fig. 42. It is a hybrid topology, consisting of a ring (doubleheaded arrows) and a number of chains (downward arrows). As indicated by the arrowheads, communication on the ring is bidirectional, while along the chains it occurs in one way only. The driver node handles the communication with the host system and controls the ray-casting processes by initiating new rays when render nodes become idle. The render nodes perform the actual calculation of pixel colors, determined by the rays that are cast through the volume data. Calculated pixel values are set to the grafix node that controls the frame buffer. The parallelization method that was used is based on image space partitioning: work (calculations) for the render nodes is handed out by the driver node on a scan-line basis. When a render node has computed all pixel values for a scan line, it sends them to the grafix node. It is then ready to receive a new scan line from the driver node. Object space partitioning is not used: all render nodes have a copy of the complete volume data set. Measurements of the efficiency EN of the system were carried out for configurations of up to 22 transputers, showing that an efficiency of 95% was attainable even for the largest configurations. This means that communication overhead within the network is low. However, other experiments brought to light that, when the number of chains in the network is increased, total image generation time does not decrease as expected. The explanation for this phenomenon is that the
... ...
*
9 Rende
Rende
Rende
Rende
Render
Rende
Grafix FIGURE 42. Transputer configuration of the MEIKO system.
RECENT ADVANCES IN 3D DISPLAY
169
render nodes calculate the pixel values faster than the grafix node can handle, thus creating a bottleneck in the system. Typical image generation rates, for the 22 transputer system and voxel models of approximately 2M to 3M voxels, are six images per second using surface shading and one image per second when transparency is taken into account.
C . Image-Processing Hurdware The computations involved in many image-processing operations, notably point operations and m x n convolutions, can be calculated in parallel for each pixel. Various hardware architectures have been proposed (and built) that consist of many processing elements functioning in parallel. Each processing element then operates on a separate part of the image, sometimes even down to the single pixel level. A substantial speed-up of local image operations may be achieved in a simpler way by pipeline architecture. Here the separate steps of the local image operation are executed in an assembly-line fashion by a single processor, consisting of various different functional units. The sequence in which pixels are passed through the pipeline in this case can be provided by the video scanout hardware. To obtain real-time performance for point operations, all pixels should be processed within one video cycle; i.e., 1/25th of a second. The time an operation needs to pass through the pipeline should be less than the horizontal retrace time of the video cycle (about 10 11s). This prevents resulting pixel values from being written on the next scan line. Performing an m x n convolution within one video cycle, would require a pipeline of sufficient length to process the largest possible convolution kernel in one “pass.” Such a pipeline should consist of mn multipliers, mn - 1 adders and a divider. Apart from the fact that the convolution kernel can in principle be as large as the entire image, this means that for the more frequently performed 3 x 3 and 5 x 5 convolutions, only a small number of processing elements in the pipeline is effectively used. To overcome these problems, the operation can be spread out among several video cycles, allowing the use of an even shorter pipeline. The functional pipeline of the processor in image processing systems often consists of one arithmetic unit that is able to perform multiplications/ divisions and additions, one logical unit for bitwise logical operations between pixels’ and a table look-up processor. Thus, an m x n convolution requires mn video cycles for the multiplications/additions and 1 additional cycle for the final division.
’ Sometimes the arithmetic and logical units are combined into a single one
170
D. P. HUIJSMANS AND G . J. JENSE
Spreading out a complex operation over several video cycles requires the storage of intermediate results. To avoid truncation errors at every step, a frame store for intermediate results with twice the number of bit planes of an input image is needed. 1 . Hardware Description The image processing hardware available for this project consists of two printed circuit boards that plug into an IBM PC/AT backplane. Figure 43 gives an overall view of the system, showing the interconnections between the two cards, the PC/AT host and the outside world. Each of the frame store memories is addressable through the PC/AT bus. In addition to this, image data may be exchanged between the two subsystems by means of an external video bus. Control of the two units is effected through a number of hardware registers that are mapped in the PC/AT host's memory space. The following sections describe the two subsystems in more detail. The first board is a Data Translation DT2851 High Resolution Frame Grabber (Data Translation, 1986a). Figure 44 is a block diagram showing the most relevant functional units. The frame grabber is capable of a digitizing analog video signals into 8-bit samples in real time. There are eight 256 x 8-bit RAM input look-up tables for input data conversion. The digitized images may be stored in one of two 512 x 512 x 8-bit frame-store memory buffers. The two frame buffers are mapped into the IBM PC/AT memory space and can be accessed using normal memory instructions. Write protection of image data may be enabled for several combinations of bit planes. A feedback loop from the output of the frame buffers to the input look-up
2 1 Video
I/O
-
Video Port
I,
"
#I
Frame Processor
video Port
I I
512x512
512x512
8-bi t
8-bi t Frdme Store 2
Frame S t o r e
t
4
Frme
Store Frame Grabber/Buffer
9
I
512x512 16-bi t W O W
RECENT ADVANCES IN 3D DISPLAY
video
-.
&bit ADC
Eight ’ 2Sbx8-bit
--
Frme
171
-
Store I
Frme
u u -
Store 2
FIGURE 44. Block diagram of frame buffer-grabber.
tables allows image data to be passed through any of these tables and rewritten in either frame buffer. This permits several arithmetic or logical functions to be performed on images. Eight 256 x 24-bit RAM output RGB color look-up tables are available for pseudo-color or gray-level conversion of image pixels. There are also two external video 1/0 ports. These ports can be used to transfer data in and out of the frame buffers at video speeds to other image processing devices such as the frame processor that will be described next. The second board is a Data Translation DT2858 Auxiliary Frame Processor (Data Translation, 1986b). See Fig. 45 for a block diagram of this device. The frame processor is connected to the frame grabber through the external video IjO ports. The six 8-bit to 16-bit look-up conversion tables are used to perform operations such as pixel offsetting, multiplication or thresholding. For instance, when calculating a convolution, the tables are used to multiply or divide data by a constant. A histogram generator is capable of calculating a 256-bin x 32-bits histogram of the pixel values in the frame buffer. The resulting histogram is stored in (part of) the conversion table RAM. The division logic employs a successive approximation method to divide 16-bit numbers from the frame buffer by an 8-bit divisor, giving 8-bit results. A division is performed by the ALU, along with the successive approximation register and the conversion table. The 16-bit ALU can perform arithmetic and logic operations on both incoming data and data in the frame buffer. The incoming data may be added
172
D. P.HUIJSMANS AND G . J. JENSE
.) Histogram
FIGURE 45. Block diagram of frame processor.
to, subtracted from, logical ANDed, ORed or XORed with data from the frame buffer. Various combinations of these functions are also possible. The 512 x 512 x 16-bit frame buffer stores video frames sent from the frame grabber/buffer, intermediate computational results or image data that are to be sent back to the frame grabber/buffer. Data may also be looped back into the frame buffer via the ALU. Finally, there are two offset registers controlling the start address for the scan-out sequence of the frame buffer. Programming these registers allows panning and scrolling of images. An important use of these registers is the computation of convolutions of image data with a convolution kernel matrix. In several passes, all pixels of the image are subsequently multiplied by the convolution matrix coefficients and added to the intermediate result. In each pass, the pixels are offset by an amount that corresponds to the place of the coefficient in the convolution kernel matrix. During scan-out of the frame buffer, pixel values may be replicated in both x and y directions either one, two, four or eight times. When used in conjunction with the offset registers, this feature allows a general zoom operation on stored images. D . TAAC-1: A Flexible Pipeline Architecture
An entirely different approach to hardware support is described by England (1986). In this article, several different application areas are reviewed that have one thing in common: a need for hardware support for computation intensive tasks such as graphics display functions, image processing
RECENT ADVANCES IN 3D DISPLAY
173
operations and heavy use of (vector) arithmetic. The point is argued that, while each of the mentioned applications would be speeded up most by special dedicated hardware, such systems provide a solution that is useful only in that specific application. Moreover, such a solution is too inflexible when the applications have to be adapted to changing user needs. The architecture of the system that was eventually developed, the TAACI , is briefly described in England ( 1988) and more extensively in Sun Microsystems (1988). Like a pipelined architecture, the TAAC-I was designed to efficiently support the fine-grain, i.e., at the instruction level, parallelism of many algorithms. To distinguish it from pipelined architectures, the TAAC1 is called a low-latency design. This means that, while several instructions can still be executed in parallel, the TAAC- 1 does not suffer from the delays that occur when a pipeline has to be filled or flushed. The most important subsystems of the TAAC-I are the processor and the various memory modules (see Fig. 46). The processor consists of six processing units that are all capable of operating concurrently. There are two integer ALUs, a floating point unit, an integer multiplier, a barrel shifter and a look-up table processor (see Fig. 47). These units communicate with each other and the peripheral units, such as IjO registers, by means of six separate internal busses. This layout of processors and busses explains why the TAAC-1 could also be called aflexible pipeline design: the flow of data between processors across the busses can be altered on a per instruction basis and is not fixed as in a "standard" pipeline design. Instructions for the TAAC-1 are 200 bits wide. An instruction specifies the actions of each of the six processors, the bus(ses) from which their respective input data are to be read and the bus(ses) to which their output data have to be written.
A
A
Host
t-).
d Interface
DRAM
Video
-.
V Processor SRAM
PRAM
3.
+
174
D. P. HUIJSMANS AND G . J. JENSE
MUL
t
.
1
,
FP U
BS
LUTP
addresses
FIGURE 47. Main functional units of the TAAC-1 processor.
On the TAAC-1, program instructions are stored separate from program data in module PRAM (16 K 200-bit words). Program data, such as global variables and the run-time stack, reside in a special, very fast, static RAM module (SRAM, 64Kbyte). Such an organization is known as Harvard architecture. The bulk of the data is stored in the data-image memory (DRAM), organized as 2M 32-bit words (8Mbyte). As the name indicates, this memory can be used to hold both images and any other kind of data, no real distinction is made between them. Any part of data-image memory may be selected for display. The data-image memory is accessible by normal read and write operations. In addition to this, special conditional write hardware allows the whole memory to be used as a Z buffer. Address decoding hardware makes it possible to efficiently specify linear (1D) addresses, pixel (2D) addresses and voxel (3D) addresses when accessing data-image memory. The video output unit is programmable to generate various video signal formats and consists, among other things, of four 8 x 24-bit color look-up tables. Finally, the interface unit controls the communication with the host computer. Software development for the TAAC-1 is aided by the availability of several tools. First, the system comes with a C compiler that allows existing software to be rapidly ported to the TAAC-1. A library of readily usable functions for graphics, image processing, volume visualization and various control purposes is included. For the development of more efficient software, which fully utilizes the system’s capabilities, an assembler is available. Furthermore, there is a debugging tool and an execution profiler for the analysis of running programs.
RECENT ADVANCES IN 3D DISPLAY
175
Programs that run on a host/TAAC-1 system are divided in two parts. The part that runs on the host takes care of disk I/O, handles the user interface, etc. The computation intensive tasks run on the TAAC- 1 accelerator. Because the two parts of the program run asynchronously, some sort of protocol should handle the intertask synchronization. For instance, the TAAC-1 program should halt when the host has new data available that are to be downloaded in the TAAC-I data-image memory. A simple handshaking protocol, implemented by means of two mutually readable flags, is usually sufficient to handle this. A number of performance figures are mentioned in England (1988). A speedup of a factor of 60 with regard to an unaccelerated workstation (Sun 3/160) is given for different tasks such as a ray-tracing program, a 2D FFT function and an adaptive histogram equalization routine. The library graphics function for rendering Gouraud shaded and Z-buffered 3D polygons is capable of generating more than 18,000 polygons per sec. These figures indicate that, while the TAAC-1 does not perform as well in each of these individual cases as dedicated hardware systems, it offers a substantial speed-up in all of them. The general programmability of the system is responsible for this “overall performance.”
E. Comparison The GODPA system is targeted specifically at medical applications. It provides several capabilities for the visualization of voxel models that are obtained by x-ray tomography. Depending on the version of the system, the size of the voxels is limited to either 4 (VPP) or 8/16 (VPPW) bits. The display algorithm is restricted to image space gradient shading. Gray-level gradient shading and transparency are not supported, which we think is a serious drawback for a volume visualization system. The spatial selection facilities are limited to slicing along a single plane. Since it is more or less a stand-alone system, loosely coupled to a host computer, the development of software tools for experimenting with alternate visualization methods will not be easy. Advantages of the GODPA system are that it is “scalable,” which is confirmed by the quoted performance figures. Finally, the system has been fully developed and a commercial version is available. Several of the previously mentioned shortcomings also apply to the PARCUM system: voxels are limited to be 1 or 8 bits in size and only image space gradient shading is available for surface rendering. Moreover, the display algorithm does parallel projection, which is a little strange since the system is aimed specifically at solid modeling applications, where perspective projections would be very useful (see Section 1II.B). For our
176
D. P. HUIJSMANS AND G . J. JENSE
requirements (display of irregular objects) the lack of perspective projection is of less importance. However, should the performance of the system be sufficient to allow real-time rotation, it would be a serious omission. The PARCUM system does not appear to be easily extendible, limiting the performance to what is offered by the basic system. An advantage of the system is that it offers CSG modeling operations. These could be applied for spatial-editing purposes, for instance to specify complex regions that are to be removed from a sampled volume data set. However, the modeling operations are applied directly to the cubic memory, physically erasing data that is marked for hiding. Using these modeling operations for visualization purposes would therefore require extensive modifications to the system. Currently, the system is still very much in the prototype stages of development. The preceding also holds for the CUBE system, although this has an important advantage over the other systems: it offers the possibility to render voxel models with transparency and has segmentation facilities built into the display hardware. Similar to the previously mentioned systems, CUBE is loosely coupled to a host computer that handles the interaction with the user and data I/O. How this affects the development of software remains unclear. Like the PARCUM system, CUBE is still under development. The big advantage of transputer-based systems is that they can be scaled to meet given performance requirements. Another advantage is that they can be programmed in a high-level language and that various tools exist for software development. On the negative side the inherently difficult tasks of load balancing and data distribution should be mentioned. Due to the distribution of data, spatial selection and editing techniques will be difficult to implement on such systems. Also, a bottleneck between the computing nodes and the frame buffer is hard to avoid, unless some sort of distributed frame buffer is used. In Sandler, Hayat and King (1990) the performance of transputers is compared to a number of digital signal processors and a conventional processor. The transputer architecture turns out to be the slowest for typical image-processing tasks. A transputer network solution will therefore present an unnecessary costly solution. The PC-based image processing hardware can be used for interactive visualization of voxel models, as will be shown in Section V1.A. Due to the relatively small amount of memory, it is limited to small binary voxel models. Images can be rendered at reasonable speed, but viewed only along the principal axes. Several other hardware features can be exploited to enhance the surface rendering of the volume data. Spatial selection capabilities are in practice limited to the selection of block-shaped regions and the positioning of a slicing plane. The system is inexpensive though and available “off the shelf.”
RECENT ADVANCES IN 3D DISPLAY
177
The TAAC-1 accelerator is generally applicable in tasks involving heavy numeric computations, image processing and graphic display functions. Because of the tight coupling with the host computer (easy access to data-image memory) and the well-integrated software development environment, it offers a suitable platform for experimenting with interactive visualization of voxel models. The raw processing speed appears to be adequate for these purposes. The disadvantage of the system is that it is not extendible and that the development of assembly language programs is not very well supported. This last issue is important, since neither the accompanying software library nor the C compiler seems to fully exploit the available processing power, as witnessed by the performance figures given in Section 1II.D. The hardware supported 2D- and 3D-addressing scheme only uses 32-bit words, whereas 8-bit words (bytes) prevail among 3D data sets. As to the portability of software, this is not as straightforward as the TAAC-1 documentation suggests. The C code for the basic display algorithms can indeed be recompiled with the TAAC C compiler, but in order to achieve a speed-up on an order of magnitude the special (nonportable) features of the TAAC C compiler must be used. Furthermore, the subdivision of existing code in host and TAAC resident parts, along with the required communication between them, often requires an extensive reorganization and partial rewriting of existing C code. VI IMPLEMENTATIONS A . 3 0 Image Processing and Image Analysis
In Section I it was argued that the voxel model or 3D scalar field may be considered as a three-dimensional extension of a two-dimensional digital image model when the scalar value is visualized as an intensity value. It is therefore obvious to investigate whether and how the theory and applications of traditional two-dimensional image processing may be extended to a three-dimensional image processing concept. Note however that the term imageprocessing, as used in this context, is strictly speaking not correct. In a voxel model we are dealing with a three-dimensional data set from which, through the use of various display methods, a two-dimensional image must be created before data can be visualized. However, the analogy with 2D image processing is so strong that we will also call the 3D case image processing. Because an exhaustive survey of 3D image processing and analysis techniques would be far beyond the scope of this section, we have chosen a number of techniques that proved useful for interactive visualization.
178
D. P. HUIJSMANS AND G. J. JENSE
1. Point Operations
Several classes of image processing functions exist. A first class are point operations, where the value of an output pixel is determined solely by the value of the corresponding pixel in the input image. All point operations can be trivially extended to three dimensions. As an example an intensity transformation called histogram equalization will be described. a. Histogram Equalization. The gray-value histogram of an image gives the fraction (or relative frequency) of each possible gray value as it occurs in a specific image. The gray-value histogram of an image can be an important aid in evaluating the quality of the image and for the segmentation of the image in different parts. When the segmentation is done by thresholding, the histogram can provide information about the possible locations of threshold values in the gray scale; e.g., when the gray-value histogram shows marked peaks, the bottoms of the valleys between them are good candidates for the threshold values (Rosenfeld and Kak, 1982). In the context of volume rendering, the voxel classification problem, mentioned in Section III.C, can be seen as a segmentation of the voxel model in differently labeled parts, based on the distribution of gray values. A second issue related to the distribution of gray values in the voxel model is that of edge or surface detection. In Section 1II.C a method has been described for rendering surfaces from volume data, based on estimating local surface normal vectors from gray-value gradients. When the voxel gray values are redistributed, errors are introduced in the estimation of the normal vectors. Altering the directions of the surface normal vectors may lead to artifacts in the resulting image. Histogram equalization is an image processing technique for global contrast enhancement by transforming pixel gray values. In the continuous case, where the gray levels are not quantized, it can be shown that the resulting gray-value distribution function p , ( s ) = 1; i.e., uniform across the gray-value range. In the discrete case, which is considered here, the gray-value distribution is only approximately uniform (Gonzalez and Wintz, 1987). Usually, histogram equalization is applied to the entries of an output look-up table in order to redistribute the displayed colors or gray values more evenly and thus enhance visual contrast. When the redistribution of the gray values is applied directly to the voxels, and the resulting voxel model is rendered using gray-value gradient shading, sGme of the surface irregularities are removed. This can be seen in Fig. 48, where a number of dents in the forehead and side of the skull have disappeared, due to the “stretching” of the local gray-value gradient, while reguIar bone surfaces have not been affected.
RECENT ADVANCES IN 3D DISPLAY
179
FIGURE 48. (a) Gray-value gradient shading; (b) histogram-equalized gray-value gradient shading.
180
D. P. HUIJSMANS AND G. J . JENSE
2. Local Neighborhood Operations A second class is formed by the so-called local filter operations, in which a, usually small, number of neighboring pixels determines the value of the output pixel. This class may be further subdivided in two subclasses, depending on whether just the values of the neighboring pixels play a role or both their locations and values. Rank filters belong to the first subclass, while the second one is formed by the weight filters and geometric transformations. Extending local neighborhood operations to 3D is essentially as easy as extending point operations, as will be shown by two examples: an implementation of a 3D rank filter, and the implementation and analysis of a previously published algorithm for the rotation of voxel models. a. 3 0 Rank Filtering. The display of surfaces from voxel models, based on local gray-value gradients is a form of 3D local edge detection. This is also reflected in our description of the gradient-based shading methods in Section I11 as local edge operators. Because edges correspond to sharp changes in gray values, they are features with high spatial frequency and are thus sensitive to high-frequency noise (Ballard and Brown, 1982). This becomes clear when looking at Fig. 49, where many artifacts of the surface shading can be observed that are due to the noisy character of most volume data sets, especially when input comes from confocal scanning laser microscopes or MR scanners. In order to make the gradient-based detectors less susceptible to noise, image smoothing should be applied to the voxel model prior to display. Because the edges (surfaces) that are present in the volume data should be preserved, filtering methods based on neighborhood averaging should be avoided, as they tend to blur edges. Instead, median filtering seems the appropriate way to decrease the noise content of a voxel model. Median filtering is a form of rankfiltering. In a rank filter, operating in a small local neighborhood of each pixel, the value of the output pixel is determined solely by the gray values of the pixels in the neighborhood. Contrary to weight filters, the location of the surrounding pixels does not play a role. A rank filter operates as follows: to each output voxel a local 3D neighborhood of voxels, surrounding the corresponding input voxel is assigned. All gray values within the neighborhood are sorted in ascending order. One of the values from this sequence is assigned to the output pixel. When the smallest (largest) value is used, this is called minimum (maximum) filtering, but any value from the sorted sequence can be assigned. In practice, the median value is most often used because it effectively removes noise spikes, but leaves edges intact (Rosenfeld and Kak, 1982). In the implementation of the 3D rank filter, special care was taken to minimize the number of times voxels are read from memory, as well as the
RECENT ADVANCES IN 3D DISPLAY
181
amount of time spent in sorting the voxel values in the local neighborhood. The algorithm consists of three nested loops over the 2 coordinates (slices), y coordinates (rows) and x coordinates. When for instance a 3 x 3 x 3 neighborhood is used, the algorithm does not read all 33 voxels when it advances to the next column. Instead, this is done only once on each row. After that, a slice of 3' voxels is dropped and a new slice of 3' voxels is read each time the x coordinate is incremented. Minimizing the sorting was done by using an adapted version of Quicksort that returns the kth order statistic (such as the minimum, maximum or median value) from a series of N elements, with I 5 k 5 N . The adaptation of Quicksort is based on the observation that Quicksort partitions an array a [ 1 . . . N ] into a subarray a [1 . . . i - I ] with all elements less than or equal to a[i],and a subarray a[i + 1 . . . iy3 with all elements greater than or equal to a[i].Thus, when the kth order statistic is sought and i = k the sort can be terminated. Otherwise, we recurse on the left subarray if k < i, or on the right subarray if k > i. It can be shown that the time complexity of this algorithm is linear on average (Sedgewick, 1988). In spite of this efficient sorting algorithm and the buffering strategy for reading voxels, our implementation on the Sun/TAAC-1 workstation still takes about 6 minutes to median filter a 256* x 32 voxel model, using a 3 x 3 x 3 neighborhood, while the 5 x 5 x 5 median filter takes 20 minutes. A comparison of Fig. 50 with the original image in Fig. 49 shows that many of the noise artifacts are indeed removed from the voxel model. After
FIGURE 49. Surface shaded image of a grain of Scotch pine pollen before filtering.
182
D. P. HUIJSMANS A N D G. J . JENSE
applying the 3 x 3 x 3 filter a few noise residues remain, but they disappear when the 5 x 5 x 5 filter is used. Note however that the 5 x 5 x 5 filter removes so many voxels that the background shows through several holes in the object. b. Geometric Transformations. Geometric transformations (translation, scaling, and rotation) occur for instance when the voxel models of multiple objects have to be positioned with respect to each other to be combined or compared. Geometric transformations may be carried out in two ways.
FIGURE 50. Images of the same object as in the previous figure after 3D median filtering: (a) 3 x 3 x 3 neighborhood, (b) 5 x 5 x 5 neighborhood.
I83
RECENT ADVANCES IN 3D DISPLAY
When they are performed in the forward direction, the input or source image is scanned sequentially and source pixel coordinates are transformed to their corresponding values in the output or destination image. Because the destination coordinates are not necessarily integer values, holes may occur in the image due to round-off errors; i.e., multiple-source pixels may end up at the same destination grid point, while some destination grid points will never be addressed. These holes may be avoided by performing the transformations in the reverse direction: the destination image is scanned and the inverse transformation is applied to the destination pixel coordinates. The corresponding coordinates in the source image again generally do not lie on the discrete grid points, but somewhere in an elementary volume cell with eight input values at its corners. The destination gray value is now calculated by some interpolation scheme, e.g., nearest neighbor, trilinear, etc., from the values of the eight surrounding grid points. Forward transformation from source to destination is sometimes called pixel carryover and its 3D equivalent can be named voxel carryover. Backward from destination to source is known as pixelfilling with voxelfilling as 3D equivalent. We will now look at an algorithm for the efficient rotation of voxel models, which is the most computation intensive of the geometric transformations. This is an extension of an algorithm for rotating binary 2D images by Hersch (1985), and it has as such been described by Kaufman and Bakalash (1988b). As in the 2D situation, the transformation is performed from destination to source (back rotation). Briefly, the 2D version of the algorithm works as follows. Given the coordinates (x;, y ; ) of a pixel PI,in the destination image, the coordinates of the corresponding pixel Po in the source image are cos(- a ) - sin(- a )
(;:)
( sin(-
(i)
cos(- a ) ) where -0 is the angle of rotation from destination to source. The transformation does not have to be applied to all individual pixels of the destination image. Instead, back-rotated coordinates may be computed incrementally from those of neighboring pixels that have already been transformed. Consider two neighboring pixels in the destination image P( = (x(, y ( ) = (x; + 1, y i ) and Pi = (x;, y i ) = (x;, y,!, 1). Applying the back rotation to these pixels gives the coordinates in the source image: =
a)
+
(;:)
cos(-a) = (sin(-a)
=
(z:)
-
sin(-a)) x:( cos(-a)
cos( -a) +
t)
(sin(-a))
(30)
184
D. P. HUIJSMANS AND G . J . JENSE
So, cos(-a) and sin(-a) need be computed only once, and the source image coordinates may be computed from previously calculated coordinates by two addition or subtraction operations per pixel. An entire image is rotated by stepping along scan lines in the output image, one scan line after another. Back-rotated coordinates of the starting and end point of each scan line are calculated using Bresenham’s algorithm for scan conversion of straight line segments. Thus, the full inverse transformation need be applied only to the four corner pixels of the destination image. All other coordinates can then be computed incrementally at a much faster rate. The algorithm is easily generalized to 3D. Rotation about a principal axis is done by rotating the entire voxel model slice by slice with slices chosen to be perpendicular to the rotation axis. A rotation about an arbitrary axis is performed by decomposing it into three separate rotations, each about a principal axis, Note that, unlike translation and scaling, rotation of raster images requires the use of intermediate storage because no scanning sequence of the destination image can be determined that guarantees that all pixels are read from the source image before they are overwritten by new (destination) values. However, since the algorithm is restricted to rotation about one of the principal axes, source and destination pixels lie in the same plane perpendicular to the axis, and the algorithm can proceed slice by slice, auxiliary memory for only one slice at a time is required instead of an entire 3D buffer. In the preceding description it is assumed that the source and destination images are embedded in an infinite grid (coordinate system). In practical applications the following difficulty arises. As can be seen in Fig. 51, not every pixel in the rotated image lies within the bounds of the original image. When the original image fills the entire image buffer, some back-rotated pixels end up outside the buffer, where no interpolation values are available. This requires a modification to the algorithm in order to calculate the clipped subarea, indicated in Fig. 51, that can be “safely” processed. Fortunately this subarea is convex, and it can be represented simply in terms of two tables storing the starting and end points (minimum and maximum x coordinates of each scan line). Because the rotation of a voxel model can be done slice by slice, the amount of auxiliary storage is the same as for the 2D algorithm. In total, for a voxel model of size N 3 , the algorithm requires N2 2N additional storage, N2 for the intermediate slice buffer and 2N for the subarea edge tables.
+
RECENT ADVANCES IN 3D DISPLAY
185
source
des t i na ti on
's a f e ' s u b a r e a
FIGURE 51. Source and destination image.
An implementation of the algorithm was made on the accelerated Sun/ TAAC-I workstation mentioned before in Section V.D. The aim was to compare the performance of a display algorithm, based on rotation of the voxel data, followed by a fast orthographic projection along a principal axis, with the display algorithms discussed in Section 111, which allow more general viewing directions. Table IX shows both the relative and absolute times spent in various stages of the algorithm for a rotation about all three principal axes of 90". The first thing that can be learned from the table is that the overhead of the algorithm, lumped together in the routine Setup, takes a negligible amount of time. Most of the set-up time is for filling the edge tables. Reading the voxel values from the source voxel model and writing the destination values are evidently very expensive. These routines were taken from the vendor-supplied TAAC- 1 software library. They apparently do not use the hardware facilities for 3D addressing but instead translate 3D to linear addresses in software via a number of computations and table look-up operations (SUN Microsystems, 1988). When this table is compared with the tables in the previous section, it becomes clear that displaying a voxel model by rotating it in memory and then performing an orthographic projection along a principal axis is not a good alternative to the other display algorithms investigated. Even if the projection would take 0 seconds, display times would be nearly twice as long (80 sec) as with the SBS BTF algorithm (47 sec). The geometric transformations during interactive exploration will therefore be incorporated in the display algorithm. Only when the volumetric data itself is needed in a different orientation or scale will this geometric transformation algorithm be applied.
186
D . P. HUIJSMANS A N D G. J. JENSE
TABLE IX PROFILE OF TAAC-I VOXEL ROTATION ALWRITHM Time spent Routine
Yo
Setup Rotate ReadVox WriteVox
-
Total
sec -
17 31 52
25.2
100
80.1
13.5 41.4
3 . Malhematical Morphology
In image analysis techniques, the extension from two to three dimensions suffers from problems of a more fundamental nature. For example, due to the greater topological complexity, contour following in 2D is basically simpler than surface tracking in 3D (Artzy et al., 1981). Moreover, when a voxel model has been acquired via the sampling of many intermediate 2D sections, there are two ways in which an image analysis technique may be applied. In the first place, the technique may be applied to the separate sections, after which a processed voxel model is constructed from the processed sections. Another way is to apply a 3D equivalent of the analysis technique directly to the voxel model. The results of these two approaches can be quite different. This will be illustrated by the description and comparison of a number of 3 0 image thinning methods. a. 3 0 Image Thinning. The structural shape of an object can be represented by its skeleton. In 2D image analysis, the skeleton of an area is a 1pixel thick stick figure or wireframe representation, while in the 3D case a skeleton can consist of both linear and surface elements. Local symmetry axes and surfaces form the structural elements of a skeleton. A skeleton may be obtained via the medial axis transform (MAT). The following definition of the MAT has been proposed by Blum (Gonzalez and Wintz, 1987): 0
0 0
Let G be a set of connected points that define an object, Let the contour C be the set of points that make up the boundary of G , Let x E G and d, be the minimum distance of x to C , Let N , be the number of points on C at a distance d, from x,
The medial axis M is then defined as the set of points x in G with N , > 1 for
RECENT ADVANCES IN 3D DISPLAY
I87
the corresponding d,. The MAT is in fact the collection of symmetry points and their corresponding radii of maximally inscribed circles that touch the boundary in at least two different points. Note that this defines the MAT as a continuous transformation. When only the symmetry points are retained, the structure is called the skeleton. Because the information about the radii of the inscribed circles has been discarded, the original object can no longer be reconstructed from the skeleton. In case of the MAT the original object is the union of all inscribed circles. Thinning is a method to obtain an approximate skeleton of an object on a discrete grid. Our interest in 3F thinning algorithms lies in the fact that the speed performance of voxel projection display algorithms is to a large extent determined by the number of voxels that are projected and rendered (see Section 1Il.D). This performance is at present not sufficient for display rates of several images per second so that an object’s shape can be deduced from, e.g., a rotating image. If, however, the number of voxels can be significantly reduced, while at the same time enough structural information is retained to provide visual clues of the object’s shape, the display time can perhaps be reduced to an acceptable level. Several CAD programs use a wireframe representation during interactive reorientation and the slower shaded views once the correct orientation is reached or when the user waits long enough between successive changes. A thinned voxel model can also be used to provide more rapid visual feedback, e.g., for setting up proper viewing and rendering parameters, while the detailed image, based on the fullvolume data set, has to be generated only once the correct orientation and selection is determined. A skeletonized voxel model can be rotated and displayed much faster because it is only a binary voxel model and the number of object voxels is greatly reduced. b. Fundamentals of 2D Image Thinning. Thinning algorithms operate on binary images. It is therefore presumed that in the binary image an object has already been defined as a result of some segmentation algorithm. Thinning an object is done by repeatedly removing pixels from its boundary. The conditions under which a pixel may be peeled off are subject to certain constraints, determined by the requirement that the skeleton of the object should be approximated in order to reflect the object’s shape as closely as possible. The final goal is to obtain the skeleton after a finite number of thinning operations; that is, until the skeleton has become stable. This leads to the following requirements for thinning algorithms: 0 0
The algorithm should not disrupt the connectedness of the thinned object, End points of linear structures and boundary contours should be preserved.
188
D. P. HUIJSMANS AND G. J . JENSE
A remark has to be made here about the choice of connectivity of the object and the background. The notion of adjacency for the 3D (voxel) case was introduced in Section 111. In 2D images pixels can be either 4-adjacent (across an edge) or 8-adjacent (across an edge or vertex). Two pixels are called 4-connected when a path of four neighbors, i.e., pixels that are adjacent across and edge, can be found from one to the other. Similarly, eight neighbors are those pixels that are adjacent across an edge or a vertex (node), and two pixels are %connected when a path of 8 neighbors exists between them. A paradox arises when object and background have the same connectivity. In the case of 8-connectivity, a connected object could completely surround a part of the background, while at the same time the surrounded part would remain connected to the rest of the background. When both are 4-connected, the piece of the background inside the object would not be connected to the outer part, even though the object is not fully connected around it. To avoid this difficulty, the background should be 8connected when the object is 4-connected and vice versa. Even better would be a 6-connected topology for both, but this optimal topology is not in widespread use in the image analysis community. In 2D image processing, several thinning algorithms have been described. The algorithm by Zhang and Suen (1984) is a basic reference. We will briefly describe this algorithm here. The ZS algorithm is a so-called parallel thinning algorithm. This means that the value of a pixel in the new image depends only on its old value and those of its neighbors in a (3 x 3) window in the old image. The consequence of this is that all pixels can be processed simultaneously. It will be assumed that the pixels in the 3 x 3 neighborhood of the central pixel are numbered as in Fig. 52. The ZS algorithm achieves the preservation of connectivity by dividing the thinning operation in two steps. In the first step, a pixel PIis removed from the image only if it satisfies all of the following conditions: 1. 2 5 B(P1) 5 6 2. A(P1) = 1
3. P2'P4'P6 = 0 4. P4'P6'P8 = 0
Here, A ( P I ) denotes the number of 01 patterns in the ordered sequence P2,P3,. . . , P9;i.e., the number of times a transition from value 0 to value 1 is encountered in a "walk" around the central pixel. B(P,)is the number of neighbors of P , that have value 1. The first two conditions select pixels that lie on the boundary of the region. The last two conditions are satisfied by those pixels for which P4 = 0 v p6 = 0 V (Pz = 0 A Ps = 0). In other words, the pixel could be on an east or south border or it could be a northwest corner. In the second step, the first two conditions are applied
RECENT ADVANCES IN 3D DISPLAY
P9
PZ
P3
Pa
P1
P4
P7
P6
P5
189
FIGURE 52. The clockwise numbering of pixels in a 3 x 3 neighborhood
unaltered, but the last two are modified to
3'. Ps.Pq-PfJ = 0 4'. P2'P6'P8 = 0 With a similar argument to that given earlier it can be shown that this step removes only pixels lying on a north or west border or in a southeast corner. By subdividing the thinning operation in two steps the removal of structures with a thickness of two pixels is prevented. Several points of criticism can be raised against the ZS algorithm: 0 0
Straight diagonal segments with a thickness of 2 pixels, which ought to be preserved, are eliminated instead. Any pattern that can be reduced to a 2 x 2 pixel square disappears. Noise is enhanced.
5 6, When the first condition in the ZS algorithm is modified to 3 5 E(PI) the first two shortcomings are eliminated. This improvement to the Zhang and Suen algorithm has been suggested by Lu and Wang (1986). In order to speed up 2D thinning algorithms the conditions that determine whether a pixel may be removed are stored in a look-up table. This table represents all 256 possible pixel configurations within a 3 x 3 neighborhood, along with the decision whether deletion is permitted or not. Indexing the table is done by examining the occupancy of the neighborhood of each image pixel. This yields an ordered sequence of 0/l bits which in turn is interpreted as an 8-bit table index, resulting in the appropriate yes/no decision. c. 3 0 ZS-Based Thinning Algorithms. In this section a number of 3D thinning algorithms will be presented, which are all based on the 2 D ZS algorithm. When our 2D image thinning algorithm would be extended to 3D straight away, the removal of voxels would be based on the local configuration in a 3 x 3 x 3 neighborhood, and the configuration table would
190
D. P. HUIJSMANS AND G. J. JENSE
become prohibitively large (226= 64 M entries). One way of avoiding such a large configuration table is by not taking all voxels in the local neighborhood into account. A number of different approaches and their relative merits will be compared in this section. Another solution is to use a form of indirect table indexing, which will be described in the next section. Adjacency in the 3D case has already been defined in Section 111. In 3D, connectivity on a rectangular grid comes in three “flavors”: voxels may be 6-,18- or 26-connected. In order to ensure the complementarity of object and background in 3D, the object should be 6-connected when the background is 26-connected, or the other way around. In the following 6-connectedness of the object will be assumed. By limiting the neighborhood to 3 x 3 x 1, the thinning operation can be carried out in a slice-by-slice order, and the ZS algorithm can be applied directly. The configuration table length is thus the same as in the 2D case. This is not really 3D thinning and it will be referred to as sfice-based thinning. The big disadvantage of slice-based thinning is that the connectivity between voxels in adjacent slices may be entirely lost especially when the differences between slices are large. We will now describe a true 3D thinning operation that still employs the 2D ZS algorithm, but takes connectivity between the 18 neighbors into account. This is accomplished by decomposing the thinning operation in three steps. In each step a 3 x 3 neighborhood around the central voxel is considered that lies perpendicular to one of the principal axes (see Fig. 53). Because the 18connected neighbors of the central voxel play a role in the thinning operation we will call this algorithm 18-connected thinning. A voxel may be removed when it may be removed in all of the three perpendicular 2D cases.
FIGURE 53. 18-connected thinning.
191
RECENT ADVANCES IN 3D DISPLAY
Just as when the Z S algorithm was applied in a 2D case, there are some structures that disappear although they should be preserved. Body diagonals, as well as any structure that can be reduced to a 2 x 2 x 2 cube, are entirely removed. This is caused by the fact that the 26 neighbors are not considered during the thinning operation. Like the 2D Z S algorithm, 18connected thinning suffers from noise problems, as shown in Fig. 54. The figure shows how a single voxel, protruding from the 3D structure, gives rise to a “sheet” of spurious voxels after repeated thinning. A second 3D algorithm, adapted from the 2D Z S algorithm, also preserves a certain amount of 3D connectivity across slices, when, in addition to the 3 x 3 neighborhood in a slice, the six neighbors in the adjacent slices are considered. The extra condition of the central voxel now becomes when the six neighboring in the adjacent slices are both empty, the voxel may not be deleted. The disadvantages of this approach are these: 0
As in 18-connected thinning, diagonal structures are eliminated. Structures that lie entirely in one slice, i.e., without any neighbors in adjacent slices, will not be thinned.
d. 3 0 Thinning by Local Connectivity Analysis. An entirely different approach to 3D thinning has been reported by Lobregt, Verbeek and Groen (Lobregt, 1979; Lobregt et al., 1980). Their method is based on the assumption that voxels are six-sided cubes. The faces of a number of connected voxels constitute a so-called netted surface. When a voxel is removed, the once connected netted surface may become disconnected. By analysing the contribution of a voxel to the local connectivity of the netted surface, criteria may be established by which the decision about removal of the voxel can be made. A measure for connectivity is provided by the Euler number (Mantyla,
(a)
(b)
FIGURF 54. The influence of a noise voxel on the thinning operation: (a) a single voxel protrudes from the structure; (b) after repeated thinning, a “sheet” of spurious voxels remains.
192
D. P. HUIJSMANS AND G . J . JENSE
1988). For a closed 3D netted surface the Euler number is n - e + f = 2
(32)
where f is the number of faces, e the number of edges separating the faces, and n the number of end points of edges or nodes. This formula may be extended for closed 3D surfaces that are connected to themselves in various ways. The Euler number then becomes n - e + f = 2 -2h
(33)
where h represents the number of handles and tunnels occurring in the surface. An object may be bounded (separated from the background) by several surfaces. The global connectivity number N of an object can now be defined as N = c ( 2 - 2hi)
(34)
i
that is, the sum of the number of holes and tunnels in each of the (netted) surfaces. The connectivity number N is a global quantity. It can be shown that the global quantity N , as well as changes to it, can be computed from local contributions. By substituting formula (33) in (34) we obtain i
This means that when a voxel is removed, changes will occur in the number of faces, edges and nodes of the netted surface. The change in the number of faces, edges and nodes should be computed for a 3 x 3 x 3 local neighborhood, because they occur in the faces of the central voxel and thus possibly involve all direct neighbors. As was mentioned previously, the length of a look-up table that holds the number of faces, edges and nodes for all possible voxel configuration in that neighborhood would be prohibitively large. Therefore, the 3 x 3 x 3 neighborhood will be subdivided in eight 2 x 2 x 2 neighborhoods, which all have the central voxel at one of their corners. This leads to 28 = 256 possible configurations that are precomputed and stored. The contributions to N are now computed separately for each of the eight 2 x 2 x 2 neighborhoods, once with the central voxel and once without it. The separate contributions are summed, and when the removal of the central voxel would result in a change to N , it is left in place. For further details of how to compute the values of n, e and f for each of 256 possible voxel configurations we refer to Lobregt (1979).
RECENT ADVANCES IN 3D DISPLAY
193
4. Implementations and Comparison A comparison will now be made between the various thinning algorithms. Both the time performance of the implementations and the quality of the resulting voxel models will be compared. All 3D algorithms were implemented on the Sun/TAAC-1 workstation. In order to be able to evaluate the performance increase of the TAAC-I over an unaccelerated workstation, two algorithms were also implemented on a "generic" workstation. The first test data set is a 128' voxel model of 8-bit voxels, obtained from a series of CT scans. A binary voxel model was created from it by thresholding the gray-value data (see Fig. 5 5 ) . The second data set is a hollow sphere; e.g., those voxels for which the equation -
F < (x - 64)2 + ( y
-
64)'
+ (z
-
64)2 - R2 < F
(36)
holds. Table X shows the execution times for the four 3D thinning algorithms in minutes and seconds, for three different binary voxel models; i.e., two different versions of the hollow sphere and the CT voxel model. The columns are labeled with abbreviations of the algorithm names: S B = slice based, 6 - N = slice based and six neighbors, 18-C = 18-connected, LCA = local connectivity analysis. The performance difference between the Sun/ TAAC-I and an unaccelerated workstation becomes clear from Table XI.
FIGURE 55. C T data
set
used in testing of thinning algorithms.
194
D. P. HUIJSMANS AND G . J. JENSE
TABLE X PERFORMANCE OF 3D THINNING ALGORITHMS Object
SB
6-N
18-C
LCA
Sphere ( R = 40,F = 100) Sphere ( R = 40, F = 400) CT data set
1 :08 2 : 45 1 :48
6 : 33 17 : 20
6 : 05 2 4 : I5 I5 : 52
17 : 50 11 :oo
I : 45
4 : 50
This result is somewhat disappointing, as the TAAC-1 has proved to be a factor of 3-7 faster for other algorithms (see Section 1II.D). The explanation for this is that the actual computations in thinning operations consist of merely fast table look-up operations. Memory operations instead of computations dominate the total execution time. The implementation of the voxel rotation algorithm has already made clear (see Table IX) that thememory access routines are responsible for this disappointing performance. Finally, Figs. 56 and 57 present pictures of the CT data set thinned by the four different methods. The loss of connectedness at the top of the skull is obvious in the object that was thinned with the SB algorithm. Also note that the structure of the eye sockets have been affected by the thinning operation. The 6N algorithm does a better job at the top of the skull, although a number of holes remain. Around the eye sockets the results are comparable to the SB algorithm. The 18-connected thinning does surprisingly well. Both at the top and around the eyes the structure has been preserved by the 18-C algorithm. The LCA algorithm, like the 18-C, also shows a completely connected surface on top of the skull, but around the eye sockets the result is less satisfactory. B. Exploring Binary Voxel Models with a PC-Based Image Processing System
Although originally designed for the processing of two-dimensional images, we have used the image processing system, described in Section V, for the storage, manipulation and display of binary voxel models. One of the frame buffers is used to store the binary voxel model. The other frame buffer TABLE XI 3D THINNING ON DIFFERENT HARDWARE SYSTEMS Sun/TAAC- I
Method
Sun3/60C
SB LCA
3 : 00
I :48
18 : 30
11 : 0 0
RECENT ADVANCES IN 3D DISPLAY
FiGriRE
56. Thinned CT data set: (a) SB algorithm; (bj 6-N algorithm.
195
196
D. P. HUIJSMANS AND G . J. JENSE
FIGURE 57. Thinned CT data set: (a) 18-C algorithm; (b) LCA algorithm
RECENT ADVANCES IN 3D DISPLAY
197
holds either a surface normal view, a distance map or a combined surfacedistance image. Altering the light direction or shading function is performed by manipulating the hardware output look-up tables. The frame processor is used to run various filter operators over pre-images, and for pan and zoom operations. The software was written in the C language, with the exception of a few low-level assembly language routines for the transfer of voxel data between the PC and the frame buffer. a. Direct Display of the Data Set. One of the available frame buffers is used to store and directly display the binary voxel data set (see Fig. 58). By choosing a special storage scheme for the 3D voxel array, we can directly display the entire data set on screen (see Fig. 59). Conceptually, the binary voxel model of an object is stored as a three-dimensional array of bits. A three-dimensional binary array of size 1283 occupies 256 Kbytes of memory. Our image processing hardware has two frame buffers of 512 x 512 8-bit pixels. Each of these frame buffers is therefore also 256Kbytes in size. We use one of these to store the binary voxel model. To this aim the 512 x 512 frame buffer is subdivided in 16 (4 x 4) sections of 128 x 128 8-bit pixels. Each section holds eight consecutive slices through the object. This storage scheme leads to a simple mapping of 3D voxel coordinates (x, y , z ) to 2D frame buffer coordinates ( u , u)and a bit number m (i.e., the number in the range 0 . . . 7 of the appropriate bit within the byte ( u , v)):
+x 128 + y
u = [(zdiv8)mod4] x 128 'u =
m
=
[(zdiv8)div4] x zmod8
.................. ....... "
I? i!
12
..
slices 4 0 . . 4 7 0
127
FIGURF 58. (a) 3D binary voxel array: (b) frame buffer storage scheme.
(37)
198
D. P. HUIJSMANS AND G . J. JENSE
FIGURE 59. Direct display of the binary voxel model of an embryonic rat heart.
The operator div denotes integer division, while mod is the “modulo” (or “remainder”) operator. Because all arithmetic operations involve constants that are powers of 2 they may be implemented efficiently using bit shift and logical masking operations. Because of the spatial coherence, the changes between consecutive slices are small. Therefore, when the contents of the frame buffer are displayed, the screen shows 16 cross-sectional views of the binary voxel model, each of which in fact consists of eight consecutive slices. Individual bit planes may be displayed by choosing suitable entries for the output look-up table. Together with the zoom option, this allows for the display of individual slices.
6. Generating a Su$ace View. A 3D model can be rendered on a flat display screen only as a 2D projection. However, by using several depth cues an illusion of the third dimension can be created. The following depth cues are available in our system: 0
0
Hidden surface elimination. Simultaneous projections from different viewpoints. Depth (or distance) shading. Surface shading. Interactive rotation of the light source.
Hidden surface elimination, or more accurately visible surface detection, is
199
RECENT ADVANCES IN 3D DISPLAY
performed by a ray-casting algorithm. The voxels along a ray from the viewpoint through the voxel cube are examined until the first object voxel is hit. Because of the way the cube is stored, each byte in the binary voxel array holds eight voxels. This means that when rays are cast along one of the main axes (&X1 + Y l * Z ) , the algorithm can examine eight voxels at a time until a nonzero byte is found. Two cases have to be distinguished here (see also Fig. 60): 1. Casting a ray along the Z axis: when a nonzero byte is found, the first
(or last, when looking along the negative Z axis) I-bit in this byte is the visible surface voxel, 2. Casting, as it were, eight parallel rays simultaneously along the X or Y axis: each nonzero byte encountered along the ray(s) is logically ORed with a current mask byte. The search continues until the mask byte consists of all Is, or the end of the ray is reached. This method results in a reasonably fast surface detection algorithm. Of course, the viewing direction is limited to along the main axes. By displaying several (possibly all six) views at the same time this disadvantage is partially overcome. Several different shading techniques were implemented in conjunction with the ray casting visible surface detection algorithm. c. Depth Shading. By letting the rays, cast along the viewing axis, yield the distance of the surface voxels to the viewing plane, a distance map is obtained. This distance map may be rendered using eq. (14). Figure 61(a) is an example of a depth-shaded image. The distance map may be enlarged in the X and Y directions by employing the pan and zoom facilities of the frame processor card. When an image
A byte h o l d s on* volnl
x -
(4
(b)
FIGURE 60. (a) Casting a ray along the Z axis; (b) casting eight parallel rays along the X axis.
200
D. P.HUIJSMANS AND G. J. JENSE
FIGURE 61. Reconstructed embryonic rat heart: (a) depth shading; (b) surface shading; (c) both.
is zoomed by a factor of 2, each pixel (distance value) is simply replicated four times in a 2 x 2 pixel square. Zoom factors of four and eight are also possible. The jagged appearance of the zoomed image may be decreased by running a pixel averaging operator over it. The effect of this operation is to interpolate the replicated distance values. d. Surface Shading. As a second shading technique, surface shading was implemented. This requires calculation of surface normal vectors (see eq. ( 1 9 , for which the binary gradient-shading method (see Section 1II.C) was used. Figure 61(b) shows the result of applying this shading method to the same surface as the depth shaded image in Fig. 61 (a). Finally, an image that has been shaded using a combination of these two techniques, i.e., according to eq. (16), is shown in Fig. 61(c). e . Interactive Rotation of the Light Source. Real-time interactive rotation of an object on the screen provides a very effective depth illusion, especially for objects of irregular shape, such as the ones that we are interested in. The changing light intensity on the surface as the object moves with respect to a light source provides the viewer with many clues about the shape of the object. Real-time rotation of an object is not possible with
RECENT ADVANCES IN 3D DISPLAY
20 1
the system that is described here, but instead of this several features of the hardware can be used to implement interactive changing of the lighting function, including the position of the light source. In Section 1II.C the lighting equations were given. The quantity I,,, is a constant that depends on the hardware; i.e., the maximum value of a pixel. For a given viewing direction D,,, is also constant. For each pixel in the image (from the given viewing direction) the values of the surface normal N and the distance d at the corresponding surface voxel are known. Instead of calculating the values of I and storing these in the frame buffer, the visible voxel’s surface normal and distance are stored for each pixel. The lighting equation, which now gives I as a function of the light direction and the percentage of ambient light, I = f ( L ?l a m b ) (38) may be calculated for all possible combinations of N and d stored in one of the hardware output look-up tables. Changing either the light direction or the proportion of ambient versus reflected light is then a matter of altering only the output look-up table entries (in this case 256 numbers) instead of recalculating all pixel values (16K numbers). A technique similar to this, but applied to the shading of polygonal objects, has been described in Bass (1981) and Sloan and Brown (1979). The value of an approximated surface normal vector may be encoded in 5 bits, as there are only 26 possible values. The remaining 3 bits of each 8-bit pixel are used to hold an approximation of the surface distance value. These encoded values are stored in the second frame buffer on the frame grabber board. Figure 62 shows the effect of rotating a light source from the viewing direction toward the right side of the object. The changes in light intensity at various irregularly shaped features on the surface can clearly be observed. f : Reslicing a Reconstructed Object. The incorporation of a facility to reslice a reconstructed object along an arbitrary plane only requires a slight modification to the original ray-casting algorithm. This way parts of the objects can be removed to reveal hidden inner details (see Fig. 63). The equation describing a plane in three dimensions is ax by + ci d =0 (39)
+
+
Assuming that the object is viewed along the positive Z axis, this equation may be rewritten as z = -(a/c)x
(h/c)y- d/c
(40) The ray casting is now performed by either starting the search for the first 1voxel along the ray at the z coordinate determined by this equation or by continuing the search to this z coordinate value. These two cases correspond -
202
D. P. HUIJSMANS AND G. J . JENSE
FIGURE 62. Rotating the light source around the object.
to either removing the voxels for which the inequality ax or the inequality ax + by cz d > 0 holds.
+ +
+ by + cz + d < 0
g . Results and Conclusions. As an experimental extension to an earlier, contour stack-based reconstruction system (Huijsmans et al., 1984), the capability was added to output the series of parallel 2D area masks, originally used for the hidden line display of the contour stack. This gave the possibility to obtain binary voxel models of several reconstructed objects. These voxel models were used as input data sets for the display routines of the system described previously. Computation of a depth shaded image takes about 1Osec. For comparison purposes, the extended contour stack-based system generated a depth-shaded image in 3 min. A binary gradient-shaded surface view along
RECENT ADVANCES IN 3D DISPLAY
203
FIGURE 63. The same object. resliced along three different planes
a major axis can also be generated in 10 sec. When both depth shading and binary gradient shading are applied the display time is still 10sec. The inclusion of the reslicing capability did not measurably affect the display times. These measurements indicate that the search for the visible surface voxels dominates the display process. Updating a look-up table takes about ljloth of a second so that the interactive rotation of a light source around the object is possible in the near real time. This amount of time is relatively long because the scan-out pixel values from the frame buffer has to be interrupted for the duration of one frame while the output look-up table is being loaded. Unfortunately, this leads to an annoying flicker of the image on the screen. Because the frame grabber card has eight output look-up tables, up to eight different lighting functions, e.g., for eight different light source
204
D. P. HUIJSMANS AND G. J . JENSE
positions, can be precomputed and loaded. After this, switching look-up tables is instantaneous and changing views are obtained without flicker. Another way to avoid flicker when changing look-up tables is the use of an image processing board that allows synchronization with horizontal retraces; LUT entries can then be changed one at a time. A disadvantage of moving the light source around a fixed object is that the object appears to move when the position of the light source changes. This psychophysical effect may be due to the built-in assumption of a stationary light source. In spite of this, observing the changing surface shading gives a clear impression of the surface irregularities of the object. An important element in this is the immediate visual feedback the user gets by observing the result (changed shaded surface view) in response to an input action like pressing a cursor key to change the light direction. h. Further Possibilities. In addition to the techniques described in the previous sections, the image processing hardware offers several other features that might be exploited. i . Image Space Gradient Shading. The distance map may also be used as input data for another gradient based shading method: image space gradient shading (Gordon and Reynolds, 1985; Bright and Laflin, 1986). The frame processor hardware can be used to compute the gradient image of the distance map.
j . Modeling Objects. Complex objects may be built from combinations of precomputed and stored primitive objects by applying the set operations union, intersection and difference in a CSG like way (Requicha, 1980). Parts may be removed from an object by computing the difference between it and a secondary object. This is of interest when a reconstructed real-world object is to be inspected. To implement set operations between objects, the correspondence between Boolean set operations and bitwise logical Boolean operations is used: assuming there are two objects, represented by sets A and B of binary volume elements, and volume elements W, then A UB
1
= {W (W E
A n B =
A ) OR
(U
E
B))
(. E A ) AND (. E B ) }
(41)
A - B = A n B’ where B’ denotes the complement of B, which corresponds to the Boolean NOT operation and the expression w E A corresponds to ZI’S bit in the binary voxel model having value 1. The ability of the image processing hardware to perform various bitwise
RECENT ADVANCES IN 3D DISPLAY
205
logical operations between two frame buffers may be used for an efficient implementation of the set operations. The primary object’s binary voxel model is loaded in frame buffer 1, while the secondary objects representation is loaded in frame buffer 2. Then, the frame processor calculates the result of the bitwise logical operation and stores it in one of the input frame buffers, after which it can be displayed. k . Calculating the Volume of an Object. Determining the volume of an object involves counting all I-voxels of the binary voxel model. This operation may be speeded up by using the histogram hardware of the frame processor. The result of a histogram operation is a table of 256 values, the number of pixels for that entry. Each 8-bit pixel value represents eight voxels, the number of I-bits being equal to the number of 1voxels. The total number of 1-bits (and thus the total number of 1-voxels or volume V ) in the frame buffer can now be calculated from the following formula:
That is, the value H of each histogram table entry is multiplied by the number of 1-bits B of the corresponding table index i, and all (256) terms are summed.
1. Voxel Address Calculations. The storage scheme that is used for the binary voxel array greatly favours the computation of orthographic projections of the object, because in these cases the ray-casting algorithm examines eight voxels in parallel. When computing general parallel projections, voxel address calculations will probably cause display times to grow unacceptably large. However, the generation of isometric projections (i.e., when the projection plane is perpendicular to one of the lines f x = f y = ztz), might offer a reasonable compromise, since voxel address calculations are relatively easy in this case. For general parallel and perspective projections, some form of hardware support for 3D addressing of voxels is needed. There are relatively cheap graphics subsystems for PCs on the market today, based on coprocessors from the Texas Instruments TMS340 range (Asal et al., 1986). These processors offer 2D addressing in hardware as well as bit-addressable memory and therefore come a long way in meeting our requirements. In one of the following sections a system will be described that offers facilities for 3 D addressing, but that system is in an entirely different class (regarding price) than our off-the-shelf image processing hardware.
206
D. P. HUIJSMANS AND G. J . JENSE
C . Exploview TAAC-I: Interactive Exploration of Gray- Value Voxei Models 1. Available Volume Data Sets
Four volume data sets were available to us: 1. 128 images of a child’s head, taken with a CT scanner (see Fig. 64(a)). Each image consists of 256 x 256 pixels. Pixels have 12 significant bits and are stored as 16-bit integers, 2. 28 images of a grain of Scotch pine pollen, taken with a confocal scanning laser microscope (see Fig. 64(b)). Image size is 256 x 256, while the pixels have 8 bits, 3. MRI scans of a human head (see Fig. 65(a)), consisting of 64 256 x 256 8-bit pixels, 4. Another MRI data set, of the same size, of a human knee (see Fig. 65(b))
All of these data sets were converted to gray value voxel models of 12838-bit voxels. a. Synthetic Test Objects. For testing purposes it is convenient when it is known in advance how a voxel model should look. Errors or artifacts in display and rendering algorithms can easily be detected that way. The system offers a facility to create artificial test scenes, consisting of two objects: a rectangular parallelepiped and a sphere. Voxels of the test scene are assigned one of four values, depending on whether they are inside none, one, the other or both of the test objects. This way a labeled voxel model is obtained and allows the selection of various combinations of the two objects for display by setting an appropriate window on the voxel gray scale. Figure 66 shows one of the possible configurations of the test scene. 2. TAAC-1 Organization
A number of the display and rendering algorithms, described in Section 111, have been integrated into a system that allows interactive control of display and rendering parameters. This system was implemented on the accelerated workstation, i.e., the Sun 3/160C/TAAC-l system, running under SunOS V4.0 operating system. All software was written in the C language, and the SunView user interface facilities were used. The software modules for the TAAC use the TAAC software library and were compiled with the optimizing TAAC C compiler.
RECENT ADVANCES IN 3D DISPLAY
207
FiGuRE 64. (a) CT data set (gray-value gradient shading); (b) CSLM data set (depthgradient shading).
208
D. P. HUIJSMANS A N D G. J. JENSE
FIGURE 65. (a) MRI data set 1, human head (gray-value gradient shading and voxel mapping); (b) MRI data set 2, human knee (gray-value gradient shading and voxel mapping).
RECENT ADVANCES IN 3D DISPLAY
A
7
209
FIGLIRE 66. Test scene of a cube from which a sphere has been subtracted (Cube Sphere)
The 8 Mbyte of TAAC-I data-image memory were used as follows:
0
One 512 512 32-bit/pixel image ( 1 Mbyte). One 512 x 512 32-bit/pixel intermediate buffer, to be described later ( 1 Mbyte). 4 Mbyte of volume data (8-bit voxels), either one single voxel model, or two, e.g., 1283 = 2 Mbyte each, voxel models (for 3D image-processing purposes). 2 Mbyte of miscellaneous storage.
3. User Interfuce Pop-up windows with menu choices, sliders, radio and push buttons, scroll windows and spatial selection with a mouse-steered cursor provide most of the control within our graphical user interface. One of the interaction facilities to help divide the voxel model or a part of it presented us with a problem that we tried to solve in several ways. a. Moving the Splitting Planes. When a subdivision is being created, the user moves a splitting plane through the current cell and selects the position where the cell is to be split. Visual feedback is provided by voxel mapping at
210
D. P. HUIJSMANS AND G. J. JENSE
the intersection of the splitting plane and the current cell. Originally, the positioning of the splitting plane was done by means of three sliders, labeled pitch and yaw for the rotation of the plane and push for the translation along its normal vector (see also Fig. 30). With these sliders, it is possible to roughly position the plane, but fine adjustment of the plane turned out to be rather difficult. We found a fine adjustment solution based on the movement of intersection points of the splitting plane and the edges of the volume cell. By allowing the user to change the position of these points, the position of the splittingplane is changed. Given (the equation of) a plane, there are many triples of points that can define it. The problem now is choosing points that both support the splitting plane yet can be easily selected and moved on the display screen. It turned out that, to properly judge the position of the splitting plane with respect to the current cell, the movement of the supporting points should be confined to be along the edges of the volume cell (see Fig. 67). The other two supporting points are chosen from the remaining intersection points. When the intersection of the splitting plane and the cell has N vertices and the user has selected vertex i, ( i E (0,. . . , N - l}), then the other two supporting points s1 and s2 are sl = ( i s2 = ( i
+ Ndiv3)modN + 2Ndiv3)modN
(43) (44)
In other words, the two additional supporting points are chosen such that the three points defining a splitting plane lie “evenly spaced” in the list of intersection points. The user selects one of the intersection points of the splitting plane and cell edges. The selected point may then be translated along its edge by means of a slider. For an initial rough positioning, the original method of steering the plane with the three pitch, yaw and push sliders remains the preferred way. 4. Experimenting with Display and Rendering Parameters
The user interface allows the setting of various display and rendering parameters, as well as a number of spatial selection facilities. The display algorithms used are the SBS FTB algorithm and ray casting. The parameters that may be interactively changed, are as follows:
0
Viewing parameters: rotations about the X , Y and 2 axes and a scaling factor. Rendering methods: depth, depth-gradient (Z grad), gray-value gradient (G grad), and adaptive gray-value gradient (AG grad)
RECENT ADVANCES IN 3D DISPLAY
0
0
0 0
21 1
shading; thelast threemethodscan also be usedincombination withdepth shading (see also Subsection III.C.2). In addition to this, planes of the bounding box (see under spatial selections) can be voxel mapped. Gray-scale window: a lower and upper threshold on the voxel values, that determine the range of voxels that will be projected. Spatial selections: bounding box (minimum and maximum object coordinates), resolution (scanning the voxel model with strides larger than l ) , viewing window (minimum and maximum image space coordinates), slicing plane position and selection of front-back halfspace (hither and yon clipping). Lighting model: direction of the light source and fraction of ambient light. Postprocessing: the final image may be enhanced by contrast stretching.
Several of these features will be described in more detail. a . Spatial Selection Facilities. There are several facilities in the system to make spatial selections: 0
A bounding box can be defined by six planes, two associated with each principal axis. Only those voxels inside the bounding box are projected. The bounding box, as well as the outline of the entire voxel model, may be displayed as wireframes (Fig. 68).
A cutting plane may be positioned arbitrarily in the voxel model, subdividing it into two convex polyhedra. The voxels on either side may be selected for projection by a version of the SBS FTB algorithm that has been adapted to “3D scan-convert’’ convex polyhedra (see Section 1II.B). The resolution of the projected image can be selected by setting the stride with which the voxel model is scanned. When the stride equals 1, every voxel is projected, when it equals 2, only every second slice, row and column are scanned, when it equals 4, every fourth slice, row and column, etc. Because the voxels are rendered as correspondingly larger rectangles of pixels, the resulting image is an approximate rendering, at a reduced resolution, of the object (Fig. 69). Because the display times of reduced resolution images are significantly less (see the performance figures, given in Section III.D), the setting of viewing parameters has a faster feedback. An axes-parallel viewing window may be selected on the screen (see Fig. 70). This determines which part of the screen is affected by the current display action.
212
D. P. HUIJSMANS AND G. J. JENSE
(b)
FIGURE 67. Moving a splitting plane: (a) initial position; (b) select supporting point;
213
(c)
FIGURE 67. (c) move point along edge (two of the other points remain tixed).
68. Selection of a bounding box. FIGURE
214
D. P. HUIJSMANS AND G . J. JENSE
FIGURE 69. Reduced resolution renderings (depth-gradient shading): (a) 1/4resolution; (b) 112 resolution.
RECENT ADVANCES IN 3D DISPLAY
215
FIGURE 69. (c) full resolution
5. The Exploded View Facility Our BSP tree-based volume divider, Explo View, offers facilities to construct a BSP tree-based subdivision of a voxel model and to display an exploded view of the subdivided model. Rendering parameters and other attributes can be selected for individual polyhedral cells and polygons. The user interface and the routines related to the BSP tree, as well as the file IjO parts of the programs, all run on the Sun host computer. The TAAC-1 is used to store the volume data and execute the display routines. a. Graphical User Interface of Volume Divider. The basic user interface of the program offers facilities for the following: Control: for setting the viewing parameters, such as the position viewpoint, zoom and explosion factor, and selection of display and edit modes, Planes: sliders to position intersection planes etc. Cells: to move around between cells in the BSP tree and setting node attributes. On the screen the current cell is displayed with its parent in the BSP tree and its two children (when present). This provides the user with information about the spatial relationships between cells in the neighborhood of the
216
D. P. HUIJSMANS AND G . J. JENSE
FIGURE 70. Composite renderings: [a) using an image space viewing window; (b) combining different shading methods (gray-value gradient for the skin and depth-gradient for the bone surfaces).
RECENT ADVANCES IN 3D DISPLAY
217
current cell. More partitioning planes may be added, until the desired subdivision of the voxel model is reached. b. Selecting Cells and Polygons. The system offers two display “modes” during the editing operation, showing both the currently selected cell, and the current cell, its parent in the BSP tree, and its two children. The second mode provides the user with information about the spatial relationships between cells in the neighborhood of the current cell in the BSP tree as shown in Fig. 71. Initially, the selection of the current cell was done by means of three buttons, parent, front and back. This turned out to be cumbersome, and direct selection by pointing with a mouse cursor of the current cell was implemented instead (Fig. 71 (a)). This becomes the current cell, while the display of parent and sibling cells are adjusted accordingly. Also, a polyhedral cell may become the current one by pointing at it in the displayed exploded view. This direct selection also applies to individual polygons of volume cells. c. Display Attributes. The purpose of selecting individual volume cells and polygons is to set their display attributes. The values of these attributes are controlled via a separate “pop-up” window, for the attributes of the current cell. A polygon has one attribute: it indicates whether it is to be displayed by voxel mapping or as an outline. The display of volume cells on the other hand is affected by several attributes: Visibility: On or off. Display method It can assume the following values: 0
0 0 0
voxel mapping - all polygons of the cell are displayed by voxel mapping. The attribute values of the individual polygons are also in effect, i.e., when a cell is displayed by voxel mapping, some of its polygons may be invisible. FTB - the cell is displayed with the modified front-to-back display algorithm. Surface ray casting - the cell is displayed with a ray casting algorithm, using thresholding to determine the visible surface. Transparent volume ray casting - the ray casting method is now used with the transparent compositing method, described in Section III.C, to render both opaque surfaces and transparent volumes.
Shading: either depth gradient or gray-value gradient. Gray-Scale Window: consisting of a lower and upper threshold value (between 0 and 255), it selects which voxel values are used for display.
D. P. HUIJSMANS AND G. J. JENSE
218
Parent: v i s i b l e
Back:
Current: v i s i b l e
v l s lb l e
Front:
visible
(b)
FIGURE 71. (a) Some of the cells; (b) part of the BSP tree that corresponds to the cells shown in (a).
RECENT ADVANCES IN 3D DISPLAY
219
This set of attributes offers various facilities to create composite images from a volume data set by allowing different parts of the voxel model to be rendered in different ways. In Fig. 72, for instance, the jaws have been rendered using maximum value reprojection, while the rest of the object was rendered with gray-value gradient shading.
d. The Voxel-Layer Labeling Table. In Section 1II.C the voxel classiflcation problem was mentioned briefly as a means of segmenting the voxels into different sets, which supposedly correspond to different structures in the voxel model. For this purpose a look-up table has been incorporated in the Explo View system, whereby the color and opacity values of voxels can be specified in an indirect way. The voxel values are no longer interpreted directly as gray or “density” values, but serve as indices in the table, where the color or gray values and opacities are stored. This table is only used with the ray-casting display method.
FIGURE 72. A composite rendering
220
D. P. HUIJSMANS AND G. J. JENSE
For the creation and modification of the look-up table, another subwindow is provided in the user interface of the system. Using sliders and selection items, the voxel gray-value range ( 0 . . . 2 5 5 ) can be divided into four intervals. For each interval, determined by a gray-scale window (lower and upper threshold), an opacity and color value can be defined. Additionally, a layer thickness between 1 and 20 voxels can be set. This determines whether voxels along a ray are “composed into” the final pixel color or not: a layer contributes only when the minimum layer thickness is exceeded (hence the name voxel layer table). Finally, a voxel layer may be rendered as an opaque surface by depth or gray-value gradient shading. This allows the display of semitransparent volumes over opaque surfaces, providing a better depth cue than with transparency alone. 6 . Results
In the ExploView system, individual volume cells are displayed by the volume ray-casting algorithm. This algorithm was extended to handle general convex polyhedra instead of rectangular boxes, as described in Section IV. Display times for the ray-casting algorithm, using various rendering methods, are presented in Table XII. The viewing parameters were set to values that are comparable to those used in Section 1II.D. In the final two rows, the difference in display time between the CT data set and the M R data set is explained by the difference in opacity values that were set in the voxel layer tables for the two models: high opacity values (on the TABLE XI1 DISPLAY TIMES FOR THE EXPLOVIEWSYSTEM, USINGVOLUME RAYCASTING (THE NUMBERS BETWEEN BRACKETS INDICATETHE SELECTED VOXEL GRAY-SCALE INTERVAL) Rendering method
Time (sec)
Surface rendering CT data set [30, 2551 M R data set [20, 2551
21 21
Avg. value reprojection CT data set [30, 1501 M R data set [20, 2551
43 252
Max. value reprojection CT data set [30, 1501 M R data set [20, 2551
38 44
Transparent layers CT data set (high avg. opacity) M R data set (low uvg. opacity)
48
I 00
RECENT ADVANCES IN 3D DISPLAY
22 1
TABLE XI11 “BRUTE FORCE” VERSUS “SMART ” RAYCASTING (TIMES I N SECONDS) Rendering
Brute force
Using coherence
Surface Average Maximum
92 138 144
21 43 38
average) in the case of the CT model, and low opacity values for the M R model. When the display times in this table are compared to the values in the tables in Section III.D, the speed-up factor between the “brute force” ray-casting algorithm and the accelerated version can be deduced. For purposes of comparison, the figures have been reproduced, side by side with the new figures, in Table XIII. From these figures, a speed-up factor between 3 and 4.5 emerges. Finally, two examples of “exploded views” are given. In Fig. 73 the possibility to disable the translation of certain volume cells is demonstrated: the upper part of the head has been subdivided into three cells. One of these has been marked invisible. The other two are not translated outward, resulting in a “pie-wedge’’ cutout. Figure 74 shows a composite rendering of the CT model. The two cells on the lower right side are rendered semitransparently. The upper and lower left, and the upper right cells show gray-value gradient-shaded rendering of the skin surface, together with voxel mapping of the cutting planes. For the other two cells in the upper half, depth gradient shading of the bone surface has been selected. VII. CONCLUSION In our quest for interactive volume visualization we inventorized data structures and found that voxel models, 3D arrays of scalar values, are very well suited to represent unstructured three-dimensional measurement and simulation data. Display from 3D voxel data sets on a 2D screen turned out to be a time-consuming operation, unless fast hardware and smart programming are used in all the phases of geometric transformation, hidden feature removal and rendering. Incremental scanning techniques were used to exploit spatial coherence. Depending upon the view direction, scanning occurs in an ordered way so that output appears sorted along lines of sight. Extra buffers for depth values and visible voxel coordinates further help to lower the amount of
222
D. P. HUIJSMANS AND G . J. JENSE
FIGURE 73. Partial exploded view of the CT data set, using surface rendering (gray-value gradient shading) and voxel mapping of the cutting planes.
calculations needed. Just by using smart programming the generation of visual feedback can be accelerated by orders of magnitude. Further acceleration is supported by such hardware features as extra buffers (2buffer, coordinate buffers, surface orientation) and an a channel. Special purpose hardware produced so far appears to be too restricted and unflexible. For spatial selection and division the binary space partitioning tree proved very effective. It allows for a spatial subdivision in which each polyhedral cell can be displayed in a different rendering mode. It can also be used to suppress the display of specific parts and the faces of the volume cells can be mapped
RECENT ADVANCES IN 3D DISPLAY
FIGURE
223
74. Composite visualization of the CT data set, using various rendering methods
with the original voxel values. The BSP tree can easily be combined with the incremental scanning methods and is used in our final implementation to compose exploded views of three-dimensional sampled objects. A further acceleration in all stages of interactive exploration is needed before one can truly speak of real-time interaction. Increasing processor throughput alone will probably not suffice. Some form of parallelism will be needed. Further challenges in scientific visualization are offered by 3D vector fields, interactively steered simulations and time-varying phenomena. The exploration of structures changing in time adds yet another dimension to scientific visualization. Even faster hardware and smarter programming will be needed to tackle those 4D data sets.
224
D. P. HUIJSMANS AND G. J . JENSE
ACKNOWLEDGMENTS This research was performed mainly at the department of Computer Science of Leiden University. Over the years there have been many stimulating discussions with Peter van Oosterom, Remco Veltkamp and Chris Laffra. The following students contributed to the implementations during their masters degree thesis research: Rob Beersma, Jurgen den Hartog, Kees Ouwehand and Bob Schijvenaars. Paul Kranenburg who took care of the hardware and software environment also provided invaluable aid. The cooperation of the following people and institutes is gratefully acknowledged for making available the various data sets: 0
0
0
0
0
W. M. ter Kuile, of the Instituut voor Milieu Wetenschappen T N O , afd. Milieubiotechnologie, Delft, the Netherlands, for the CLSM data set of the Scotch Pine pollen; S. Lobregt, CT Scanner Science Department, Philips Medical Systems, Best, the Netherlands, and Dr. F. W. Zonneveld, Department of Diagnostic Radiology, Utrecht University Hospital, Utrecht, the Netherlands, for the CT data set of the child’s head (used by permission of Prof. J. C. van der Meulen, Department of Plastic and Reconstructive Surgery, Rotterdam University Hospital “Dijkzicht,” Rotterdam, the Netherlands); A. A. van Est, MR Predevelopment Department, Philips Medical Systems, Best, the Netherlands, for the MR studies of the human head and knee; The laser range meter data set of the mask is from M. Rioux and L. Cournoyer, The NRCC Three-Dimensional Image Data Files, National Research Council Canada, CNRC 29077, June 1988; The contour stack model of the snail reproductive organ is courtesy of N. W. Runham of University College, North Wales, UK. BIBLIOGRAPHY
Amanatides, J., and Woo, A. (1987). “A Fast Voxel Traversal Algorithm for Ray Tracing,” in “Eurographics 87” (G. Marechal, ed.), 3-10, North-Holland, Amsterdam. Artzy, E., Frieder, G., and Herman, G. T. (1981). “The Theory, Design and Evaluation of a Three-DimensionalSurface Detection Algorithm,” Computer Graphics and Image Processing 15, 1-24. Asal, M., Short, G., Preston, T., Simpson, R., Roskell, D., and Guttag, K. (19886). “The Texas Instruments 34010 Graphics System Processor,” IEEE Computer Graphics and Applications 6, 24-39.
RECENT ADVANCES I N 3D DISPLAY
225
Badler, N., and Bajcsy, R. (1978). “Three-Dimensional Representations for Computer Graphics and Computer Vision,” Computer Gruphics 12, 153- 160. Ballard, D. H., and Brown, C. M. (1982).“Computer Vision,” Prentice-Hall, Englewood Cliffs, NJ. Bass. D. H. (1981). “Using the Video Lookup Table for Reflectivity Calculations: Specific Techniques and Graphic Results,” Computer Graphics and Image Processing 17, 249-261. Boissonnat, J . D. (1988). “Shape Reconstruction from Planar Cross Sections,” Computer Vixion, Graphics and Image Processing 44,No. I , 1-29. Bresenham. J. E. (1965). “Algorithm for Computer Control of a Digital Plotter.” IEM S.ystem.s Journal 4, No. I , 25-30. Brewster. L. J., Trivedi, S. S., Tuy, H . K., and Udupa, J. K. (1984). “Interactive Surgical Planning,” IEEE Computer Graphics and Applications 4, No. 3. 31 -40. Bright, S., and Laflin, S. (1986). ”Shading of Solid Voxel Models,” Computer Graphics Forum 5 , 131-137. Brooks, F. P., Ouh-Young, M., Batter, J. J., and Kilpatrick. P. J. (1990). ”Project GROPE Haptic Displays for Scientific Visualization,” Computer Graphic,s 24, No. 4, 177-185. Chen, L. S. (1987). “Representation, Display and Manipulation of 3D Digital Scenes and Their Medical Applications,” Ph.D. thesis, University of Pennsylvania. Chen, L. S.. and Sontag, M. R. (1989). “Representation, Display, and Manipulation of 3D Digital Scenes and Their Medical Applications,” Computer Vision, Graphics and Image Processing 48, No. 2. Chen, L. S., Herman, G. T., Reynolds, R. A,, and Udupa, J. K. (1985). “Surface Shading in the Cuberille Environment,” IEEE Computer Graphics and Applicutions 5, No. 12, 33--43. Christiansen. H. N., and Sederberg, T. W. (1978). “Conversion of Complex Contour Line Definitions into Polygonal Element Mosaics.” Computer Graphics 13, No. 2. 187- 192. Cleary, J. G., and Wyvill, G . (1988). “Analysis of an Algorithm for Fast Ray Tracing Using Uniform Space Subdivision,” The Visual Computer 4, No. 2, 65-83. Data Translation (1986a). “User Manual for DT2851 High Resolution Frame Grabber.” Data Translation, Inc., Marlborough, MA. Data Translation ( l986b). “User Manual For DT2858 Auxiliary Frame Processor,” Data Translation, Inc., Marlborough, MA. Drebin. R. A,, Carpenter, L., and Hanrahan, P. (1988). “Volume Rendering,” Computer Gruphics 22, No. 4, 65-74. Dyer, D. S. (1990). “A Dataflow Toolkit for Visualization.” I E E E Compufer Graphics and Applications 10, No. 4, 60-69. Ekoule, A. B., Peyrin, F. C., and Odet, C . L. (1991). “A Triangulation Algorithm from Arbitrary Shaped Planar Contours.” ACM Transactions on Graphics 10, No. 2, 182-199. England, N. ( 1986). “A Graphics System Architecture for Interactive Application-Specific Display Functions,” IEEE Computer Graphics and Applications 6, 60-70. England, N. (1988). “Application Acceleration: Development of the TAAC-1 Architecture,” Technical Note 9, Sun Microsystems. Inc., Application Accelerators Group, Mountview, CA. Foley, J. D., and Van Dam, A. (1982). “Fundamentals of Interactive Computer Graphics,” Addison-Wesley, Reading, MA. Frieder, G., Gordon, D., and Reynolds, R. A. (1985). “Back-to-Front Display of Voxel Based Objects,” IEEE Computer Graphics and Applimrions 5. No. I , 52-60. Fuchs, H., Kedem, Z., and Usselton, S. (1977). “Optimal Surface Reconstruction from Planar Contours,” Communications of the ACM 20, 693-7 12. Fuchs, H., Kedem, Z., and Naylor, B. (1980). “On Visible Surface Generation by A-priori Tree Structures,” Cornpurer Graphics 14, No. 3, 124-131. Fuchs, H., Abram, G . D., and Grant, E. D. (1983). “Near Real-Time Shaded Display of Rigid Objects,’’ Computer Graphics 17, No. 3, 65-69. -
226
D. P. HUIJSMANS AND G. J. JENSE
Fujimoto, A,, Tanaka, T., and Iwata, K. (1986). “ARTS: Accellerated Ray-Tracing System,” IEEE Computer Graphics and Applications 6, No. 4, 16-26. Glassner, A. S. (1989). “An Introduction to Ray Tracing,” Academic Press, San Diego, CA. Goldstein, E. B. (1984). “Sensation and Perception,” 2nd ed., Wadsworth, Belmont, CA. Goldwasser, S. M. (1984). “A Generalized Object Display Processor Architecture,” IEEE Compurer Graphics and Applicalions 4, No. 10, 43-55. Goldwasser, S. M., and Reynolds, R. A. (1987). “Real-Time Display and Manipulation of 3-D Medical Objects: The Voxel Processor Architecture,” Computer Vision, Graphics and Image Processing 39, 1-27. Goldwasser, S. M., Reynolds, R. A., Talton, D. A,, and Walsh, E. S . (1988). “High Performance Graphics Processors for Medical Imaging Applications,” in “Proc. Int. Conf. on Parallel Processing for Computer Vision and Display,” (P. M. Dew, R. A. Earnshaw, and T. R. Heywood, eds.). Gonzalez. R. C., and Wintz, P. (1987). “Digital Image Processing,” 2nd ed., Addison-Wesley, Reading, MA. Gordon, D., and Reynolds, R. A. (1985). “Image Space Shading of 3-Dimensional Objects,” Computer Vision, Graphics and image Processing 29, 361-376. Gordon, D., and Udupa, J. K. (1989). “Fast Surface Tracking in Three-Dimensional Binary Images,” Computer Vision, Graphics and Image Processing 45, No. 2, 196-214. Grave, M., ed. (1990). “Proc. First Eurographics Workshop on Visualization in Scientific Computing,” Paris. Hearn, D., and Baker, M. P. (1986).“Computer Graphics,” Prentice-Hall, Englewood Cliffs, NJ. Herberts, I. (1989). “Realtime Transparent Volume Rendering on a Parallel Computer,” Master’s thesis, Afdeling Wiskunde en Informatica, Rijksuniversiteit, Leiden, the Netherlands. Herr, L.. ed. (1989). “Volume Visualization: State of the Art,” ACM Siggraph Video Review, issue 44 (videotape). Hersch, R. D. (1985). “Raster Rotation of Bilevel Bitmap Images,’’ in “Eurographics 85” (C. E. Vandoni, ed.), 295-307, North-Holland, Amsterdam. Hiltebrand, E. G. (1988). “Hardware Architecture with Transputers for Fast Manipulation of Volume Data,” m “Proc. Int. Conf. on Parallel Processing for Computer Vision and Display,” (P. M. Dew, R. A. Earnshaw, and T. R. Heywood, eds.). Hockney, R. W., and Jesshope, C. R. (1988). “Parallel Computers 2,” Adam Hilger. Bristol. England. Hohne, K . H., and Bernstein, R. (1986). “Shading 3D Images from CT Using Gray Level Gradients,” IEEE Trans. Med. Imaging 5 , 45-47. Horn, B. K . P. (1986). “Robot Vision,” MIT Press, Cambridge, MA. Huijsmans, D. P. (1983). “Closed 2D Contour Algorithms for 3D Reconstruction,” in “Eurographics 83,” (P. W. ten Hagen, ed.), 157-168, North-Holland, Amsterdam. Huijsmans, D. P., Lamers, W. H., Los, J. A,, Smith, J., and Strackee, J. (1984). “ComputerAided Three-Dimensional Reconstruction from Serial Sections,” in “Eurographics 84,” (B. Tucker, ed.), 3-13, North-Holland, Amsterdam. Huijsmans, D. P., Lamers, W. H., Los, J. A,, and Strackee, J . (1986). “Toward Computerized es,” The Anatomical Record 216, 449-470. Huiskamp, W., Elgershuizen, P. M., Langenkamp, A. A. J., and Van Lieshout, P. L. J. (1990). “Visualization of 3-D Empirical Data: The Voxel Processor,” in “Proc. Eurographics Workshop on Visualization in Scientific Computing,” (M. Grave, ed.). Iwata, H. (1990). “Artificial Reality with Force Feedback,” Computer Graphics 24, No. 4, 165170.
RECENT ADVANCES IN 3 D DISPLAY
227
Jackel, D. (1985). “The Graphics PARCUM System: A 3 D Memory Based Computer Architecture for Processing and Display of Solid Models,’’ Computer Graphics Forum 4, 2132. Jackel, D. (1988). “Reconstructing Solids from Tomographic Scans: The PARCUM I1 System,’’ in “Advances in Graphics Hardware 11,” (A. A. M . Kuijk and W. Strass, eds.), Springer-Verlag, Berlin. Jackins, C. L., and Tanimoto, S. L. (1980). “Oct-Trees and Their Use in Representing ThreeDimensional Objects,” Computer Graphics and Image Processing 14, No. 3, 249-270. Jansen, F. W. (1987). “Solid Modelling with Faceted Primitives,” Ph.D. thesis, Technische Universiteit, Delft. Johnson, E. R., and Mosher, C . E. (1989). “Integration of Volume Rendering and Geometric Graphics,” in “Proc. Chapel Hill Workshop on Volume Visualization” (C. Upson, ed.), Dept. of Computer Science, Univ. of North Carolina, Chapel Hill. Kaufman, A. (1986). “Memory Organization for a Cubic Frame Buffer,” in “Eurographics 86,” (A. A. G. Requicha. ed.), 93-100, Elsevier Science Publishers B.V. (North-Holland), Amsterdam. Kaufman, A. (1987a). “An Algorithm for 3 D Scan-Conversion of Polygons,” in “Eurographics 87,” (G. Marechal, ed.), 197-208, Elsevier Science Publishers B.V. (North-Holland), Amsterdam. Kaufman, A. ( 1 987b). “Efficient Algorithms for 3D Scan-Conversion of Parametric Curves, Surfaces and Volumes,” Computer Graphics 21, No. 3, 171-179. Kaufman, A. (1988a). “The CUBE Three-Dimensional Workstation,” in “Proceedings NCGA ’88 Conference,” 344-354. Kaufman, A. (1988b). “The CUBE Workstation - A 3-D Voxel-Based Graphics Environment,” The Visual Computer 4, No. 4, 210-221. Kaufman, A. (1991). “Volume Visualization,” IEEE Computer Science Press, Washington, DC. Kaufman, A,, and Bakalash, R. (1988a). “CUBE - An Architecture Based on a 3D Voxel Map,” in “Theoretical Foundations of Computer Graphics and CAD,” 40, (R. A. Earnshaw, ed.), Springer-Verlag. Berlin. Kaufman, A,, and Bakalash, R. (1988b). “Memory and Processing Architecture for 3 D VoxelBased Imagery,” IEEE Computer Graphics and Applicutions 8, No. 6, 10-23. Kaufman, A,, and Shimony, E. (1986). “3D Scan-Conversion Algorithms for Voxel-Based Graphics,” in “Proc. ACM Workshop Interactive 3D Graphics.” 45-76, ACM, New York. Kay, T. L., and Kajiya, J . T. (1986). “Ray Tracing Complex Scenes,” Computer Graphics 20, No. 4, 269-278. Keppel, E. ( 1975). “Approximating Complex Surfaces by Triangulation of Contour Lines,” IBM Journal oj‘ Research and Development 19, 2 - I I . Kong, T. Y., and Rosenfeld, A. (1989). “Digital Topology: Introduction and Survey,” Computer Vision. Graphics and Image Processing 48, 357-393. Laan, A. C., Lamers. W. H., Huijsmans, D. P., te Kortschot, A., Smith, J., Strackee, J., and Los, J . A. ( 1989). “Deformation-Corrected Computer-Aided Three-Dimensional Reconstruction of Immunohistochemically Stained Organs,” Anatomical record 244, 443-457. Levoy, M. (1988). ”Display of Surfaces from Volume Data,” lEEE Cornpurer Graphics and Applications 8, No. 2, 29-37. Levoy, M. (1990). “Efficient Ray Tracing of Volume Data,” ACM Transactions on Graphics 9, NO. 3, 245-261. Lobregt, S. (1979). “Logische operaties op 3 D Beelden.” Master’s thesis, Vakyroep Signaal/ Systeem Techniek, Technische Hogeschool. Delft, the Netherlands [in Dutch].
228
D. P. HUIJSMANS AND G. J. JENSE
Lobregt, S., Verbeek, P., and Groen, F. C. A. (1980). “Three Dimensional Skeletonization: Principle and Algorithms,” IEEE Trans. Pattern Anal. Machine Intell. 2, 75-77. Lorensen, W., and Cline, H. (1987). “Marching Cubes: A High Resolution 3D Surface Construction Algorithm,” Computer Graphics 21, No. 4, 163- 169. Lu, H. E., and Wang, S. P. (1986). “A Comment on ‘a Fast Parallel Method for Thinning Digital Patterns’,’’ Comrn. ACM 29, No. 3, 239-242. Mantyla, M. (1988). “An Introduction to Solid Modelling,” Computer Science Press, Rockville, MD. May, D. (1987). “Occam 2 Language Definition,” Technical Report, INMOS. Naylor, B. F., and Thibault, W. C. (1986). “Application of BSP Trees to Ray-Tracing and CSG Evaluation,” Technical Report GIT-ICS 86/03, School of Information and Computer Science, Georgia Institute of Technology, Atlanta. Newman, W. M., and Sproul, R. F. (1979). “Principals of Interactive Computer Graphics,” 2nd ed., McGraw-Hill, New York. Owczarczyk, .I.and , Owczarczyk, B. (1990). “Evaluation of True 3D Display Systems for Visualizing Medical Volume Data,” The Visual Computer 6 , No. 4., 219-226. Porter, T., and Duff, T. (1984). “Compositing Digital Images,” Computer Graphics 18, No. 3, 253-259. Post, F. H., and Hin, A. J. S., eds. (1991). “Proc. Second Eurographics Workshop on Visualization in Scientific Computing,” Technische Universiteit, Delft. Requicha, A. A. G. (1980). “Representations for Rigid Solids: Theory, Methods and Systems,” ACM Computing Surveys 12, No. 4, 437-464. Reynolds, R. A,, Gordon, D., and Chen, L. S. (1987). “A Dynamic Screen Technique for Shaded Graphics Display of Slice Represented Objects,” Computer Vision, Graphics and Image Processing 38, 27 5 -298. Rosenfeld, A,, and Kak, A. C. (1982). “Digital Picture Processing,” 2nd ed., Academic Press, Orlando, Florida. Sabella, P. (1988). “A Rendering Algorithm for Visualizing 3D Scalar Fields, Computer Graphics 22, No. 4, 51-58. Samet, H. (1990a). “The Design and Analysis of Spatial Data Structures,” Addison-Wesley, Reading, MA. Samet, H. (1 990b). “Applications of Spatial Data Structures,’’ Addison-Wesley, Reading, MA. Sandler, M. B., Hayat, L., and King, L. D. (1990). “Benchmarking Processors for Image Processing,” Microprocess. Microsyst. 14, No. 9, 583-588. Sedgewick, R. (1988). “Algorithms,” 2nd ed., Addison-Wesley, Reading, MA. Sloan, K. R., and Brown, C. M. (1979). “Color Map Techniques,” Computer Graphics and Image Processing 10, 297-317. Srihari, S. N. (1981). “Representation of Three-Dimensional Digital Images,” ACM Computing Surveys 13, No. 4, 399-424. Sun Microsystems (1 988). “TAAC-I Application Accelerator: User Guide,” Sun Microsystems, Inc., Mountain View, CA. Sutherland, I . E., Sproull, R. F., and Schumacker, R. A. (1974). “A Characterization of 10 Hidden-Surface Algorithms,” ACM Computing Surveys 6, No. 1, 1-55. Teunissen, W. J. M., and Van den Bos, J . (1990). “3D Interactive Computer Graphics; the Hierarchical Modelling System HIRASP,” Ellis Horwood. Thibault, W. C., and Naylor, B. F. (1987). “Set Operations on Polyhedra Using Binary Space Partitioning Trees,” Computer Graphics 21, No. 4, 153- 162.
RECENT ADVANCES IN 3D DISPLAY
229
Tiede, U., Hohne, K. H., Bomans, M., Pommert. A., Riemer, M., and Wiebecke, G. (1990). “Investigation of Medical 3 D Rendering Algorithms,” IEEE Computer Graphics und Applications 10, No. 2, 41-53. Trivedi, S. S. (1986). “Interactive Manipulation of Three-Dimensional Binary Scenes,’’ The Visual Compurer 2, 209-218. Trousset, Y . , and Schmitt, F. (1987). “Active-Ray Tracing for 3D Medical Imaging,” in “Eurographics 87,” (G. Marechal, ed.), 139- 150, North-Holland, Amsterdam. Tufte, E. R. ( 1990). “Envisioning Information,” Graphics Press, Cheshire, England. Tuy. H. K., and Tuy, L. T. (1984). “Direct 2-D Display of 3-D Objects,” IEEE Computer Graphics and Applications 4, No. 10, 29-33. Udupa, J . K. (1983). “Display of 3 D Information in Discrete 3D Scenes Produced by Computed Tomography,” Proc. IEEE 71. No. 3, 420-431. Upson, C., ed. (1989). “Proc. Chapel Hill Workshop on Volume Visualization”, Dept. of Computer Science, Univ. of North Carolina, Chapel Hill. Upson, C., and Keeler, M. (1988). “V-Buffer: Visible Volume Rendering,” Computer Graphics 22. NO. 4, 59-64. Veltkamp, R. C . (1991). “2D and 3D Object Reconstruction with the y-Neighborhood Graph,’’ Technical Repport CS-R9116, CWI, Amsterdam. Westover, L. (1989). “Interactive Volume Rendering,” in “Workshop on Volume Visualization,” (C. Upson, ed.), 9-16, Dept. of Computer Science, Univ. of North Carolina, Chapel Hill. Zhang, T. Y., and Sum, C. Y . (1984). “A Fast Parallel Method for Thinning Digital Patterns,” Comm. ACM 27. No. 3, 236-239.
This Page Intentionally Left Blank
ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS. VOL 85
Applications of Group Theory to Electron Optics YU LI Research Section of’ Applied Physics. Shanghai Institute of Mechanical Engineering. Shanghai. China
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . I1. M Function and Its Symmetry Group . . . . . . . . . . . . A . Some Concepts of Set Theory and Group Theory . . . . . . B . Group G x V of Transformations of ax: . . . . . . . . . C . Symmetry Group G, of an M Function p(P,[) . . . . . . . D . Constraint Relations Among the mth Partial Harmonic Potentials 111. Applications to Electrostatic Multipoles . . . . . . . . . . . A . The M Function for an Electrostatic Multipole . . . . . . . B . Transformations of an Electrostatic Multipole . . . . . . . . C . Induced Transformations of Its M Function . . . . . . . . D . Symmetry Transformations of an Electrostatic Multipole . . . IV . Applications to Magnetostatic Multipoles . . . . . . . . . . . A . The M Function for a Magnetostatic Multipole . . . . . . . B. Transformations of a Magnetostatic Multipole . . . . . . . C . Induced Transformations of Its M Function . . . . . . . . D . Symmetry Transformations of a Magnetostatic Multipole . . . V . A General Method for Deriving Constraint Relations . . . . . . A . Determination of the Symmetry Group G , of a Multipole . . . B . Determination of Constraints of the Symmetry Group G, . . . C . Concrete Examples . . . . . . . . . . . . . . . . . . . Appendix: Application to Algebraic Reconstruction Techniques . . References . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . . . . . . . . .
. . . . . . . . .
231 233 233 234 236 239 241 241 241 243 245 245 245 247 248 250 251 251 252 253 256 257
I . INTRODUCTION Electrostaticand magnetostaticmultipoles withcontrol vector( = (<. . . . . . cu) have been applied to correct aberrations in electron optics. These multipoles have many symmetries in the Cartesian product space W x Z of the ordinary three-dimensional Euclidean space 9 and an w-dimensional linear space Z consisting of all control vector . It is well known in mathematics that group theory presents a powerful mathematical tool to treat symmetries . In order to discuss the constraints of these complicate symmetries on harmonic potentials of the multipole we cannot but utilize the rigorous and powerful method of group theory . As is known in Ximen and Li (1982) and Ohiwa (1985) the electron optical properties of a multipole is. in essence. determined by the 23 1
Copyrighl Q 1993 by Academic Press. Inc . All rights of reproduction in any form reserved . ISBN 0-12-014727-0
232
YU LI
special relations satisfied by its harmonic potentials. In fact, these relations are just the constraints of the symmetry group of this multipole on its harmonic potentials. Therefore, in order to choose an appropriate multipole to correct some aberrations, it is necessary to set up a rigorous and general method for deriving these constraint relations from the symmetry group of this multipole. We shall solve this important problem in following steps: 1. Define the M function cp(P,E ) for the multipole. 2. Derive the constraint relations among the mth partial harmonic potentials Q m , , . . . , Qmw of the multipole determined by a symmetry transformation of its M function. 3. Derive the induced symmetry transformation of the M function from the symmetry transformation of the multipole, and hence from the symmetry group Gp of the multipole we can determine the symmetry group G, of its M function. 4. From the generating set of G, we can derive all the constraint relations of G, on the mth partial harmonic potentials. Hence from the constraint relations among the mth partial harmonic potentials we get the constraint relation for the mth harmonic potential.
It is important to note that there is no approximation in the proofs that follow. Therefore this method is rigorous. Furthermore, by this method, starting only from the symmetry group of the multipole, without knowing the concrete form of the multipole and its M function, one can derive the constraint relations of its harmonic potentials. Hence this method is general. It is just due to this situation, we can, moreover, add some additional conditions among the controlling variables El, . . . ,E, or give some concrete form of the multipole to obtain more useful constraint relations of their harmonic potentials. Since the symmetry group of the M function of a multipole, especially its generating set, is very important for the determination of the constraint relations, we had best find all the possible types of the symmetry group expressed by their generating set; hence the author derived them (Li, 1988). Most results of Li (1988) are helpful for us to find and understand the generating set of the symmetry group. Its deduction arguments are too abstract in mathematics and hence are omitted in this chapter. In Section 11, the M function (p(P,E) for a multipole and the possible types of the symmetry group G, of the M function are introduced, some important theorems (for example, theorem about the constraint relations among the mth partial harmonic potentials determined by a symmetry transformation of the M function) are given. In Sections I11 and IV, the symmetry transformation of the M function for a multipole induced by a symmetry transformation of the multipole is derived. Since the electric
APPLICATIONS OF GROUP THEORY TO ELECTRON OPTICS
233
potential is a scalar field and the magnetic scalar potential is a pseudo-scalar field, furthermore, under reflection the behavior of a magnetostatic multipole is more complicated than that of an electrostatic multipole and the conclusions and arguments of proofs for the two cases are quite different. Therefore, we discuss the electrostatic multipole in Section 111 and the magnetostatic multipole in Section IV. In Section V, a general method for determining the constraints of the symmetry group G, of a multipole on its harmonic potentials is given. In the appendix, an introduction of the application of group theory to algebraic reconstruction techniques is given.
11. M FUNCTION AND ITS SYMMETRY GROUP
A . Some Concepts of Set Theory and Group Theory
Some concepts and symbols of set theory and group theory used in this chapter are given. They may be found in Jacobson (1974). 1. Sel g E G: g is an element of the set G.
{ g l P ( g ) } :The set of all g such that P ( g ) holds. U G E G: A The union of a collection A of sets, it is the set of elements belonging to at least one set of the collection A . A C: The intersection of a collection A of sets, it is the set of elements belonging to all sets of the collection A . X:’=, G;: The Cartesian product of sets G I , . . . , G,, it is the set of all ordered n-tuples ( a l , .. . , u,), where ui E Gi (i = 1 , . . . ,n).
nGE
X:=,Ci may be written as G I x G2 x . . . x G,. If A is a finite collection of sets G I , . . . , G,, U G E G A and n G E A G may be written as G I u C 2 U . . . U G, and G, n G2 n . . . n G,, respectively. 2 . Group
A group G is defined by the following conditions: 1. G has a binary operation xy defined on it 2 . The operation is associative: (xy)z = x ( y z )
for all x , y , z E G.
234
YU LI
3. G has an identity element 1: Ix = XI= x for all x E G . 4. Every element x E G has an inverse x-I: xx-1 = x-1x = 1.
A subgroup of a group G is defined as a subset of G that is a group relative to the binary operations in G . The subgroup [L] generated by a nonempty subset L of a group G is defined to be the intersection of all subgroups of G that contain L. The set L is called a generating set of [L].If L is the set { l 1 , .. . ,ir}, we write [L]= [II,.. . , l,]. If G I , . . ,G, are n groups, and for any ( a l , .. . ,an) and ( b l , . . . ,b,) in XY=,Gj we define ( a l , .. . ,a,)(bl,. . . , b n ) to be ( a l b l , .. . ,a,$,). Then XY=lGj is a group, called the direct product of the group G I, . . . , G, under this binary operation. A map from a set S to S is called a transformation of the set S . In this chapter we consider groups of invertible transformations with composition of two transformations as its binary operation. Hereafter, for any set S , the identity transformation of S is simply denoted by 1. B. Group G x V of Transformations of 9 x E 1. A Group G of Orthogonal Transformations of .%
Consider the following orthogonal transformations of the ordinary threedimensional Euclidean space .% expressed in a cylindrical coordinate system (PI o,z>:
1. A rotation C ( a ) by angle
(Y
radian about the z axis:
for all (p,B,z) E 9 2. A reflection R ( a ) in the plane containing the half-plane 0 = a: C(a)(p,B,z)= (p,B+a,z),
R ( a ) ( p , B , z ) = (p,2a - B , z ) , for all (p,B,z) EB. Thus we have (Li, 1988), for any real numbers a, a l , and a2,
+ 2k7r) = C ( a ) , R(a+kr) = R(a),
C(a
(1)
(2)
k = integer,
(3)
k
(4)
= integer,
C ( a , > C ( a 2= ) C(Ql + a211
(5)
R(QI)R(az) = C[2(a1 - .2)1,
(6)
C(aI)R(a2) = R(Q2 + + I )
[C(a)]-l= C ( - a ) ,
= R(a2)c(-QI),
(7)
[ R ( a ) ] - '= R ( a ) .
(8)
APPLICATIONS OF GROUP THEORY TO ELECTRON OPTICS
235
Notations: 1. G denotes the group generated by all the above rotations C ( a ) and
reflections R ( a ) .
2. C,, = C(27r/n),
R, = R(7r/n)l
n # 0,
(9)
Then, by Eqs. (3-9), we have (Li, 1988)
G = { C J n 2 I } u {R,,ln 2 I}.
(10)
2. A Group V of Linear Transformations of E Consider the following linear transformations of E: For each p E { 1 , . . . ,w},let N,, be the linear transformation of E satisfying X = I , . . . ,w, for any
(N&)A = (-l)6eA(~,
< E Z,
( 1 1)
where SbA is Kronecker delta. Let S, be the group formed by all permutations of the set { I , . . . ,w}.For each p E S,, let a( p ) be the linear transformation of E satisfying X = 1 . . . . w, for any
(a(p)E)A = E ( p - l A ) ,
< E E,
(12)
Hence we have (Li, 1988)
4Pl)4P2) =4PlP2),
[4P)l-I
=
4P - %
(13) (14)
and for all A, p E { 1 , . . . ,w},for all p E S, we get (Li, 1988) N;
=
1,
NA
#
1,
N ANp = N,, N A ,
N A 4 P ) = “(P)N(p-IA).
Notations: 1 . V denotes the group generated by all the above Np and a(p ) . def 2. ( p ) = {A,, . . . , X r } , if p is a cycle (XI . . . X r ) .
By Eqs. (13-17), we obtain (Li, 1988)
Furthermore we can prove the following theorem (Li, 1988).
(15) (16)
(17)
236
Y U LI
Theorem 1. Zfv E V , we have
j= I
’€(pJ)
where p l , . . . ,pr are disjoint cycles satisfying
and k, = 0 or 1 ( u = 1 , . . . ,w). I f furthermore v2 = 1, then for each j E { 1, . . . , r } , p, is a two-cycle or one-cycle, and when pj is a two-cycle, say, pi = (ap),we have
k,
= k,.
(21)
Since P I , . . . , p r in Theorem I are disjoint, therefore by Eqs. (11, 12). [ O ( ~ , ) I I ~ ~ ( , , ~influences ) N , ~ ” ] only the set (pi), and for any i , j E { 1,. . . , r } ,
3. The Group G x V of Transformations of 9 x S For any g E G, for any v E V , the transformation (g,v) of the Cartesian product space 9 x E is defined as follows ( g ,. ) ( P I
E ) = w,v‘9,
( P ,E )
€
9 x E
(23)
The group formed by all these transformations (g,v) with composition of transformations as its binary operation is the direct product G x V of groups G and V . The ( g , 1) and (1, v) may be abbreviated to g and v, respectively. Hence by Eq. (23) we have (g,v)
= gv = vg.
(24)
C. Symmetry Group G, of an M Function p(P,6 ) Let R be a bounded fine cylindrical vacuum region in 9. In the following we always choose the cylindrical coordinate system ( p ,8, z ) such that its z axis is the rotation axis of R and its origin is in R.
APPLICATIONS OF GROUP THEORY TO ELECTRON OPTICS
237
1 . M Function p(P,[) An M function p(P,E) is a function satisfying (Li, 1988) 1. w
cp(P,<)= C
(P,O E Q x
z;
(25)
A= I
2. For each X E { 1 , . . . w } , pAis harmonic in R and is not rotational about the z axis; 3. For any p, X E { 1,. . . ,u}, if p # A, then pp# IZ pA.
Now give its physical interpretation as follows (Li, 1986). An electrostatic (or magnetostatic) multipole with control vector ( = (<, , . . . , Eu) is an ordered set of s poles arranged around the cylindrical vacuum region 0. We assume that w < s and for each k E { I , . . . ,s}, the excitation q k supplied to the kth pole is a linear function of c1 . . &, as follows ] .
]
w
where WkA (k = 1 , . . . ,s; X = 1 , . . . ,w)are real constants. This set of Eq. (26) is called the control relation for the multipole. In Sections I11 and IV, we shall prove that the potential cp of a multipole at a point P in R produced by this multipole is a function of P and as in Eq. ( 2 5 ) , where cpA is the potential field produced by the multipole when = 1 and other components of are 0. The function p(P,<)is called the Mfunction for the multipole, which, for the useful multipole, satisfies the previous mathematical definition of the M function. In the cylindrical coordinate system ( p , 0, z)we can expand cpA and p ( P 1I ) as follows (Li, 1986):
<
where the inner product of any two complex numbers a and b is denoted by ( a ,b) (Li and Ximen, 1982); QmA (m= 1 , 2 , 3 , . . . ; X = 1 ] . . . w) are complex-valued functions, QOA (A = I , . . . , w ) real-valued functions, and w
238
YU LI
Thus, the mth harmonic potential am of a multipole is a linear combination of the mth partial harmonic potentials am,. . . am.It is easy to prove that = (%P)I
(%YP) = (Y%P)l
(30)
where the upper bar indicates the complex conjugate. Thus by Eqs. (27, 28) the relation between am and {aml,. . . @ m u } , Eq. (29), is unchanged when the cylindrical coordinate system rotates by any angle about the z axis. 2. Transformation T ( g , v ) of the M Function p ( P , ( ) The transformation T(g,v) of the M function p ( P ,E ) is defined as follows (Elliott and Dauber, 1979): for all (PI()E a x = .
T ( g ,w)p(PlE ) = p(g-'P, v-lt),
(31)
Thus we have, for all g, gl g2 E G, for all w,v l , v2 E V , T ( g ,I VI 1T k 2 ,212) = T(g1g2I v1 V2) I (T(g,u)]-'= T(g-', .-I).
(32) (33)
By Eq. (24) T ( g ,v) may be denoted by T(gw). 3. The Possible Types of the Symmetry Group G, of the M Function p In this chapter we consider the following symmetry group G , of the M function p: G, = {T(gv)lg E GI
E
VI
T(gw)cp= p).
(34)
We can prove the following theorems (Li, 1988).
Theorem 2. r f n is a positive integer and T(Cnw)E G,, then 21"
= 1.
(35)
Theorem 3. For any real number s, if T(R,v) E G,, then v2 = 1.
(36) By Theorems 1 and 3, the generator T(R,w) is very convenient to be treated. Therefore one prefers to express the symmetry group G, with more generators of type T(R,v) to the utmost.
Theorem 4. We can choose an appropriate basic half-plane 0 = 0 for a cylindrical coordinating system ( p ,0, z ) such that G, can be expressed as one of the following types:
APPLICATIONS OF GROUP THEORY TO ELECTRON OPTICS
wheren=2,3,4, ...;
V,V~,V,,E
V
239
un = v 2l = v ,2= l .
Li (1988) remarks: If Clv E G, then C,v = 1. Consequently, only two following types of generators for the symmetry group of the M function should be considered: T(C,V),
where n
= integer,
T(R,u),
(38)
and v takes the form described in Theorem 1.
D. The Constraint Relations Among the mth Partial Harmonic Potentials
nut
Theorem 5. If T(g nS=lo(p J ) ( p J )N:) is a symmetry transformation of an M function p ( P ,t),where g = C, or R, with n being integer; p I, . . . ,pr are disjoint cycles satisfying Eq. (20);k , = 0 or 1 (v = 1,. . . ,w ) , then for each , . . . , amw m E { 0 , 1,2, . . . , }, among the mth partial harmonic potentials am1 hold the ,following constraint relations: 1. When g = C,, these relations are
(- 1 l k A@mX 2. When g
=
e~2m?r/n--@m(p,~),
x ~ ( p j ) , j=I,...,r.
(39)
j = li...,r.
(40)
R,, these relations are
(- l)kA5mXe‘2mx/n-@m(p,~)i
E
(Pj)i
Proof: Eqs. (1, 2, 8, 9, 27) yield
By Eq. (30),we can rewrite Eqs. (41, 42) as follows:
240
YU LI
Since p l , . . . ,pr are disjoint cycles satisfying Eq. (20), so by Eqs. (1 1-16, 23, 25, 31) we have
EZl nuE(PJ)
Now T(g o(p j ) N?) is a symmetry transformation of the M function p(P,E ) , so by the preceding equation we have
cC i=l
XE(P,)
Cc r
r
(-l)kA<(P,A)VA(g-'P)=
t(pJA)V(PJA)(p).
(46)
j=1 AE(P,)
Since [ in Eq. (46) is an arbitrary element of Z, Eq. (46) yields (Li, 1986) (-PV'Px(g-lp) = V(P,A)(P)l
(47)
where X E (pi),j = 1 , . . . , r, and P is arbitrary in R. We now derive Eqs. (39,40) from Eq. (47). When g = R,, from Eqs. (47, 44, 27) we get
which holds for any real value of 0. Hence the corresponding coefficients of
24 1
APPLICATIONS OF GROUP THEORY TO ELECTRON OPTICS
the trigonometric series on both sides of the preceding equation are equal:
which hold for any p E [0,c), here c is some positive number. Accordingly, the corresponding coefficients in the power series on both sides of Eq. (49) are equal, thus we get Eq. (40). Similarly we can derive Eq. (39).
111. APPLICATIONS TO ELECTROSTATIC MULTIPOLES
A . The M Function for an Electrostatic Multipole
An electrostatic multipole with control vector E = (El , . . . ,Ed) is an ordered set of s solid conductors, arranged around a cylindrical vacuum region R. We assume that w < s and for each k E { 1 , . . . , s}, the excitation potential q k supplied to the kth conductor is a linear function of variables El, . . . , as in Eq. (26), which is called the control relation for the electrostatic multipole. Since the multipole is in reality of finite extent, the electric potential field cp outside the previously mentioned conductors produced by the multipole is (Glaser, 1952)
<,
k= 1
where & is the potential field produced by the multipole when the excitation potential q/, of the kth conductor equals 1 and excitation potentials of other conductors equal 0. Thus by Eq. (26) we know that the potential cp at a point P in R produced by such a multipole with control vector ( is a function cp(P,E) of P and as Eq. (25), which is called the M function for the electrostatic multipole.
<
B. Transformations of an Electrostatic Multipole Let there be only one electrostatic multipole in 2.The transformation (g,v) of this multipole is defined as follows. Under the transformation (g,v) of the Cartesian product space 9 x E , the conducting material with its order at
242
each point P of point P* with
YU LI
W is assumed to be transferred to the corresponding new
P* = g P , (51) at each [ of Z are assumed to be and the excitation potentials ( q l , . . . ,a) transferred to the corresponding new [* with [* = v[. ( 52) The preceding transformation ( g , w) changes a multipole to some new one that we are going to derive. Being an orthogonal transformation of W,g does not change the distance between any two points of W (Ilyin and Poznyak, 1984). So, for each k E { 1 , . . . ,s}, g changes the kth old conductor of this multipole to a kth new conductor with the kth old conductor surface being changed to the kth new conductor surface, and hence, the old multipole of s ordered poles is changed to a new multipole of s ordered poles. Furthermore, v transfers the excitation potentials (ql, . . . ,qs)from corresponding to the old control vector [ to corresponding to the new control vector [* = v[; i.e., transforms the old control relation given by Eq. (26) into the new control relation 77; = A;([*),
k = 1,... ,s,
(53)
which satisfies
A= 1
for any k E { 1, . . . ,s}. Therefore we have
X=l
Now (q;,. . . ,Q;) and [* are the excitation potentials and control vector for this new multipole, respectively, so the control relation for this new multipole may be written as (Li, 1987) W
by replacing the notations T i , . . . ,71,* and <* in Eq. (55) with u l , . . . , qs and <, respectively. Let w take the following form w
u= 1
we have, by Eqs. (1 1, 12, 14, 15),
APPLICATIONS OF GROUP THEORY TO ELECTRON OPTICS
(w-'<)~= (-l)kA
= 1 , . . . ,w, for any
E E E,
243 (58)
The transformation g of a multipole can be easily understood. By comparing Eq. ( 5 6 ) with Eq. (26), we see that the transformation u of a multipole can be realized by replacing each Ex with ( v - ' < ) ~for any E E Hence, the transformation w of the form in Eq. (57) can be realized by replacing each Ex with (- I ) k A E ( p Xfor ) any E E.Hence we get the physical meaning of Np and o(p ) as follows.
<
<
Proposition 1. For a given multipole with control vector = (tl,.. . ,EU), N,, means to change the sign of the pth controlling variable E,, only, u(p ) means to carry out the permutation p of the ser (El, . . . ,EU}. Henceforth, the transformations ( g , 1) and ( 1, u) of an electrostatic multipole are abbreviated to g and v, respectively. Then we have Eq. (24). C. Induced Transformations of Its M Function
Given a rectangular coordinate system (0,(e,)} for W,with its origin 0 lying on the z axis and its associated orthonormal basis being (e,) = ( e l , e 2 , e 3 )Then . the position vector r = OP of a point P with coordinates (x,) E ( X I , x2, x3) can be expressed as follows: 3
r=
C x,e,.
(59)
a= I
An orthogonal transformation g (= C, or R,) of W changes each point P of W to the new point P* = g P , each position vector r to the new position vector r*:
(1=
I
with and hence changes the old rectangular coordinate system (0,(e,)} to the new rectangular coordinate system (0,(e:}. Thus the (point) transformation P + P' = gPofWinducesacoordinatetransformation ( 0 ,(e,)} -+ ( 0 ,(e:)} of 9. A transformation gv of an electrostatic multipole induces a corresponding transformation of its M function, which we shall derive in the following. The M function of the old electrostatic multipole is denoted by cp(P,().
244
YU L1
First, let this multipole undergo transformation g (= C,, or R,,).Thus its excitation potentials (q,,. . . , qs) and control vector remain unchanged, the old multipole of s ordered conductors changes to a new multipole of s ordered conductors. By Eqs. (59, 60), the coordinates of the point P* of the new multipole expressed in { 0 , ( e : ) } are just the coordinates (x,) of the corresponding point P of the old multipole expressed in ( 0 ,( e , ) } . Therefore the equations of the conductor surfaces of the new multipole expressed in ( 0 ,(e:)} are just the equations of the conductor surfaces of the old multipole expressed in { 0,( e , ) } , say,
<
f k ( x , ) = 0, k = 1 , . .. ,s, (62) Let 'pf(P)and 'p;(P*)be the potential fields produced by the old multipole and the new multipole, respectively, and let
CPC(~,)= ~ ~ ( P ) l ( e o ) , C P ; ( X ~= )
q;(P*)l(e;),
(63) (64)
where P* = gP, and subscripts (e,) and ( e z ) indicate the representations in ( 0 ,( e , ) } and ( 0 ,(e:)}, respectively. Since all multipoles are in reality of finite extent, the potential field ' p f ( x a ) outside the conductors of the old multipole and the potential field ' p ; ( x , ) outside the conductors of the new multipole are the solutions of the same Laplace equation (Jackson, 1975): (65) satisfying T%2)lfk(xu)=O
=
and lim $(x,) = 0, T'OO
So, by the uniqueness theorem, we 'p;(xa)= ' p < ( X a ) .
(68)
Thus, by Eqs. (63, 64), (69) where P* = gP, with g = C,, or R,. Since the electrostatic potential is a scalar field, we have (Arfken, 1985) v;(P*)l(e:)= vc(P)I(en),
'p;(p*)l(e:) = 'p;
(P*)l(e,)*
Hence Eq. (69) gives q;(P)I(ee)= 'p<(g-'p)I(e").
(70)
APPLICATIONS OF GROUP THEORY TO ELECTRON OPTICS
245
which is just the representation of the following equation in a given rectangular coordinate system:
C P p ?= (PdR-IP).
(71) Both sides of Eq. (71) are scalar fields, so it holds for any rectangular coordinate system. Now the in q ( P ) and cp;(P) is an arbitrary element of E,so by Eq. (71), the M function for the new multipole is
<
CP*(PIt) = cp(g-'p,t). (72) Next, let this new multipole undergo a transformation v, then, by Eq. ( 5 6 ) , we get another new multipole with its M function being
cp**(P,t)= c P ( R - k ~ - ' o . Accordingly, we get the following important proposition (Li, 1987).
(73)
Proposition 2. A transformation g71 of an electrostatic multipole induces a corresponding transformation T(gv) of its Mfunction p(P , E ) defined by Eq. (31). D. Symmetry Transformations of an Electrostatic Multiple A transformation gv is called a symmetry transformation of a given electrostatic multipole, if gv does not change the distribution of its conductors and that of its excitation potentials in 9 no matter what value the control vector of the old multipole takes. Here the orders of these conductors are not considered in the previous distributions. Since the electrostatic potential field outside the conductors of an electrostatic multipole is uniquely determined by the distribution of its conductors and that of its excitation potentials irrespective of the orders of these conductors, the M function p(P,E) for a multipole is unchanged under the symmetry transformation gv of the multipole. So, by Proposition 2, we get the following proposition (Li, 1987). Proposition 3. I f g v is a symmetry transformation of an electrostatic multipole, then T ( g v ) is a symmetry transformation of its M function. IV. APPLICATIONS TO MAGNETOSTATIC MULTIPOLES A . The M Function for a Magnetostatic Multipole A magnetostatic multipole with control vectors t = (I,, . . . , td)is an ordered set of s coils of wire wound around s cores that are arranged around a
246
YU LI
cylindrical vacuum region R. We assumed that each turn in the s coils is nearly closed. Along any coil, a reference current direction is given, and points of the coil are ordered so that the reference current direction is from its initial terminal to its final terminal. Therefore the reference current direction is determined once the initial terminal is given. We assume that w < s and for each k E { 1,. . . s}, the excitation current qk supplied to the kth coil is a linear function of controlling variables El,. . . Ew as in Eq. (26), which is called the control relation for the magnetostatic multipole. The conduction current distribution j, of the magnetostatic multipole is given by us as follows (Li, 1992):
where rk(Sk) describes the geometrical distribution of the kth coil with s k being the arc length measured from its initial terminal, prk(r - rk) denotes the projection of r - rk on the plane perpendicular to drk/dsk, S[prk(r- rk)] is the two-dimensional Dirac delta function (Arfken, 1985) in the previously mentioned plane. Hence if the magnetic material of the s cores works in a range such that
B = pH, (75) where p is independent of the magnetic field, then from relations (Jackson, 1975; Glaser, 1952)
B = -V$,
(78) we see that, by taking the magnetic scalar potential at the origin to be 0, the magnetic scalar potential field $ in the region R produced by the multipole is
where 4 k is the magnetic scalar potential field in R produced by the multipole when the excitation current r]k of the kth coil equals 1 and that of other coils equal 0. Thus, by Eq. (26), the magnetic scalar potential $ at a point P of 0, produced by the multipole with control vector E, is a function of P and E as follows: w
$JV,E) = CEX$X(P), X=l
(P10 E 0 x
5
(80)
APPLICATIONS OF GROUP THEORY TO ELECTRON OPTICS
247
where
This is the magnetic scalar potential field produced by the multipole when = 1 and all other components of equal 0. The function @(P,<) is called the M function for the magnetostatic multipole.
<
B. Transformations of n Magnetostatic Multipole Let there be only one magnetostatic multipole in W.The transformation gv of this multipole is defined as follows. Under the transformation (g,u)of the Cartesian product space W x E , the conducting and magnetic material with its order at each point P of W is assumed to be transferred to the corresponding new point P* satisfying Eq. (51), and the excitation currents (ql,. . . , 71,)at each [ of 3 are assumed to be transferred to the corresponding new <* satisfying Eq. (52). Following the arguments given in the case of electrostatic multipole, we can prove that the above transformation (g,w) changes an ordered set of s coils of wire wound around s cores into a new ordered set of s coils of wire wound around s new cores and changes the old control relation, Eq. (26), into the new control relation, Eq. (56). Therefore Proposition 1 holds also f o r the magnetostatic multipole. Hereafter, the transformations (g, 1) and ( 1 , w) of a magnetostatic multipole are abbreviated to g and TI, respectively, and Eq. (24) holds. By definition, the transformation g is order preserving. So, under transformation g, the initial terminal of any old coil is transferred to the initial terminal of its corresponding new coil and hence the reference current direction of the new coil is determined. By Eqs. (56, 24), the transformation g of the multipole does not change the control relation. Thus, for a given control vector = ( E l , . . . ,<"), the excitation currents (q,,. . . , q 3 ) are unchanged under transformation g. However, under transformation g, the rotation sense of any circulation in each nearly closed turn of any coil is unchanged if g is a rotation, and is reversed if g is a reflection (Arfken, 1985), so do the rotation sense of reference current direction and excitation current in each turn of any coil. Here we must note that, under the transformation g, the screw sense of any coil is unchanged if g is a rotation and reversed if g is a reflection (Schouten, 1954). Now instead of R, we consider the following transformation (Li, 1992):
<
R , = R,N = N R , ,
(81)
248
YU LI
where N = NI . . . Nu,
(82)
By Eqs. (81, 82) and Proposition 1, we see that R,, is such a transformation, under which the multipole is subjected to transformation R,, and at the same time each q k is replaced by ( -qk), respectively ( k = 1, . . . ,s), hence, in each turn of any coil, the absolute value and rotation sense of excitation current remain unchanged. C. Induced Transformations of Its M Function
A transformation gv of a magnetostatic multipole induces a corresponding transformation of its M function, which we shall derive in the following. The M function of the old magnetostatic multipole is denoted by $(Pl 5). First, let this multipole undergo transformation g (= C,, or R,,). Thus, its excitation currents (ql, . . . , qs) and control vector f remain unchanged. The old multipole of s ordered coils and cores changes into a new multipole of s ordered coils and cores. By Eqs. (74, 59, 60), the new conduction current distribution j;(r*) is (Li, 1992)
where r* = gr, ri = grk and prk(r*- ri) denotes the projection of r* - ri on the plane perpendicular to dr;/dsk. Here, by Eqs. (59, 60), the coordinate of the point P*(=gP) of the new multipole expressed in ( 0 ,(e:)} are just the coordinates of the corresponding point P of the old multipole expressed in { 0,( e a ) } .Therefore we have (Li, 1992) ~ ( x a=) P(f'*)I(e;)
= dP)l(eo),
(84)
[ x k y ( s k ) l = ri(sk)l(e:)
= rk(sk)l(e,).
(85)
Hence by Eqs. (85, 83, 74), we get 1.&(xa)1 =j;(p*)l(e:)
= j&P)l(en).
(86)
Let Be and B; be the magnetic induction produced by the old and new multipoles, respectively, and let [B
(87)
l~;o(xo)l= B;(P")l(c,,
(88)
where P* = g P , then [Beo(xa)] and [B;p(x,)] are solutions of the same set of
APPLICATIONS OF GROUP THEORY TO ELECTRON OPTICS
249
following equations:
where =
{
if a , P , y is an even permutation of 1, 2, 3; if 0, p, y is an odd permutation of 1, 2, 3; otherwise.
fl,
1 0,
-
(91)
Since a magnetostatic multipole in reality is of finite extent, all its conduction and magnetization currents are confined to a bounded region. Hence, its magnetic induction B tends to 0 at infinity as quickly as l/lr13 (Jackson, 1975). Thus, we can prove that for given distributions p(x,), and j E r ( x a ) there is only one solution of Eqs. (89, 90) for [BEo(x,)].Therefore, we have (Li, 1992) =
P&N)I
P<JX,)l.
(92)
Let y+ and 11; be the magnetic scalar potential fields in region R produced by the old and new multipoles, respectively, and let $<(xa) = +f(P)l(e,,)j
PE
$;(xo) = $:(P*)I(e;,l,
P'
a1
(93)
R,
(94)
E
where P ' = gP, then q+(x,) and $:(xa) are solutions of the same set of the following equations: =
-alClc/ax
1,
P
= 1,2,3,
CL<(Ol 0,O) = 0.
(95) (96)
So we have @(Xa)
Thus Eqs. (93, 94) give
= @&CJ?
+: (P') I
(xn) E 0.
(e:,) == +<(P)I(e,).
(97) (98)
Hence,
+;(PII ( e 3
= + < ( g - ' P )I(e,).
(99)
By Jackson (1975), we see that the magnetic scalar potential is a pseudo-scalar
250
YU L1
field, so we have (Arfken, 1985) if g = C,, if g = R,. Eqs. (99, 100) give if g = C,,
if g = R,, which is just the representation of the following equation in a given rectangular coordinate system:
Both sides of Eq. (101) are pseudo-scalar (Arfken, 1985), so, Eq. (101) holds in any rectangular coordinate system. Now [ in Eq. (101) is an arbitrary element of 3. Therefore, after transformation g the M function for the new multipole is (Li, 1992)
Next, let this new multipole undergo a transformation v, then by Eq. (56) we get another new multipole whose M function is
By Eqs. (103, 25, 81, 82), we get the following important Proposition (Li, 1992).
Proposition 4. A transformation hv ( h = C, or R,) of a magnetostatic multipole induces a corresponding transformation T b(hv) of its M function +( P, E ) as follows: T*(C,u) = T(C,w),
T*(R,v) = T(R,v),
(104)
where operator T(gv) is deJined by Eq. (31). D. Symmetry Transformations of a Magnetostatic Multipole
A transformation hv ( h = C, or R,) is called a symmetry transformation of a given magnetostatic multipole, if it does not change the distributions of its
APPLICATIONS OF GROUP THEORY TO ELECTRON OPTICS
251
magnetic material, conducting wire, and excitation currents in 9 no matter what value the control vector of the old multipole takes. Here the orders of these coils and their cores are not considered in these distributions. Since the magnetic scalar potential in the region R produced by a magnetostatic multipole is uniquely determined by the distributions of its magnetic material, conducting wire, and excitation currents in 9 and the assumption that the magnetic scalar potential at the origin is 0 , hence, by Proposition 4 and the definition of a symmetry transformation of a magnetostatic multipole, we have the following proposition (Li, 1992).
Proposition 5. A symmetry transformation hv ( h = C, or R,) of a magnetostatic multipole induces a symmetry transformation T*(hv)of its M function as defined by Eq. (104). V. A GENERAL METHODTO FINDCONSTRAINT RELATIONS A . Determination of the Symmetr.v Group Gp of a Multipole
Here we consider only such a multipole, the excitation vk of which equals just one of the controlling variables C I , . . . ,CU ( k = 1 , . . . ,s). Other than the symmetry group G , of the M function p of the multipole, the corresponding symmetry group Gp of the multipole can be seen directly, and the former can be easily derived from the latter by Propositions 3 and 5. Therefore, it is important to find the symmetry group G,, of the multipole. Then, how do we determine the symmetry group C, of the multipole from its geometrical distributions of conducting material, magnetic material, and controlling variables C l , . . . ,CU in a? First, in view of Theorem 4, the group G , takes only one of types described by Eq. (37). Then, by Proposition 3, the corresponding symmetry group Gp of an electrostatic multipole may probably be one of the following types:
4. [Rnvn,R ~ v t l ,
v" = vl2 = v;' = IE.AndbyPropwheren = 2 , 3 , 4 , . . .; v , v l , v nE V osition 5, the corresponding symmetry group Cpof a magnetostatic multipole may probably be one of types described by Eq. (105) with symbol R being replaced by R . Here, we must emphasize that the transformation Rn may be considered as R, with the absolute value and rotation sense of current in each turn of any coil remaining unchanged. It is worthy to note that the generating set of any one of these types of groups contains at most two elements.
252
YU LI
By the definition of a symmetry transformation of a multipole, if gw were a symmetry transformation of a multipole, this transformation would not change the distributions of its conducting material and magnetic material in g . Since w does not influence the distribution of conducting material and magnetic material, so g must be a symmetry transformation of the multipole with respect to the distributions of conducting material and magnetic material only; that is, without considering the distributions of controlling variables E l , . . . ,E, in 9. Though the converse is not necessarily true, we had better first find the symmetry transformation g of the multipole without considering the distributions of controlling variables. Ifg is C, or the multipole is electrostatic, we prefer to find the w so that gv does not change the multipole. If such w exists, the multipole has the symmetrygw. Ifgis R, and the multipole is magnetostatic, we must replace R, by R,, then try to find the w so that R,v does not change the multipole. If such w exists, the multipole has the symmetry R,v. This w may be expressed by Eq. (19). Here, in finding w, we must utilize Eqs. (13-17) and Proposition 1 and must note that the product wIv2 of two transformations wl,w2 means to perform first w2, then wl.If the multipole possesses a reflection type symmetry, say R ,w 1 for an electrostatic multipole or R,vl for a magnetostaticmultipole, relativetosomeplane,wehadbettertakeitasagenerator. Here we have chosen a half-plane lying in this plane as the basic half-plane. Then we try to find another reflection type symmetry that cannot be generated by the first reflection type generator. If no such new symmetry exists, we have G, = [R1wI] (or G, = [l?lvl]). If this new reflection type symmetry does exist, we must find the second reflection type generator R,u, (or R,w,) that cannot be generated by the first reflection type generator, and R, is a reflection in a plane with smallest intersection angle with the basic half-plane. Thus we have Gp = [ RIw1, R,w,] (or G, = [R1vl,R,w,]). If the multipole contains no reflection type symmetry, we must trytofindallitsrotationtypesymmetriesC(B)w.ThenwehaveG, = [C,w], where C, = C(Bmi,)with eminbeing the minimum of all previous 0. B. Determination of Constraints of the Symmetry Group G,
Knowing the symmetry group Gp of a multipole, by Propositions 3 and 5, we can easily get its corresponding symmetry group G, of the M function for this multipole and express G, by its generating set. Hereafter, o(p) and T(gv) are abbreviated to p and gw, respectively. From the symmetry group G, of the M function for a multipole with control vector t = (t,,, .. ,E,), we can derive the constraint relations among the mth partial harmonic potentials aml, . . . , amw by applying Theorem 5 to elements of G,. Since the symmetry group G, is generated by its generating set, we need only consider all elements in the generating set. Applying Theorem 5 to these generators we get a set of constraint relations among
APPLICATIONS OF GROUP THEORY TO ELECTRON OPTICS
253
a",,,. . . ,amw.Then utilizing Eq. (29) we get the constraints of G, on all harmonic potentials of the multipole. Since for a given m, the Q m is as yet not determined completely by these constraints, it also depends on the controlling variables t', , . . . , [, so we can add more constraint conditions, for example, put am = 0 for some integer m. This will give some additional relations among E l , . . . ,.,[ C. Concrete Examples
The previous sections A and B present a rigorous and general method for deriving the constraint of the symmetry group Gp of a multipole on its harmonic potentials. In this section we give some concrete examples to illustrate the applications of this general method. The symmetries (but not the concrete configurations) and the basic half-plane 0 = 0 of the cylindrical coordinate system ( p , B , : ) are indicated in Fig. 1. However, the rotation senses on any circulations are not indicated in it, therefore the symmetry transformation R,v of a magnetostatic multipole seems to be the symmetry transformation R,v of an electrostatic multipole. In order to avoid confusion, by Propositions 3 and 5, we prefer to take the expression of the symmetry group G, of the M function of the multipole indicated in Fig. 1 . i l
52
I
i 3
254
YU LI
1. Deflectors All the M functions of these multipoles possess the symmetry NC2, so, by Eq. (39), we get -QmX ei2mn/2
-amX, -
m = 0 , 1 , 2 , 3, . . . ,
X = l , . . . ,w ,
(106)
which yields Qmx=O,
m = 0 , 2 , 4 , 6, . . . .
X = l , . . . ,w ,
(107)
Hence we have, by Eq. (29), i f m = 0 , 2 , 4 , 6,....
Qm=O,
(108)
2. Detail Analysis of Case d i n Fig. 1 The symmetry group of its M function is (Li, 1987) G , = [RI( 1 (3IN3 (24)NzN4,
R8 ( 12)(34)N3N41.
(109)
Hence by Theorem 5 , we have 6ml
= Qml 1
-
-am3
= @m3,
-
-@m2
6ml eimn/4 -5 m4 eimn/4
= Qm4r =Q -
m2 3
am33
where m E {0,1,2,.. .}. So, by Eq. (29), the mth harmonic potentials can be written as follows: @(2k) @(2k+l)
= O, = =
(El
{
El
+ E2ei(2k+ 1 ) n / 4 + J3ei(2k+1)"/2 + ( E 2 - t 4 ) COS
(2k
-
[4e-i(2k+l)a/4 )@(2k+l)l
+ 1). 4
where k = 0,1,2,. . . . This is the constraint relations of the harmonic potentials of this multipole. Its nonzero harmonic potentials depend linearly on the components of [. Thus we can set some relations among variables E l , Ez, t3,E4 such that some nonzero harmonic potentials satisfy new conditions. For example, if we wish that this multipole become a deflector of
APPLICATIONS OF GROUP THEORY TO ELECTRON OPTICS
255
high quality, we must further demand @3
( 1 12)
= 0.
It is easy to see that @31 is a nonzero function. Therefore Eqs. ( I 1 1 , 112) give additional constraint relations as follows
I
1
-G + -( E 2 + (4)
El - -((2 - E 4 ) = 01
v2
= 0.
( 1 13)
Thus we get also @* = 0,
+ i&)@I,.
(114)
= 2(E1
@I
This is an octopole with two-dimensional control vector
(El E3).
3. Detail Analysis of Case b in Fig. I The symmetry group G , of its M function is [ R I(1)(2)12,
R4( 12))
Hence, by Theorem 5, we have ~
@mi
= @mi
i
-
= am21
-am2 ei2mn/4
ml
-
@
m2 1
for m = 0, 1,2,3,. . . . The mth harmonic potential is
am
+
= < I @ ~ I €2@m2.
(117)
Consequently, we have if m = 0 , 2 , 4 , 6 , . . .
01
@,nl(
ifm=1,5, ... ifm=3,7, ...
( 1 18)
Let
I
= El
+ i121
(119)
then if m = I , 5 , . . . if m
= 3,7,. . .
which is the generalization of Eq. (A2) of Ximen and Li (1982).
256
YU L1
APPENDIX TO ALGEBRAIC RECONSTRUCTION TECHNIQUES’ APPLICATION In algebraic reconstruction techniques (ART) for electron microscopic images (Ximen and Kapp, 1986) three orthogonal projections for a threedimensional (3D) objects were used. According to the eigenvalue and eigenvector analysis in multivariate matrix theory, an observed projection image can be described by a data matrix and its variance-covariance matrix. The mass distribution is described by a n x n x n squared matrix: M = (mijk),
i , j , k = 1 , . . . ,n,
the symmetry group of which is denoted by G,. Let a rectangular coordinate system be ( x , y ,z), i, j, k be the unit vector along x , y , z axes, and x axis be the rotation axis of a 3D object in ART. The geometrical transformation groups for a 3D object are defined by Go, which keeps the x axis and its origin 0 invariant. Some special elements of Go are listed as follows (Hamermesh, 1962): Ci(cp):Rotation about axis i by angle ‘p. Ci,:Rotation about axis i defined as Ci, = Ci(27r/p), p # 0. and 0;: Reflection in a plane with the normal i. oV:Reflection in a plane with the normal v Ii. For a 3D object described by a n x n x n matrix ( m $ ) with its center at the origin 0 of the coordinate system (x, y , z ) the i,j , k projection operators P i , Pj, Pk are defined as follows:
For any data matrix F i t s variance-covariance matrix V F is the product of F and its transpose FT: VF
’ See Ximen and Li (1986).
=F~F.
APPLICATIONS OF G R O U P THEORY TO ELECTRON OPTICS
257
In ART we usually consider that the test object rotates about axis i by an angle p. Therefore, its 3D mass distribution matrix M , 2D projection P , M ( t = i,j, k ) and their variance-covariance 2D matrix V P , M ( t = i,j , k ) will be changed as follows: M
+
C,(Cp)M
P,M
+
P,C,(p)M
VP,M
+
VP,c,(p)M
I
( t = i,j,k)
It is shown that these matrices are periodic functions of variable p with periods closely related to the subgroup C = G, n Go. Confined to the case of finite point group, C may be taken as one of the following four cases (Harmermesh, 1962; Miller, 1972):
[C,,,l.
ICI,‘?0 1 1 7
lC,/t’a,,l,
“TICl(2p)Ir
where p E { 1 , 2 , 3 , .. .}. Here we have utilized the result that each element g of C is a transformation that keeps the .Y axis and its origin 0 invariant. Let the projection direction k be along the electron beam, and the test object be rotated by a series of angles p around axis i. We have the following conclusion (Ximen and Li, 1986). Let the period of VPkC,(cp)Mis a divisor of p , then, p = 2 n / p for case (1)-(3), and different from this, p = 7r/p for case (4). Furthermore the periods of the eigenvalue spectrum X,(p) and eigenvector system x,(cp) are also divisors of p . In principle, one can utilize these periodicities of the eigenvalue spectrum and eigenvector system to classify the orientation of the rotating test object.
REFERFNCES Arfken, G . (1985). “Mathematical Methods for Physicists,” 3rd ed.. Academic Press, Orlando, FL. Elliott, J. P., and Dauber, P. G . (1979). “Symmetry in Physics.” Vol. I , Macmillan, London. Glaser, W. (1 952). “Grundlagen der Electronenoptik,” 94, Springer-Veriag, Vienna. Hamermesh, M . (1962). “Group Theory and Its Application to Physical Problems,” AddisonWesley, Reading, MA. Ilyin. V. A,, and Poznyak, E. G . (1984). “Analytic Geometry,” Mir Publishers. Moscow. Jacobson, N. (1974). “Basic Algebra I,” W. H. Freeman. San Francisco. Jackson, J . D. (1975). “Classical Electrodynamics,” 2nd ed., John Wiley and Sons, New York. Li, Y. (1986). Optik 75, 8. Li, Y . (1987). Optik 76, 48. Li, Y. (1988). Acru Math. Scientiu 8, 131, 353. Li, Y . (1992). Acru Phys. Sinicu 41 (to be published).
258
YU LI
Li, Y., and Ximen, J. (1982). Acra Phys. Sinica 31, 604; U p f i k 61, 315. Miller, W., Jr. (1972). “Symmetry Groups and Their Applications.” Academic Press, New York. Ohiwa, H . (1985). Uptik 70, 72. Schouten, J. A. (1954). “Ricci-Calculus,” 2nd ed., 5, Springer-Verlag, Berlin. Ximen, J., and Kapp, 0. H. (1986). Upfik 72, 87, 143. Ximen, J., and Li, Y. (1982). A c f a Phys. Sinicu 31, 1617; Uptik 62, 287. Ximen, J., and Li, Y. (1986). Uptik 74, 27.
ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS, VOL. 85
Parallel Programming and Cray Computers R. H. PERROTT Department of Computer Science, Queen’s University, Beljast, United Kingdom
I. 11. 111. IV. V. VI. VII.
Introduction. . . . . . . . . . Approaches to Parallel Programming Implicit Parallelism . . . . . . . Explicit Parallelism . . . . . . . Cray Computers . . . . . . . . Parallel Computing Forum . . . . Summary. . . . . . . . . . . Acknowledgments. . . . . . . . Bibliography. . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
259 261 263 265 271 297 299 300 300
I . INTRODUCTION The development of sequential computers has been helped by the fact that the underlying hardware has essentially followed the same architectural model, known as the von Neumann model. The improvements in this model were caused primarily by advances in component technology: each improvement led to better performance in each generation of computers based on this model. Thus sequential computers have had a model of computation that formed a relatively stable base for the development of languages and tools. This is one of the main reasons for the widespread use of these tools and languages, however, the same cannot be said in the case of parallel machines. Parallelism has always been utilised in component technology but only in the 1970s did it become explicitly available in a machine architecture for the programmer; e.g., the Cray-1. It was not until the 1980s that commercially available parallel machines incorporating a wide variety of architectures were introduced; e.g., AMT distributed array processor, Convex, Alliant, Hypercub, Cray X-MP. It was the scientific community that first discovered the limitations of sequential machines in their applications. Applications such as weather forecasting were limited in their usefulness by the lack of sufficient processing power to deliver the results in a realistic time scale. The developments in component technology made it clear that the required increase in speed 259
Copyright 0 1993 by Academic Press, Inc. All nghts of reproduction in any form reserved. ISBN 0-12-014727-0
260
R. H.PERROTT
could never be obtained on sequential machines and that the only solution on offer was that of parallel computing. Hence, one of the main promises of parallel processing is that the speed up in the execution of an application would be substantial and that this speed up would increase as the amount of parallelism in the system increases. As a consequence users would be able to attempt to solve larger problems as the machine’s capabilities and functionality increases. Such improvements are possible only because of the economics of mass produced VLSI components. The production of such components increasingly favours parallel systems built mainly from cheap processors. Early results of experiments using parallel machines report a price-performance advantage in the range of 10 to 30 times better than traditional machines (Wadsworth, 1988). However, such comparisons usually ignore the cost of software, in particular, the effort involved in programming the parallel machine, which has been shown to be a nontrivial exercise requiring considerable skill and expertise. The developments in parallel software are not so far reaching nor so nearly well understood as the developments in parallel hardware. In the case of sequential computers the architectural model, the programming paradigms and the method of constructing algorithms all have a single objective. In the case of parallel computers there is at present no single architectural model to represent parallelism but rather a variety of different parallel architectures. The main issue affecting the architectural model is how to organise multiple processors to execute in parallel. One of the first models was that of an array processor - the SIMD model - where multiple processors execute the same instruction but on different data; the processors operate under the control of a single processor, which broadcasts the instructions to be executed. Array processors are particularly suited to problems involving matrices, and some impressive results have been achieved. However, the main criticism of this model is that there is little flexibility in the architecture for problems that could benefit from the execution of different instructions at the same time - the MIMD model. The earliest MIMD models were based on the shared memory concept, where all the processors are connected to the same memory. In this scenario the processors can execute different parts of an application concurrently, thus ideally reducing the time to execute the complete program. However, this model can lead to severe memory contention problems as the processors attempt to access the same data. There is some question as to whether this model will scale to larger orders of parallelism.
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
26 1
A more recent MIMD model is the distributed memory model, where each processor has its own local memory and processors communicate by passing messages. However, there is an overhead cost associated with such communication, which in many instances can be substantial. The amount of overhead is influenced by such factors as the distance between the two processors wishing to communicate and the interconnection topology. The distributed model is scalable to greater orders of parallelism than that currently implemented. In the case of parallel software the choice of programming language is no longer confined to a single approach. The main division of these languages is into either imperative or declarative languages. The declarative group can be further divided into logic and functional languages while the imperative group consist of procedural and object-oriented languages. All the various languages that have been proposed offer some different way of capitalising on the power of parallel machines. To date it is not clear if any one approach is substantially better than any other as enough experience has not yet been accumulated. In many cases the concepts have not been efficiently implemented on parallel machines. In addition there is a considerable lack of tools to assist in all aspects of parallel programming and debugging. One consideration, which was perhaps not so important with sequential languages, is the ability to prove a program correct. This is becoming increasingly important as parallel machines are applied to more crucial aspects of human applications. However, the criteria for judging a language’s design that were established for sequential languages are still valid; criteria such as readability, simplicity, efficiency and expressiveness. The third important aspect of programming parallel systems is the choice of algorithm. Studies have shown that transfering an efficient sequential algorithm to a parallel machine results in an inefficient parallel algorithm. It is now apparent that the design and construction of a new parallel algorithm for a particular application area can produce major performance improvements. Hence, in the case of parallel systems there are three important and contributing factors; namely, the architectural model, the programming language and the choice of algorithm. The following sections concentrate on the programming language.
11. APPROACHES TO PARALLEL PROGRAMMING
Essentially, three main methods have been used to promote the wider use of parallel processing:
262
R. H . PERROTT
(i) Extend an existing sequential language with features to represent parallelism. The advantage of extensions is that existing software can be transferred to a new parallel machine with relative cases. This is possible because programmers are already trained in the base language and can introduce the extensions gradually as they become more familiar with the situation in which they should be used and the effect they produce. However, experience to date has shown that extension languages have been limited to a certain range of hardware and to machines with a small number of processors. Problems have also been reported in the debugging of programs written in such languages, as the interaction of the sequential and the parallel features can give rise to difficulties in detecting errors. A more general problem is that many of these extensions have been developed by different groups using the same language base, which has led to nonstandard variants of the same language being produced, making the production of a standard for such languages difficult. (ii) Implicit: use a sequential language but rely on the compiler to detect which parts of the program can be executed in parallel. Most of the work in this area is based on FORTRAN and examines the DO loops of the program to determine if it is possible to spread the iterations of the loop across different processors. The advantage of such an approach is that existing sequential programs can be moved relatively inexpensively and quickly to the parallel machine. This can represent a substantial saving in development costs and is an attractive proposition for many purchasers of a new parallel machine. However, it is rare that the full parallelism of the program is exploited without the help of a programmer to restructure the program; this usually requires a reorganisation of the loops of the program so that the automatic detection techniques will work. In the case of the construction of new programs it is advisable that a programmer have some knowledge of the detection techniques if as much parallelism as possible is to be detected. This represents a diversion for a programmer from the main task of program construction. In addition, such an approach inhibits the development of parallel languages and algorithms as it is confined to a sequential notation. (iii) Develop a new parallel language. In this case a completely new parallel language is developed, ignoring all existing languages and applications. The main advantage of this approach is that a coherent approach to parallelism is presented. The parallel notation will enable a user to express directly the parallelism in an application and, in addition, will
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
263
assist with the development of new parallel algorithms. However, it does mean that a user will have to rebuild the entire software base in the new language, which is a labour intensive, expensiveand perhaps an error prone exercise. All existing applications are ignored, which requires courage on the part of the management of large installations, particularly since many new languages have not had the property of longevity. 111. IMPLICIT PARALLELISM Recent years have demonstrated that parallel processors are now a viable and commercially successful product and that it is the software for these machines which is lagging behind and causing the most difficulties. The highly successful tactic of Cray in the 1970s, of providing a FORTRAN engine, a machine that would take existing FORTRAN programs and detect which parts could be automatically vectorised, is the objective for the newer breed of multiprocessors only now based on parallelism. The origins of many of these systems can be traced to the research of David Kuck at the University of Illinois on vectorisation technique. This research has been extended to incorporate the situation where many processors, possibly vector processors, share the same memory. The model used for most shared memory machines is very similar in nature. However there may be differences in how the processors and memory interact; for example, in the Cray X-MP the connections between processors and memory are direct while on the Alliant and Convex a bus is used to connect memory and processors. The latter machines incorporate a data cache memory to provide acceptable access times, but it is not significant enough to disrupt their classification as shared memory machines. Such factors are not relevant as far as the programming of these machines is concerned. In general, the main tactic of parallelisation systems is to examine nested DO loops, with the object of vectorising the innermost loop and parallelising the outermost loop. The methods rely on data dependence analysis techniques that determine the flow of data in a program. This, in turn, enables statements to be identified that can be executed in parallel. Data dependence analysis is the cornerstone on which all automatic parallelism detection methods are built; the quality of a paralleliser is directly related to the quality of the dependence analyser. Currently techniques are available for nested DO loops but have not yet been commercially applied to complete programs. This requires full interprocedural analysis - the tracking of data across procedure calls - to be performed on a user program. Once a compiler uses interprocedural information as a basis for compiling time decisions, data dependencies between
264
R. H . PERROTT
procedures in a program can be resolved. The systems at Rice University and IBM provide a limited form of interprocedural analysis. There are certain parallel programming situations that can be automatically parallelised without any user intervention. The most straightforward situation consists of loops with no data dependency between the iterations. In this case the iterations can be assigned to the processors either individually or in groups depending on the scheduling algorithms. In some systems the programmer can decide. In other situations if there is a possibility of a data dependency the compiler takes a conservative view, which usually means that no parallelisation is attempted. The burden is then placed on the programmer to decide if the compiler’s decision should be overridden; this is achieved by a user inserting compiler directives into the program code. This is particularly the case in situations where interprocedural analysis is required as most existing systems are not capable of performing this analysis. This is not always an easy decision and can require a considerable level of skill on behalf of the programmer. To help with parallelisation several manufacturers have incorporated into their hardware special features to handle synchronisation of the processors. This, in turn, can be used when processing, in parallel, different iterations of a DO loop that has data dependencies. For example, the Alliant machines have a concurrency control bus that is used to reduce the overhead involved in processor synchronisation and can be utilised in loops with data dependencies. The success of vectorisation in the 1970s has raised the expectations of the scientific programming community so that they now expect (if not demand) compilers that will automatically translate a sequential program for efficient execution on a parallel computer. This has meant that users are less tolerant towards new languages or language extensions. The experiences of Cray as reported in the fifth section of this paper would seem to confirm this. They started with vectorisation then introduced macrotasking wherre the user had explicit tasks and synchronisation features; due to the amount of code restructuring required, these features were not well received by users. Their next step was the introduction of microtasking where the user had to incorporate compiler directives throughout the program; the response from the users was better but not enthusiastic. Finally came autotasking, which required no user effort. If this is the general trend or a reflection of the state of expertise of the general user community then it is quite depressing. To summarise, the work on parallelisation currently taking place in the United States is strongly based on the shared memory model. Parallelism is still a relatively unexplored research area; it will be many years before compilers can capture all the parallelism in a sequential program. The data dependence analyser is responsible for providing information about data
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
265
flow in order that a program can be restructured to find blocks that can execute in parallel - this must be done within the constraint of correctness as defined by the original sequential program. The approach is based essentially on vectorisation of the inner loop and parallelisation of the outer loop; in certain circumstances such situations can be handled automatically without user intervention. for other situations the user has the option of introducing compiler directives to force parallelisation when certain that data dependencies are not present or relevant. Different systems provided different levels of parallelisation, with many of the techniques directly dependent for their efficient implementation on additional hardware. However, there is no consistency in what current systems can d o or the notation they use for doing it. The mechanism used to introduce or force parallelisation is through a FORTRAN comment; however, each implementation has chosen its own notation and range of situations that can be handled. In addition, there are usually a few parallel language features that the programmer can employ. There is no consistency in the syntax of these features, and their use reduces the possibility of movement of programs between machines. The main research laboratories that have been active in this area and the systems that they have produced are the following: Cray Research with CFT (Minnesota); University of Illinois with Parafrase (Illinois); IBM with Ptran (New York); Rice University with the PFC systems (Texas); Superb as part of the Suprenum project (Europe); Velour at Honeywell Bull (Europe). In addition, many computer manufactures such as Convex and Alliant have produced systems for their particular machine.
IV. EXPLICIT PARALLELISM In the case of languages for multiple processor systems the imperative approach has received the most attention. This arose out of the work carried out on operating systems in the early 1960s. At that time programmers were designing programs to control and coordinate the many independent activities in an operating system. It is this work that laid many of the foundations for this type of parallel programming language. The term process, or more recently task, is used to describe a sequence of program instructions that can be performed in parallel with other groups of program instructions. A program can therefore be represented as a number
266
R. H. PERROTT
of processes that can be executing concurrently. The point at which a processor is withdrawn from one process and given to another is dependent on the progress of the processes and the algorithm used to assign the available processor(s). The net effect is that processes are capable of interacting in a time dependent manner. Thus, in a concurrent programming environment, a programmer requires not only program and data structures similar to those required in a sequential programming environment but also mechanisms to control the interaction of the processes - processes that are proceeding at fixed but unknown rates. The situations in which processes interact can be divided into two categories. The first situation occurs whenever several processes wish access to a resource at the same time. For example, when several processes wish to update a shared variable, only one process must succeed in gaining access to the resource at any time. Once a process has obtained the resource it must be able to use the resource without interference from the other competing processes; this is referred to as mutual exclusion. The second situation occurs whenever porcesses are cooperating; they must be correctly synchronised with respect to each other’s activities. For example, when one process requires a result not yet produced by another process, the first process must be able to wait for the second process and the second process must take the responsibility of resuming the first process when it arrives with the result. The processes are therefore scheduling one another and are aware of each other’s existence and purpose; this is referred to as conditional synchronisation. The methods that have been used for solving these problems have evolved over the years. Originally they were at a low level and involved instructions like test and set, where two operations could be performed without interruption. In the early 1960s Dijkstra introduced the semaphore based on the railway signalling system. Although sufficient to solve the problems, they could lead to programming situations that were difficult to understand and complex in nature. Experience and research produced a series of solutions such as critical regions and conditional critical regions, finally resulting in the monitor (Perrott, 1987). The monitor was first proposed at a conference at Queen’s University, Belfast, United Kingdom, in 1972 and can be regarded as the state of the art in shared memory concepts for parallel programming. The monitor was influenced by the class concept of Simula. In summary the main problems that must be catered for are as follows: (i) The identification of processes (or tasks) that represent activities that can take place in parallel; (ii) The sharing of variables among a number of competing processes
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
267
such that the shared variables can be manipulated without interference by other processes; (iii) The synchronisation of processes that wish to co-operate with each other to their mutual benefit, which requires the processes to be ordered with respect to time. Monitor. A monitor defines a shared data structure and all the operations that can be performed on it. These operations are defined by the procedures or functions of the monitor. In addition, a monitor defines an initialisation operation that is executed when its data structure is created. In general, a process can access the shared data of a monitor by calling one of the monitor's procedures. If there is more than one call then only one of the calling processes is allowed to succeed in entering the monitor at any time; this is to guarantee that the data of the monitor is accessed exclusively. Only when a process exits the monitor is it possible for one of the calling processes that was delayed to enter the monitor. It is also possible for a process to enter the monitor and discover that the information it requires has not yet arrived. In such a situation, it can join a queue associated with that condition and thereby release its exclusive access over the monitor, after which another process is now able to enter the monitor. Eventually, a process may enter the monitor and enable a suspended process to continue. The queues within a monitor are usually identified by condition variables, and a process can append itself to a single condition variable queue by executing a wait operation. Another process executing a signal operation on a condition variable queue will cause a process delayed on that queue (if there is one) to be resumed. These features are illustrated in the following code, which shares a single resource among a number of processes. monitor RESOURCE; var
FREE : BOOLEAN;
instance
BUSY : CONDITION; procedure * ACqUIRE ; begin if not FREE then BUSY.WAIT ;
FREE := FALSE end (* acquire *); procedure 'RELEASE ; begin
FREE := TRUE;
268
R. H. PERROTT
BUSY.SIGNAL end (* release *) ; begin
FREE := TRUE ; end (* resource *) ; process PRODUCER; (* local data *) begin
(* statements *) RESOURCE.ACQUIRE (* use resource * RESOURCE.RELEASE (* statements *) end; instance BEES : array [I
. . N]
of PRODUCER;
The last declaration causes N producer processes to be created with a lifestyle defined by the preceding code. The N processes operate in parallel and independently, making calls to the monitor whenever they wish to acquire and release a resource. The condition acts as the means of synchronisation between the processes; if the resource is being used a process waits on the busy queue, when a process finishes using the resource it signals the queue for a waiting process to continue. Several languages have been developed using this technique. For example, Concurrent Pascal (Brinch Hansen, 1974), Modula (Wirth, 1977) and Pascal Plus (Welsh and Bustard, 1979); all these languages are based on the monitor plus condition variable approach. However, this technique has not gained widespread use despite having been well tried and tested in academic projects. In addition, much formal work has been performed on the concept giving it a firm theoretical basis. Many of the multiprocessor machines that appeared in the early 1980s do subscribe to the model of parallel computing as represented by the monitor, such as the Alliant and Convex models, although their programming languages are not based on the monitor. It does appear that the monitor is well suited for implementation on a shared memory machine. Message Passing. A second technique that can be used to solve the problems associated with mutual exclusion and conditional synchronisation is based on message passing. The origin of this technique can be traced to the work carried out on coroutines in 1963 (Conway, 1963). In most languages the procedure structure is hierarchical, in that procedures
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
269
are subservient to each other when called in the course of program execution. It was felt there was a need for routines that were at the same level so that a symmetrical rather than a hierarchical relationship could be established. In 1975 Dijkstra introduced, the concept of a guarded command. A guarded command is simply a guard, which is a Boolean expression, followed by a list of statements. If the guard is true the associated statements can be executed, otherwise they cannot. I t is possible to group several of these guarded commands together to form selection and repetitive constructs. A significant introduction at this time was the idea of nondeterminism; in other words, given a choice of several true guards one is picked at random. This was meant to reflect the way in which real time events occur; it is not possible to predict their order, and therefore this should be incorporated into any programming notation. It was Hoare (1978) who incorporated these ideas into the notation known as communicaring sequential processes (CSP). The essential idea is that synchronisation is set up by message passing using input-output commands. Effectively an output command in a sender process must specify the destination process. An input command in a receiver process must specify the source process, and the parameter lists must match. Only under these conditions is it possible for two processes to communicate. Hoare took this idea and combined it with Dijkstra’s guarded commands by enabling an input command to occur in a guarded command. One significant feature of Hoare’s processes is that they must name each other in order to communicate; there is a symmetrical relationship between the processes. This was felt to be restrictive, and Brinch Hansen (1978) proposed a notation known as distributed processes (DP) in which the process that is called does not need to know the name of the process that called it. This would seem to be reasonable in an environment, for example, where a library facility was being called by many processes. It is not necessary for the library process to know which process is calling it. This amounts to an asymmetric relationship between the calling and called processes. This technique of message passing has been adopted in languages such as Ada (1983) and Occam (Inmos, 1984). Ada was designed by a team led by Jean Ichbiah of France. The contract to design the language was with the U.S. Department of Defense and was won against competition from other organisations around the world. Of particular interest is their method of process communication. In Ada, for example, the tasks first synchronise their activities and then communicate directly without the help of an intermediate data structure. Hence one process may have to wait for another process to arrive, when it does the processes can exchange messages directly; such an encounter is referred to as a rendezvous. In addition, nondetermin-
270
R. H. PERROTT
istic selection has been introduced so that given a choice between several alternatives one is chosen at random. Hence, with this technique, all communication is explicit through the transfer of values; there is no reading and writing of shared variables and no explicit queues have to be manipulated by a user. There is a series of rules that must be obeyed in the situation when two processes wish to communicate. first, the guards of the called task must be evaluated, and those that are ready (true) have the possibility of communicating. If there is ever a situation where there is a choice of several ready guards one is chosen at random. To illustrate these features the problem of sharing a single resource among a number of competing processes is expressed in Ada. Each task (process) in Ada is expressed in two parts: (i) A specification part, which is the interface presented to the other tasks and may contain entry specifications, that is, a list of the services provided by the task; (ii) The task body, which contains a sequence of statements to be executed when any of its services are requested.
InAda the resource must be represented as a task, and the calls accepted by this task are those listed in the specification part, in this case calls to ACQUIRE or RELEASE. The statements of the task body following the reserved word accept are executed whenever another task calls the appropriate entry procedure and that call is accepted by the call task. Thus whichever task reaches its communication statement first must wait for the other task, that is, either the calling task calls an entry procedure such as RESOURCE.ACQUIRE or the called task reaches the appropriate accept statement, in this case, accept ACQUIRE do. The body of the task RESOURCE has been set up as an infinite loop that contains a select statement with two limbs. Each time the select statement is encountered either one of the limbs is selected arbitrarily. The first limb has a condition attached in the form of a when clause; this limb cannot be selected unless the condition is true in this case the resource is available. task RESOURCE is - specification entry ACQUIRE; entry REMOVE;
endRESOURCE; task body RESOURCE is - body begin loop
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
27 1
select when FREE=> accept ACQUIRE do FREE:=FALSE; end; or accept RELEASE do FREE:=TRUE; end ; endselect; end loop; end resource; task type PRODUCER is - specification end; task type PRODUCER is begin - statements RESOURCE.ACQUIRE; - use resource RESOURCE.RELEASE; - statements end ; BEES: array (1 . .N) of PRODUCER;
Ada has been in existence for many years, and compilers are commercially available. The language is required to be reviewed every 10 years, and extensions are currently being discussed.
V. CRAY COMPUTERS Since the early 1970s Cray Research has been designing machines that have dominated the supercomputer market. The machines have ranged from the first commercially successful supercomputer, the Cray- 1, through the Cray X-MP and Cray-2 to the latest machine like the Cray Y-MP and the recently launched C-90. The Cray-1 was based on the principle of pipelining, where individual operations were divided into a number of stages that could operate in parallel. This is referred to as a functional unit. In order for a unit to
272
R. H. PERROTT
operate efficiently it is necessary to input two vectors of operands. The concurrent operation of the stages then enables the data to be processed in parallel. It was also possible by means of the concept of chaining to use the output of one functional unit as the input of another functional unit, further increasing the amount of concurrent operation in the system. In the Cray-1 there were a total of 13 functional units, split into four groups: vector, floating point, scalar and address units. The data to be processed was held in shared memory and had to pass through the intermediate registers before reaching the appropriate functional unit. For example, there are eight vector registers, each of which holds 64 operands. These registers provide the input operands that are fed into a vector functional unit and where the results are to be placed after processing. The operands must also be moved between the main memory and the registers before and after processing. The intermediate registers reduce the time of operation of the functional units compared to a system that operates directly on memory. The preceding architecture is the basic model of operation that has been carried forward into successive machines in the Cray range; namely, a shared memory model with a large number of intermediate registers placed between the memory and the processing power. The Cray Y-MP consists of eight central processing units, all of which may operate simultaneously on a single job or on eight separate jobs. Each processor has a computation section composed of operating registers, functional units and an instruction control network. The instruction control network makes all decisions relating to instruction issues as well as coordinating the three types of processing available: vector, scalar and address. As in the Cray-1 each of these modes has its associated registers and functional units. An interprocessor communications section coordinates processing between the processors and the shared central memory. This memory consists of 32 million 64-bit words of directly addressable central memory organised into 256 banks. The eight processors share the central memory, which is organised into interleaved sections, subsections and banks that can be accessed independently and in parallel during each machine clock period. The large number of memory banks greatly reduces memory contention. The Cray Y-MP, unlike earlier machines, includes instructions for the efficient manipulation of randomly distributed data elements and conditional vector operations. Gather-scatter instructions allow for the vectorisation of randomly organised data, and the compressed index instruction allows for the vectorisation of unpredictable conditional operations. These were problems on earlier machines, which have now been incorporated in the instruction set, resulting in improved performance.
PARALLEL PROGRAMMING AND CRAY COMPUTERS
273
The Cray Y-MP, therefore provides a multiprocessor environment that enables the user to improve turnaround time using multiprocessing and multitasking techniques. The Cray definition of these terms is as follows: multiprocessing enables several programs to be executed concurrently on multiple processors of a single mainframe; multitasking allows two or more parallel execution segments of a program to be executed while sharing a common memory space. All processors are identical and symmetrical in their programming function. When executing in multitasking mode any number of processors can be assigned to perform multiple tasks of a single job. These are the main characteristics of the architecture of the Cray machines that a user would be advised to be familiar with before using such a machine. 5.1. Operating Systems and Languages
There are two operating systems that can be used on the later Cray machines: (i) COS, which is the original operating system provided by Cray; (ii) UNICOS, which is an operating system based primarily on the AT&T Unix System V operating system. UNICOS offers both an interactive and a batch environment; it provides features that deal with multiprocessing and multitasking. Like UNIX it is written in C language, and contains a small kernel that is accessed through system calls and a set of utilities and library programs. The plan is that UNICOS will eventually become the main operating system of the Cray range of machines. On the original Cray-1 a vectorising FORTRAN compiler C(ray) F(0RTRAN) T(rans1ator) was provided, which was successful in taking “dusty deck” FORTRAN programs and detecting which parts could be executed on the vector functional units. It automatically vectorised inner DO loops and provided scalar optimisation without sacrificing high compilation rates. The library includes highly optimised scientific subroutines that enable the user to take maximum advantage of the hardware. For example, a conditional or GOT0 statement with a DO loop causes problems for a vectorising compiler since this would require the functional unit to deal with two different data paths necessitated by the logical branch. To allow for conditional statements within a DO loop special library utility procedures are provided. The following code illustrates the use of one of these library routines:
274
R. H.PERROTT
DO 3 I = 1,200 X(1) = A(1) IF (B(I).GT.C(I) X(1) 3 CONTINUE
=
D(1)
should be rewritten as
3
DO 3 I = 1,200 X(I) = CVMGM (D(I), A(I), c(I)-B(I)) CONTINUE
to allow vectorisation. The vector procedure CVMGM is one of several procedures that allow vectorisation to continue; the parameters determine when the updates are performed, in this example they choose the first parameter if C(1)-B(1) is negative, otherwise the second parameter. Such subroutines are nonstandard FORTRAN subroutines and if introduced to the program would lead to problems of nonportability between different machines. This approach to detection of parallelism has been expanded on subsequent machines and with the advent of their multiprocessor machines has led to the introduction of additional language features. The latest FORTRAN compiler is CFT77, which complies with the ANSI standard 3.90-1978 version of the language and offers a high degree of automatic scalar and vector optimisation. It is claimed that users can program using the syntax of standard FORTRAN and have access to the full power of the Cray Y-MP system architecture. As stated previously the later Cray machines offer a multiprocessing capability in conjunction with vectorisation. This enables an application program to be partitioned into independent tasks that can execute in parallel and should result in substantial throughput improvements over serially executed programs. Three methods of multiprocessing can be used:
(i) Macrotasking (multitaskinig), which is best suited to programs with larger, longer-running tasks. This is introduced by means of a set of FORTRAN callable subroutines that explicitly define and synchronise tasks at the subroutine level; (ii) Microtasking, which breaks the code into small units that can be executed in parallel on multiple processors. It uses a preprocessor to allow programmers to multiprocess the low level parallelism found in codes. Extremely fast synchronisation allows microtasking’s self-scheduling algorithm to make effective use of any available processor cycles, providing the load balancing of the system; (iii) Automatic multitasking or autotasking, which partitions a program
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
275
into tasks without user intervention and uses whichever processors are available at any point during execution. These methods of multiprocessing are considered in detail in the following sections with particular reference to the Cray programming manual. In addition to the FORTRAN compilers Pascal is provided and complies with I S 0 level 1 standard but includes extensions such as separate compilation of modules, imported and exported variables and an array syntax. It also provides scalar optimisation and automatic vectorisation of FOR loops. The Pascal compiler provides access to FORTRAN common block variables and uses a common calling sequence that allows Pascal code to call FORTRAN, C and CAL (assembly language) routines. Language C is also provided and performs scalar optimisations and vectorises code automatically. An Ada compiler has been developed under contract by Telesoft, currently release 1.0 is available on Cray X-MP machines under Unicos. Shortly version 2.0 of the compiler will be released on Y-MP machines. Lisp is being developed as a result of requests from users. However, for the purposes of this chapter further examination of the FORTRAN variants will be considered to demonstrate the various approaches to parallel programming available on Cray machines. 5.2. Macrotasking
Both the CFT and CFT77 compilers automatically perform vectorisation across inner DO loops without user interaction; however, in order that the maximum amount of vectorisation can be performed it may be necessary for a user to modify the code of these loops to remove interdependences. The modifications for macrotasking involve larger sections of code than vectorisation since macrotasking can enhance the performance of outer DO loops containing already vectorised code. As stated earlier macrotasking is the process of dividing a program into tasks, which may then be executed simultaneously on several processors. In order for this to be achieved macrotasking requires the user to identify explicitly the tasks that can be executed in parallel. In this way multiple tasks that share a common memory can execute simultaneously on different processors. A single task consists of code and associated data, which can be executed on a processor. A task must be explicitly created by a call to the multitasking library routine TSKSTART, and a program may wait for the completion of a task by using the library routine TSKWAIT. Both the called, and the calling tasks execute in parallel.
276
R. H. PERROTT
In a macrotasked program, the main program is the initial task and can be considered as the task that initialises the global variables, performs the processing that cannot be macrotasked and initiates the subsidiary program tasks. Macrotasking is supported only for subprograms, and the tasks and the data structures must be arranged such that the tasks can run in parallel. As macrotasking is nondeterministic with respect to time, but must be deterministic with respect to results, communication and synchronisation mechanisms need to be added between parallel tasks to provide for the protection of shared data. Critical regions must be protected by locks (flags) to indicate that this data is shared; this will work only if all other tasks check the lock before they enter a corresponding critical region. There are three forms of synchronisation mechanism available on the Cray:
0
0
Tasks can wait for events, post events (that others may be waiting for) and clear events; Locks can be used by a signalling task (by using the standard subroutine LOCKOFF) and a waiting task (by using the standard subroutine LOCKON); A task can wait for another task to complete execution.
The language provides a complete range of standard routines to initiate task execution, control critical regions and set and clear event flags. In macrotasking any data appearing in a COMMON statement or appearing in a SAVE statement or passed to a subtask in a calling sequence is known to all tasks. All other data is local; that is, each task has its own copy of the data. Task COMMON enables subroutines in a task to share data that is local to the task. Improper data sharing can introduce bugs that are difficult to locate these are usually due to time dependent behaviour among the interacting tasks. 5.2.1 Declaration of Tasks A task is defined as starting execution at a FORTRAN entry point (typically a subroutine), and it can call other subroutines during its execution. It completes execution when it executes a RETURN statement in the subroutine where it began execution. Macrotasking within a program occurs as soon as the program explicitly creates a task through a call to the macrotasking library routine TSKSTART. A program may wait on the completion of a task by using the library routine TSKWAIT. The syntax of these functions is as follows: CALL TSKSTART (TASKID, SUBNAME [ ,ARGS] )
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
277
creates a task with identification TASKID and entry point SUBNAME [,ARGSl where SUBNAME is typically a subroutine and [,ARGSI is an optional list of arguments to the subroutine. CALL TSKWAIT (TASKID) suspends the calling task until the task with identification TASKID has completed. An integer task control array, which must be constructed by the user program, represents each user-created task. The array can be either two or three words in length. The value of the first element of the array is the length, while the second element is used by the macrotasking library for identification purposes; throughout the following programs the length is set to 2. The use of these routines is illustrated by the following example, where the elements of a 100-element array are incremented by using two tasks:
INTEGER ARRAY( loo), TASKl (2), TASK2 (2) COMMON/MEM/ARRAY,TASKl,TASK2 TASKl (1) = 2 TASK2 (1) = 2 CALL TSKSTART (TASKI,INC , i , 50) CALL TSKSTART (TASK2, INC, 51, 100) CALL TSKWAIT (TASK1) CALL TSKWAIT (TASK2)
... STOP END SUBROUTINE INC (FIRST,LAST) INTEGER FIRST,LAST,ARRAY (100) COMMON/MEM/ARRAY,TASKl,TASK2 DO 10 I = FIRST,LAST ARRAY(I) = ARRAY(I) + 1 10 CONTINUE RETURN END The array is declared globally to the tasks by placing it in a block of common memory labelled MEM. The tasks may then gain access to the array by referring to this block of memory. Each task, upon being called, starts execution of the subroutine INC using the parameters FIRST and LAST passed to it in the TSKSTART call. Thus TASKl increments each element of the array from position 1 to position 50, while TASK2 simulta-
278
R. H. PERROTT
neously increments each element from 51 to 100. The main calling task is suspended until both tasks have completed. In this way it is hoped that the updating is performed in half the time. 5.2.2 Locks
In order to protect data that has to be shared a user can use the LOCK routines to create a critical section of code. In CFT critical regions of code are enclosed by means of the lock mechanism. When a task enters a critical region of code it sets a LOCK, this LOCK acts as a signal that the critical region is in use. Any other task attempting to execute the critical region is suspended until the LOCK is cleared by a task exiting the critical region. Thus only one task at a time may access this code thus guaranteeing mutual exclusion. The syntax of the library routines concerning LOCKS is as follows:
CALL LOCKASGN (LOCK) identifies an integer variable LOCK that the program intends to use as a lock; CALL LOCKON (LOCK) suspends the calling task until the status of the lock variable LOCK is unlocked, then changes the status to locked; CALL LOCKOFF (LOCK) changes the status of the lock variable LOCK to unlocked. Tasks waiting for a lock variable to become unlocked are suspended on a queue in the order of their arrival. When the lock status is changed to unlocked the task at the head of the queue is reactivated and the lock status is changed back to locked. Hence by means of this mechanism mutual exclusion can be provided among competing tasks. For example, in the situation where two tasks wish to increment a variable X stored in common memory,
INTEGER X, LOCK COMMON/MEM/X, LOCK CALL LOCKASGN (LOCK) CALL TSKSTART TASK^ , INC) CALL TSKSTART (TASK2, INC) CALL TSKWAIT (TASKI) CALL TSKWAIT (TASK2)
... STOP END
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
279
SUBROUTINE INC INTEGER X, LOCK COMMON/MEM/X, LOCK I0 CONTINUE CALL LOCKON(L0CK)
x = x+1 CALL LOCKOFF(LOCK) GOT0 10 RETURN END Since both tasks are operating in parallel the lock mechanism ensures that only one task at any time may alter the value of the variable. The lock variable LOCK must be placed in common memory in order that the two tasks may have access to it. When one task is executing the critical region no other task may enter this region. If the other task tries to enter it is suspended until the first task has exited the critical region and changed the status of the lock variable LOCK to unlocked. Thus mutual exclusion is achieved. In addition to monitoring critical regions of code the lock mechanism may be used to synchronise the execution of tasks. A waiting task may be suspended by calling LOCKON and subsequently reactivated at the appropriate time by the signalling task calling LOCKOFF. Locks can be used to guard access to a critical region of user code that normally comprises several statements; however, standard routines are available to enable operations to be performed under the protection of a hardware semaphore. These routines are considerably faster than surrounding a single statement with calls to LOCKON and LOCKOFF. Examples of these routines and the functions they perform are ISELFSCH (X) function returning X + 1 ISELFADD (X, Y) function returning X + Y ICRITADD (X, Y) subroutine setting X = Y Y ISELFMUL (X, Y) function returning X * Y ICRITMUL (X, Y) subroutine setting X = X * Y
+
These routines are applicable only to integer variables. Corresponding routines for real values are available for the latter four listed routines.
280
R. H. PERROTT
5.2.3 Events
Another method of synchronising the action of tasks running in parallel is provided by the EVENT mechanism. EVENTs allow for the signalling between tasks and have two states, cleared and posted. An EVENT is represented by an integer variable, and the syntax of the library routines concerning EVENTs is as follows: CALL EVASGN (EVENT) identifies an integer variable EVENT that the program wishes to use as an event; CALL EVWAIT (EVENT) suspends the calling task until the status of the event variable EVENT is posted; CALL EVPOST (EVENT) changes the status of the event variable EVENT to posted; CALL EVCLEAR (EVENT) changes the status of the event variable EVENT to cleared. When an event variable is posted all the tasks waiting for that event are reactivated. By means of these routines it is possible for a number of tasks to cooperate to their mutual benefit. An example of the use of the EVENT mechanism follows. Two tasks are created in the main body of the program: one task initialises the elements of an integer array to a certain value, the other task calculates the sum of the elements. The operations of the two tasks must not overlap. The use of the EVENT mechanism ensures that only one task has access to the array at any time. INTEGER ARRAY( 100), EVENTl , EVENT2 COMMON/MEM/ARRAY, EVENTl, EVENT2 CALL EVASGN (EVENTI) CALL EVASGN (EVENT2) CALL TSKSTART (TASK1, INITLSE, 1) CALL TSKSTART (TASK2, ADDARRAY, 2)
...
STOP END SUBROUTINE INITLSE (K) INTEGER ARRAY( 100), EVENTl, EVENT2 COMMON/MEM/ARRAY, EVENTl, EVENT2 M=K 1 CONTINUE
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
28 1
M=M K DO 10 J = 1, 100 ARRAY(J) = M J 10 CONTINUE CALL EVPOST (EVENTl) CALL EVWAIT (EVENT2) CALL EVCLEAR (EVENT2)
... GOTO 1 RETURN END SUBROUTINE ADDARRAY (N) INTEGER ARRAY( loo), EVENTl , EVENT2 COMMON/MEM/ARRAY, EVENTl, EVENT2 2 CONTINUE CALL EVWAIT (EVENTl) CALL EVCLEAR (EVENT1) TOTAL = 0 DO 20 I = 1, 100 TOTAL = (TOTAL * I) + ARRAY910 N) 20 CONTINUE CALL EVPOST (EVENT2)
... GOT02 RETURN END In the preceding example TASKl allocates values to the elements of the array and TASK2 perfoms a calculation on these values. TASK2 is suspended on an event queue by means of the CALL EVWAIT(EVENT1) statement until initialisation is complete. TASK 1 then reactivates TASK2 by posting the event variable EVENTl ,it is then suspended on event variable EVENT2 while the array is accessed. TASK2 reactivates TASKl by posting EVENT2 and is itself suspended again on EVENTl. This process is repeated. Other subtleties affect the use of events. There is no problem if an event is used only once, but if it is used repeatedly, for example, in a DO loop, a second event is needed to reset the first event to its initial state. The first event is, in turn, needed to reset the second event to its initial state. If the second event is not used a race condition results. Case Study: The Spaghetti Eaters. Most examples of the use of these facilities tend to be numerically based. The following example is one of the
282
R. H. PERROTT
FIGURE 1. A representation of the Dining Philosophers.
classic problems of parallel programming; it is used to illustrate the problems of mutual exclusion and synchronisation. The problem is known as the Dining Philosophers or Spaghetti Eaters and can be stated as follows (see Fig. 1). Five philosophers (labelled as A to E ) spend their lives either thinking or eating. Each philosopher has his own place at a circular table in the centre of which is a large and continually replenished bowl of spaghetti. Each philosopher helps himself from the bowl of spaghetti when seated at his own place. To eat the spaghetti a philosopher requires two forks but only five forks are provided, one between each pair of plates. A philosopher can pick up only the fork on his immediate right or the one on his immediate left; e.g., philosopher A can pick up only forks 0 and 1. Since a philosopher requires two forks to eat the spaghetti, no two adjacent philosophers can be eating at the same time. If philosophers were allowed to pick up their forks one at a time a situation could arise where each philosopher held one fork and was waiting for a neighbour to release his second fork. This scenario is known as deadlock; one process waiting indefinitely for a resource held by another process. Making the acquisition of the two forks dependent on both forks being available prevents this possibility. Starvation cannot occur if queues are organised and serviced according to the order of arrival of requests.
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
283
A solution can be formulated by identifying the philosophers as tasks and the forks as the shared resources. The problem is then to organise a means of cooperation for the processes such that deadlock and starvation are avoided and the resources are well utilised. The features of CFT are now applied to the Spaghetti Eaters Problem in order to evaluate their expressiveness., clarity and ease of use in the construction of a solution. The philosophers are represented by five concurrently operating tasks P1 to P5, and the number of forks available to a philosopher is recorded in a corresponding array element FORKCNT. At the start of the program all the elements have the value 2; i.e., all the forks are on the table. A philosopher who wishes to eat must first check that the two specified forks are still on the table. If so, the philosopher picks up the forks and changes the fork counts of his or her immediate neighbours. Acquisition of forks must take place only if both are available; if only one were available another attempt would have to be made later. Upon finishing eating a philosopher puts down the forks. The regions of code involved in changing the values of the array elements of FORKCNT are critical regions, to be accessed by only one task at a time, and so are enclosed by a lock, FORKLOCK. Tasks that attempt to enter the critical region while it is in use are suspended on the lock queue and are released in order of their arrival. A task competing for the same resource with another task, e.g., a neighbouring philosopher desiring the same fork, cannot jump ahead of its competitor in the queue. PROGRAM PHILOS EXTERNAL EATER INTEGER P1(2), P2(2), P3(2), P4(2), P5(2) INTEGER FORKLOCK, FORKCNT(5) COMMON/MEM/FORKLOCK, FORKCNT INTEGER EVGO(5) COMMON/EVENTS/EVGO DATA FORKCNT/5*2 CALL CALL CALL CALL CALL CALL Pl(1)
LOCKASGN (FORKLOCK) EVASGN (EVGO(1)) EVASGN (EVGO(2)) EVASGN (EVGO(3)) EVASGN (EVGO(4)) EVASGN (EVGO(5)) =
2
284
R. H.PERROTT
P2(1) = 2 P3(1) = 2 P4(1) = 2 P5(1) = 2 i n i t i a t e tasks
CALL TSKSTART (P1, EATER, 1) CALL TSKSTART (P2,EATER, 2) CALL TSKSTART (P3,EATER, 3) CALL TSKSTART (P4,EATER, 4) CALL TSKSTART (P5,EATER, 5) suspend tasks
CALL TSKWAIT (Pi) CALL TSKWAIT (P2) CALL TSKWAIT (P3) CALL TSKWAIT (P4) CALL TSKWAIT (P5) STOP END SUBROUTINE EATER (IDENT) philosopher l i f e - s t y l e thinking
CALL PICKUP (IDENT) eating
CALL PUTDOWN (IDENT) GOT01 RETURN END SUBROUTINE PUTDOWN (IDENT) INTEGER FORKLOCK, FORKCNT (5) COMMOM/MEM/FORKLOCK,FORKCNT INTEGER EVGO(5) COMMON/EVENTS/EVGO change forkcount s CALL LOCKON (FORKLOCK) FORKCNT(MOD( IDENT-1,5)) = FORKCNT(MOD(IDENT-1,5)) + 1 FORKCNT(MOD(IDENT+~, 5)) = FORKCNT(MOD(IDENT+I, 5 ) ) + i CALL LOCKOFF (FORKLOCK) release alldelayedtasks
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
285
CALL EVPOST (EVGO(MOD(IDENT-~, 5))) CALL EVPOST (EVGO(MOD(IDENT+~, 5))) RETURN END SUBROUTINE PICKUP (IDENT) LOGICAL OBTAINED INTEGER FORKLOCK,FORKCNT(5) COMMON/MEM/FORKLOCK,FORKCNT INTEGER EVGO(5) COMMON/EVENTS/EVGO 2 OBTAINED =.F. CALL LOCKON (FORKLOCK) IF (FORKCNT(1DENT). EQ .2)THEN FORKCNT(MOD(IDENT-1,5)) = FORKCNT(MOD(IDENT-1,5)) - 1 FORKCNT(MOD(IDENT+I,5)) = FORKCNT(MOD( IDENT+~, 5)) - 1 OBTAINED = .T.
END IF CALL LOCKOFF (FORKLOCK) IF ( .NOT.OBTAINED)THEN CALL EVWAIT (EVGO(1DENT)) CALL EVCLEAR (EVGO(1DENT)) GOT02 END IF RETURN END A philosopher who cannot acquire both forks is suspended on the event variable, EVGO. Whenever a philosopher returns forks to the table all waiting philosophers are reactivated, after which they attempt to pick up their forks again. It would be preferable if only philosophers that were able to continue were reactivated and could continue; this is not so easily programmed. If a philosopher is still not successful the task is suspended again to await the next deposit of forks. This means that processing power may be wasted by tasks whose resources are not immediately available. In summary, macrotasking is a technique whereby programs are modified with explicit calls to a special FORTRAN callable library of synchronisation routines, the Macrotasking Library. There are three sets of routines: one for task creation and manipulation and two for synchronisation (locks, events). Macrotasking works best when the amount of work to be par-
286
R. H.PERROTT
titioned over multiple processors is large, otherwise the synchronisation overhead may be significant. Introducting macrotasking to an existing program can lead to a significant amount of code restructuring; this, in turn, can lead to the introduction of errors. When the work is not easily partitionable into equal-sized tasks load imbalance may occur, producing less speed gains than expected; to solve these problems microtasking was introduced.
5.3 Microtasking In the case of microtasking a user must divide a program into small units that can be executed in parallel on multiple processors; thus microtasking is at a much lower level of definition than macrotasking. On the CRAY Y-MP, all microtasking is directed by a user with the insertion of preprocessor directives in the program. The directives mark loops whose iterations can be executed concurrently by the available processors. The burden of verifying the independence of loop iterations is placed on the user, a nontrivial exercise in some circumstances. A preprocessor reads the directives and translates them and their associated DO loops into a form palatable to the compiler. Code is then generated that allows the use of extra processors if they are available. In addition, the user can identify critical regions inside a microtasked loop that can be protected (locked) for single processor access with inserted directives. Microtasking is realised by using the following compiler directives: CMIC$ GETCPUS P
informs the system that this application may call more than one processor. P is the maximum number of processors that will be permitted to work on a microtasked program where P is an integer constant or variable number of processors. The default value for P is the maximum number of physical processors available. CMIC$ RELCPUS
this directive specifies that the processors acquired for microtasking should be released back to the system. It is the complement of the GETCPUS directive. This directive should be used when no microtasking is to be performed for a long period of time or when the program is preparing to terminate. This directive is optional; if it is not used, all processors acquired by the GETCPUS directive are held until the program terminates. CMIC$ MICRO
PARALLEL PROGRAMMING AND CRAY COMPUTERS
287
must appear just before the subroutine statement in order to inform the system that this subroutine is to be microtasked; i.e., uses more than one processor. The RETURN statement signals the end of the multiprocessing work and must be accessible to all processors that enter the routine. CMIC$ PROCESS indicates that only one process executes the following code up to the directive CMIC$ END PROCESS. CMIC$ ALSO PROCESS this directive marks the beginning of a process other than the first process inside a control structure and the end of the previous process. A PROCESS directive followed by any number of ALSO PROCESS directives implements the classic multitasking fork structure. CMIC$ DO GLOBAL tells the system that the following loop is to be partitioned among all available processors; that is, the iterations of a DO loop comprise all the processes. The number of processors executing the loop is unknown until execution begins. Because the number may change on every execution of the loop, the code must be independent of the number of processors. The end of the control structure is marked by the statement containing the label referred to in the DO statement. CMIC$ STOP ALL PROCESS this directive provides a way to exit from both PROCESS and DO GLOBAL control structures without performing all of the processes or iterations. This directive forces all processors to complete work in a process if they are in one, then accept no more work, closing the control structure. Processors resume work at the first statement after the end of the control structure. CMIC$ GUARD N this directive marks the beginning of a section of code to be protected from concurrent execution, N is an integer from 0 through 63. It may occur within a control structure or within a routine called from inside a control structure. The guarded portion of the program is not itself a control structure. GUARD with no value of N supplied constitutes a rapid guard, in which case no other guarded code can be executed simutaneously. Only sections of code guarded by the same N are prevented from executing simultaneously. The preprocessor cannot determine if one GUARD is nested inside another. A user must ensure that this does not happen, because it will lead to deadlock. The preprocessor issues a warning message to alert users to this potential error when it encounters a GUARD directive inside a subroutine that is not microtasked. The end of a critical section is indicated by CMIC$ END GUARD N.
288
R. H. PERROTT
Within a microtasked subroutine the processors enter the subroutine and proceed to execute it in parallel. The flow of control of these processors is restricted only by the control structures introduced into the code. The user should not make any assumptions about how many prpocessors are available. The basic model of execution is single program multiple data (SPMD) programming where each process executes the entire program unless there is code that restricts it. For example, a matrix multiplication program can be written in sequential FORTRAN as SUBROUTINE MXN(A, B ,C, L ,M ,N) DIMENSION A(L,M), B(M,N), C(L,N) DO 100 K = l , N J = l DO 200 I = l , L C(1,K) = A ( I , J ) * B ( J , K ) 200 CONTINUE DO 100 J = 2,M DO 100 I = l , L C(I,K) = C(I,K) + A(I,J)*B(J,K) 100 CONTINUE RETURN END
The solution is such that only the result matrix changes as a result of calling the subroutine. In the subroutine the calculation of each element is independent of every other result. This knowledge can be used to introduce microtasking to the preceding code as follows: CMIC4 MICRO SUBROUTINE WN(A ,B ,C ,L , M , N) DIMENSION A(L,M) ,B(M,N) ,C(L,N) CMIC$ DO GLOBAL DO 100 K = l , N J=1 DO 200 I = l , L C(1,K) = A ( I , J ) * B ( J , K ) 200 CONTINUE
100
DO 100 J = 2,M DO 100 I = l,L C(1,K) = C(1,K) + A ( I , J ) * B ( J , K ) CONTINUE
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
289
RETURN END The work on any one column is independent of the work on any other column and may therefore proceed in parallel with the work on other columns. This fact has been used to determine where to place the DO GLOBAL directive, which indicates that the following loop is to be partitioned among all available processors. Another example of microtasking applied to a four-way unrolled matrix multiplication is now given:
SUBROUTINE MXN(A,B,C,L,M,N) DIMENSION A(L,M), B(M,N), C(L,N) DO 100 K = l,N DO 100 I = l,L C(I,K) = (((A(I,l)*(l,K) + A(192)*B(2,K)) + $ A(I ,3)*B(3 ,K)) + A(I ,4)*B(4,K)) 100 CONTINUE C
DO REMAINING J'S
DO 110 J=5,M,4 DO 110 K=l,N DO 110 I=l,L C(1,K) = C(1,K) + (((A(I,J)*B(J,K) + A(I,J+l)*B(J+l,K)) + $ A(1, J+2)*B(J+2,K))+A(I, J+3)*B(J+3,K)) 100 CONTINUE RETURN END The inner loop of the triple loop vectorises because the iterations are independent of 1. The nesting order of the outer two loops does not matter for the single-threaded version. However, the outer loop has dependencies if it iterates on J , because each iteration modifies all elements of C . If, however, the nested loop is restructured so that the outer loop iterates on K , then each processor that takes an iteration will get a distinct row of C to modify. Proper microtasking requires only the following modifications.
CMIC$ MICRO SUBROUTINE MXN( A,B ,C ,L ,M ,N) DIMENSION A(L,M), B(M,N), C(L,N) CMIC$ DO GLOBAL DO 100 K = l,N
290
R. H. PERROTT
DO 100 I = l , L C(I,K) = (((A(I,1) * B(1,K) $ + A(I,2) * B(2,K)) + A ( I , 3 ) * B(3,K + A ( I , 4 ) * B(4,K)) 100 C O N T I N U E DO R E M A I N I N G J JS
C
CMIC$ DO DO DO DO
GLOBAL 110 K = l , N 110 J=5,M,4 110 I = l , L C(1,K) = C(1,K) + (((A(I,J)*B(J,K) + A(I,J+l)*B(J+l,K)) A( I , J + 2 ) * B ( J+2,K)) + A( I , J+3)*B( J+3,K ) ) $ 100 C O N T I N U E C RETURN END
+
Within a microtasked subroutine, the objective is to permit parallel processing of global data by imposing structure on the parts of the routine that modify it. This is done by allowing processors to enter the subroutine and proceed through it; in Cray terminology this disorder is referred to as the fray. The fray itself is indeterminate from the beginning of the subroutine until the RETURN statement. The fray can be entered by any number of processors that become available while it is being processed. The flow of control of these processors is restricted only by the control structures introduced into the code. To enforce order, control structures can be defined within the body of the fray. A user cannot control or know in advance how many processors will operate in any control structure. The advised rules for using microtasking (“Cray Multitasking Reference Manual”, 1986) are as follows: 0 0 0 0 0 0
Make no assumptions about how many processors are available. Modify global data only inside control structures. Local variables set inside control structures are not visible outside. Do not nest control structures. Iterations of microtasked DO loops must be genuinely independent. Use GUARD only inside control structures. Verify a microtasked program before using it for production.
The aim of microtasking is to improve the execution time of a program by close to P, where P is the number of processors called in CMIC$ GETCPUS. The low overhead of microtasking makes this more attainable than it is with
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
29 1
other forms of multitasking. According to Cray the total processor time for all processors of the parallel code should not be much higher than that of the single processor version. In fact, a microtasked program, running on one processor, should never be more than 5 percent slower than a single-processor version of the program. A user must impose structure on, and essentially partition, entire subroutines; the scope of all variables in such a subroutine must be carefully checked. However, the ability to protect critical regions allows a user considerable freedom in choosing the type and granularity of tasks to be concurrently processed. Also, a user can always simplify scoping of variables in complicated routines by pulling each loop to be microtasked out of line into a new subroutine. With large and sometimes complicated software packages, the most important part of applying microtasking is identifying which parts of a code to microtask. Vectorisation should never be overlooked since on the Cray there is a high vector to scalar speed ratio. Microtasking has three advantages over macrotasking. First, a user need make fewer modifications to convert an existing program. This is important because very few applications have been written with parallelism in mind programmers will be converting code for a long time. Second, microtasking has a much lower synchronisation overhead than macrotasking, which lets a user parallelise smaller pieces of code effectively. Because microtasking is so efficient, users can parallelise individual loops rather than entire subroutines. Thus, the compiler can vectorise the inner loop while the outer loop runs in parallel. Not only does this approach use hardware effectively, it is easier to detect data dependencies in a loop rather than across subroutine boundaries. Third, the microtasking code is often easier to understand than the macrotasking version. The simplicity of microtasking is deceptive, since in practice, this code is often difficult to debug. The culprit is the SPMD model, which requires great care in specifying data sharing. Some errors are extremely hard to locate because they are time dependent. Microtasking also reduced the degree of parallelism because it does not permit nested parallel structures: a parallel DO may not appear in the scope of a PROCESS block. As the number of processors increase this may become a serious drawback. Microtasking does not use the hardware as efficiently as macrotasking. In macrotasking, processes are suspended while they wait, freeing the processor for processes that are ready to d o useful work. Microtasking achieves efficient synchronisation by using spin waits, which require each process to check continually the status of a semaphore and restart as soon as the flag is set. Unfortunately, this approach keeps processors busy even if
292
R. H. PERROTT
processes are not doing useful work; thus, there is no point in defining more processes than processors. In summary, microtasking is a technique for multiprocessing programs that is based on exploiting the parallelism in DO loops. The programmer determines where parallelism exists and then places directives in the text of the program. A preprocessor reads these directives and translates them and their associated DO loops into a form suitable to the compiler. A code is generated that allows the program to use extra processors if they are available. Loop iteratioons are assigned to the next processor ready for work, resulting in excellent load balancing. Microtasking is based on subroutine rather than loop boundaries; subroutines provide a natural break as far as the scope of variables is concerned. A disadvantage of microtasking is that it may not be safe to execute code inside parallelisable DO loops by multiple processors. Also the burden is still placed on the programmer to find the potential parallelism, and this may be difficult; to solve these problems autotasking was introduced. 5.4 Autotasking Automatic multitasking or autotasking is the latest tool provided by Cray to help users execute their programs in parallel. It automatically partitions outer FORTRAN DO loops and spreads the tasks across multiple processors (when possible). Autotasking is intended to capitalise upon the strengths of microtasking, such as good load balancing, low synchronisation overhead and maximum use of idle processor cycles. Autotasking is most effective when it is used on programs that have a high degree of parallelism; that is, on programs in which most of the work is in nested DO loops. To execute correctly in parallel, the iterations of a DO loop must be independent, a property that is often difficult to recognise in complex loops; in certain situations this may require loops to be transformed. Autotasking in the compiling system consists of three phases: data dependence, translation and code generation. 5.4.1 Data Dependence Analysis
This phase looks for parallelism within program units, recognising when iterations of DO loops operate on independent elements of arrays and inserting directives. The input to the dependence analysis phase is FORTRAN source code and the output is FORTRAN source code (possibly restructured) with autotasking directives added to express the parallelism. The main benefits from the dependence analyser are these:
PARALLEL PROGRAMMING AND CRAY COMPUTERS
293
(i) Enhanced vectorisation as a result of possible statement reordering, ambiguous subscript resolution, reference reordering, splitting subroutine calls out of loops, loop nest restructuring and loop exchanges to get a stride of one or the longest vector length on the innermost loop. (ii) Recognition and generation of parallel constructs by the insertion of directives for use by the translation phase; (iii) Automatic inlining of subroutines to remove the overhead of the subroutine call; (iv) Specific code sequence recognition for situations such as matrix multiplication, first and second order linear recurrences, dot product and search for maximum and minimum. The system generates calls directly to optimised library routines that can perform these functions in parallel. Because some types of parallelism are difficult to locate using the dependence analyser the programmer can insert directives identified as the CFPP$ directives. CFPP$ directives inform the dependence analyser where to look for parallelism or how to make assertions that may allow it to recognise parallelism. There are four dependence analyser directives that begin with CFPP$ and are defined as follows:
CFPP$ o p t i o n [S] where S refers to the scope of the directive; namely, L COOP1 for the next DO loop (but not its inner loops), R[OUTINE] for the rest of the current subprogram, F [ILE] for the rest of the file. Examples of these directives are as follows (the second alternative disables the first): (i) CFPP$ CONCUR[Sl CFPP$ NOCONCUR [S] The CONCUR directive tells the dependence analyser to look for parallelisation over the scope s. The NOCONCUR directive disables concurrency analysis. (ii) CFPP$ INNER[Sl CFPP$ NOINNER [S] The INNER directive enables concurrency analysis for innermost vectorisable DO loops over the scope specified. It is recognised only
294
R. H. PERROTT
when concurrency analysis is enabled; that is, when the dependence analyser phase cannot find exploitable parallelism on an outer DO loop, it detects an innermost vectorisable DO loop. The NOINNER directive disables concurrency analysis for innermost vectorisable DO loops over the scope specified. Hence INNER/NOINNER directives refer only to vectorisable innermost DO loops. Nonvectorisable innermost DO loops are subject to concurrency analysis under the control of the CONCUR directive. (iii) CFPP$ VECTOR [Sl CFPP$ NOVECTOR [S] The VECTOR directive enables vectorisation analysis for innermost DO loops over the scope specified. The NOVECTOR directive disables vectorisation analysis over the scope specified and is activated only when NOCONCUR has precedence and is ignored otherwise. (iv) CFPP$ SKIPCSI This directive disables both concurrency and vectorisation analysis. 5.4.2 Translation
This phase uses directives to restructure the code for parallel execution. The directives (indicated by the prefix CMIC$) may be inserted by either the dependence analyser or the programmer. The output is in FORTRAN source code enhanced with calls to machine dependent library routines and intrinsic statements embedded in the source code to control parallel execution. This phase enforces the scoping requirements by ensuring that every variable has a scope, which is either private or shared. There is only one copy of each shared variable, but there are potentially many copies of private variables. The primary function of the translator is to rewrite the FORTRAN code with directives into pure FORTRAN for use by the code generation phase. The directives are expanded into a series of special function calls and compiler intrinsics. The programmer can insert directives, which are instructions to the translator. The directives to the translator begin with CMIC$ and are as follows: (i) CMIC$ PARALLEL [if(expr)l [shared (var[ I)] [private (varC I)] CMIC$ END PARALLEL These directives delimit a parallel region thereby providing a technique for modifying some variables’ scope to enable multiprocessing to occur; they indicate where multiple processors enter execution. For example,
PARALLEL PROGRAMMING A N D CRAY COMPUTERS
295
CMIC$ PARALLEL PYE = 3.14 BEE = 1 . 2 * X X BUZZ = SqRT (A+D) CMIC$ END PARALLEL
indicates that the three assignments statemsnts can be executed in parallel. The i f ( e x p r ) parameter is used as a run-time test to choose between uniprocessing and multiprocessing. The variables listed under shared have global scope so that they are accessible to all tasks. The variables listed in private have local scope; that is, each task will have its own copy.
(iii) CMIC$ DO ALL [if (expr)] [shared(var [I)] [private)var [])I [save last] [single] [chunks ize(n)l [numchunks(m)l [guided] [vect or1 indicates that a DO loop is to be executed in parallel using multiple processors. The DO ALL directive is a special shorthand notation for the two consecutive directives PARALLEL and DO PARALLEL; that is, the DO ALL indicates a parallel region whose only code is a DO loop. This directive initiates a parallel region whose only code is a DO loop with independent iterations. Savelast indicates that the last value of the iteration is to be retained; the other parameters specify the work distribution policy for the iterations of the DO loop, e.g., whether single or multiple iterations should be assigned to a processor. For example, CMIC$ DO ALL I F (N'M. GT. 1000) CMIC$l SHARED (M,N,A) PRIVATE ( I , J , X ) CMIC$2 SAVELAST DO 100 J = l , M CDIR$ IVDEP DO 100 I = l , N X = SqRT (A(S ,J)) A ( 1 , J) = X + l .O/X 100 CONTINUE
The IVDEP directive ensures that the loop is vectorised. All variables used in the loop are identified as shared or private; that is, accessible to all processors or local to each processor. The test is evaluated at run time to determine whether the loop can execute correctly on multiple processors or it must be run on a single processor.
I
296
R . H.PERROTT
(iii) CMIC$ DO PARALLEL [single] [chuncksize(n)l [numchunks(m)l [guided] indicates that the code may be executed in parallel by multiple processors. The parameters can be used to specify the work distribution policy for the iterations of the loop. Again the DO ALL directive is a special shorthand notation for the two consecutive directives PARALLEL and DO PARALLEL;that is, the DO ALL indicates a parallel region whose only code is a DO loop. An example of the use of some of these directives is in the case of reduction, which can be expressed in sequential FORTRAN as
200
SUM = 0.0 XSUM = 0.0 DO 200 J=1, JMAX XSUM = XSUM + A(J)*(J) CONTINUE SUM = SUM + XSUM
which can be changed to
SUM = 0.0 CMIC$ PARALLEL PRIVATE (XSUM) XSUM = 0.0 CMIC$ DO PARALLEL DO 200 J = 1, JMAX XSUM = XSUM + a( J)*B( J) 200 CONTINUE CMIC$ GUARD SUM = SUM + XSUM CMIC$ END GUARD CMIC$ END DO CMIC$ END PARALLEL In fact, this is a case that can be handled automatically by the dependence analyser. (iv) CMIC$ GUARD [n] CMIC$ END GUARD [n] these directives delimit a critical section and provide the necessary synchronisation to protect the code inside the critical region; the code must be executed by only one processor at any time. The optional parameter n is used to identify guarded regions individually, which can then be executed in parallel. (v) CMIC$ CASE
PARALLEL PROGRAMMING AND CRAY COMPUTERS
297
CMIC$ END CASE
these directives are used to identify blocks of code which can be executed in parallel. For example, CMIC$ CASE CALL ABC CMIC$ CASE CALL DEF CMIC$ END CASE
A single CASE directive in a parallel region can be used to force a single processor to execute a block of code. (vi) CMIC$ SOFT E X I T indicates that the GOT0 statement on the next line branches outside the currently executing parallel region. (vii) CMIC$ CONTINUE indicates that the external call on the next line has been specially prepared for execution in parallel.
5.4.3 Code Generation This phase uses the translation phase output to generate executable machine code. When possible, the compiling system will vectorise the innermost loop of a nest of DO loops and autotask the outermost loop for multiple processors. A user can direct the compiling system to stripmine a single vectorised DO loop with a long iteration count, effectively transforming the loop into a nested DO loop with a vector inner and a parallel outer loop. However, vectorisable loops are not required for autotasking to work. In summary, autotasking offers the possibility of more parallelism being found and exploited; this can be enhanced by the user selecting the most efficient forms of parallel processing and directing the dependence analyser and translator to create faster running programs. The concept of a parallel region allows the computational overhead of processor startup to be minimised and spread over multiple exploitable regions of code. It should be remembered that the benefits of vectorisation are substantially greater than what can be achieved through parallelisation and should therefore not be ignored.
VI. PARALLEL COMPUTING FORUM As a result of the proliferation of parallel FORTRAN dialects, steps have been taken in the United States to try to standardise parallel languages
298
R. H. PERROTT
for shared memory machines. A number of U.S. academics and suppliers of parallel machines have been meeting as a group known as the Parallel Computing Forum (PCF). The objective of this forum is to produce common extended FORTRAN syntax and semantics that allow concurrent processing to be expressed easily. The membership of the forum is such that its deliberations are likely to be widely adopted. The main proposals are as follows: (i) PARALLEL DO, which describes parallelism between iterations of a DO loop. Each processor executes an iteration, and execution of the statement after END PARALLEL does not occur until all processes have been completed. (ii) PARALLEL SECTIONS,which describe parallelism between sections of code. For example,
PARALLEL SECTIONS LOCAL VARIABLES SECTION
...
SECTION END PARALLEL SECTIONS The construct can have local variable definitions. (iii) PARALLEL REGION is executed in parallel by a number of processors that cooperate in the execution of any PDO or PSECTIONS constructs that they encounter. The PDO construct means the iterations are assigned to all available processors. The PSECTIONS describes parallelism between blocks of code. There is an implicit tasking control protocol that specifies subroutine level parallelism. In addition PCF FORTRAN directly supports different mechanisms for providing synchronisation between the concurrent activities that make up a parallel program.
(i) LOCKS are for the mutual exclusion of shared data. Locks can have two states, locked and unlocked; and only the process that performs the lock can unlock it. (ii) CRITICAL SECTIONS provide a structured mechanism for using LOCKS without having to use the LOCK primitives. (iii) EVENTS have two states, cleared and posted. An event may be posted and cleared by any process, which need not be the one that posted it. (iv) SEQUENCES are for communicating between iterations of a loop or between distinct loops.
PARALLEL PROGRAMMING AND CRAY COMPUTERS
299
The PCF syntax and semantics provide basic parallel constructs for specifying parallel processing, suitable for execution on a shared memory multiprocessor. One goal of these extensions is to avoid a program’s dependence on the actual number of processors available at any given time to execute the program. Since a programmer is not constrained to coding for a limited number of processors, the program may describe either more or less parallelism than there are processors in the actual system. An implementation of these extensions requires at least one processor to execute a parallel construct. Consequently a programmer can control the number of processors executing a program. Recently the deliberations of the Parallel Computing Forum have been passed to the U.S. Standards organisation, ANSI, who have set up a committee X3H5 to produce a FORTRAN based standard for shared memory multiprocessor machines. If, or when, such a standard is produced it is likely to have a major impact on languages for multiprocessor machines throughout the United States and beyond. VII. SUMMARY The developments in languages for parallel computing within the academic community have produced synchronisation and communication primitives such as message passing and the monitor. Over the years languages based on these primitives have been implemented on a number of parallel machines, however, they have not gained widespread acceptance within the commercial community. Typical of the language approaches available for commercial machines are those found on Cray machines; namely, macrotasking, microtasking and autotasking. Macrotasking, microtasking and autotasking exploit different levels of parallelism and when appropriate all three can be used in the same program. However, microtasking and autotasking are not allowed in the same subroutine. Macrotasking is best suited for very high levels of parallelism; microtasking is most effective at the subroutine level; and autotasking, the only method not requiring programmer intervention, can exploit parallelism at the DO loop level. The Cray experience provides an interesting, perhaps depressing, insight to the ability of the programming community to adopt to the concepts of parallel programming. Originally Cray provided a vectorising compiler that was enthusiastically received, as it could be used without user intervention. It was partly responsible for the success of the Cray-1. For their first multiprocessor machine they took the radical step of providing explicit parallel features in the form of the macrotasking library routines; this was
300
R. H. PERROTT
not in general a success with users, as they found the concepts difficult to understand and difficult to use. The next approach moved away from explicit features and introduced compiler directives that the user could apply within subroutines. It was the user’s responsibility to perform data dependence analysis to decide which loops could be executed in parallel on multiple processors. This approach was more successful with users as it did not require as much code restructuring as with macrotasking. Later Cray introduced autotasking, which automatically partitions DO loops across multiple processors and can be used without any user intervention.
ACKNOWLEDGMENTS It is a pleasure to acknowledge the help received from P. McMurray and A. Ramasubbu in connection with the Cray X-MP.
BIBLIOGRAPHY Ada. (1983). “Reference Manual for the Ada Programming Language” (ANSI/MIL-STD 1815). United States Department of Defense, Washington, DC. Almasi, G. S., and Gottlieb, A. (1989). “Highly Parallel Computing.” Benjamin/Cummins, Redwood City, Calif. Bieterman, M. (1988). “A Comparison of Microtasking on the Alliant FX and the Cray XMP.” Boeing Computer Services Report ETA-TR-81. Brinch Hansen, P. (1975). “The Programming Language Concurrent Pascal,” IEEE Trans. on Soft. Eng. 1, 199-207. Brinch Hansen, P. (1978). “Distributed Processes: A Concurrent Programming Concept,” Communications ACM 21, 934-940. “Cray Multitasking Reference Manual.” (1986). Cray Computer Systems Publication SN-0222. “DAP FORTRAN Language.” (1980). ICL Technical Publication 6755. Dehbonei, B., and Memmi, G. (1988). “Velour: A New Vectorising Computer Prototype,” in Proceedings of ICS ’88, Boston, 43 1-440. Dijkstra, E. W. (1975). “Guarded Commands, Non-determinacy, and Formal Derivation of Programs,” Communications ACM 18, 453-457. Hoare, C. A. R. (1973). “Keynote Address,” ACM Sigplan Conference, Boston. Hoare, C. A. R. (1974). “Monitors: An Operating System Structuring Concept,” Communications ACM 10, 549-557. Hoare, C. A. R. (1978). “Communicating Sequential Processes,” Communications ACM 21, 666-677. Hord, R. M. (1984). “The Illiac IV - The First Supercomputer.” Computer Science Press, Maryland. Inmos, Ltd. (1984). “Occam Programming Manual.” Prentice-Hall, Englewood Cliffs, N.J.
PARALLEL PROGRAMMING AND CRAY COMPUTERS
30 1
Karp, A. H., and Babb, R. G., 11. (1988). “A Comparison of Twelve Parallel FORTRAN Dialects,” IEEE Software, 52-67. May, D. (1983). “Occam,” ACM Sigplm Notices 18, 69-79. Parkinson, D. (1983). “The Distributed Array Processor,” Computer Physics Communications ACM 28, 325-336. Perrott, R. H. (1979). “Languages for Parallel Computers,” in “On the Construction of Programs,” R. M. McKeag and A. M. Macnaghten, eds. Cambridge University Press, Cambridge. Perrott, R. H. (1979a). “A Language for Array and Vector Processors,” ACM Trans. on Programming. Lung. Systems 2, 177-195. Perrott, R. H. (1987). “Parallel Programming.” Addison-Wesley, Reading, Mass. Perrott, R. H. (1988). “Evaluating Software Parallelism in Supercomputers.” Proc 1988 Unicorn Conference. Thompson, J. R. (1986). “The Cray-I, the Cray X-MP, the Cray-2 and Beyond: The Supercomputers of Cray Research,” in “Supercomputers,” S. Fernbach, ed. Wadsworth, C. (1988). “Who Should Think Parallel,” in “Proceedings Software for Parallel Computers.” Unicorn Ltd. Welsh, J., and Bustard, D. W. (1979). “Pascal Plus Another Language for Modular Multiprogramming,” Software - Practice und Experience 9, 947-957. Wirth, N. (1971). “The Programming Language Pascal,” Acta Informatica 1, 35 -63. Wirth, N. (1977). “Modula: A Language for Modular Multiprogramming,” Software Practice and Experience7, 3-35. Zima, H . P., Bast, H. J., and Gerndt, M. (1988). “SUPERB: A Tool for Semi-Automatic MIMDjSIMD Parallelisation,” Parallel Computing 6 , 1 18. -
-
-
This Page Intentionally Left Blank
ADVANCES IN ELECTRONICS A N D ELECTRON PHYSICS. VOL 85
A
Constructive, solid geometry, 105, 205 Contour, stack, 103 Coordinate frame body. 3-4, 8, 19 earth. 3-4 geographical, 3-4, 8 navigation, 3-4. 12 Coordinate transformation, 3, 8 Correlated noise, 24, 66, 68 Correlation coefficient, 15, 38 Covariance. 15, 19, 64, 68 equation, 3-4, 36-37, 40 graph, 56, 65 matrix, 20-25, 33-48, 50-55, 63 CPU time, 29, 62--64, 70 Crack code, 103 Crdy computers, 271 -297, .see also Parallel programming architecture, 271 -273 autotasking, 292-297 code generation, 297 data dependence analysis. 292-294 translation, 294-297 macrotasking, 275-286 events, 279-286 locks, 278-279 task declaration, 276-277 microtasking, 286-292 compiler directives, 286-287 rules for using, 290 operating systems and languages, 273-275 Cutaway view, 13 I Cutting plane, 21 3
Accelerometer, 2-4 Adjacency. I 16 Aerodynamics, 80 Alpha-blending, 139 Alpha channel, 92 Andrews algorithm, 37 Architecture, pipeline, 169 Array processor, 15, 26, 45-46, 62 Autotasking, Cray computers, 292-297
B Backward Kalman filter, 49-54. 57, 69 Backward recursion, 43 Barometric height, 13-14 Bierman algorithm, 23-24, 34, 36, 68 Bounding, box, 2 13 Buffer depth-coordinate, 137, 147 frame, cubic, 163
c
Carlson algorithm, 22 Chain code. 103 Chaining, 25 Cholesky decomposition, 23, 27 Classification, voxel binary, 142 fuzzy, 142 Coherence, 85, 92 edge, 158 face, 158 scanline, 158 spatial, 106, 222 Conditional synchronisation, 266 Connected 4-, 188 6-. 130, 144, 188 8-. 188 18-, 130, 191 26-, 130, 144, 190
D Decomposition, 26-42, 52-53 Depth cue, 85, 114 Descriptor OD, 97, 99 ID, 97, 101 2D, 97, 103 3D, 97, 105 Digital differential analyzer, 127- 128 303
304
INDEX
Digital filtering, 16 Dimensionality, 91 Dither, 6 Drift error, 7 Dynamic system, 19
Gyro, 2-7 mechanical, 4-5 optical, 5-7 fiber, 6 laser, 5-6
E Edge, active, 131 Eigenvalue, 33 Electrostatic multipole, 241-245 induced transformation of M function, 243-245 M function, 241 symmetry transformations, 245 transformation, 241 -243 Estimation, 20-21, 56, 65 Exploded view, 156
F Factorization, 23, 33-41, 63 Finite element analysis, 79 Flight data, 54 guidance, 8, 10 path, 55, 58, 60-61, 67 Fluid dynamics, 79 FORTRAN, Cray computers, 273-275 Forward Kalman filter, 49-50, 54, 69 Frame buffer, 170, 195 distributed, 176 grabber, 170 processor, 170
G Gaussian algorithm, 37, 43 process, 17, 19 Givens transformation, 28-32, 37, 43, 53, 62 Global positioning system (GPS), 54-55, 62, 66 Ground-ground support information, 54-55, 62, 66 Group, 233-241 direct product, generating set. 234
H Hardware parallel, 161 pipeline, 161 Hearing, 81 Hidden line, 114 surface, 114-198 volume, 114 Householder transformation, 27-28
1
Image continuous, 87 digital, 87 processing, 3D, 178 space, 86, 1 17 Inertial navigation, 2-16, 54, 57, 60 error description, 11-16 support, 2, 19, 24, 53-54, 61, 65-66 Interaction loop, 87 Interactive, 83, 94, 223 Interpolation nearest neighbor, 127 trilinear, 127 Intersection, 126, 130, 158 Inversion of a matrix, 21-22, 37, 43, 50
J Joseph algorithm, 21 Jover and Kailath algorithm, 39
K Kalman filter, 2 adaptive algorithm, 24-25 backward algorithm, 49-54, 57, 69 forward algorithm, 49-50, 54, 69
INDEX gain matrix, 20-21, 39, 42 Kalman-Bucy algorithm, 17-22 square-root algorithm, 22-24
L Languages, Cray computers, 273-275 Laplace transformation, 16, 60 Laser gyro strapdown system, 65 radar, 66, 68 Lighting model, 21 1 Light source, rotation, 198, 200 Linear dynamic system, 18-19 Lock-in effect, 6 Look-up table, 91
M Macrotasking, Cray computers, 275-286 Magnetostatic multipole, 245-251 induced transformation of M function, 248-250 M function, 245-247 symmetry transformation, 250-25 1 transformation, 247-248 Mapping backward. 124 forward, 116 voxel, 2 I5 Matrix matrix operation, 25-26, 45-46, 63 -vector operation, 24-26, 45 Maximum likelihood estimator (MLE), 50 Measurement, 24, 4 I , 49 matrix. 17. 19. 21, 47, 55 noise, 24, 38, 41, 48, 65, 68 scalar. 21-24, 36 vector, 19, 21, 24--25. 36, 47, 55 Medical imaging, 79 Memory, cubic. 162 Message passing, 268-271 Meteorology, 79 M function 4. 237 symmetry group G,, 238 determination of constraints, 252-253 possible types, 238-239
305
transformation T(g, v), 238 Microscope, laser scanning, confocal, 206 Microtasking, Cray computers, 286-292 M I M D model, 260-261 Model boundary, 96 constructive, 96 decomposition, 96 mathematical, 78 voxel, 100 binary, 101, 199 gray value, 100 labelled, 100 Multiplanar reformatting, 133 reprojection, 133 Multipole, 237 control relation, 237 determination of symmetry group Gp, 251-252 harmonic potentials, 238 partial harmonic potentials, 238 constraint relations among, 239
N
Navigation accuracy, 8, 15 equation, 4 error, 1 1 -16, 66, 68 system, 7, 56 Numerical instability, 21 stability, 21, 24, 28, 32. 34, 36, 45-46, 50
0 Object space, 86, I17 partitioning, 161 Octree, 107, 122 Opacity, 139 Operating systems, Cray computers, 273-275 Operations, logical, 148 Orthogonal matrix, 27, 29, 33 transformation, 26-38, 41, 45, 47. 50-52, 62-63
306
INDEX P
Parallel Computing Forum, 297-299 Parallelism explicit, 265-271 message passing, 268-271 monitor, 267-268 implicit, 263-265 Parallel programming, 259-265, 297-299, see also Cray computers approaches, 261-263 implicit parallelism, 263-265 Parallel Computing Forum, 297-299 PC, 25-26 Performance, 142, 221 Pipeline, flexible, 173 Pixel, 88 Platform system, 3, 7-8, 11, 55 Point cloud, 99 Polygon, planar, 103 Positive definite matrix, 22, 27, 37, 50 Prediction, 20, 22, 34-35, 46, 49, 63 equation, 21, 35 Primitive, 97 Probabilistic error, 53 Projection orthographic, 119 perspective, 114, I19 Pseudo-inverse matrix. 17
Q QR algorithm, 33-34, 63
R Radar measurement, 54, 56, 66 Radio navigation, 10- 1 1 error, 15-16 system, 10, 14-15, 38, 53-54, 62 Random number generator, 61 vector, 18-19 Rank filtering, 3D, 179 Rauch-Tung-Striebel algorithm, 50 Ray casting, 114, 124, 144 volume, 158 generation, 125
termination, 141 tracing, 124 Regression, 17 Rendering composite, 221 method, 145, 21 1, 222 surface, 133 volume, 139, 146 Representation, 112 continuous, 98 discrete, 98 Reprojection additive, 139, 146 maximum value, 139, 146 multiplanar, 133 Resolution, spatial, 84, 213 Rotation, 114, 118 Rotation matrix, 12, 29 Rounding error, 32 Runge-Kutta-Verner method, 60
S Scalar field, 89, 99 Scalar measurement, 21-24, 36 Scaling, 118 Scan conversion, 108 CT, 206 MRI, 206 plane, 131 Scanning front-to-back, 156 incremental, 120 recursive, back-to-front, 122 slice-by-slice back-to-front, 121, 143 front-to-back, 123, 143 Schur complement, 39-42, 50, 52 Seeing, 81 Seismic studies, 80 Selection, spatial, 147, 215 Sensory perception, 80 Set, 148, 205, 233 identity transformation, 234 Shading, 114, 217 depth, 133, 198 gradient binary, 135
INDEX gray-level, I35 adaptive, 137 normal-based contextual. 135 gradient. I34 Z-buffer gradient, 134 Shadowing, 114 Sigma band, 55 Simulated flight data, 57-61 Singular matrix. 33-34 Singular value decomposition (svd), 32-34, 36, 62 Skeleton. I86 Slicing, 201 Smell, 81 Smoothing algorithm, 49-50 Splitting plane, 151, 210 polyhedra, 152 State estimation. 20. 25 variable, 21, 24, 39, 45, 48, 51. 53 Stereo, 114 Stick figure, 102 Stochasticdl error, I5 Stochastic process, 17. 19 Strapdown navigation system, 8-9, I I Surface netted, 191 opaque, 113 rendering, 133 System matrix, 21 model matrix, 59, 64 noise matrix, 24-25, 36, 38. 47, 57, 64-65 state vector, 17, 36, 41-43, 45, 47, 50, 52 Systolic array computer, 25-26
T Taste, 81 Tessalation, surface, 108 Thinning, 3D, 186 Touch, 81 Transformation equation, 1 1 error, 12 geometric, 182 matrix, 3, 8, 29, 118
307
viewing, 117 Transition matrix, 18-19, 36, 47 Translation, 118 autotasking, 294-297 Transparency, 138 Transputer, 165 Tree, binary, space partitioning, 106, 148 Triangularization, 24, 37, 42, 47 Triangulation, 109 U UDU'-formulation, 23, 26, 34, 36, 63 Uncorrelated covariance, 22 measurement, 24, 63 noise, 24, 36 UNICOS, 273 Update. 20-25, 36-46, 49, 63 Upper triangular matrix, 22-23, 27-28, 30, 34-35, 37, 43, 45, 47, 51-52 V Vector processor. 24-26, 32, 62-63 Vertical axis, 13 Viewing parameter, 21 I transformation. 117 Virtual reality, 82 Volume calculation, 205 data set, 89, 206 Voxel, 115 address, 206 carryover, I82 filling, 182 model, 116, 195, 206 projection. 114, 142 traversal, 127 visible. 138
W Wireframe, 102 L
Z-buffer, 117, 174 Z-transformation, 16
ISBN O-L2-014727-0