Imaging for Detection and Identification
NATO Security through Science Series This Series presents the results of scientific meetings supported under the NATO Programme for Security through Science (STS). Meetings supported by the NATO STS Programme are in security-related priority areas of Defence Against Terrorism or Countering Other Threats to Security. The types of meeting supported are generally “Advanced Study Institutes” and “Advanced Research Workshops”. The NATO STS Series collects the results of these meetings. The meetings are co-organized by scientists from NATO countries and scientists from NATO’s “Partner” or “Mediterranean Dialogue” countries. The observations and recommendations made at the meetings, as well as the contents of the volumes in the Series, reflect those of participants and contributors only; they should not necessarily be regarded as reflecting NATO views or policy. Advanced Study Institutes (ASI) are high-level tutorial courses which convey the latest developments in a subject to an advanced-level audience. Advanced Research Workshops (ARW) are meetings of experts where an intense but informal exchange of views at the frontiers of a subject aims at identifying directions for future actions. Following a transformation of the programme in 2004 the Series has been re-named and re-organised. Recent volumes on topics not related to security, which result from meetings supported under the programme earlier, may be found in the NATO Science Series. The Series is published by IOS Press, Amsterdam, and Springer, Dordrecht, in conjunction with the NATO Public Diplomacy Division. Sub-Series A. B. C. D. E.
Chemistry and Biology Physics and Biophysics Environmental Security Information and Communication Security Human and Societal Dynamics
http://www.nato.int/science http://www.springer.com http://www.iospress.nl
Series B: Physics and Biophysics
Springer Springer Springer IOS Press IOS Press
Imaging for Detection and Identification edited by
Jim Byrnes Prometheus Inc., Newport, U.S.A.
Published in cooperation with NATO Public Diplomacy Division
Proceedings of the NATO Advanced Study Institute on Imaging for Detection and Identification Il Ciocco, Italy 23 July–5 August 2006 A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN-10 ISBN-13 ISBN-10 ISBN-13 ISBN-10 ISBN-13
1-4020-5618-4 (PB) 978-1-4020-5618-5 (PB) 1-4020-5619-2 (HB) 978-1-4020-5619-2 (HB) 1-4020-5620-6 (e-book) 978-1-4020-5620-8 (e-book)
Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. www.springer.com
Printed on acid-free paper
All Rights Reserved C 2007 Springer No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.
CONTENTS
Preface
vii
Multi-perspective Imaging and Image Interpretation Chris J. Baker, H. D. Griffiths, and Michele Vespe
1
Radar Imaging for Combatting Terrorism Hugh D. Griffiths and Chris J. Baker
29
Optical Signatures of Buried Mines Charles A. Hibbitts
49
Chemical Images of Liquids L. Lvova, R. Paolesse, C. Di Natale, E. Martinelli, E. Mazzone, A. Orsini, and A. D Amico
63
Sequential Detection Estimation and Noise Cancelation E. J. Sullivan and J. V. Candy
97
Image Fusion: A Powerful Tool for Object Identification ˇ Filip Sroubek, Jan Flusser, and Barbara Zitov´a
107
Nonlinear Statistical Signal Processing: A Particle Filtering Approach J. V. Candy Possibilities for Quantum Information Processing ˇ ep´an Holub Stˇ Multifractal Analysis of Images: New Connexions between Analysis and Geometry Yanick Heurteaux and St´ephane Jaffard Characterization and Construction of Ideal Waveforms Myoung An and Richard Tolimieri
v
129
151
169
195
vi
CONTENTS
Identification of Complex Processes Based on Analysis of Phase Space Structures Teimuraz Matcharashvili, Tamaz Chelidze, and Manana Janiashvili
207
Time-resolved Luminescence Imaging and Applications Ismail Mekkaoui Alaoui
243
Spectrum Sliding Analysis Vladimir Ya. Krakovsky
249
Index
263
PREFACE
The chapters in this volume were presented at the July–August 2006 NATO Advanced Study Institute on Imaging for Detection and Identification. The conference was held at the beautiful Il Ciocco resort near Lucca, in the glorious Tuscany region of northern Italy. For the eighth time we gathered at this idyllic spot to explore and extend the reciprocity between mathematics and engineering. The dynamic interaction between world-renowned scientists from the usually disparate communities of pure mathematicians and applied scientists which occurred at our seven previous ASI’s continued at this meeting. The fusion of basic ideas in mathematics, radar, sonar, biology, and chemistry with ongoing improvements in hardware and computation offers the promise of much more sophisticated and accurate detection and identification capabilities than currently exist. Coupled with the dramatic rise in the need for surveillance in innumerable aspects of our daily lives, brought about by hostile acts deemed unimaginable only a few short years ago, the time is ripe for image processing scientists in these usually diverse fields to join together in a concerted effort to combat the new brands of terrorism. This ASI was one important initial step. To encompass the diverse nature of the subject and the varied backgrounds of the participants, the ASI was divided into three broadly defined but interrelated areas: the mathematics and computer science of automatic detection and identification; image processing techniques for radar and sonar; detection of anomalies in biomedical and chemical images. A deep understanding of these three topics, and of their interdependencies, is clearly crucial to meet the increasing sophistication of those who wish to do us harm. The principal speakers and authors of the following chapters include many of the world’s leading experts in the development of new imaging methodologies to detect, identify, and prevent or respond to these threats. The ASI brought together world leaders from academia, government, and industry, with extensive multidisciplinary backgrounds evidenced by their research and participation in numerous workshops and conferences. This forum provided opportunities for young scientists and engineers to learn more about these problem areas, and the crucial role played by new insights, from recognized experts in this vital and growing area of harnessing mathematics and engineering in the service of a world-wide public security interest. An ancillary benefit will be the advancement of detection and identification capabilities for natural threats such as disease, natural disasters, and environmental change. vii
viii
PREFACE
The talks and the following chapters were designed to address an audience consisting of a broad spectrum of scientists, engineers, and mathematicians involved in these fields. Participants had the opportunity to interact with those individuals who have been on the forefront of the ongoing explosion of work in imaging and detection, to learn firsthand the details and subtleties of this exciting area, and to hear these experts discuss in accessible terms their contributions and ideas for future research. This volume offers these insights to those who were unable to attend. The cooperation of many individuals and organizations was required in order to make the conference the success that it was. First and foremost I wish to thank NATO, and especially Dr. F. Pedrazzini and his most able assistant, Ms. Alison Trapp, for the initial grant and subsequent help. Financial support was also received from the Defense Advanced Research Projects Agency (Drs. Joe Guerci and Ed Baranoski), AFOSR (Drs. Arje Nachman and Jon Sjogren), ARO (Drs. Russ Harmon and Jim Harvey), EOARD (Dr. Paul Losiewicz), ONR (Mr. Tim Schnoor, Mr. Dave Marquis and Dr. Tom Swean) and Prometheus Inc. This additional support is gratefully acknowledged. I wish to express my sincere appreciation to my assistants Parvaneh Badie, Marcia Byrnes and Ben Shenefelt and to the co-director, Temo Matcharashvili, for their invaluable aid. Finally, my heartfelt thanks to the Il Ciocco staff, especially Bruno Giannasi, for offering an ideal setting, not to mention the magnificent meals, that promoted the productive interaction between the participants of the conference. All of the above, the speakers, and the remaining conferees, made it possible for our Advanced Study Institute, and this volume, to fulfill the stated NATO objectives of disseminating advanced knowledge and fostering international scientific contacts. August 6, 2006
Jim Byrnes Il Ciocco, Italy
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
Chris J. Baker, H. D. Griffiths, and Michele Vespe Department of Electronic and Electrical Engineering, University College London, London, UK
Abstract. High resolution range profiling and imaging have been the principal methods by which more and more detailed target information can be collected by radar systems. The level of detail that can be established may then be used to attempt classification. However, this has typically been achieved using monostatic radar viewing targets from a single perspective. In this chapter methods for achieving very high resolutions will be reviewed. The techniques will include wide instantaneous bandwidths, stepped frequency and aperture synthesis. Examples showing the angular dependency of high range resolution profiles and two-dimensional imagery of real, full scale targets are presented. This data is examined as a basis for target classification and highlights how features observed relate to the structures that compose the target. A number of classification techniques will be introduced including statistical, feature vector and neural based approaches. These will be combined into a new method of classification that exploits multiple perspectives. Results will be presented, again based upon analysis of real target signatures and are used to examine the selection of perspectives to improve the overall classification performance. Key words: ATR, radar imaging, multi-perspective classification
1. Introduction Target classification by radar offers the possibility of remotely identifying objects at ranges well in excess of those of any other sensor. Indeed radar has been employed in operational systems for many years. However, the systems use human interpretation of radar data and performance is generally unreliable and slow. Nevertheless, the benefits of fast and reliable classification are enormous and have the potential for opening huge areas of new application. Research in recent years has been intense but still, automated or semi-automated classification able to work acceptably well in all conditions 1 J. Byrnes (ed.), Imaging for Detection and Identification, 1–28. C 2007 Springer.
2
C. J. BAKER ET AL.
seems a long way off. The prime approach to developing classification algorithms has been to use higher and higher spatial resolutions, either one dimensional range profiles (Hu and Zhu, 1997) or two-dimensional imagery (Novak et al., 1997). High resolution increases the level of detail in the data to be classified and this has generally been seen as providing more and better information. However, the performance of classifiers, whilst very good against a limited set of free space measurements is much less satisfactory when applied to operationally realistic conditions. In this chapter we will review methods for obtaining high resolution, use these to generate high resolution target signatures and subsequently illustrate some of their important aspects that require careful understanding if classification is to be successful. We then examine some typical classifiers before considering in more detail an approach to classification that uses a multiplicity of perspectives as the data input. This also enables much information about the nature of target signatures and their basis for classification to be evaluated. Firstly though, the concepts of resolution and classification are discussed as these terms are often used with imprecise or varying meanings. Most often resolution is defined as the ability of radar (or any sensor) to distinguish between two closely spaced scatterers. A measure of this ability is captured in the radar system point spread function or impulse response function with resolving power being determined by the 3-dB points. This is a reasonable definition but care needs to be taken as there is an implicit assumption that the target to be considered has point like properties. This is well known not to be the case, but nevertheless it has proved a useful descriptor of radar performance. As will be seen later, if resolution is improved more scatterers on a target can be distinguished from one another and there is undoubtedly an improved amount of detail in the resulting signature. Care also has to be taken with synthetic aperture imaging in two dimensions such as SAR and ISAR. These imaging techniques again have an implicit assumption that targets have point like properties and are continuously illuminated during formation of the image. Once again most targets are not points and quite often there is occlusion of one target by another. For example high placed scatterers near the front of a target often place the scatterers further back in shadow and hence they are not imaged. Thus the resolution in a typical SAR image is not necessarily constant, a fact often overlooked by developers of classification algorithms. However, once again, these assumptions have still resulted in robust SAR and ISAR image generation and as larger apertures are synthesised there is an undeniable increase in detail in the resulting imagery. We will return to these themes as we explore high resolution techniques and approaches to classification of the resulting target signatures.
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
3
2. High Down Range Resolution Techniques Despite the reservations discussed above we nevertheless begin by modelling the reflectivity from a target as the coherent sum of a series of spatially separated scatterers (Keller, 1962). Its simplicity offers valuable insight into some of the scattering processes observed. Thus the frequency domain reflectivity function ζ θ ( f ) of a complex target illuminated at a given aspect angle θ by an incident field of frequency λ D, where D is the physical dimension of the target, is given by: ζθ ( f ) =
N
ζθi ( f )
(1)
i=1
where ζθi ( f ) = Aiθ ( f ) exp − jϑθi ( f ) .
(2)
Thus we see that the amplitude and phase of the ith scatterer depend on both frequency and aspect angle. An inverse FFT will yield the complex reflectivity function or range profile, where the magnitude in each of the IFFT bins represents the magnitude of the reflections from the scatterers in a range resolution cell. The resolution in the range dimension is related directly to the pulse width (Tp ) of the waveform. Two targets of equal RCS are said to be recognized as being resolved in range when they are separated from each other by a distance: d=
cTp . 2
(3)
This equation tells us the range resolution of a pulsed radar system when the pulse is not modulated. Thus a very short pulse is needed if high range resolution is required. Indeed to resolve all the scatterers unambiguously the pulse length has to be short enough such that only one scatterer appears in each range cell. Normally as high a range resolution as possible is used and it is accepted that not all scatterers will be resolved. The resulting ambiguity is one source of ensuing difficulty in the next stage of classification. In long range radar, a long pulse is needed to ensure sufficient energy to detect small targets. However, this long length waveform has poor range resolution. The use of a short duration pulse in a long range radar system implies that a very high peak power is required. There is a limitation on just how high the peak power of the short pulse can be. Ultimately the voltage required will ‘arc’ or breakdown, risking damage to the circuitry and poor efficiency of transmission.
4
C. J. BAKER ET AL.
Figure 1. Changing the frequency as a function of time across the duration of a pulse. B is the modulation bandwidth and T the pulse duration.
One method which is commonplace today that overcomes this limitation is pulse compression (Knott et al., 1985). Pulse compression is a technique that consists of applying a modulation to a long pulse or waveform such that the bandwidth B of the modulation is greater than that of the un-modulated pulse (i.e. 1/Tp ). On reception the long pulse is processed by a matched filter to obtain the equivalent resolution of a short pulse (of width 1/B). In the time domain the received signal is correlated with a time reversed replica of the transmitted signal (delayed to the chosen range). This compresses the waveform into a single shorter duration that is equivalent to a pulse of length given by 1/B. In other words it is determined by the modulation bandwidth and not the pulse duration. Thus the limitation of poor range resolution using long pulses is overcome and long range as well as high resolution can both be achieved together. Here pulse compression can be achieved by frequency, phase and amplitude modulation. The most common form of modulation used is to change the frequency from the start to the end of the pulse such that the required bandwidth B is swept out as shown in Figure 1. This leads to a resolution given by: d=
c . 2B
(4)
To achieve even higher range resolutions a frequency modulated steppedfrequency compressed waveform may be employed. This reduces the instantaneous modulation bandwidth requirement while increasing the overall bandwidth. In other words the necessary wider bandwidth waveform is
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
5
Figure 2. Spectrum reconstruction of the target reflectivity function using a series of chirp pulses (a) and the corresponding synthesised bandwidth (b).
synthesised using a number of pulses. However, note that this has the disadvantage of collecting the individual waveforms over a finite period of time making the required coherency vulnerable to target motion. High range resolution (HRR) profiles are subsequently produced by processing a wideband reconstruction of a targets reflectivity spectrum in the frequency domain (Wilkinson et al., 1998). By placing side by side N narrower chirp waveforms or pulses of bandwidth B and using an inter-pulse frequency step increment also equal to B, it is possible to synthesise a total bandwidth of Bt = (N + 1)B/2. This is illustrated in Figure 2. A range profile may then be defined as a time sequence of the vector sum of signals reflected back by different scatterers within a range cell. By matched filtering this stepped frequency waveform a range resolution d = c/(2NB) is achieved. The received signal is the transmitted pulse modulated by the corresponding sub-spectrum representing the target reflectivity function. By adding the compressed individual portions of reflectivity function, which result from time convolution between each received pulse with the complex conjugate of the corresponding transmitted pulse, the entire spectrum is eventually obtained commensurate with the extended bandwidth. The HRR profile may then be synthesised from an inverse FFT applied to each row of the time history of the target’s frequency domain signature matrix.
3. High Cross Range Resolution In real aperture radar the cross range resolution is determined by the antenna beamwidth. This is often thought of as being so large that it effec-
6
C. J. BAKER ET AL.
tively produces no resolution of any value and is why HRR profiles are often referred to as one-dimensional target signatures. In reality they are two dimensional images where one of the dimensions of resolution is very poor. The real beamwidth (or cross range resolving power) is defined as the width at the half power or 3-dB points from the peak of the main lobe. The beamwidths may be the same in both the elevation (vertical) and the azimuth (horizontal) dimensions, although this is by no means mandatory. The beamwidth in radians is a function of antenna size and transmission wavelength. For a circular aperture (i.e. a circular profile of a parabolic dish) the beamwidth in radians at the half power or 3-dB points is given approximately by: Baz =
λ . D
(5)
This means that the cross-range extent of the beam at a range R is given by: Raz = R
λ . D
(6)
Thus for a range of only 10 km and a wavelength of 3 cm, a 0.5 m diameter antenna will have a cross range resolution of 600 m. This contrasts with a fre quency modulated pulse bandwidth of 500 MHz leading to a range resolution of 30 cm. It is typical to have equal down and cross range resolutions. Two techniques by which much higher cross range resolution may be obtained are SAR and ISAR. These are essentially completely equivalent and rely on viewing a target over a finite angular ambit to create a synthetic aperture with a length in excess of that of the real radar aperture and hence able to provide a higher resolution. The greater the angular ambit traversed, the greater the length of the synthesised aperture and the higher the cross range resolution. Here we will confine our discussions to ISAR imaging only and the interested reader is referred to excellent textbooks that cover SAR imaging (Curlander and McDonough, 1991; Carrara et al., 1995). To form an ISAR image the HRR profiles assembled using the step frequency technique reviewed in the previous section are used. To obtain the magnitude of the ISAR image’s pixel in lth range cell and jth Doppler cell (Dl, j ) an FFT is applied to each column of the matrix representing the time history of target’s range profile. This is demonstrated in Figure 3. For ISAR, the resolution achieved in cross-range depends upon the minimum resolvable frequency f D between two adjacent scatterers (Wehner, 1995). Doppler resolution is also related to the available coherent time of integration T which is equal to the time required to collect the N chirp returns.
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
7
Figure 3. Matrix decomposition for HRR profiles and ISAR imagery.
Therefore, consecutive reflectivity samples from the same range cells are taken every N T seconds: fD =
1 1 ≈ . N T T
(7)
As a consequence, the cross-range resolution rc can be written as: rc =
c f D λ = 2ω0 f c 2ω0 T
(8)
where λ = c/ f c is the illuminating wavelength and ω0 is the angular velocity of the target rotational motion. In ISAR image processing the motion of the target is usually unknown. The target motion can be seen as the superposition of rotational and translational motions with respect to the radar system. If the former contributes to the ability to resolve in cross-range, in order to obtain a focused image of the target it is necessary to compensate for phase errors due to the translational motion occurring during data collection. This is usually referred to as motion compensation. After this correction an ISAR image may be obtained by processing data collected over an arc of circular aperture whose dimensions depend on the rotational speed of the target ω0 and the integration time T . There may still be further residual motions that require correction using autofocus techniques. We now introduce experiments by which HRR profiles and ISAR images are generated.
8
C. J. BAKER ET AL.
4. High Resolution Target Signatures In this section we exploit the high resolution techniques introduced above to examine the form of the resulting signatures. To do this we use measurements made of calibration and vehicle targets that have been mounted on a slowly rotating platform. Figure 4 shows the experimental set up. Here the radar system views a turntable at a shallow grazing angle of a few degrees. The figure shows corner reflector calibration targets in place. Two are located on the turntable and two are located in front of and behind the turntable. Additional experiments have also been performed with the corners removed and a vehicle target situated on the turntable instead. The profiles are generated from an Xband radar having an instantaneous bandwidth of 500 MHz. Eight frequency steps spaced by 250 MHz are used to synthesise a total bandwidth of 2.25 GHz. The turntable data is first enhanced by removing any stationary clutter (Showman et al., 1998). Estimation and subtraction have been performed in the frequency domain. Figure 5 shows the resulting range profiles and their variation as the turntable rotates through 360◦ . The two stationary trihedrals show a constant response at near and far range as expected. For the two rotating trihedral targets, when the line-of-sight is on the trihedral bisector, a peak of reflection occurs. This is consistent with the expected theoretical response. As the trihedral targets rotate, the backscattered field decreases progressively until a point is reached where there is a peak of specular reflection. This is a reflection from one of the sides making up the trihedral which is orthogonal to the illuminating radar system (i.e. it faces the radar beam and looks like a flat plate reflector). At increasing rotation angles the RCS of the target drops since the orientation of the trihedral is such that it tends to reflect incident radiation away from the radar. This angular dependency of the RCS of a well known reflector such as a trihedral begins to illustrate how the backscattering properties of real targets may vary with the orientation of observation. For example if a target has part of its structure that mimics a trihedral it may show this feature over a similarly
Figure 4. ISAR geometry: two stationary corner reflectors are in front of and behind the turntable, while two rotating ones are placed on the turntable.
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
9
Figure 5. History of HRR range profiles (30 cm of range resolution) from four corner reflectors, two rotating and two stationary. The 0◦ angle is in correspondence of the two corners having the same range relative to the system and facing directly the radar beam.
limited angular range. Thus in a multi-perspective (M-P) environment, different angular samples of a target’s signature should improve the likelihood of observing a corner or corner like reflector and recognising it as such. Particular shapes such as flat plates and corners can be common on many manmade structures and are often quite dominant features that may prove useful for classification. In Figure 6, the complete angular ambit of range profiles spanning 360◦ from a Land Rover vehicle rotating on the turntable is shown. This highlights a number of different scattering behaviours: the strong peaks from specular reflections (0◦ , 90◦ , 180◦ ) appear over a very limited angle range and obscure nearby point-like backscattering. Corner-like returns can be observed at far range (∼6 m) for two range angular spans (∼[10◦ −60◦ ] and [130◦ −180◦ ]). These returns correspond to the trihedral like structures formed at the rear of the Land Rover. This is a vehicle without the rear soft top and has a metallic bench seat that makes a corner where it joins the rear bulkhead of the drivers cabin. At ∼8 m range there is a double bounce return corresponding to one of the corners. This type of effect increases the information that can be processed which would be otherwise impossible to reconstruct by a traditional single perspective approach. It also illustrates the
10
C. J. BAKER ET AL.
Figure 6. History of HRR range profiles (range resolution less than 15 cm) from a series of X-Band stepped frequency chirps illuminating a ground vehicle as it rotates over 360◦ . At 0◦ , the target is broadside oriented, while at 90◦ has its end-view towards the radar.
complexity and subtlety of radar target scattering. Note also that here we are dealing with targets on turntables where there is no clutter or multipath and the signal to noise ratio is high. In more realistic scenarios the scattering from targets will in fact be even more complicated. The turntable data can be processed using the ISAR technique to yield two-dimensional imagery. Figure 7 shows a series of ISAR images of a Ford Cougar car in which the number of perspectives used to form the imagery is slowly increased. It is clear that when only a single perspective is used there is considerable self-shadowing by the target and the shape cannot be determined with good confidence. When all eight perspectives are employed then the effects of self-shadowing are largely eliminated and much more complete information is generated. Note, however, that the concept of resolving scatterers as seemed to be observed in the range profiles of the Land Rover is much less obvious. As with most modern vehicles the Cougar is designed to have low wind resistance and has little in the way of distinct scattering centres. This begins to place in question both the image formation process and classification techniques if they are using a point target scatterer model assumption.
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
One perspective
Three perspective
Five perspective
Eight perspective
11
Figure 7. Multi-look image reconstruction: (a) is generated using a single perspective, (b) three perspectives, (c) five perspectives and (d) eight perspectives.
Figure 8 shows a multi-look ISAR image of the Land Rover target used to generate the HRR profiles seen earlier. There are two ‘point-like’ scatterers that are consistent with the area where the rear metal bench seat meets the bulk head behind the drivers cabin and forms a corner like structure. There are some
Figure 8. Multi-look image of the Land Rover target.
12
C. J. BAKER ET AL.
‘pointish-like’ scatterers at the rear of the vehicle although these are rather less distinct. Otherwise the scattering appears quite ‘un-point-like’ and resembles in many ways the form of image for the Cougar. Nevertheless, the shape of the Land Rover is easily discernable and is clearly quite different to that of the Cougar. In this way we start to appreciate the complexity of electro-magnetic backscatter and the ability to reliably and confidently classify real targets.
5. Target Classification Having examined the detailed structure and composition of target signatures we now consider operating on them to make classification decisions. Robust and reliable target classification has the potential to radically increase the importance and value of radar systems. Interest in radar target classification has intensified due to the increase of information available from advanced radar systems. As seen in the previous section this is mainly due to the backscattered signature having high resolution in range and crossrange. Although the description of target scattering is detailed enough to attempt radar target classification, the reliability of this crucial task is influenced by a number of unpredictable factors and is generally poor. For example, one and two-dimensional signatures can both appear significantly different even though the measurements are collected with the same radar and a very similar target geometry (measurement noise, rotational and translational range migration, speckle, etc.). In addition, global or local shadowing effects also noticeably challenge the attempts to classify targets reliably, as will multipath. As many sources of target signature variability depend on target orientation as viewed by the radar system, it may be possible to aid target recognition by using more than one perspective and hence average out the sources causing misclassification. Furthermore, the possible employment of bistatic modes of operation between the nodes of the network might additionally provide a valuable counter to stealth technology (Baker and Hume, 2003). In a monostatic netted radar scenario, the classification performance increases with the number of perspectives. Although the improvements are significant, those benefits are dependent on the location of the nodes with respect to the position of the target and its orientation.
6. Classification Techniques There are two main aspects to target classification. The first is to isolate the target returns from the clutter echoes (e.g. by filtering) and to extract the
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
13
features that can help to distinguish the class of the target. The second aspect is related to the method used for performing the decision as to which class or target type the feature data belongs. When target classification is achieved through automatic computation it is usually referred to as automatic target recognition (ATR). In ATR the classification task requires complex techniques and there are a number of approaches that can be used. For example in a model based technique a model of the target is made by computer aided design (CAD) and electro-magnetic simulations. This enables many simulated versions to be compared with the target signature to be classified. This is a computationally intensive technique. Alternatively, in a template matching based technique, many real versions of the target signatures (at a large number of geometries) are stored in a database and subsequently compared with the target detected in order to assign it to a class. Consequently, a very large database is needed. Further, if the target is altered in some way (e.g. a tank may carry some additional equipment) then the templates may no longer represent the modified signature and the classification can fail. Finally, pattern based techniques exploit features extracted from the input signature. These might include peak amplitudes and their locations in a HRR profile or ISAR image. These then are used to make a multi-dimensional feature vector which can be compared with the stored feature vectors from previous measurements of known targets in order to perform classification. This technique is less costly in terms of computation and it is consistent with a netted radar framework as a number of perspectives of the same object are available. Classification typically requires a high probability of declaration Pdec which is the probability that a detected target will be classified either correctly or incorrectly as a member of the template training set. A target is known when its feature-vector belongs to the training data set. A second performance requirement is to have a high probability of correct classification Pcc . These two parameters are often related each other: if Pdec is low, the ATR system may declare just those cases of very high certainty of correct classification. As a result, the system is able to achieve a very high Pcc but may not classify all possible targets. Finally, a low probability of false alarm Pfa is required, i.e. a low probability that an unknown target is incorrectly classified as a different object in the ATR database. This final task is particularly difficult since it is impossible to include a specific class of unknown targets in the template. This is because there is always a possibility that the classifier would need more information to make a correct decision. The decision, which is usually made automatically, may be performed with two different data input types. The first is one-dimensional target classification by HRR profiles. The HRR profile can be thought of as representing the
14
C. J. BAKER ET AL.
projection of the apparent target scattering centers onto the range axis. Hence the HRR profile is a one-dimensional feature vector. Three classifiers are considered here, a more exhaustive treatment may be found in (Tait, 2006). These are (i) a na¨ıve Bayesian classifier, (ii) a nearest neighbour classifier and (iii) a classifier using neural networks. 6.1. NA¨IVE BAYESIAN MULTI-PERSPECTIVE CLASSIFIER
The na¨ıve Bayesian classifier is a pattern recognition statistical method (Looney, 1998). The decision-making is reduced to pure calculations of feature probabilities. The training set is stored in different databases of templates and therefore the number of classes is known as well as sets of their representative vectors (i.e. the learning strategy is supervised). We consider a set of n c classes {Ci : i = 1, . . . , n c } and a single HRR profile X formed by a sequence of n elements x1 , x2 , . . . , xn as the feature vector. The classifier decides that X belongs to the class Ci showing the highest posterior probability P(Ci | X ). This is done using Bayes’ theorem to calculate the unknown posterior probability of classes conditioned on the unknown feature vector to be classified: P(X | Ci )P(Ci ) P(Ci | X ) = . (9) P(X ) The na¨ıve Bayesian classifier is based on the assumption of class independence of each attribute of the feature vector. Furthermore, since the values x1 , x2 , . . . , xn are continuous, they are assumed Gaussian distributed. Their statistical parameters are deduced from the training set and used to calculate the conditional probability P(xi | Ci ) for the attribute xi and, eventually, the likelihood ratio test: P(C j ) P(X | Ci ) If > ⇒ X ∈ class i. (10) P(X | C j ) P(Ci ) These concepts are integrated for the M-P na¨ıve Bayesian classifier. Here we consider a network of N radars and the sequence of 1D signatures {X j : j = 1, . . . , N } is the information collected by the system. The posterior probability P(Ci | X 1 , . . . , X N ) of the sequence of range profiles conditioned to Ci is the probability that the sequence belongs to that class and by applying Bayes’ theorem, can be expressed as follows: P(Ci | X 1 , . . . , X N ) =
N P(X j | Ci )P(Ci ) . P(X j ) j=1
(11)
Assuming constant the probability P(X 1 , . . . , X N ) with a fixed number of perspectives, the final decision is made for the class Ci that maximises
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
15
P(Ci | X 1 , . . . , X N ). This procedure enables the distinction of a singleperspective stage where all the conditional probabilities P(X j | Ci ) are computed separately for each perspective. 6.2. K -NN MULTI-PERSPECTIVE CLASSIFIER
The K -nearest neighbours (K -NN) is a non-parametric approach to classification (Duda et al., 2001). It consists of measuring and selecting the minimum K distances from the feature vector to be classified and comparing with the templates of the different classes. Consider an input vector X and a population made up of a set of classes {Ci : i = 1, . . . , n c }. Then the distances from di, j to Ti, j , the tth template vector of the ith class, can be computed and stored: di, j = d(X, Ti, j ) = X − Ti, j .
(12)
Subsequently, the K minimum scalars di, j are selected from each class forming a K -dimensional vector D labelled in ascending order. The final decision is made on the basis of the largest number of votes over the K dimensional vector obtained. There are three stages of the M-P K -NN classifier which are implemented as follows: 1. The mono-perspective stage: after the collection of the sequence of feature vectors {X j : j = 1, . . . , N } where N is the number of sensors in the network, the same number of single-perspective classifiers is implemented. The jth classifier computes a vector D j consisting of the K minimum distances from the templates. 2. M-P processing: the whole of the vector D j is processed and the minimum K distances are selected giving a weight for the decision. 3. Classification: the input sequence of feature vectors is associated with the class with the greatest number of weights. Different K values have been tested for this problem: the best trade-off between complexity and classification performance suggests a value K = 5 minimum distances. 6.3. FANN MULTI-PERSPECTIVE CLASSIFIER
Given a feature vector X , artificial neural networks (ANN) learn how to execute the classification task by means of examples (Christodoulou and Georgiopoulos, 2001). They are able to analyse and associate an output corresponding to a particular class of objects. Feed-forward ANN (FANN) supervised with a back-propagation strategy can be implemented. During the learning phase the training samples are used to set internal parameters of the network,
16
C. J. BAKER ET AL.
that is, after giving the templates as inputs to the classifier, the weights are modified on the basis of the distance between the desired and actual outputs of the network. Considering an input vector X and a population made up of a set of n c classes, the execution mode consists of the calculation of the output vector Y = (y1 , y2 , . . . , yn c ). Since the unipolar sigmoid (or logistic) function was chosen as activation function, the elements of the output vectors range from zero to one. The ultimate decision is made for the ith class, where i is the index of the maximum value of Y . If we assume N perspectives, the sequence of feature vectors {X j : j = 1, . . . , N } is the input for the first stage. Each single-perspective network accepts a vector and the partial outputs {Y j : j = 1, . . . , N } are calculated. Subsequently, the output Y of the M-P stage is the mean value of the partial outcomes, and the classification decision is finally made on the basis of the maximum index of Y .
7. Multi-perspective Classification We concentrate on the particular situation in which a single non-cooperative target has been previously detected and tracked by the radar system. Preprocessing raw data is necessary in order to increase the quality of the radar signatures: the target region is isolated and made more prominent thanks to the noise level subtraction from the rest of the range profile (Zyweck and Bogner, 1996). Principal discriminating factors for classification purposes are range resolution, side-lobe level (SLL) and noise level. Higher resolution means better point scatterers separation but the question of compromise regarding how much resolution is needed for good cost-recognition is difficult to resolve. Generally, a high SLL means clearer range profiles but this also implies deterioration in resolution. Eventually, a low noise level means high quality range profiles for classification. Real ISAR turntable data have been used to produce HRR range profiles and images. Three vehicles classified as A, B and C constitute the subpopulation problem. Each class is described by a set of range profiles covering a 360◦ rotation of the turntable. Single chirp returns are compressed giving 30 cm range resolution. The grazing angle of the radar is 8◦ and 2
of turntable rotation is the angular interval between two consecutive range profiles. Therefore, approximately 10,000 range profiles are extracted from each data file over the complete rotation of 360◦ . The training set of representative vectors for each class is made by 36 range profiles, taken approximately every 10◦ for rotation of the target. The testing set of each class consists of the remaining range profiles excluding the templates. The features extracted after principal component analysis (PCA) are the input attributes to the classifier.
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
17
Figure 9. Multi-perspective environment deduced from ISAR geometry.
The three algorithms have been implemented and tested in both single and multi-perspective environments. In this way any bias introduced by a single algorithm should be removed. Figure 9 represents a possible approximation of the multi-perspective scenario: each node of the network is assumed as having a fixed position as the target rotates by an angle ψ. The perspective angle ϕ is the angular displacement between the line-of-sights of two consecutive radars. From each of the radar positions either a series of range profiles can be generated as inputs to a one-dimensional classifier or they can be processed into an ISAR image that can be input to a two-dimensional classifier. It is therefore possible to perform classification using multiple perspectives. In Figure 10, the classification performance of the three previously described classifiers is plotted versus the number of perspectives used by the network. Because of the nature of the data and the small available number of targets, the classifiers present a high level of performance when using only a single aspect angle. Improved performance is achieved increasing the number of radars in the network but the greatest improvement in performance can be appreciated with a small number of radars. Since the number of perspectives, and therefore the number of radars, is strictly related to complexity, costs and time burden of the network, in terms of classification purposes, it may be a reasonable trade-off to implement networks involving a small number of nodes. However, this analysis is against a small number of target classes and these conclusions require further verification.
18
C. J. BAKER ET AL.
Figure 10. Multi-perspective classification rates using different numbers of perspectives.
The extent to which SNR affects classification and whether multiperspective scenarios are effective at different SNR levels are now examined. The FANN classifier has been used for this particular task. The range profiles I and Q components are corrupted with additive white Gaussian noise. The original data, after noise removal, has a 28.24 dB SNR. Subsequently, the classifier is tested with range profiles presenting a progressively lower SNR. In Figure 11, five different SNR levels of the same range profile from class A are represented. From the particular orientation, the length of the object is 5.5 m (spanning almost 18 range bins). As the SNR decreases, some of the useful features become less distinct, making the range profile more difficult to be classified. In Figure 12, the performance of the FANN classifier only is plotted versus the number of perspectives used and SNR levels, showing how the enhancement in classification varies with different noise levels. The graph illustrates an increase in classification performance with numbers of perspectives in each case, particularly valuable for lowest SNR levels. However, below an SNR of 17 dB the performance quickly degrades indicating that classifiers will be upset by relatively small amounts of noise.
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
Figure 11. Different SNR levels of the same range profile.
Figure 12. Correct classification rates for different SNR levels and netted radars.
19
20
C. J. BAKER ET AL.
8. Choice of Perspectives We now examine the geometrical relationships of nodes and target location and orientation using full scale measurements of representative targets. We start by re-visiting the neural network classification process in a little more detail and examine the classification performance as a function of the angle between the two perspectives taken. In a two-perspective (2-P) scenario, the parameter that distinguishes the perspective node locations is their relative angular displacement φ1,2 = φ2 − φ1 . Hence, after fixing φ1,2 , the 2-P classifier is tested with all possible pairs of HRR profiles displaced by that angle covering all the possible orientations of the target. Having a test set consisting of N profiles, the same number of pairs can be formed to test the 2-P classifier. The training set of representative vectors for each class is made up by 36 range profiles, taken approximately every 10◦ of target rotation. The testing set of each class consists of the remaining range profiles neglecting the templates. The M-P classifier can be seen as the combination of N single-perspective classifiers (i.e. N nodes in the network) whereas the eventual decision is made by processing the outputs of each FANN. For this first stage of investigation, the angle φ1,2 is not processed as information by the M-P classifier. Features from radar signatures have been extracted in order to reduce the intrinsic redundancy of data and simplify the model. This methodology, to a certain extent, decreases the overall correct classification rates but, on the other hand, makes the representation model in the feature space less complex yielding a classification process consistent with those features effectively characterising the object. A typical HRR profile is shown in Figure 13, where a threshold is applied to the HRR profile. The threshold is determined by measuring the mean intensity value neglecting the maximum peak, after normalising the profile. This guarantees adaptive target area isolation and less sensitiveness to the main scatterer reflection. Therefore, the radar length of the target for that particular orientation is measured as the distance between the first and last threshold crossings. This is the first component of the feature vector f . The second component is a measure of the average backscattering of the target whilst the successive M triples contain the information of the M peaks extracted in terms of amplitude, location and width. If the number P of peaks above the threshold is less than M, the last M-P triples are set to zero (Huaitie et al., 1997). Different numbers of peaks have been extracted until the classification process revealed a certain degree of robustness. For M = 4, the feature vector has a dimension of 14 elements while 52 range bin values make up the raw echo profile. Next, PCA has been applied to the profiles in order to better separate the classes in the feature space. By choosing the largest K eigenvectors from the
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
21
Figure 13. Feature extraction from HRR profile.
original 14 of the covariance matrix, the dimension of the new feature vector f
is additionally reduced. In Table I the resulting confusion matrix for K = 10 is shown for the four-class population problem for a single perspective classifier. Radar classification is highly dependent on the object orientation. Since radar targets are usually man-made objects, they often present a number of 3D symmetries. In this work, a low grazing angle is used to simulate a groundbased scenario, where the radar system, the target and their relative motion vectors lie on the same plane. This geometry allows us to consider the 2D problem, whereby the perspectives represented by 1D signatures collected by the measurement system are on the same plane. For example, for most of the possible 2D ground-vehicle orientations, with 180◦ between the two perspectives, the corresponding profiles might expect to be quite highly correlated. TABLE I. Single-perspective FANN confusion matrix on features after PCA using K = 10 (correct classification rate = 74.4%). Output → Input ↓ Class A Class B Class C Class D
Class A
Class B
Class C
Class D
75.85 4.75 2.81 12.91
11.19 80.91 18.98 0.76
9.57 13.26 69.94 15.26
3.39 1.08 8.27 71.07
22
C. J. BAKER ET AL.
Figure 14. 2-P correct classification rates versus the angular displacement φ1,2 for the threeclass problem.
This is due to the 180◦ symmetry typically exhibited by vehicles and hence little extra information is added. If this is the case it will cause a reduction in target characterisation and eventually, of M-P correct classification rate (CRR) improvement. However, details such as rear-view mirrors, the antenna position and any non-symmetrical or moving parts will change the two signatures, producing CRR benefits when compared with single-perspective classifiers. We now consider the relationship between nodes location and target orientation, investigated for the cases of both two and three-perspective classifiers. In a 2-P scenario the angular perspective displacement between radars is the discriminant factor for the combined CRR: as the two range profiles decorrelate (i.e. φ1,2 increases) the information content of the pair-increases. Figure 14 shows the classification performance as a function of angular separation of the two perspectives. The equivalent monostatic CRR is shown at the 0◦ and 360◦ positions. The single target classification rates show how the global accuracy depends on the peculiar geometric features of the target. The drop at φ1,2 = 180◦ , as previously hypothesised, is due to the multiple axes of symmetries of the targets and it is visible for all the classes. For targets A and B there is also a drop at 45◦ indicating a possible further degree of symmetry. The relationship between M-P classification and range profiles information content can be deduced from Figure 15, where the cross-correlation between profiles collected from different perspectives is represented for the class C target.
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
23
Figure 15. Non-coherent cross-correlation between profiles belonging to class C.
The regions of high cross-correlation influence the M-P classifier: when 90◦ of separation occurs (dotted line in Figure 15), the profiles between two perspectives, the input profiles taken in the range [120–150]◦ are highly correlated with the ones belonging to the orientations [210–240]◦ . M-P classification maxima and minima are mainly related to geometric symmetries. We now add to the dataset another sub-population class D and a new M-P FANN classifier. All the internal parameters of the Neural Network were changed by the new learning phase. As a result, the decision boundaries between the classes are modified. For this reason, as can be observed in Figure 16, class B is more likely misclassified than class C, whose CRR remain almost unaltered. The four-class CRR shows the overall performance deterioration in terms of classification. On the other hand, the internal symmetries that influence the classifier remain unchanged by adding elements to the population. For example, it is thought that the target belonging to class C has a number of multiple-bounce phenomena from corner-like scatterers. Their persistence is less than 15◦ , causing the number of relative maxima within less than 90◦ of separation. Every target shows different trends: 90◦ angular perspective displacement can mean either an improvement or reduction in information content from a backscattering information point of view. This is due to the detailed make up of the target and the way in which differential symmetries can be exhibited.
24
C. J. BAKER ET AL.
Figure 16. 2-P correct classification rates versus the angular displacement φ1,2 for the fourclass problem.
This can make the overall M-P classification rates less sensitive to the angular perspective displacement of the nodes. The effects of symmetries on CRR performance can also be seen in the neighborhood of φ1,2 = 90◦ and φ1,2 = 270◦ . This is verified when the
relationship between two perspective displacements φ1,2 and φ1,2 is:
. φ1,2 = π − φ1,2
(13)
When the perspective condition expressed in the above equation is verified, the 2-P performance is similar because of the intrinsic geometrical symmetries of classes A, B and C. This is verified in both the three and four-class problems shown in Figures 14 and 16, illustrating the M-P independence of the crossaffinity between different targets. The de-correlation rate of the HRR profiles also seems to vary depending on the particular target. In general, it appears proportional to the number of wave trapping features and their persistency. If a single main scatterer has a high persistency over a wide range of orientations, the cross-correlation of the profiles over the same range of target orientations is expected to be high and hence we conclude there is less extra information to improve classification. The more distinguishing features that exist, the greater the separation benefits achieved for single-aspect classification. This concept is amplified for the M-P environment since those crucial features appearing for few orientations affect a greater number of inputs now represented by the perspective
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
25
Figure 17. 3-P correct classification rates versus the angular displacements φ1,2 and φ1,3 for the three-class problem.
combination. For example, the length of the target is an important classification feature. For the M-P environment those targets presenting lengths not common to the other objects have even greater benefits. The mean CRR of the 2-P classifier is increased from 74.4% to 85.7% (+11.3%) with respect to the single-perspective case, while for the single class D the probability of correct classification increases by 13.1%. However, this result is highly averaged but is indicative of the overall performance improvement. In Figure 17 the classification rates for three perspectives are shown. Here the first perspective is fixed whilst the other two slide around the target covering all 360◦ . Thus these are shown as a function of the angular displacement φ1,2 between the second and the first node, and φ1,3 between the third and the first node. The origin represents the mono-perspective case and again indicates the overall improvement offered by the multi-perspective approach. The bisector line corresponds to two radar systems at the same location, while the lines parallel to the bisector symbolise two perspectives displaced by 90◦ and 180◦ . The inherent symmetry in the radar signature of the vehicle gives rise to the relatively regular structure shown in Figure 17. In a 2-Perspective scenario, if the two nodes view the target from the same perspective (i.e. the two radars have the same LOS) then this gives the monoperspective classification performance. This is not the case for a 3-Perspective
26
C. J. BAKER ET AL.
Figure 18. 3-P correct classification rates versus the angular displacements φ1,2 and φ1,3 for the four-class problem.
network. This is a consequence of weighting twice the perspective of two nodes and once the third node’s perspective. As a result, the 3-Perspective performance when φ1,2 = 0 and φ1,3 = 0 is worse than the 2-Perspective scenario that simply neglects one of the coinciding perspectives. In Figure 18, the CRR of the four-class population problem is represented with respect to the angular displacements between the three nodes of the network. As has been observed for the 2-Perspective case, the classification performance is less sensitive to the aspect angle when considering a greater number of classes problem. This may suggest that in a real environment with a large number of classes, the M-P classification could be equally effective for any target position provided that each node LOS is spaced enough from the others in order to collect uncorrelated signatures. The 3-P classifier probability of correct classification is increased from 74.4% to 90.0% (+15.6%) with respect to the single-perspective case.
9. Summary and Conclusions After implementing a multi-perspective classifier for ATR, the results in terms of classification rates have been examined using features extracted from HRR profile signatures. The benefits of the M-P classifier implementation have
MULTI-PERSPECTIVE IMAGING AND IMAGE INTERPRETATION
27
been analysed, showing a non linear but very clear CCR improvement with the number of perspectives. Furthermore, the correct classification gain from employing multi-perspectives is achievable for different SNRs and any radar node location. However, there is a small variation in the correct classification rate for a relatively constrained set of node locations. This is due to inherent geometrical symmetries of man-made objects. The M-P classification performance has been described for two and three perspectives and applied to a three- and a four-class problem. The multiple perspectives affect the single class probability of correct classification differently depending mainly on the number, nature and persistency of scattering centers appearing in the profiles. As a consequence, the node location dependence of the global classification rate decreases when a greater number of classes is involved, making the classifier equally reliable for a wide range of angular displacements. The M-P classification improvements are reduced when the nodes are closely separated since the perspectives exhibit a significant degree of correlation. Nevertheless, the overall probability of correct classification is well above the mono-perspective case (+11.3% and 15.6% using two and three perspectives respectively). In addition the complexity and variability of reflectivity from real targets has been highlighted. Multiple perspective classification doesn’t necessarily offer a trouble free route to acceptable classification and requires further testing under more realistic conditions. It also helps to indicate what information in the radar signature is important for classification. However, much further research remains before routine and reliable classification by radar becomes the norm.
Acknowledgement The work reported herein was funded by the Electro-Magnetic Remote Sensing (EMRS) Defence Technology Centre, established by the UK Ministry of Defence and run by a consortium of Selex, Thales Defence, Roke Manor Research and Filtronic. The authors would also like to thank Thales Sensors for providing ADAS files to investigate HRR range profiles and ISAR images from real targets.
References Baker, C. J. and Hume, A. L. (2003) Netted Radar Sensing, IEEE Aerospace and Electronic Systems Magazine 18, 3–6. Carrara, W. G., Goodman, R. S., and Majewski, R. M. (1995) Spotlight Synthetic Aperture Radar: Signal Processing Algorithms, Boston, Artech House.
28
C. J. BAKER ET AL.
Christodoulou, C. and Georgiopoulos, M. (2001) Applications of Neural Networks in Electromagnetics, Boston, Artech House. Curlander, J. C. and McDonough, R. N. (1991) Synthetic Aperture Radar: Systems and Processing, New York, Wiley. Duda, R. O., Hart, P. E., and Stork, D. G. (2001) Pattern Classification, New York, John Wiley and Sons. Hu, R. and Zhu, Z. (1997) Researches on Radar Target Classification Based on High Resolution Range Profiles, In Proceedings of the IEEE Aerospace and Electronics Conference, Vol. 2, pp. 951–955. Huaitie, X., Zhaowen, Z., Zhenpin, C., Songhua, H., and Biao, G. (1997) Aircraft Target Recognition Using Adaptive Time-delay Neural Network, In Proceedings of the Aerospace and Electronics Conference, Vol. 2, pp. 764–768. Keller, J. B. (1962) Geometrical Theory of Diffraction, Journal of the Optical Society of America 52, 116–130. Knott, E. F., Shaeffer, J. F., and Tuley, M. (1985) Radar Cross Section, Boston, Artech House. Looney, C. G. (1998) Pattern Recognition Using Neural Networks, Oxford, Oxford University Press. Novak, L. M., Halversen, S. D., Owirka, G., and Hiett, M. (1997) Effects of Polarization and Resolution on SAR ATR, IEEE Transactions on Aerospace and Electronic Systems 33, 102–116. Showman, G. A., Richards, M. A., and Sangston, K. J. (1998) Comparison of Two Algorithms for Correcting Zero-Doppler Clutter in Turntable ISAR Imagery, In Proceedings of the Conference on Signals, Systems & Computers, Vol. 1, pp. 411–415. Tait, P. (2006) Introduction to Radar Target Recognition, Bodmin, Cornwall, UK, IEE, Peter Perigrinus Publishing. Wehner, D. R. (1995) High Resolution Radar, Boston, Artech House. Wilkinson, A. J., Lord, R. T., and Inggs, M. R. (1998) Stepped-frequency Processing by Reconstruction of Target Reflectivity Spectrum, In Communications and Signal Processing, ’98, pp. 101–104. Zyweck, A. and Bogner, R. (1996) Radar target classification of commercial aircraft, IEEE Transactions on Aerospace and Electronic Systems 32, 598–606.
RADAR IMAGING FOR COMBATTING TERRORISM
Hugh D. Griffiths and Chris J. Baker Department of Electronic and Electrical Engineering University College London, London, UK
Abstract. Radar, and in particular imaging radar, have many and varied applications to counterterrorism. Radar is a day/night all-weather sensor, and imaging radars carried by aircraft or satellites are routinely able to achieve high-resolution images of target scenes, and to detect and classify stationary and moving targets at operational ranges. Short-range radar techniques may be used to identify small targets, even buried in the ground or hidden behind building walls. Different frequency bands may be used, for example high frequencies (X-band) may be used to support high bandwidths to give high range resolution, while low frequencies (HF or VHF) are used for foliage penetration to detect targets hidden in forests, or for ground penetration to detect buried targets. The purpose of this contribution is to review the fundamental principles of radar imaging, and to consider the contributions that radar imaging can make in four specific aspects of counterterrorism: through-wall radar imaging, radar detection of buried targets, tomography and detection of concealed weapons and passive bistatic radar. Key words: radar imaging, ground penetrating radar, passive bistatic radar
1. Introduction Radar techniques and technology have been developed over many decades. Radar has the key attributes of day/night, all-weather performance, and also the ability to measure target range accurately and precisely. The techniques of imaging radar date back to the Second World War, when crude radar images were obtained on the displays of airborne radar systems, allowing features such as coastlines to be distinguished. A major advance took place in the 1950s, when Wiley in the USA made the first experiments with airborne synthetic aperture radar (Wiley, 1985), and nowadays synthetic aperture radar is an important part of the radar art, with radar imagery from satellites and from aircraft routinely used for geophysical remote sensing and for military surveillance purposes. The enormous advances that have been made in imaging radar 29 J. Byrnes (ed.), Imaging for Detection and Identification, 29–48. C 2007 Springer.
30
H. D. GRIFFITHS AND C. J. BAKER
since the 1950s owe a great deal to the development of fast, high-performance digital processing hardware and algorithms. The objective of this chapter is to review the application of imaging radar systems to counterterrorism. In so doing, we need to bear in mind the things that radar is good at doing, and the things that it is not good at doing. At a fundamental level, then, we will be concerned with the electromagnetic properties of the targets of interest, and of their contrast with the surroundings. We will be concerned with how these properties vary with frequency, since targets may exhibit resonances (and hence enhanced signatures) at certain frequencies, and with the propagation characteristics through materials such as building walls and soil. Hence, we will be interested in techniques which allow us to distinguish targets from the background, by exploiting differences in signature, and wherever possible making use of prior knowledge. It is likely that additional information may be obtained from multiple perspective views of targets (Baker et al., 2006). Further information may be obtainable through the use of techniques such as radar polarimetry and interferometry. There are two distinct stages to this: (i) the production of high-quality, artefact-free imagery, and (ii) the extraction of information from imagery. It should not be expected that radar images should look like photographs. Firstly radar frequencies are very different from those in the optical region. Secondly, radar is a coherent imaging device (just like a laser) and therefore exhibits multiplicative speckle noise (just like a laser). This makes for an extremely challenging image interpretation environment. The structure of the chapter is therefore as follows. Firstly, we provide a brief review of the fundamentals of radar imaging, establishing some of the fundamental relations for the resolution of an imaging radar system. This is followed by a discussion of four specific applications to counterterrorism: (i) the detection of buried targets; (ii) through-wall radar imaging; (iii) radar tomography and the detection of concealed weapons; (iv) passive mm-wave imaging; and (v) passive bistatic radar. This is followed by some discussion and some comments on future prospects. 2. Fundamentals of Radar Imaging Firstly we can establish some of the fundamental relations for the resolution of an imaging radar system. In the down-range dimension, resolution r is related to the signal bandwidth B, thus c r = (1) 2B where c is the velocity of propagation. High range resolution may be obtained either with a short-duration pulse or by a coded wide-bandwidth signal, such
RADAR IMAGING FOR COMBATTING TERRORISM
31
as a linear FM chirp or a step-frequency sequence, with the appropriate pulse compression processing. A short-duration pulse requires a high peak transmit power and instantaneously broadband operation; these requirements can be relaxed in the case of pulse compression. With the advances in digital processing power it is now relatively straightforward to generate wideband coded waveforms and to perform the pulse compression processing in the receiver in real time. In the first instance cross-range resolution is determined by the product of the range r and the beamwidth θB (in radians). The beamwidth is determined by the dimension d of the antenna aperture and thus the cross-range resolution is given by x = r θB ≈
rλ d
(2)
where λ is the wavelength. As the dimensions of most antennas are limited by practical considerations (such as fitting to an aircraft) the cross-range resolution is invariably much poorer than that in the down range dimension. However, there are a number of techniques that can improve upon this. All of these are ultimately a function of the change in viewing or aspect angle. The cross-range resolution achieved in synthetic aperture radars is determined by the relative motion between the radar and the object. Consider the scenario in Figure 1. For this geometry, Brown (1967) defines the point target response of a mono chromatic CW radar as x 4π 2 W (x) = A exp j r + x2 . (3) r λ
Figure 1. A target moving perpendicularly past a monochromatic CW radar.
32
H. D. GRIFFITHS AND C. J. BAKER
Computation of the instantaneous frequency of the response allows the bandwidth to be written as 4π 4π x θ B= = sin (4) √ λ r2 + x2 λ 2 and hence the cross-range resolution is given by x =
λ π = . B 4 sin(θ/2)
(5)
For a linear, stripmap-mode synthetic aperture, Equation (5) reduces to x = d/2, which is independent of both range and frequency. Even higher resolution can be obtained with a spotlight-mode synthetic aperture, steering the realaperture beam (either mechanically or electronically) to keep the target scene in view for a longer period, and hence forming a longer synthetic aperture. Thus as θ → 180◦ , the cross-range resolution x → λ/4. A practical maximum value for θ is likely to be no more than 60◦ , leading to x ≈ λ/2, for real systems limited by practical considerations such as range ambiguity, range curvature, SNR, etc. The equivalent result for range resolution may be obtained by writing (1) as c λ f0 r = = . (6) 2B 2 B The fractional bandwidth, B/ f 0 could in principle be 200%, with the signal bandwidth extending from zero to 2 f 0 , and giving r = λ/4, but a practical maximum value is likely to be closer to 100%, giving r ≈ λ/2. In the last year or so results have appeared in the open literature which approach this limit (Brenner and Ender, 2004; Cantalloube and DuboisFernandez, 2004). Figure 2 shows one example from a recent conference of an urban target scene. Critical to the ability to produce such imagery is the ability to characterise and compensate for platform motion errors, since in general the platform will not move with perfectly uniform motion in a straight line. Of course, motion compensation becomes most critical at the highest resolutions. This is conventionally achieved by a combination of inertial navigation (IN) and autofocus processing, and a number of different autofocus algorithms have been devised and evaluated (Oliver and Quegan, 1998). In practice IN sensors are sensitive to rapidly varying errors and autofocus techniques are best able to correct errors which vary on the same spatial scale as the synthetic aperture length, so the two techniques are essentially complementary. It is possible to improve the resolution in images using so-called superresolution image processing techniques. Under most favourable conditions an improvement of a factor of between 2 and 3 in image resolution is achievable,
RADAR IMAGING FOR COMBATTING TERRORISM
33
Figure 2. High resolution aircraft-borne SAR image of a part of the university campus in Karlsruhe (Germany). The white arrow refers to a lattice in the left courtyard, which is shown in more detail in the small picture on the left bottom. The corresponding optical image is shown on the left top (after Brenner and Ender (2004)).
though to achieve this it is necessary that the signal-to-noise ratio should be adequately high (at least 20 dB) to start with and that imagery should be free from artefacts. Other means of extracting information from the target scene, and hence of providing information to help discriminate the target from clutter and to classify the target, include (i) interferometric processing to provide highresolution three-dimensional information on target shape; (ii) polarimetric radar, since the polarimetric scattering properties of targets and of clutter may be significantly different, especially if the target includes dihedral or trihedrallike features; and (iii) multi-aspect imaging, since views from different aspects of a target will almost certainly provide greater information to assist the classification process (Baker et al., 2006).
3. Applications to Counterterrorism 3.1. RADAR DETECTION OF BURIED TARGETS
An important application is the use of radar to detect and classify objects buried in the ground. Specifically in a counterterrorism context such objects may take the form of landmines and improvised explosive devices (IEDs), weapons caches and tunnels, though other applications include archaeology
34
H. D. GRIFFITHS AND C. J. BAKER
and the detection of buried pipes and cables. Fundamental to such applications are the propagation characteristics of electromagnetic radiation through soil, and at the boundary between air and soil, and how these characteristics depend on frequency and on soil properties. In general it can be appreciated that a lower frequency may give lower propagation loss than a higher frequency, but will in general give poorer resolution, both in range and in azimuth. Daniels (2004) has provided a comprehensive account of the issues in ground penetrating radar (GPR) and examples of systems and results. He states that ‘. . . GPR relies for its operational effectiveness on successfully meeting the following requirements: (a) efficient coupling of electromagnetic radiation into the ground; (b) adequate penetration of the radiation through the ground having regard to target depth; (c) obtaining from buried objects or other dielectric discontinuities a sufficiently large scattered signal for detection at or above the ground surface; (d) an adequate bandwidth in the detected signal having regard to the desired resolution and noise levels.’ Daniels provides a table of losses for different types of material at 100 MHz and 1 GHz (Table I) and presents a taxonomy of system design options (Figure 3). The majority of systems use an impulse-type waveform and a sampling receiver, processing the received signal in the time domain. More recently, however, FMCW and stepped frequency modulation schemes have been developed, which require lower peak transmit powers. Both types of systems, though, require components (particularly antennas) with high fractional bandwidths, which are not necessarily straightforward to realise Figure 4. As an example of the results that can be achieved, Figure 5 shows images of a buried antipersonnel mine at a depth of 15 cm, showing both the original image and the results after image processing techniques have been used to enhance the target. c IET, 2004). TABLE I. Material loss at 100 MHz and 1 GHz (after Daniels (2004) Material Clay (moist) Loamy soil (moist) Sand (dry) Ice Fresh water Sea water Concrete (dry) Brick
Loss at 100 MHz (dB m−1 )
Loss at 1 GHz (dB m−1 )
5–300 1–60 0.01–2 0.1–5 0.1 100 0.5–2.5 0.3–2
50–3000 10–600 0.1–20 1–50 1 1000 5–25 3–20
RADAR IMAGING FOR COMBATTING TERRORISM
35
Figure 3. System design and processing options for ground penetrating radar (after Daniels c IET, 2004). (2004)
c IET, Figure 4. Physical layout of ground penetrating radar system (after Daniels (2004) 2004).
36
H. D. GRIFFITHS AND C. J. BAKER
Figure 5. Oblique antipersonnel mine at an angle of 30◦ : (a) B-scan of raw data; (b) after c IET, 2004). migration by deconvolution; (c) after Kirchhoff migration (after Daniels (2004)
3.2. THROUGH-WALL RADAR IMAGING
The ability to image targets through building walls, to detect and classify personnel and objects within rooms, is of significant importance in counterterrorism operations, and there has been a great deal of work on the subject in the last decade. Essentially similar considerations to those for GPR apply, in that a lower frequency may give lower propagation loss than a higher frequency, but will in general give poorer resolution, both in range and in azimuth. The final line of Table I shows the attenuation of brick at frequencies of 100 MHz and 1 GHz, but many internal building walls may be made of material of lower attenuation. Police officers, search and rescue workers, urban-warfare specialists and counteterrorism agents may often encounter situations where they need to detect, identify, locate and monitor building occupants remotely. An ultrahigh resolution through-wall imaging radar has the potential for supplying this key intelligence in a manner not possible with other forms of sensor. High resolution and the ability to detect fine Doppler such as heart beat movements enable a detailed picture of activity to be built up so that appropriate and
RADAR IMAGING FOR COMBATTING TERRORISM
37
informed decisions can be made as to how to tackle an incident. For example, in a hostage scenario the layout of a room can be established and it may be possible to differentiate hostage from terrorist. An escaping suspect hidden behind a bush or wall can be detected and tracked. Alternatively, in the aftermath of a terrorist event, people buried in rubble but alive can be detected on the basis of radar reflections from their chest cavity, thus helping to direct rescue workers to best effect. Key enabling technology and techniques are: r distributed aperture sensing for very high (∼λ or better) cross-range resolution in both azimuth and elevation dimensions; r ultra-wideband pulsed waveforms (to give wavelength equivalent range resolution); r novel near-field focus tomographic processing; r narrow band adjunct waveforms for fine Doppler discrimination. Whilst aspects of each of these has been reported individually (Aryanfar and Sarabandi, 2004; Song et al., 2005; Yang and Fathy, 2005), there has been little or no research combining them to generate wavelength-sized resolution in three dimensions enabling fine discrimination of internal objects (e.g. an individual holding a gun could be potentially identified and position determined against background furniture). Much of the technology necessary already exists and can be procured at low cost (of the order of a few thousand pounds, aided by the fact that transmitter powers need only be a fraction of a Watt). The key research challenges lie primarily in developing processing techniques and algorithms that allow a wavelength (<10 cm) resolution whilst avoiding unacceptable sidelobes and ambiguities and to do this over extended spatial areas. In addition the approach also includes use of a Doppler waveform with very high resolution. To achieve all this requires a combination of distributed aperture near-field processing coupled with reflection based tomography. Lastly, information extraction from the resulting imagery remains a major challenge in radar; however, this should be considerably aided by the very high resolutions to be employed. There is a great deal of interest in this area as evidenced by the DARPA Visibuilding program. This is investigating a wide variety of sensor and processing techniques with the aim of providing a comprehensive internal picture within urban and sub-urban environments (http://www.darpa.mil/baa/baa06-04.html). At the heart of any solution is likely to be through wall technology of the type indicated above. 3.3. TOMOGRAPHY AND DETECTION OF CONCEALED WEAPONS
The techniques of tomography have been developed originally for medical imaging, to provide 2D cross-sectional images of a 3D object from a set of narrow X-ray views of an object over the full 360◦ of direction. The results of
38
H. D. GRIFFITHS AND C. J. BAKER
the received signals measured from various angles are then integrated to form the image, by means of the projection slice theorem. The radon transform is an equation derived from this theorem which is used by various techniques to generate tomographic images. Two examples of these techniques are filtered backprojection (FBP) and time domain correlation (TDC). Further descriptions of these techniques may be found in Soumekh (1999). In radar tomography the observation of an object from a single radar location can be mapped into Fourier space. Coherently integrating the mappings from multiple viewing angles enables a three dimensional projection in Fourier space. This enables a three dimensional image of an object to be constructed using conventional tomography techniques such as wavefront reconstruction theory and backprojection where the imaging parameters are determined by the occupancy in Fourier space. Complications can arise when target surfaces are hidden or masked at any stage in the detection process. This shows that intervisibility characteristics of the target scattering function are partly responsible for determining the imaging properties of moving target tomography. In other words, if a scatterer on an object is masked it cannot contribute to the imaging process and thus no resolution improvement is gained. However, if a higher number of viewing angles are employed then this can be minimised. Further complications may arise if (a) the point scatterer assumption used is unrealistic (as in the case of large scatterers introducing translational motion effects), (b) the small angle imaging assumption does not apply and (c) targets with unknown motions (such as non-uniform rotational motions) create cross-product terms that cannot be resolved. The tomographic reconstruction (TR) algorithm applies the projectionslice theorem of the Fourier transform to compute the image. The projectionslice theorem states that the 1D Fourier transform of the projection of a 2D function, g(x, y) made at an angle w is equal to a slice of the 2D Fourier transform of the function at an angle w (see Figure 6). Whereas some algorithms convert the outputs from many radars simultaneously into a reflectivity image using a 2D Fourier transform, TR generates an image by projecting the 1D Fourier transform of each radar projection individually back onto a 2D grid of image pixels. This operation gives rise to the term backprojection. The image can be reconstructed from the projections using the Radon transform. The equation below shows this ∞ ∞ g(x, y) = P( f ) · | f | · e j2π f (x cos(w)+y sin(w)) d f dw (7) 0
−∞
where w = projection angle P( f ) = the Fourier transform of the 1D projection p(t).
RADAR IMAGING FOR COMBATTING TERRORISM
39
Figure 6. Tomographic reconstruction: the projection slice theorem.
The filtered backprojection (FBP) method may be used to process by reconstructing the original image from its projections in two steps: filtering and backprojection. Filtering the projection: The first step of FB preconstruction is to perform the frequency integration (the inner integration) of the above equation. This entails filtering each of the projections using a filter with frequency response of magnitude | f |. The filtering operation may be implemented by ascertaining the filter impulse response required, then performing convolution or a FFT/IFFT combination to correlate p(t) against the impulse response. Backprojection: The second step of FB preconstruction is to perform the angle integration (the outer integration) of the above equation. This projects the 1D filtered projection p(t) onto the 2D image by following these steps: Next place a pixel-by-pixel rectangular grid over the XY plane. Then place the 1D filtered projection p(t) in position at angle w. For each pixel: Get position of the sample needed from the projection angle and pixel position. Interpolate the filtered projection to obtain the sample. Add this backprojection value multiplied by the angle spacing. Repeat the whole process for each successive projection. 3.3.1. Tomography of Moving Targets A development of these concepts has been the idea of imaging of moving targets using measurements from a series of multistatic CW or quasi-CW transmissions, giving rise to the term ‘ultra-narrow band’ (UNB) radar. This may be attractive in situations of spectral congestion, in which the bandwidth necessary to achieve high resolution by conventional means (Equation (1)) may not be available. Narrow band CW radar is also attractive as peak powers
40
H. D. GRIFFITHS AND C. J. BAKER
Figure 7. Relationship between bistatic sensor geometry and representation in Fourier space (after Bonneau et al. (2002)).
are reduced to a minimum, sidelobes are easier to control, noise is reduced and transmitters are generally low cost. Applications may range from surveillance of a wide region to the detection of aircraft targets, to the detection of concealed weapons carried by moving persons. In general the target trajectory projection back to a given radar location will determine resolution. A random trajectory of constant velocity will typically generate differing resolutions in the three separate dimensions. However, even if there is no resolution improvement there will be an integration gain due to the time series of radar observations. A Hamming window or similar may be required to reduce any cross-range sidelobe distortions. The treatment which follows is taken from that of Bonneau et al. (2002). Figure 7 shows the relationship between the bistatic sensor geometry and the representation in Fourier space. The bistatic angle is B and the bistatic bisector is the vector uB . The corresponding vector F in Fourier space is given by B 4π f cos · uB . (8) F= c 2 Figure 8 shows the equivalent relationship for a monostatic geometry. The resolutions are inversely proportional to the sampled extents u and v in
Figure 8. Fourier space sampling and scene resolution for a monostatic SAR (after Bonneau et al. (2002)).
RADAR IMAGING FOR COMBATTING TERRORISM
41
Figure 9. Fourier space sampling and scene resolution for four examples: (i) stationary tx/rx, wideband waveform; (ii) stationary tx, moving rx, CW waveform; (iii) stationary tx, moving rx, wideband waveform; (iv) monostatic tx/rx, wideband waveform (after Bonneau et al. (2002)).
Fourier space, thus r =
2π u
r =
2π v
(9)
which should be compared to Equations (1), (2) and (5). In an UNB radar the finite bandwidth of the radar signal limits the range resolution. However, this resolution can be recovered by multistatic measurements over a range of angles. Figure 9 shows four examples, and the Fourier space sampling corresponding to each. Recent experimental work under controlled conditions in anechoic chambers (Wicks et al., 2005; Wicks, 2005) and using turntable-mounted targets (Coetzee et al., 2006) has demonstrated the validity of these theoretical approaches, and may be expected to pave the way for more practical experiments and systems. 3.4. PASSIVE mm-WAVE IMAGING
Although not strictly a radar technique, one form of imaging that has shown significant counterterrorism applications is passive mm-wave imaging. Here, an image is formed at mm-wave frequencies (typically 35 or 94 GHz) of the thermal energy radiated or reflected from a target scene, in a similar way to thermal infrared imaging. However, the angular resolution (Equation (2)) of an aperture of a given size is significantly poorer at mm-wave frequencies, so to achieve equivalent image resolution a significantly larger sensor aperture
42
H. D. GRIFFITHS AND C. J. BAKER
1.2m
Figure 10. (Left) MITRE passive mm-wave imager; (right upper) optical image from Malvern Hills; (right lower) equivalent passive mm-wave image (courtesy of QinetiQ).
is necessary. On the other hand, in contrast to thermal infrared imaging, the transmission properties of radiation at these frequencies are such that mmwave signals are not significantly attenuated by obscurants such as smoke or fog, or in particular by clothing. The image contrast is due to the difference between emissive sources which appear warm (∼290 K) and reflective (metallic) objects which reflect cold sky (∼80 K at W-band) and hence which appear cold. Figure 10 shows an example of an image obtained by a system of this kind, from the Malvern Hills in the west of the UK, on a summer day. Whilst the optical image is rather hazy, the mm-wave image is much clearer. The physical aperture of the imager is 1.2 m, giving an angular resolution at 94 GHz of ∼2.5 mrad, and the image is formed by mechanical scanning of the antenna, taking several seconds to build up the image. The receiver technology uses broad bandwidth low-noise amplifiers ahead of a detector; these achieve noise figures of a few decibels and several tens of decibels of gain, but the millimetrewave MMICs requires specialised fabrication facilities. More formally, we can write down an equation which relates the temperature sensitivity T of a radiometer to the system noise temperature Tsys , the RF bandwidth B and the integration time τ : Tsys T = √ . Bτ
(10)
The number of resolution cells (i.e. pixels) N is the product of the image width X and image height Y , divided by the resolution cell size δ 2 : N=
XY . δ2
(11)
RADAR IMAGING FOR COMBATTING TERRORISM
43
If the number of simultaneous beams (receivers) is n, then the image acquisition time T is given by: T =
N τ n
(12)
and so 2 N 1 Tsys T = · . n B T
(13)
This demonstrates the tradeoffs involved in a passive mm-wave imaging system, and shows that if high-resolution images are to be obtained at operationally useful frame rates (T 1 second), multiple receiver channels will be necessary. Several types of scanning have been evaluated: mechanical, electronic, optical and digital. Some elegant forms of mechanical scanning have been devised (Appleby et al., 1997). In the context of counterterrorism the technique has shown great promise in detection of concealed weapons. Figure 11 shows an image of a person with a concealed gun, and the corresponding optical image. When used indoors, some form of low-level noise illumination is needed. Figure 12 shows the result
Figure 11. (Left) passive mm-wave image of a person with a concealed weapon; (right) corresponding optical image (courtesy of QinetiQ, Malvern).
44
H. D. GRIFFITHS AND C. J. BAKER
Figure 12. Image fusion and detection (image courtesy Farran Technology 94 GHz scanning system and ERA technology).
of fusing an optical image (which would show the identity of the individual) with a passive mm-wave image. 3.5. PASSIVE BISTATIC RADAR
Although not strictly an imaging radar technique, there is considerable current interest in passive bistatic radar (PBR), making use of ‘illuminators of opportunity’ such as broadcast, communications or radionavigation signals, rather than dedicated radar transmissions. Such illuminators – particularly VHF FM radio and UHF television – are high power and have very wide coverage, although digital radio and television signals are more favourable in certain respects. Passive bistatic radar systems have several attractive features: (i) the radar system is passive, and hence undetectable (at least, by means of any radiated signal); (ii) reduced complexity and cost, since the transmit hardware already exists; (iii) it allows the use of frequency bands (VHF and UHF) which are not normally available for radar use; and (iv) the bistatic geometry offers a counterstealth capability against certain types of target. One type of threat for which passive bistatic radar systems may be particularly well suited is the defence of high-value assets such as government buildings, power stations or defence establishments against small airborne vehicles, such as low-flying light aircraft, gliders or unmanned air vehicles (AUVs), which might carry explosive, chemical or biological warheads. Examples of incidents involving such threats in recent years include the landing of a light
RADAR IMAGING FOR COMBATTING TERRORISM
45
Figure 13. Three examples of incidents involving low-signature aircraft threats: (i) light aircraft landing in Red Square, Moscow in 1987; (ii) Pirellone Building, Milan, 18 April 2002; (iii) White House, Washington DC, 12 May 2005.
aircraft in Red Square, Moscow, in 1987, the attack on the Pirellone Building in Milan, Italy, in May 2002, and the landing of a light aircraft close to the White House in Washington DC in May 2005 (Figure 13). Small air platforms of this kind will have much lower radar cross section than conventional aircraft, so although conventional surveillance radar systems may in principle give coverage of these areas, their ability to detect such low-signature targets may be somewhat limited. The properties of illumination sources for passive bistatic radar, and the detection performance against different targets, have been evaluated and discussed (Baker et al., 2005; Griffiths and Baker, 2005a,b). Detection and tracking of conventional commercial aircraft (whose RCS would be of the order of 20 dB m2 ) at ranges in excess of 100 km, using VHF FM radio transmissions, have been demonstrated, and these results can be extrapolated to predict the performance against low signature targets of the type shown in Figure 13. Such systems would therefore be usable as “gap fillers” for regions where the coverage of conventional surveillance sensors is inadequate or to protect particular high-value assets. Another application might be against low-signature maritime targets, such as might be used to attack high-value naval targets or for gun- or drug-running, although it is necessary to be sure that the illuminator coverage out to sea is adequate (Willis and Griffiths, in press).
4. Future Prospects We have shown that there is a great deal of valuable information that can be gleaned by using radar sensors. It is equally clear that alone they do not provide a trouble free route to reliable imaging, resulting in the detection of terrorist threats. Indeed as terrorists become more sophisticated they may well use jamming and deception techniques, a topic that we have not even considered
46
H. D. GRIFFITHS AND C. J. BAKER
here. Nevertheless, the promise shown by these radar based approaches, both individually and collectively, certainly makes it worth investing in their further development. Specifically, low cost high resolution is key where fine detail is required, such as for the identification of concealed weapons. Wide area sensors based on low cost, passive concepts and able to fill gaps left, for example, by current air defences need to be fully understood if optimum performance is to be achieved. The areas of maximising dynamic range and coping with transmitted waveforms of an unknown type will be drivers for immediate research. Overall it is clear that there are many differing challenges for a wide variety of sensor systems. However, the benefit in greatly improved capability provides more than sufficient motivation for overcoming them. Acknowledgements We gratefully acknowledge the assistance of Professor Mike Wicks and his colleagues (AFRL), Dr. Roger Appleby (QinetiQ) and David Daniels (ERA Technology) in the preparation of this paper, both for the provision of results and figures and for invaluable discussions. We also thank the organisations, including the UK Ministry of Defence, the US Air Force Office of Scientific Research, the UK Engineering and Physical Sciences Research Council, QinetiQ and its predecessors, BAE SYSTEMS, Thales Sensors and AMS, who have supported the various pieces of work whose results are reported here. References Appleby, R., Gleed, D., and Lettington, A. H. (1997) The Military Applications of Passive Millimetre-wave Imaging, Journal of Defence Science 2. Aryanfar, F. and Sarabandi, K. (2004) Through Wall Imaging at Microwave Frequencies Using Space-time Focusing, In IEEE International Antennas and Propagation Symposium, Vol. 3, pp. 3063–3066. Ausherman, D. A., Kozma, A., Walker, J. L., Jones, H. M., and Poggio, E. C. (1984) Developments in Radar Imaging, IEEE Transactions on Aerospace & Electronics Systems AES–20, 363–400. Baker, C. J., Griffiths, H. D., and Papoutsis, I. (2005) Passive Coherent Radar systems – Part II: Waveform Properties, In IEE (ed.), Special Issue of IEE Proceedings Radar, Sonar and Navigation on Passive Radar Systems, Vol. 152, pp. 160–168. Baker, C. J., Vespe, M., and Griffiths, H. D. (2006) Multi-perspective Imaging and Image Interpretation, Il Ciocco, Italy. Bell, M. (1993) Information Theory and Radar Waveform Design, IEEE Transactions on Information Theory 39, 1578–1597. Bonneau, R. J., Bascom, H. F., Clancy, J. T., and Wicks, M. C. (2002) Tomography of Moving Targets (TMT), Italy.
RADAR IMAGING FOR COMBATTING TERRORISM
47
Brenner, A. R. and Ender, J. H. G. (2004) Airborne SAR Imaging with Subdecimetre Resolution, In Proceedings on EUSAR 2004 Conference, Ulm, pp. 267–270. Brown, W. M. (1967) Synthetic Aperture Radar, IEEE Transactions on Aerospace & Electronic Systems AE 3–3, 217–229. Cantalloube, H. and Dubois-Fernandez, P. (2004) Airborne X-band SAR Imaging with 10 cm Resolution – Technical Challenge and Preliminary Results, In Proceedings on EUSAR 2004 Conference, Ulm, pp. 271–274. Carrara, W., Goodman, R. S., and Majewski, R. M. (1995) Spotlight Synthetic Aperture Radar: Signal Processing Algorithms, Boston, Artech House. Coetzee, S., Baker, C. J., and Griffiths, H. D. (2006) Narrow band high resolution radar imaging, In Proc. IEEE Radar Conference 2006, Verona, 24–27 April 2006, pp. 622–625. Daniels, D. J. (2004) Ground Penetrating Radar (2nd ed.), Bodmin, Cornwall, IEE Peter Perigrinus Publishing. Griffiths, H. D. and Baker, C. J. (2005a) ‘Fundamentals of Tomography and Radar’ ‘Bistatic and Multistatic Sensors for Homeland Security’, In NATO Advanced Study Institute Advances in Sensing with Security Applications, Il Ciocco, Italy. Griffiths, H. D. and Baker, C. J. (2005b) Passive Coherent Radar Systems – Part I: Performance Prediction, In IEE (ed.), Special Issue of IEE Proceedings of Radar, Sonar and Navigation on Passive Radar Systems, Vol. 152, pp. 153–159. Howland, P. E., Maksimiuk, D., and Reitsma, G. (2005) FM Radio Based Bistatic Radar, In IEE (ed.), Special Issue of IEE Proceedings of Radar, Sonar and Navigation on Passive Radar Systems June 2005, pp. 107–115. Knaell, K. K. and Cardillo, G. P. (1995) Radar Tomography for the Generation of Threedimensional Images, In IEE Proceedings of Radar, Sonar and Navigation on Passive Radar Systems, Vol. 142, pp. 54–60. Munson, D. C. J., O’Brien, J. D., and Jenkins, W. K. (1983) A Tomographic Formulation of Spotlight-mode Synthetic Aperture Radar, In Proceedings of IEEE, Vol. 71, pp. 917–925. Oliver, C. J. and Quegan, S. (1998) Understanding SAR Images, Boston, Artech House. Pasmurov, A. Y. and Zinoviev, Y. (2005) Radar Imaging and Tomography, Peter Peregrinus, Stevenage. Sakamoto, T. and Sato, T. (1999) A Target Shape Estimation Algorithm for Pulsed Radar Systems Based on Boundary Scattering Transform, IEICE Transactions on Communications, E78–B, 1357–1365. Soldovieri, F., Brancaccio, A., Leonore, G., and Pieri, R. (2005) Shape Reconstruction of Perfectly Conducting Objects by Multi View Eperimental Data, IEEE Transactions on Geoscience & Remote Sensing 43, 65–71. Song, L.-P., Yu, C., and Liu, Q. H. (2005) Through-wall Imaging (TWI) by Radar: 2-D Tomographic results and analyses, IEEE Transactions on Geoscience and Remote Sensing 43, 2793–2798 . Soumekh, M. (1999) Synthetic Aperture Radar Signal Processing with MatLab Algorithms, Boston, Artech House. Vespe, M., Baker, C. J., and Griffiths, H. D. (2005) Multi-perspective Target Classification, In Proceedings of RADAR 2005 Conference, IEEE Publ. No. 05CH37628, Washington DC, 9–12 May 2005, pp. 877–882. Walker, J. L. (1980) Range Doppler Imaging of Rotating Objects, IEEE Transactions on Aerospace & Electronics Systems 16, 23–52. Wehner, D. R. (1987) High Resolution Radar, Boston, Artech House. Wicks, M. (2005) Tomography, Waveform Diversity and Intelligent Systems, In AOC Fourth Multinational Conference on Passive and Covert Radar, Syracuse, 5–7 October 2005.
48
H. D. GRIFFITHS AND C. J. BAKER
Wicks, M. C., Himed, B., Bascom, H., and Clancy, J. (2005) Tomography of Moving Targets (TMT) for Security and Surveillance, In NATO Advanced Study Institute Advances in Sensing with Security Applications, Il Ciocco, Italy. Wiley, C. A. (1985) Synthetic Aperture Radars – A Paradigm for Technology Evolution, IEEE Transactions on Aerospace & Electronic Systems AES–21, 440–443. Willis, N. and Griffiths, H. D. (in press) (eds.), Advances in Bistatic Radar, to be published by SciTech. Yang, Y. and Fathy, A. E. (2005) See-through-wall Imaging Using Ultra wideband Short-pulse radar system, In IEEE International Antennas and Propagation Symposium, Vol. 3B, 3–8 July 2005, pp. 334–337.
OPTICAL SIGNATURES OF BURIED MINES
Charles A. Hibbitts (
[email protected]) Johns Hopkins University Applied Physics Lab, 11100 Johns Hopkins Rd., Laurel, Md. 20723
Abstract. Soils can be disturbed by the burial of mines. When disturbed, the soils have a smaller effective size distribution than when undisturbed and this difference can be detected optically. Apparent emissivity differences in the thermal infrared is an effective passive optical technique developed by others for discriminating disturbed silicate soils from undisturbed soils and appears particularly sensitive to the presence of grains less than about 60 μm in diameter. It is not, however, an effective means for detecting grain size variations in carbonate sands. Optical techniques that may prove less sensitive to composition include 1) Changes in the reflectance in the shortwave infrared outside of absorption features nominally near 2.4 μm, and 2) Reflectance changes due to photometric backscattering by particulate surfaces. The reflectance of particulate surfaces increases with decreasing phase angle, peaking sharply at opposition when the illumination and detector are collocated (0◦ phase angle). For a given high-albedo material, finer-grains are brighter than larger-grained powders when observed at opposition. Key words: Remote Sensing, Spectroscopy, Photometry, Landmines, IED, LATEX 1. Introduction Detecting and countering Improvised Explosive Devices (IEDs) is a central theme to US and other NATO countries operating in Iraq and Afghanistan. The US spent over USD3 billion in 2006 on countermeasures to IEDs. The threat is greater than the US first estimated, often with highly educated persons building the weapons with others delivering them. The costs of IEDs are low, but countermeasures can be both expensive and not fully effective. The US is pursuing many levels of attacks to counter IEDs, considering it a huge long lasting problem that will not lend itself to a single solution. Tactics range from direct detection and electronic countermeasures to going after the bomb makers, suppliers, and the bombers, as well as in some parts of the world, attempting to apply crisis prevention to provide people incentives 49 J. Byrnes (ed.), Imaging for Detection and Identification, 49–63. C 2007 Springer.
50
CHARLES A. HIBBITTS
to not resort to terrorism. IEDs are employed and detonated by a variety of means. Some of the most notorious are vehicle-born IEDs, but other common strategies are burying them in roads, hiding them alongside roads disguised as trash, or even hanging them from bridges. They can be detonated remotely by cell phones or command wire, as well as by suicide bombers. Burying IEDs on or alongside roads continues to be a common method of emplacement. The act of burying the mine will leave traces invisible to the eye that can be detected via optical remote sensing. The burial of mines will disturb soil so that it has a different grain size distribution and possibly packing density than surrounding undisturbed soils. The grain size of the disturbed soil will usually be smaller and the packing density lower. These differences will persist until the surface is reprocessed through the washing and sorting by rain and wind (Johnson et al., 1998). The rate of weathering the surface to reset the soil conditions would be dependent on the environment, with desert surfaces potentially maintaining differences for several months (Horton et al., 1998). However, in a wet environment such as a beach, the reset time can be virtually instantaneous and in a dusty environment such as next to a well-traveled unimproved surface it may take only days or hours to cover the ground with a masking layer of road dust and loess. While the quick reset times in the later examples may pose limitations for inferring the presence of buried IEDs, a technique sensitive to a change in dust abundance or distribution could also be used to detect other human activity related to IED emplacement. The ability to infer the presence of buried IEDs could be a rapid process covering a large area as a tool for an area reconnaissance. Possible IED sites could then be investigated with slower, more resource intensive, but definitive methods. 2. Infrared Spectroscopic Techniques for Detecting Disturbed Soils All surfaces reflect light and emit thermal energy. The intensity, distribution along the electromagnetic spectrum, and polarization can all be measured and each provides information on surface composition and structure. Our eyes are excellent detectors of relative intensity, but are poor at detecting the polarization of light and are sensitive to light only over a very narrow wavelength range where solar illumination is greatest from about 400 nm to 700 nm. Thus, our eyes provide excellent context information for object identification but limited information on the composition or micro-physical structure of objects. The brightness and spectral characteristics light detected from a particulate surface is fundamentally controlled by its ability to scatter, reflect, and absorb incident and previously scattered illumination as well as emit light. The absorption of light can change rapidly with wavelength for many materials while the scattering varies less quickly. For instance, both quartz and carbonate sands are bright and often white at visible wavelengths because they scatter
OPTICAL SIGNATURES OF BURIED MINES
51
sunlight without strongly absorbing at visible wavelengths. However, in the SWIR where both continue to appreciably scatter, the carbonate sands also possess absorption bands. These variations have a strong wavelength dependency because they occur at energies associated with the vibrations within the molecules and by lower-energy electronic transitions. Spectral variations in the short-wave infrared (SWIR) region of 1 −2.5μm have been exploited for decades for determining the composition of surfaces and individual minerals (e.g. Adams, 1975). However, not all minerals possess electronic transitions or bonds that vibrate at SWIR wavelengths. Those that do not, such as quartz and a few other common rock forming minerals are spectrally bland in the SWIR. These minerals, though, tend to be very spectrally active at longer wavelengths, most notably in the 8–12-μm atmospheric window (e.g. EstepBarnes, 1977; Salisbury, 1987). Commonly termed the thermal infrared (TIR) because thermally emitted radiation far outshines any reflected solar illumination, spectral variations at these wavelengths are controlled by vibrations in the structure of the molecule. Silicate minerals are readily discerned at these wavelengths. Because atomic substations for silicon and the presence of charge-compensating ions alter the crystal structure and thus the spectral characteristics in the TIR, absorption band positions are shapes can act as fingerprints for specific minerals. Additionally, the significant absorption of light in both the SWIR and TIR, compared with the visible, coupled with scattering can result in a myriad of effects on the spectral distribution of energy at those wavelengths. At visible and near infrared (NIR) wavelengths (0.4 to 1 μm) where minerals such as quartz and carbonates are not strongly absorbing, a wide range of particle sizes will be nearly equivalently bright because scattering will eventually lead to the light exiting the material without much energy being absorbed. At longer SWIR wavelengths (about 1 to 2.5 μm) materials are more, but often only slightly absorbing, and particulate materials remain bright. However, the absorption is also usually sufficient so that finer-grained particulates that scatter more light will appear brighter than larger-grained particles (Okin and Painter, 2004). At TIR wavelengths grain size difference manifest different, less intuitive effects because the illumination is both emitted and subsequently reflected and scattered by itself. Most materials are also highly absorbing at these wavelengths, and at some wavelengths for some materials, they are so absorbing that they become semi-metallic, opaque but reflective i.e. like metal at visible wavelengths. The reststrahlen features of silicate minerals at 8–10 μm are an example of this semi-metallic behavior. These bands are sensitive to variations in grain size as well as compositional variations (e.g. Salisbury and Wald, 1992; Johnson et al., 1998). The TIR spectra of finer-grained silicate minerals have shallower reststrahlen bands (become less reflective) and exhibit lower contrast than do larger-grains characteristic of undisturbed soils. The lower reflection becomes significant when grain sizes are sufficiently small
52
CHARLES A. HIBBITTS
so that reflection is less efficient and the particles can begin to scatter and absorb more of the infrared radiation (Johnson et al., 1998). Porosity effects and consequential photon trapping do not appear to be a significant factor for irregularly shaped grains such as quartz. However, it remains possible that the reststrahlen bands of plate-like mineral grains such as phyllosilicates would be dependent on porosity and packing. A tightly packed surface of flat grains will have fewer voids and possess grains oriented with the long axis horizontal to reflect light with fewer bounces. A porous surface of randomly oriented grains would, in contrast, tend to reflect light into itself so that a sufficiently porous surface will contain enough multiple reflections to erase spectral contrast (e.g. Conel, 1969). 2.1. LABORATORY SPECTRAL RESULTS AND DISCUSSION
The SWIR of quartz-rich sand (Figure 1a) and fine-grained debris of desert silicate rocks (Figure 2a) are largely bland. These spectra are of samples that
Figure 1. Directional hemispherical reflectance of ground and wet-sieved sand from San Onofre Beach, Ca. (a) The NIR and SWIR reflectance increases for the grain size fractions < 250 microns compared to larger size fractions. (b) The calculated emissivity of same sample assuming Kirchhoffis Law (emissivity = 1 − reflectance). The emissivity at the bottom of the reststrahlen bands decrease with decreasing grain size, especially for the smallest grain size fraction <63 microns.
OPTICAL SIGNATURES OF BURIED MINES
53
Figure 2. (a) Directional hemispherical reflectance and (b) emissivity of ground and wet-sieved mixture of fine-grained fragments of various siliceous rocks collected from desert pavements near Mecca Hills, Ca. Plots are of same style as Figure 1.
have been wet-sieved into various size fractions and subsequently oven-dried for several hours at about 100◦ C. Multiple scattering leads to increased reflectance of the finer size fractions. There is a small amount of clay present that causes the 2.2–2.3-μm bands. The large, broad absorption feature near 3 μm is due to the presence of the OH- cation vibration in the clay, perhaps another common alteration product of silicates such as goethite (FeOOH), or remnant adsorbed water. In Figure 1a, the increase in reflectance for the 125 to 250-μm size fraction beyond 3.5 μm, suggests that scattering begins to brighten quartz grains whose diameters are approximately less than 50 times the wavelength of light. This effect is not obvious in the spectra of the compositionally more diverse desert grus in Figure 2a and only the finest grain size fraction significantly brightens. At the short wavelengths for both samples, where the signal-to-noise of the measurements begin to degrade, the reflectances of all grain sizes increase and begin to converge as scattering dominates with less absorption. Differences in grain sizes are more apparent in the SWIR reflectance spectra of carbonate sand (Figure 3a). The spectral structure is largely due to the carbonate and possibly organic contaminants. As with silicate sands, the reflectances of all size fractions of carbonate sand
54
CHARLES A. HIBBITTS
Figure 3. (a) Directional hemispherical reflectance and (b) emissivity of ground and wetsieved carbonate sand. The increase reflectance with decreasing grain size is present over a wider range of particle sizes than for siliceous material. The thermal emission does not change progressively with grain size. The 11.3-μm feature is due to the carbonate. The structure between about 8.5 and 10.5 μm may be due to some silicates.
increase and merge at shorter wavelengths. The depths of weak absorption bands (such as at 2.3 and 2.5 μm) increase with increasing grain size whereas the reflectance in strongly absorbing regions (about 3.4 μm) only increases for the smallest size fraction when scattering begins to dominate over absorption. The thermal infrared spectra of the two silicate samples are consistent with the results of Johnson et al. (1998). The emission spectra of the beach sand exhibit the reststrahlen doublet split by the higher emissivity Christensen feature of quartz. The emission spectra of the ground desert rock debris in Figure 2b possess these features, though subdued and affected by other silicate features that broaden the long-wavelength reststrahlen band. This spectral contamination is consistent with the observed presence of dark mafic minerals in the desert sample. Specifically, the reststrahlen band depths of both samples decrease with decreasing grain size, becoming significantly shallower only for the smallest size fraction (Figures 1b and 2b). In contrast to the emission spectra of silicates, the emission spectra of carbonate sands do not change
OPTICAL SIGNATURES OF BURIED MINES
55
consistently with grain size (Figure 3b). The depth of the strong carbonate band near 11.3 μm does not change significantly although for the finest size fraction, it does shift position and change shape. Also, for the finest size fraction, other bands in the TIR appear stronger and shift position. Some of this effect could be due to the presence of silicates in the finest size fraction and the onset of volume scattering in the 11.3-μm reststrahlen band.
3. Backscattered Illumination Effects for Detecting Disturbed Soils Backscattering of light by particulate materials results in high reflectance at phase angles near 0◦ , and peaking at 0◦ . The increase in reflectance due to backscattering at very low phase angles offers an optical technique independent of spectral measurements for inferring grain size. Multiple scattering of light between grains induces a sharp increase in brightness below 1◦ phase angle due to the constructive interference of light between portions of wave fronts traveling in opposite directions along the same multiple scattering path (Nelson et al., 1998). This is sometimes referred to as the weak localization of light or coherent backscattering and results in an increase in reflected illumination at opposition that is significant, though less than the surge associated with single scattering. For instance, high-albedo 1 to 10 μm diameter dust manifests a surge at opposition of approximately 40 percent or greater increase in brightness than when viewed at 10◦ phase (Buratti et al., 1998). As the grain size increases or albedo decreases, the illumination surge at opposition decreases. However, as the albedo becomes exceedingly low so that single scattering dominates, the surge increases again (Shkurotov et al., 2002). Dark granular material has different backscattering characteristics dominated by single-scattering phenomena where grains are hidden by other grains and are not well illuminated by scattered light. The angular width of this geometric opposition surge peak is wider than for high-albedo materials and is greater, capable of brightening by a factor of up to two (Hapke, 1986). For a material to exhibit a peak in reflectance near 0◦ phase, its grain size needs to be several times larger than the wavelength of light (Shkuratov et al., 2002). The optimum phase angle is less than a few tenths of a degree, and the incident illumination needs to be collimated to a similar divergence. Attempts to model the photometric backscatter effect of particulate media have ranged from theoretically complex solutions of the scattering phenomenon (Mishchenko et al., 2000), to numerical solutions based upon ray tracing (Muinonen and Sarrinen, 2000), and a semi-quantitative analytical simplification of the scattering model (Shkurotov et al., 1999). For very dark materials, the optical constant which describes absorption, the imaginary portion of the complex index of refraction, k, is sufficiently high that the material
56
CHARLES A. HIBBITTS
is single-scattering. However, the absorption coefficient of most geological materials is sufficiently low in the visible and NIR that multiple scattering and coherent backscattering will dominate. 3.1. EXPERIMENTAL RESULTS AND DISCUSSION
The opposition surge of a wide selection of high-albedo materials is greater when fine-grained than when large-grained (Figure 4), consistent with observations and theory. Each data point in Figure 4 is a ratio of the brightness at near 0◦ phase to the brightness at a phase angle of at least 4◦ , and thus beyond the angle to which coherent backscattering is a factor. Figure 4 shows that high-albedo particles with diameters between 10 and 20 μm, found coating larger grains in disturbed soils (Johnson et al., 1998), would be expected to have an opposition surge of approximately 20 to 25% compared to a background material that would have a surge of no more than 5%. The solid line is a multi-order polynomial fit to each point, or center of mass of the region for the
Figure 4. Grain-size dependence of the photometric brightening of high-albedo particulate materials. The surge is a ratio of the sample brightness near 0◦ phase to that at approximately 5 or 10◦ phase. Finer-grained, high-albedo particulate materials have the largest nonlinear increase in illumination.
OPTICAL SIGNATURES OF BURIED MINES
57
unwashed sandstone. The dimensions of the ovals are defined by the 1-sigma deviation of those measurements. All ratios are derived from laboratory measurements conducted by others of high-albedo materials that are well-sorted, with the exception of the powdered sandstone that is red, lower-albedo, and unwashed. Thus, these grains will be coated with fine particulates and will have a smaller effective grainsize than plotted. That the surge of the unwashed powdered sandstone is lower than expected (even for its plotted grain size) is evidence that its lower albedo may have resulted in reduced coherent backscattering, and that its albedo is not sufficiently low to result in significant backscattering due to shadow hiding. The effect of albedo on backscattering properties demonstrates a need to independently separate albedo effects from grain size effects when using this technique to estimate grain size. Preliminary field measurements exploring the backscatter effect have been conducted to confirm that the laboratory results could be reproduced. Holes were dug and refilled in an unimproved road surface in the wet temperate coastal region of Washington Sate to simulate IEDs buried in other than desert soil. The road is of moderately low albedo material, darker than powdered quartz or aluminum oxide and is frequented on an hourly or daily basis, largely by pedestrian traffic. Thus, the site offers both an example of applying this technique to non-desert terrain and the ability to use this technique on soil that has not lain undisturbed for an exceedingly long period. Its low albedo will tend to reduce the opposition surge of both disturbed and lesser disturbed areas. The partially disturbed background will result in a smaller difference in the opposition surge between it and the disturbed soil. The measurement used passive solar illumination, which is collimated to about 0.5◦ . Several holes were dug and refilled in early morning and allowed to dry throughout a clear summer day. Upon return in the late afternoon, one remained untraversed and it was immediately apparent that macroscopic shadowing due to soil clumps dominated any granular photometric effect making the patch of disturbed soil obvious to the unaided eye. In order to present a realistic and difficult simulation of detecting the location of a buried device we subsequently disguised the refilled hole by smoothing its surface to eliminate shadows caused by the clumps of soil and then by sprinkling dirt gathered from the top surface of nearby lesser-disturbed terrain to match albedos. We completed the disguise by randomizing any patterns by lightly brushing the surface with leafy vegetation. The resulting disturbed patch of soil is indeed difficult to discern from the lesser disturbed surrounding terrain under usual viewing geometries (Figure 5a). However, when the same disturbed spot is imaged at low phase angle (with the sun behind the camera), it is readily apparent (Figure 5b). The illumination enhancement is not centered on the camera and thus is not glory or another effect only associated with low phase angle, but is specifically associated with the patch of disturbed soil. The lowest phase angle
58
CHARLES A. HIBBITTS
Figure 5. (a) Same disturbed region of ground seen at an angle of 50◦ from nadir and high phase angle to simulate a more typical viewing geometry. (b) Illumination surge near 0◦ phase angle associated with disturbed soil seen in contrast-stretched image. The shadow of the camera body is in the lower right of the patch of disturbed soil.
achieved is about 0.5◦ because the zero degree phase angle is blocked by the shadow of the camera. Comparison of the brightness of the disturbed patch with a patch of lesser-disturbed soil at an equivalent phase shows the surge is approximately 4%. This effect is detected at > 0◦ phase angle, using not completely collimated illumination reflected from a patch of soil disguised to the unaided eye in a background of soil that was not pristine. The opposition effect raised the reflectance of the soil significantly compared to observations at larger phase angles. The disturbed soil at 0.5◦ phase angle is 17% brighter than the lesser-disturbed road surface at a greater phase angle and outside the range of the opposition surge (Figure 6). As the phase angle of the disturbed soil exceeds 1.5◦ , its brightness decreases to nearly 6% greater than the background road surface. The existence of some backscattered illumination enhancement to approximately 1.5◦ is consistent with a low-albedo
OPTICAL SIGNATURES OF BURIED MINES
59
Figure 6. Ratio of the reflectance within pixels of the disturbed soil to pixels about 2◦ to the right in soil of the background road. In the first datum, pixels of disturbed soil at 0.5◦ are selected. They are about 17% brighter than the background road at a higher phase angle. The remaining 7 data points are ratios of the same portion of the disturbed soil but from observations at increasing phase angles. The same portion of the disturbed region that is at 0.5◦ phase in the first observation is near 1.6◦ in the second observation and so on. At the minimum phase angle of the disturbed soil in the second observation, the disturbed patch is only 6% brighter than the background road that is at a 2◦ higher phase angle.
soil (relative to quartz, etc). Backscattering in higher-albedo desert soils is expected to be stronger, especially if using better collimated illumination than sunlight and at a lower phase angle. The approximately 6% higher reflectivity of the disturbed patch compared to the background road surface demonstrates that disturbing moderate albedo soil will also brighten them, consistent with the previous section. 4. Discussion and Conclusions The three optical methods presented here for detecting disturbed soils to infer the presence of buried mines each have strengths and weaknesses. The existing TIR spectral method is well-proven in deserts and other terrains dominated by silicate lithology. However, that technique would not work well to detect mines buried in carbonate-dominated material. Instead, laboratory measurements suggest that increased reflectance in the SWIR associated with finer-grained material could be used to detect that a surface has been disturbed independent
60
CHARLES A. HIBBITTS
of lithology. For a silicate sand, the effectiveness of the technique appears to be comparable to the TIR spectral method and is more effective for a carbonate sand. However, only the 2.4 μm region appears suitable because the other portion of the SWIR spectrum is affected by absorption features which could be confused with grain-size induced reflectance differences. Additionally, while the 3 to 4-μm region is also devoid of absorption features, other than due to organics at about 3.2 to 3.5 μm, spectral contrast is expected to be low at those wavelengths due to the presence of some thermal emission, weaker solar illumination, and because the atmosphere is less transparent than at shorter wavelengths. However, the reliance on an albedo difference risks introducing error if there is a reflectance difference in the subsurface material that is now on the surface. Fieldwork needs to be conducted to test the viability of the SWIR technique. The third technique proposed, photometric backscattering, appears to offer a promising new means for characterizing the relative grain size of a surface. Laboratory, theoretical, and preliminary field measurements consistently suggest that photometric backscattering can be used to infer relative grain size. The effect in bright media such as desert soils should be 10s of percent at visible and NIR wavelengths and should work with a wide range of compositions. The effect will, however, strongly depend on the albedo. As with relying on SWIR reflectance to infer disturbance, the possibility for error exists from albedo changes that could be confused with or reduce the effect of backscattering. It may prove prudent to conduct backscattering measurements at SWIR wavelengths to also take advantage of the wavelength-dependent reflectance increase associated with finer-grained materials.
Acknowledgements Thanks to Dr. James Bauer for insightful discussion and initial suggestion to investigate the opposition surge effect. Thanks to Dr. Bonnie Buratti for the unpublished data of powdered sandstones. Assistance in the field by Amit Mushkin and discussions with Dr. Alan Gillespie also proved essential. Much of this initial was funded by a small portion of a federal contract to the Planetary Science Institute and subsequently by internal funds from the Johns Hopkins University Applied Physics Laboratory.
References Adams, J. B. (1975) Interpretation of Visible and Near-infrared Diffuse Reflectance Spectra of Pyroxenes and Other Rock-forming Minerals, In Infrared and Raman Spectroscopy of Lunar and Terrestrial Minerals, Academic Press, New York, pp. 94–116.
OPTICAL SIGNATURES OF BURIED MINES
61
Buratti, B. J., Hillier, J. K., and Wang M. (1996) The Lunar Opposition Surge: Observations by Clementine, Icarus, 124, p. 490. Conel, J. E. (1969) Infrared Emissivities of Silicates: Experimental Results and a Cloudy Atmosphere Model of Spectral Emission from Condensed Particulate Mediums, Journal of Geophysical Research, 74, 1614–1634. Estep-Barnes, P. A. (1977) Infrared Spectroscopy, in Physical Methods, In J. Zussman (eds.) Determinative Mineralogy, Academic, New York, pp. 529–603. Hapke, B. W. (1986) Bidirectional Reflectance Spectroscopy IV. The Extinction Coefficient and the Opposition Effect, Icarus 67, 264–280. Horton, K. A., Johnson, J. R., and Lucey P. G. (1998) Infrared Measurements of Pristine and Disturbed Soils 2. Environmnetal Effects and Field Data Reduction, Remote Sensing of Environment, 64, 47–52. Johnson, J. R., Lucey, P. G., Horton, K. A., and Winter, E. M. (1998) Infrared Measurements of Pristine and Disturbed Soils 1. Spectral Contrast Differences between Field and Laboratory Data, Remote Sensing of Environment, 64, p. 34. Mishchenko, M. I., Luck, J-M., and Nieuwenhuizen, T. M. (2000) Full Angular Profile of the Coherent Polarization Opposition Effect, Journal of the Optical Society of America, 17, p. 888. Muinonen, K. and Sarrinen, K. (2000) Ray Optics Approximation for Gaussian Random Cylinders, J. Quantitative Spectroscopy and Radiative Transfer, 64, p. 201. Nelson, R. M., Hapke, B.W., Smyth, W.D., and Horn, L. J. (1998) Phase Curves of Selected Particulate Materials: The Contribution of Coherent Backscattering to the Opposition Surge, it Icarus, 131, 223–230. Okin, G. S. and Painter, T. H. (2004) Effect of Grain Size on Remotely Sensed Spectral Reflectance of Sandy Desert Surfaces, Remote Sensing of Environment, 89, p. 272. Salisbury, J. (1987) Mid-Infrared (2.1–25 micron) Spectra of Minerals, USGS Open File Report 87-263. Salisbury, J. and Wald, A. (1992) The Role of Volume Scattering in Reducing Spectral Contrast of Reststrahlen Bands in Spectra of Powdered Minerals, Icarus, 96, 121–128. Shkurotov, Y. G., Kreslavasky, M. A. Ovcharenko, A. A. Stankevich, D. G. and Zubko E. S. (1999) Opposition Effect from Clementine Data and Mechanisms of Backscatter, Icarus, 141, p. 132.
This page intentionally blank
CHEMICAL IMAGES OF LIQUIDS L. Lvova1 , R. Paolesse1,3 , C. Di Natale2,3 , E. Martinelli2 , E. Mazzone2 , A. Orsini2 , and A. D Amico2,4 1 Department of Chemical Science and Technologies University of Rome “Tor Vergata”, via della Ricerca Scientifica,1 00133 Rome, Italy 2 Department of Electronic Engineering University of Rome “Tor Vergata”, Via del Politecnico, 1 00133 Rome, Italy 3 CNR-IMM Rome, via del Fosso del Cavaliere, 100 00133 Rome, Italy 4 CNR-IDAC Rome, via del Fosso del Cavaliere, 100 00133 Rome, Italy
Abstract. Following the success of electronic noses in a variety of applications related to many areas such as industrial, medical, environmental, spatial, etc. where the objective was to construct chemical images of volatile compounds including odors, here we introduce another system able to perform chemical images of liquids of different origin, quality, and composition. In line with mammalian senses such as olfaction or taste, recently a strategy which applies the sensors not specific in character for a single analyte but rather broadly sensitive with the only necessary characteristic and prerequisite to be different from each other in terms of their responses, has been developed. The sensors are grouped in a matrix and supported by a suitable pattern recognition analysis in order to generate a global response of the overall array. For liquid phase analysis such a system is called electronic tongue. This chapter describes in some detail the operating principle of the sensors forming the electronic tongue multisensor system, the matrix strategy, the most suitable data analysis adopted for the purpose, and a number of chemical images of different liquids including potable and waste waters, soft drinks, alcoholic beverages, clinical samples and some biological objects, taking into account the different components’ concentration distributions present in the tested samples. Key words: electrochemical sensors, electronic tongue, pattern recognition
1. Introduction An application of sensor arrays for analysis of liquid samples started in the middle eighties, when Otto and Thomas (1985) applied the principle of simultaneous measurement with several plasticized polymeric membrane potentiometric sensors to a detection of Mg2+ -ions concentration in multicomponent 63 J. Byrnes (ed.), Imaging for Detection and Identification, 63–95. C 2007 Springer.
64
L. LVOVA ET AL.
solutions of composition similar to biological liquids (blood plasma and urine). An accurate detection of magnesium ion concentration in biological liquids is important especially due to an absence of highly selective Mg2+ -electrodes which may permit a direct measurement of Mg2+ on the high background of calcium. After this pioneering work many applications of multisensor arrays for liquid phase analysis have been reported both for qualitative discrimination and quantitative determination of various components and several comprehensive reviews were published (Albert et al., 2000; Legin et al., 2003). Multisensor arrays composed of various types of electrochemical sensors and using different detection principles were devised. Thus, an array based on sensors with polymeric lipid membranes was reported by Toko and colleagues (Hayashi et al., 1990) from Kyushu University, Japan. Such a system was named a “taste sensor”, since it showed an ability to distinguish five basic tastes: sweet, bitter, salty, sour and umami. Later a “taste sensor” with global selectivity similar to a human perception principle (Toko, 1996) and a commercial version of a ‘taste sensor’ (?) were reported. A multisensor array based on potentiometric chalcogenide-glass chemical sensors with membranes doped with various metals has been reported by Vlasov and Legin (1998) from St. Petersburg University, Russia. As a result of joint work of the same authors with the Chemical Sensors Group of ‘Tor Vergata’ University, Rome, Italy, the term “Electronic tongue” was suggested for a multisensor system composed of an array of chemical sensors and an appropriate data processing tool (Legin et al., 2000). This term was chosen based on the similarity of the ‘electronic tongue’ working principle and the human gustatory system. Later the chalcogenide-glass electronic tongue was supplemented by various solvent polymeric membrane electrodes, the suitability of which for electronic tongue applications was specially studied (Legin et al., 1999a, 2002). A potentiometric electronic tongue based on electrochemically polymerized porphyrin- and metalloporphyrins’ polymeric films was reported in Paolesse et al. (2003). Other types of transducers like ISFET, LAPS (light addressable potentiometric sensors) were recently reported for multisensor arrays applications (Yoshinobu et al., 2005). An artificial tongue composed of four sensors produced as ultra thin Langmuir–Blodgett films of conducting polymers and their mixture and utilizing AC measurements (impedance spectroscopy) was suggested in Riul et al. (2003). Arrays of ISEs were also reported for qualitative discrimination of several liquid foodstuffs (Ciosek et al., 2006). The electronic tongue developed at Linkoping University (Winquist andLundstrm, 1997), consisting of an array of different inert metal electrodes and based on pulsed voltammetry, has been used for analysis of liquid samples. An application of potentiometric metallic sensor, so-called first kind electrodes, in electronic tongue systems were recently reported in MartinezMez et al. (2005b), while a thick film technology application for metallic
CHEMICAL IMAGES OF LIQUIDS
65
sensor array development was reported by Martinez–Manez et al. (in press a). Hybride electronic tongues based on various types of electrochemical sensors and various detection methods (Winquist et al., 2000; Arrieta et al., 2003), using a combination of electronic tongues and electronic noses for liquid phase applications, were also reported (Winquist et al., 1999; Buratti et al., 2004). It was found that a fusion of several techniques may significantly improve the analysis results, while the comparison of various methods used in electronic tongues may exhibit the weak and strong sides of such methods, yielding further improvements. An electronic tongue system based on an array of electrochemical sensors may not provide a taste of analyzed samples as does the human tongue, but “chemical images” of target samples received as a result of chemical interaction between samples and sensors may be correlated either with a taste or with other particular features. Such “chemical images” may be used for different analytical purposes, either for discrimination and classification tasks, or for quantitative analysis. That is why multisensor systems, and electronic tongues in particular, have attracted much interest in the last two decades. In this chapter we focus on new results and achievements in the area of electrochemical multisensor array applications for liquid phase analysis.
2. Liquid Phase Analytical Methods Common methods of liquid analysis, especially for complex multi-component samples, are well established and highly sensitive gas/liquid chromatography (GLC) coupled with specific detection methods like mass-spectrometry (MS) or visible (or ultraviolet) spectroscopic detection, atomic absorption and atomic emission spectroscopy (AAS and AES), colorimetry; elemental analysis (Rouessac and Rouessac, 2002). Classical titrimetric and precipitation gravimetry are often used for water and wastewater analysis, while an application of immunoassays or other biochemical tests is common for analysis of biological liquids and foodstuffs (Harvey, 2000). Application of electrochemical methods dealing with analysis of complex liquid media by means of electrochemical sensors (electrodes) utilization is a big part of modern analytical chemistry (Wang, 2006). Electrochemistry deals with the study of chemical reactions that produce electrical effects (oxidation and reduction of particles presenting in solution), and of the chemical phenomena that are caused by the action of currents and voltages. In contrast to the many analytical measurements that involve homogeneous bulk solutions, electrochemical processes happen on electrode-solution surface. The application principle of electrochemical sensor is a transformation of its’ electrical signal to a concentration of analyte based on known theoretical principles of electrode processes. The two main groups of electrochemical methods are dynamic and static
66
L. LVOVA ET AL.
methods (Badr and Fulkner, 2000). In potentiometry, which is a static method the change of electrode potential due to the interactions of electrode sensing membrane with charged analyte particles in solution is registered under open circuit static mode (zero-current conditions). Dynamic electrochemical methods deal mainly with metallic electrodes and observe the situation when the current flows and analyte concentration change as the result of electrontransfer red-ox reaction on electrode surface. Minimum two electrodes are required for electrochemical measurements. Dipped in electrolyte solution these electrodes constitute an electrochemical cell. An electrode which responds to a target analyte is an indicator (or working) electrode. In general, a potential of working electrode either in static or in dynamic electrochemical technique has some significant value only being measured against a well defined reference point, an electrode with a stable potential value, independent on analyzed solution composition. This electros is called a reference electrode. The first well known reference electrode, which is rarely used for routine applications now, but still important because it is used to establish standard-state oxidation-reduction potentials of many electroactive particles is a standard hydrogen electrode (SHE). The SHE is composed of porous Pt plate electrode immersed in a HCl solution in which the hydrogen ion activity is 1.00 and in which H2 gas is bubbled at a pressure of 1 atm, Figure 1. The standard potential of such electrode is established as equal to E 0 = 0.00V for all temperatures and determined by the reaction: H+ (aq) + e− → 12 H2 (g). In practice an application of SHE is difficult due to a complexity of preparation and handling. Other types of reference electrodes, which satisfy to a main demand of reference electrode (a high inertness to analyzed solution composition and stable potential) are used nowadays: saturated calomel (SCE) and silver/silver chloride (Ag/AgCl) electrodes, Figure 2. Both these
Figure 1. The SHE schematic diagram.
CHEMICAL IMAGES OF LIQUIDS
67
Figure 2. The schemes of SCE (left) and Ag/AgCl (right) reference electrodes. SCE constructed using an aqueous solution saturated with KCl has a potential of +244 mV at 25◦ C, while Ag/AgCl often prepared with 3.5 M KCl has a potential of +205 mV at 25◦ C.
electrodes are based on the red-ox couple between metal (Hg or Ag) and its insoluble salt (calomel Hg2 Cl2 or AgCl) and their potentials are determined by the concentration of Cl− ion used for their preparation. Hence such electrodes can be classified as potentiometric sensor of second kind (see below). 3. Theoretical Basis Behind Electrochemical Sensors Response 3.1. DYNAMIC ELECTROCHEMICAL METHODS
3.1.1. Coulometry Among the dynamic electrochemical methods coulometry is the most attractive since it doesn’t demand any calibration procedure. The method is based on the electrolysis of a solution, when the analyte is quantitatively oxidized or reduced at the working electrode or reacts quantitatively with a reagent generated at the working electrode. The total charge, Q, in coulombs, passed during an electrolysis is related to the absolute amount of analyte by Faraday’s law: Q = nF N
(1)
where n is the number of electrons transferred per mole of analyte, F is Faraday’s constant (96,487 C mol−1 ), and N represents the number of moles of analyte. A coulomb is also equivalent to an amper per second (A*s), thus,
68
L. LVOVA ET AL.
for a constant current i the charge is given as: Q = ite
(2)
where te is the electrolysis time. If current varies with time, as it does in controlled-potential coulometry, then the total charge is given by t=te Q= i(t) dt (3) t=0
3.1.2. Voltammetry The voltammetric method is based on determination of ion concentration in dilute solutions from current flow as a function of voltage when ion polarization (depletion of concentration caused by electrolysis) occurs around the electrode. In general, working electrode potential is used to derive an electrontransfer reaction: O + ne− ↔ R
(4)
where O and R are oxidized and reduced forms of redox couple, and resulting current is measured with a help of auxiliary (counter) electrode. For system in a thermodynamic equilibrium the potential of electrode can be expressed by the Nernst law, resulting from the expressions of free Gibbs energy (G) and electrochemical potential change: E = E0 +
2.3RT [O] ln nF [R]
(5)
where E 0 is a standard potential of redox reaction (1), R is the universal gas constant, T is the absolute temperature in Kelvins, n is a number of electrons transferred in reaction, F is Faraday constant and [O] and [R] are the concentrations of electroactive particles on electrode surface. The current resulting from the oxidation state of electroactive particles is a Faradic current, since it obeys Faradays law 1 to 3. Nevertheless, the total current flowing on electrode surface is the result of several processes and a sum of faradic currents for the analyte and blank solution as far as nonfaradic charging currents (Wang, 2006). Often a plenty of several steps are involved in electrochemical reaction, the simplest steps are: mass transfer of electroactive particles to the electrode surface, electron transfer along the electrode surface and the back transfer of resulting product back to bulk solution. The resulting current of reaction is limited by the slowest step. For muss transport-limited reactions (when the overall electrode reaction is only controlled by the rate at which electroactive particles reach the surface) the particles flux J to electrode is expressed by Nernst–Plank equation. The overall current i is proportional to the particles
CHEMICAL IMAGES OF LIQUIDS
69
i = nFAJ
(6)
flux (diffusion):
and according to the first Fiks’ law the rate of diffusion is directly proportional to the slope of concentration gradient: J (x, t) = −D
∂C(x, t) ∂x
(7)
where t is a time and x is a position of particle relatively to the electrode surface, D is a diffusion coefficient. By combination of (6) and (7) a general equation for current response limited by diffusion (diffusion current) may be expressed as follows: i = −nFAD
∂C(x, t) ∂x
(8)
The time and position dependence of diffusion flux (and hence, diffusion current) may be evolved then, using Fiks’ second law. The voltammetry resulting diagram (voltammorgam) represents the current signal (Y axis) versus variation of applied potential (X axis). Voltammogram may be used either for qualitative analysis: measure a half-wave potential and detect electroactive species which undergo electrode discharge at characteristic potentials, or quantitative detection of concentration of species in the test solution by means of diffusion current. In a latter case a calibration with standards and a preparation the linear standard curve of diffusion current versus concentration is first required. There are several various modifications of voltammetric technique, depending on form of applied potential (pulse voltammetry, amperometry which uses a constant applied potential, etc.) and measure procedure (for instance, stripping voltammetry). Several types of biosensors use voltammetry as detection method (Sanz Alaejos and Garcia Montelongo, 2004). 3.1.3. Conductimetry In this method the conductance of a solution is measured, using inert (third kind metallic electrodes), alternating current and electrical zero-circuit. By this no net current flow through electrochemical cell and no electrolysis happens. The concentration of ions in the solution is estimated from the conductance value. 3.2. POTENTIOMETRY
As noticed before, potentiometry is a static electro chemical method, in which the potential of an electrochemical cell is measured under static conditions (zero current). The electro motive force (EMF) of electrochemical cell, E,
70
L. LVOVA ET AL.
may be expressed as: E cell = E ind − E ref + E lj
(9)
where E ind is a potential of indicator electrode, E ref is a potential of reference electrode, E lj is a liquid junction potential which develops at the interface between two ionic solutions that differ in composition and for which the mobility of the ions differs. Liquid junction potential can be almost eliminated by using a salty bridge filled with electrolyte solution of salt formed with ions with very similar mobility (KCl for instance). The potential of indicator electrode is expressed by the Nernst law (5), thus, for discussed earlier SHE: E = E O + 0.05916 × log
[H+ ] [H2 ]
1 2
= 0.05916 × log[H+ ]
(10)
There are two main classes of indicator electrodes used in potentiometry: metallic electrodes, and ion-selective electrodes (ISEs). 3.2.1. Metallic Potentiometric Sensors Potentiometric metallic electrodes can be divided on three groups. First kind metallic electrodes are wires of active metal immersed in solution, contained the ions of this metal, thus, for instance, copper wire dipped in solution of copper chloride is a metallic electrode of first kind. The potential of such electrode is a function of concentration of metal in solution: E = 0.3419 +
0.05916 · log[Cu2+ ] 2
(11)
where E 0 = 0.3419 V is a standard potential of Cu2+ (aq) + 2e− ⇐ Cu(s) reaction determined versus SHE. Second kind metallic electrodes are the electrodes involving an Mn+/M redox couple and respond to the concentration of another species if that species is in equilibrium with Mn+ . SCE and silver/silver chloride electrodes discussed earlier are the second type potentiometric electrodes. Their potentials are a function of chloride in solution: E SCE = 0.268 − 0.05916 · log[Cl− ]E Ag/AgCl = 0.2223 − 0.05916 · log[Cl− ] (12) Inert metallic electrodes (namely Pt, Au, etc.) which simply serve as a source of, or a sink for, electrons in other red-ox reactions are the third kind potentiometric metallic electrodes, or red-ox electrodes. These electrode do not participate in electrochemical reaction.
CHEMICAL IMAGES OF LIQUIDS
71
Figure 3. Schematic presentation of ISE with plasticized polymeric membrane.
3.2.2. ISEs A big group of potentiometric sensors which have enhanced selectivity to the only one analyte in solution are ISEs. ISEs function by using a membrane that reacts selectively with a single primary ion. ISEs differ depending on a material of sensing membrane which can be made of glass, crystal, chalcogenideglass, polymeric. Schematic presentation of ISE with plasticized polymeric membrane is given on Figure 3. The properties of ISE membrane are dependent on its composition and the velocity of permeation to the various ions. When the ISE is placed in a solution containing the particular ion (to which electrode is reversible), a small number of ions (to which the membrane is selective) pass from the solution of higher concentration through the membrane and result in a membrane potential. One side of the membrane is in contact with an internal solution containing a fixed concentration of analyte, while the other side of the membrane is in contact with the sample. Current is carried through the membrane by the movement of either the analyte or an ion already present in the membrane’s matrix. The membrane potential is given by a Nernst-like equation: E mem = E Asym −
RT [A]int · ln nF [A]sam
(13)
where [A]sam and [A]int are the concentrations of analyte in the sample and the internal solution, respectively, and n is the analyte’s charge. Ideally, E mem should be zero when the concentrations of analyte on both sides of the membrane are equal. The term E asym , which is called an asymmetry potential, accounts for the fact that the membrane potential is usually not zero under these conditions. Membrane potentials result from a chemical interaction between the analyte and active sites on the membrane’s surface. Because the
72
L. LVOVA ET AL.
signal depends on a chemical process, most membranes are not selective toward a single analyte. Instead, the membrane potential is proportional to the concentration of all ions in the sample solution capable of interacting at the membrane’s active sites. The Nikolskiy equation includes the contribution of an interferer: nA 0.05916 E = E0 + · log [A] + KA,I [I] nI (14) nA where n A and n I are the charges of the analyte and interferer, and K A,I is a selectivity coefficient accounting for the relative response of the interferent. The selectivity coefficient is defined as: K A,I =
[A]E nA nI
(15)
[I]E
where [A]E and [I]E are the concentrations of analyte and interferer yielding identical cell potentials. The selectivity is a main characteristics of ISEs. Selectivity is an ability of sensor to respond only to primary ion in presence of other ions in analyte. By the other words, an ideal ISE will only respond to changers of primary ion. Selectivity of ISE is determined by the composition of sensor membrane. Membrane may be made of glass (composition of approximately 22Na2 O, 6 pcCaO, and 72 pcSiO2 for pH-sensitive glass electrode), crystalline (Ag2 S, Ag2 S/AgCl, LaF3 doped by Eu), polymeric. The polymeric membrane of ISE is based on some polymeric material (polyvinyl chloride, polyurethane, silicon rubber, etc.) dissolved in a water-immiscible solvent-plasticizer (the esters of organic acids). This mixture is doped by membrane active components (MAC), the active ionic or/and neutral sites which interact specifically to ISE primary ion. A plenty of MAC has been studied in the last four decades when the application of ISEs for different purposes was extremely important and useful in many areas (Bhlmann et al., 1998; Krusse-Jaress, 1988).
4. The Electronic Tongue Strategy In the traditional approach, the ideal chemical sensor should be highly selective, which means that the device is able to detect a single analyte in a complex mixture (Cattrall, 1997). In this case the response of the sensor can be directly correlated to the concentration of the target analyte, after a calibration procedure. The high selectivity of the chemical sensor can be achieved in two different strategies:
CHEMICAL IMAGES OF LIQUIDS
73
(a) chemical approach: synthesis of a molecular receptor designed to bind only the target analyte in the complex mixture, thanks to a strong interaction (but this can be detrimental of the device reversibility), or by multiple weak interaction operating by groups geometrically arranged in an optimized way (Albert et al., 2000). (b) physical approach: selection of the transduction mechanism, in which only the target analyte is able to induce the physical property variation then transformed in a readable signal by the chosen transducer (Albert et al., 2000). Both of these approaches can be useful for the development of a high selective sensor, although the progress of supramolecular chemistry concepts resulted in a great popularity of the first chemical approach. In this concept of chemical sensor, all the chemical species present in the analyzed environment other than the target analyte are considered as interfering species, which should be ignored as much as possible (ideally in a complete way) by the sensor, because their interaction induce an error in the sensor response, so reducing the reliability of the device. This kind of chemical sensors has been exploited in different fields of application, where it is important to know the exact concentration of a chemical species, such as for example tracking a metabolite in clinical analyses. However this high selectivity of chemical sensors in several cases is not possible or even not useful, mainly for two reasons: (a) it is difficult to reduce the interaction with interfering species and the design and preparation of the selective molecular receptor is in some cases a Tantalus effort; (b) the chemical environment to analyze is often particularly complex, with hundred or thousand of different compounds, each of them contributing to the characterization of the analyzed matrix. In this case is not possible to individuate a single compound able to characterize the whole chemical environment. A classical example is a food aroma: the complex chemical image cannot be described with a single pixel of the whole picture, because it induces a significant loss of the necessary information (Albert et al., 2000). Because of these difficulties a different approach has been proposed, taking inspiration from the working mechanism of the “chemical” mammalian senses and in particular from the olfaction. In these senses Nature does not use selective receptors, as it is the case of enzymes; receptors are not selective towards single chemical species, but they can interact with a large family of chemicals, although with different intensity, and it is the brain that it in charge to extract the information derived from the senses, classifying and discriminating among the different odors or tastes. When applied to sensor field, these concepts led
74
L. LVOVA ET AL.
to a different idea of chemical sensors, where it is no more necessary the high selectivity of the device, but it should be able to interact with almost all the analytes present in the environment, although with different sensitivities. In this case interfering species cooperate to the sensor responses, giving important chemical information and being no more considered in a simple, negative way. In this case the sensor response looses the direct relationship with the concentration of a single chemical species, because of the action of interfering molecules. As a result, the sensor cannot be used stand-alone, because of this response ambiguity, but it should be integrated in a sensor array. Each sensor contributes in giving the multidimensional response of the array, which can be considered the chemical image of the analyzed sample. Data analysis techniques are in charge to extract and decode the chemical information contained, as the definition of a single color in a multicolored picture. It is worth mentioning that in this approach the rational design of the molecular receptor is not abandoned, because it is necessary that the sensor responses in the array should not be correlated each other and the selectivity towards a molecule, or class of molecules cannot completely lost. The sensor array should be cross-reactive, or cross-selective, but they should be different to preserve the content of information present in the chemical image of the analyzed environment (Lavigne and Anslyn, 2001). This approach is now quite popular for the characterization of gaseous environment and several examples of electronic noses have been reported in the literature and are now available in the market (Pearce et al., 2002), while quite surprisingly similar devices operating in the liquid phase are far less studied. The attempt to mimic the natural taste sense has been the starting point of the “electronic tongue” application. This concept has been later widened to the development of devices able to give multicolored chemical images of different liquid environments and not just the discrimination of the basic tastes. Thus, Electronic tongues devices have been exploited in different field of application, as detailed in the next sections of this review.
5. Experimental Routine Since the target of electronic tongues application is to provide an alternative to common analytical methods (which are often enough complex) therefore the strategy of experimental approach is to reach the simplest possible experimental routine. As a consequence, no probe pretreatment such as any dilution, pre-concentration, addition of special co-reagents and sample exposure is normally used. In some specific cases a minimal pretreatment like, for instance, a sample buffering is needed, but in a majority of cases sample analyzed as it is. A simple rinsing of sensors working surface with distilled or tap water is needed for e-tongue sensors cleaning. Sometimes a short exposition
CHEMICAL IMAGES OF LIQUIDS
75
(about 3 to 5 min) of sensors in conditioning solution which composition is determined by application task, or peculiarly analyzed sample is required in order to certain a sensor stable response before a signal acquisition. In comparison to the other types of sensors (glass or solvent polymeric membrane sensors) the problem of metallic sensors’ surface contamination due to possible chemo- or physical sorption, oxidation or complexes formation causes a serious sensor response drift and, may cause a serious problems in multisensor system functionality. Hence a special attention must be paid to the metallic sensors drift diminishing. One possibility to solve a problem is an application of mathematical transform for drift adjustment, while the other possibility is to apply either electrochemical cleaning (Holmin et al., 2004) or mechanical buffing, which can be performed either automatically (Olsson et al., 2005). Depending on application and mainly on properties of analyzed liquid sample, an initial procedure of sensor array composition evaluation is required. There is no well defined method for this, but one of the most often used strategies is to evolve the electrochemical response of discrete sensors in individual solutions of various compounds known as components of target analyzed sample. Then those sensors, which shoved the highest response as far as better reproducibility towards one or several analytes are incorporated in electronic tongue array. Basing on the same strategy an approach providing a set of three empirical ‘cross-sensitivity’ parameters: average slope S, formal sensitivity (signal-to-noise ratio) factor K and non-selectivity factor F was reported in Vlasov et al. (1997). The higher all tree cross-sensitivity parameters, the more favorable the application of sensor in electronic tongue array. It is important to notice here, that utility of ISEs in multisensor arrays application is evident (Nam et al., 2005), but restricted by the existence of highly selective sensors towards many analytes under interest. The reproducibility of sensors in array should be checked time-to-time in specific calibration solutions to ensure sensors stable response with minimal drift. Multisensor array may be applied either for a qualitative identification (imaging) of analyzed samples, or for quantitative determination of several compounds in target probe. In the latter case a procedure of multicomponent calibration of sensor array in multicomponent solutions mimicking a composition of a target analyzed sample is demanded (Martens and Naes, 1989). During the calibration process the correlation between the matrix of sensor array responses and known content of calibration solution components is evolved. The content of all analytes under interest is varied in calibration solutions in a concentration range they appear in target probe. Obviously, the multivariate calibration is a most time consuming step of multisensor system application, nevertheless, being once calibrated with set of calibration solutions with components of known content such system can be used repeatedly for determination of these components. Moreover, several automated modifications of electronic tongue systems (including sensors calibration step), have been reported (Grl et al., 2005; Duran et al., 2005).
76
L. LVOVA ET AL.
Miniaturized electronic tongue systems are required for applications where only a small amount of sample was available for measurement (Lvova et al., 2002). Various substrates may be utilized for fabrication of microelectrodes for e-tongue systems: silicon, alumina, or plastic materials. Working electrodes are fabricated on substrate surface by multi-step procedures like silicon etching, screen-printing or thick-film technologies. After fabrication working microelectrodes (mainly metallic, metal-oxide or black carbon) may be either used as independent sensors in e-tongue system (Martinez-Mez et al., 2005a) or be covered with specific sensing material (membrane), mainly polymeric one. Electrochemical polymerization of monomer from aqueous or organic solution (in case of material ability to electropolymerisation) (Verrelli et al., 2005), photocurable polymerization (Moreno et al., 2005) or solvent casting of dissolved in volatile solvent polymeric membrane cocktail (Yoon et al., 2000) are the most often used methods of membrane deposition on working electrode surface. A possibility to function in FIA- (flow injection analysis) (Mortensen et al., 2000; Gallaro et al., 2004) mode is a favorable feature of multisensor systems especially for on-line processes monitoring. Moreover, together with sensor equilibrium potential and FIA peak height, the shape of the FIA curve “potential over time” can be used as additional parameter for multivariate calibration of multiple-sensor systems (Legin et al., 2005a).
6. Data Analysis The electronic tongue produces as measurements results a multidimensional data that can be analyzed with the multivariate data analysis techniques. In the following examples of electronic tongue application techniques like principal component analysis, SIMCA and partial least square are utilized. In this section a brief introduction to these data analysis method is given. 6.1. PRINCIPAL COMPONENT ANALYSIS (PCA)
PCA is an unsupervised technique used to describe a dataset often in the two- or tri-dimensional graphical representation carrying most of the data variance (Massart et al., 1988). The axes of the new representation space are given by a linear combination of the original axes and they are requested to be uncorrelated and orthogonal. The first axis, called first “principal component”, is chosen in the direction of the largest variance, the second axis, called second “principal component”, is taken along the orthogonal direction to the first PC with largest variance, and so on. This technique allows to visualize multi-dimensional datasets for preliminary data exploration and study of the intrinsic capability of the system to
CHEMICAL IMAGES OF LIQUIDS
77
discriminate the data in clusters (Duda et al., 2001). PCA is one of a wide set of linear transformations that may be described as: S =W×X
(16)
where X is the original data set, W is the transformation matrix, and S are the data in the representation space.The peculiarity of the Principal Component Analysis (PCA) is the faithful representation of the data set onto a sub-space of reduced dimensionality. The term faithful here means preserving the statistical properties of the original data set. Chemical sensors always exhibit a certain degree of correlation among them, and then their data set may be represented in a sub-space of reduced dimensionality. In order to preserve the statistical properties of a data set it is necessary to make assumption on the statistical distribution of the data. In PCA the Gaussian distribution of the data is assumed. Indeed, PCA is calculated considering only the second momentum of the probability distribution of the data (covariance matrix). Only for normally distributed data the covariance matrix (X T X ) describes completely the data once they are zero-centered. Geometrically, the covariance matrix describes a hyper-ellipsoid in the N dimensional space, and the PCA rotates the coordinates to represent this hyper-ellipsoid in canonical form. The calculation procedure of PCA is the following: let us consider a matrix X of data, let C = X T X be the covariance matrix of X. The ith principal component of X is X T λ(i), where λ(i) is the ith normalized eigenvector of C corresponding to the ith largest Eigen value. Given a matrix of data PCA provides three kinds of information: scores, loadings, and Eigen values of the covariance matrix. The scores are concerned with the measurements; they are defined as the coordinates of each vector measurement (a row of matrix X) in the principal components base. The loadings are concerned with the sensors. They measure the contribution of each sensor to the array. A high loading, for a sensor, means that the principal component is aligned along the sensor direction. The Eigen values are directly proportional to the variance explained by their correspondent eigenvector, so that looking at the relative Eigen values λ(i) it is possible to confine the representation to those components carrying most of the information. 6.2. SIMCA
The soft independent modeling of class analogy technique, simply called SIMCA, is a supervised technique that calculates the performance of one PCA for each of the classes under study, in order to investigate whether or not new sample belong to one of the classes. The class identification is realized through the calculation of the distance between the sample under study to each of the built PCA class models. This technique is based on the assumption
78
L. LVOVA ET AL.
that the data distribution is Gaussian and that the estimations error of the of the mean and variance of the distribution goes to zero increasing the cluster samples. 6.3. PARTIAL LEAST SQUARE (PLS)
Partial least square (PLS) regression is a data analysis technique that combines characteristics from principal component analysis and multiple regression. It is mostly useful when there is a need to predict a set of dependent variables from a large set of independent variables (predictors). Let us consider M observations described by K dependent variables which are stored in a MxK matrix denoted Y, and that the values of J predictors collected on these M observations are collected in the MxJ matrix X. The goal of PLS regression is to predict Y from X and to describe their common structure. When Y is a vector and X is full rank, this goal is accomplished using ordinary multiple regression (Geladi and Kowlaski, 1986). When the number of predictors is large compared to the number of observations, X can be singular and the regression approach is not feasible (i.e., because of multi-collinearity). Several approaches have been developed to resolve this problem. One approach is to eliminate some predictors (trying to avoid the multicollinearity). Another approach, called principal component regression, is to perform a principal component analysis (PCA) of the X matrix and then use the PCs of X as regressors on Y. The orthogonality of the principal components eliminates the multi-colinearity problem. But, the problem of choosing an optimum subset of predictors remains. As an example, the choice of the first PCs of X can not be relevant to find a correlation with the variables Y. Instead, PLS regression finds components from X that are relevant for Y. In particular, PLS regression searches for a set of components, called latent vectors, that achieves a simultaneous decomposition of X and Y with the restriction that these components explain as much as possible of the covariance between X and Y. Then PLS algorithm searches the subset of X that shows the maximum correlation with Y. The prediction error of the model is strongly dependent on the number of variables (features) and by the number of measurements. In the case of a small dataset, in order to have a realistic error estimation a cross validation technique is applied. Then, latent variables that minimize the prediction error in the validation phase are used to build a model for the further test phase (Wold, 1966). 7. Electronic Tongue Electronics Electronic tongue electronics The data acquisition device for measurements with an array of potentiometric electrodes has been designed on Department
CHEMICAL IMAGES OF LIQUIDS
79
of Electronic Engeneering of “Tor Vergata” university, Rome, Italy. The measured quantities are then expressed according to the classical relationship: V1,...,8 = VWE1,...,WE8 − VREF
(17)
where the voltage of each working electrode can be referred to the fixed potential of a reference electrode. The work was aimed at an overall methodological optimization, and the result is a system with the following main hardware characteristics: r multi-channel input: eight working electrodes (WE), one reference electrode (REF); r high input impedance; r low noise; r low drift; r simple connection: RS232 to PC; r small dimensions; r low weight. The device is also equipped with control software, whose simple interface provides the user with many functionalities: r r r r r
a simple parameter editor for measurement set-up; stacked and overlaid plots for the sensors array; graphical utilities; text format log files, easily importable in data processing tools.
The hardware interface is suited for an eight-elements potentiometric sensors array, as shown in Figure 4, it includes: r an analog section, which provides adequate buffering for the electrodes and multiplexing to the A/D converter; r a digital section, based on a RISC microcontroller which manages the correct measurement flow: multiplexer’s port addressing, A/D control, data storage and communication with the control software. The device is also provided with an internal Power Supply module (not shown in the picture). Each input channel ensures an input impedance >1015 /0.2(/pF), with an input bias current as low as I B = 3fA. The core of the Electronic Tongue is an ad-hoc programmed 8-bit RISC microcontroller (AT90LS8535), which is responsible for managing the measurement flow: r control of the multiplexer’s line switching; r control of the A/D’s functionalities;
80
L. LVOVA ET AL.
Figure 4. Simplified schematic of the Electronic Tongue electronics.
r communication with PC using an RS232 line. Data A/D conversion is performed using a 12 bit (plus polarity) converter, whose functionalities are managed by the controller. An external low-noise, precision voltage reference with extremely low temperature coefficient (0.5 ppm/C) has been used. 8. “Chemical Images of Liquids”—Selected Applications A brief review of selected applications of electrochemical multisensor arrays is presented in this section and several examples of Electronic tongue applications are discussed in details. Evaluation of the ability to distinguish compounds responsible for basic tastes and by detection such compounds in analyzed sample to evolve the sample “taste” were performed in several early works of Toko and coworkers (Iiyama et al., 1995, 2003) and other authors (Habara et al., 2004; Riul et al., 2002). Besides a plenty of works related to the qualitative discrimination of waters, this target is still attract many researchers, since a monitoring of water quality is an important task for food safety and environmental control (Natale et al., 1997; Rudnitskaya et al., 2001). Common methods permit a separate determination of several standard water parameters like pH, COD (chemical oxygen demand), BOD (biological
CHEMICAL IMAGES OF LIQUIDS
81
oxygen demand), water hardness expressed in Ca2+ and Mg2+ content, conductivity, turbidity, content of several inorganic anions, etc. Nevertheless some these methods are enough complicated (for instance, COD and BOD detection demand sample pre-exposition and harmful dichromate oxidation). Often an integral oreview is preferable for classification of water samples, especially in ecological monitoring tasks for alarm-like purposes (Krantz-Rlcker et al., 2001). An application of multisensor systems can provide such type of information. “Images” of water samples provided by multisensor systems may be classified according to a membership to defined class (like pure standard water or polluted water). Moreover, several quantitative parameters can be evolved. An interesting application of potentiometric sensor array based on thick film technology and composed of RuO2 , C, Ag, Ni, Cu, Au, Pt and Al micro-electrodes printed on to alumina substrate for analysis of six Spanish natural mineral waters was reported in Martinez-Mez et al. (2005c). Basing on possibility to extract chemical information from spontaneous polarization processes on the electrode surface, authors could perform not only an identifi2− 2+ cation of mineral water, but also quantitative analysis of HCO− 3 , SO4 , Ca 2+ and Mg ions content. The calibration PLS models were built on the base of known content of mentioned ions and potential readings of 12-element sensor array. Received correlation coefficients were 0.954, 0.913, 0.980 and 0.902 for bicarbonate, sulphate, calcium and magnesium ions correspondingly. Quantitative determinations of copper, zinc, lead and cadmium activity at low ppb and ppt levels in mixed artificial solution mimicking marine water (with salinity about 30 pc) was reported in Legin et al. (2004b). Average relative error in prediction was 20 to 30 pc, which is encouraging result for ultra low activity. Wine production is a one of the biggest branches of modern food industry and a control of wine production, aging and storage processes as far as taste properties is demanded. Many applications of Electronic tongue systems were reported for recognition of red wines from different vineyards (Legin et al., 1999b, 2001). Results of wine analysis were correlated with taste panel and the standard chemical analysis data. At the same time a few applications were reported for white wine analysis. Either if white wine has a shorted life time (3 to 5 years in comparison to 5 to 20 years for red wines) it occupies a considerable part of the wine market. In Lvova et al. (2004) a potentiometric electronic tongue with PVC plasticized membrane sensors doped with several porphyrins (H2 TPP, Co(II, III) and Pt(II, IV) porphyrinates) deposited on glassy carbon working electrodes were evaluated for analysis of Italian ‘Verdicchio’ dry white wines. Porphyrin-based “electronic tongue” system discriminated between artificial and real wines and simultaneously distinguished wine samples from different cantinas and production year, Figure 5. The more old wines have a negative scores alone PC3 component, while a clear discrimination between real artificial wine is directed alone PC1.
82
L. LVOVA ET AL.
Figure 5. ‘Verdicchio’ wines identification.
Several attempts of qualitative ethanol determination with artificial taste systems have been undertake. Thus, in Arikawa et al. (1995) along with the discrimination of four kinds of sake, the concentration of ethanol in analytes was determined by means of taste sensor. Electronic tongue array comprised of eight potentiometric solid state sensors with polymeric plastisized membranes based on metallo-porphyrins was applied for “chemical imaging” of alcoholic beverages made of two different source materials grape and barley and quantitative determination of alcoholic degree of these beverages (Lvova et al., in press b). Prior to real samples analysis, multivariate calibration of Electronic tongue was performed using 36 calibration solutions containing four alcohols in various concentrations. Model was trained with PLS and good correlation of real and predicted ethanol content (R = 0.952) has been received. The alcoholic strength of real alcoholic beverages was evaluated according to the evolved one dimensional scale of ‘alcoholic degree’ of beverages in ethanol concentration, Figure 6. Except outlaying samples of beer “Ceres” and Ballantine’s whiskey, a good correlation of predicted using electronic tongue alcohol content and labeled by manufacturers was found. The possibility of porphyrin-based electronic tongue system to distinguish alcoholic beverages made of different source material: grapes and barley was also shown, Figure 7. Two areas, corresponding to the alcoholic drinks made of
83
CHEMICAL IMAGES OF LIQUIDS
Figure 6. Evaluation of beverages alcoholic degree using porphyrin-based electronic tongue.
100 gottod'oro bianc
50
whiskey 2 31 whiskey whiskey
PC2,21%
moretti 3 leffe 2 ceres peroni 2 3 peroni leffe33 moretti 2 ceres 2
0
-50
velletri bianco velletri bianco gottod'oro b gottod'oro bian velletri bianco gottod'oro rosso
I gottod'oro rosso gottod'oro rosso castelliromani castelliromani castelliromani r rr
II -100
grappa grappa 3 1 grappa 2
-150 -150
-100
-50
0
50
100
150
PC1,69% Figure 7. Imaging of grape and barley made alcoholic drinks by means of porphyrin-based electronic tongue system. All the analyzed beverages were available in local stores in Rome, Italy. Grape-made alcoholic drinks were as follows: two kinds of white dry wines (“Velletri superiore” by Le Contrade, Italy, 11.5 vol, 2001; “Marino” by Gotto D’Oro, Italy, 11 vol, 2002); two kinds of red dry wines (“Castelli Romani” by Le Contrade, Italy, 11 vol, 2002; “Castelli Romani” by Gotto D’Oro, Italy, 10.5 ± 1.5 vol, 2003) and grappa “Colli del Frascati” by “IlahCoral” s.r.l., Italy, 38 vol, 2003. All the wines were of DOC quality i.e. denominated to controlled origin. Beverages made of barley were: two lager beers (“Moretti” 4.6 vol and “Peroni” 4.7 vol both from Italian manufactures ); red beer “Ceres”, 6.5 vol, Denmark; amber strong beer “Leffe”, 8.2 vol, Belgium and Ballantine’s scotch whiskey by G.Ballantine and son LTD, Scotland, 40 vol, 2003.
84
L. LVOVA ET AL. 150
"Budweiser"** 100
"Cafri"* Light taste
PC2 (30%)
50
0
"OBLager"* can "Cass"* can
-50 "OBLager"*
"RedRock"* "Beck's" light
"Hite"* can
bottle
Mild taste
Rich taste
-100
"Beck's" dark -150
-100
-50
0
50
100
150
PC1(32%) Figure 8. A taste map (PCA score plot) of several beers available in Korea obtained from the all-solid-state electronic tongue chips.
grape and barley (areas I and II along the PC1 axis on Figure 6 correspondingly) may be evolved. Classification of spirits such as ethanol, vodka, eau-de-vie and cognac has been performed by means of electronic tongue system based on array of potentiometric sensors in Legin et al. (2005b). The system distinguished different samples of cognac from each other, and from any eau-de-vie, synthetic and alimentary ethanol and various sorts of spirit differing in ethanol quality, and presence of contaminants in vodka. Samples of different brands of beer produced in different countries were analyzed with lipid taste sensor Toko (2000). In Lvova et al. (2002) several beers were distinguished from each other by means of all-solid-state potentiometric Electronic tongue, correlation between beer taste characteristics and multisensor systems output has been found and a “beer taste map” was evolved, Figure 8. An impressive example of chemical imaging of port wines according to the wine age was demonstrated in Rudnitskaya et al. (2005). The electronic tongue comprised of 28 potentiometric sensors were applied for analysis of two sets (22 and 170 samples correspondingly) of port wines aged in oak casks for 10, 20, 30 and 40 years, Vintage, LV and Harvest wines (2 set) of 2 to 70 years old. The resulted PLS precision of port wine age prediction with Electronic tongue system was 8 pc, i.e. one year. Several quantitative parameters
CHEMICAL IMAGES OF LIQUIDS
85
Figure 9. PCA classification of diluted and pure vinegars.
such as pH (1 pc), total (8 pc), volatile (21 pc) and fixed (10 pc) acidity as well as content of organic acids (tartaric −10 pc, malic −15 pc), sulphate (9 pc) and sulphur dioxide (24 pc) were evaluated in port wines by means of preliminary calibration of Electronic tongue system and further prediction of unknown samples with adequate accuracy. Other application closely related to the wine analysis is vinegars identification. Vinegar production is related with a wine industry (wine utilization for vinegar acidification) and identification of vinegars is important due to the effort to achieve an adequate quality of production, to ensure uniformity within a brand, to avoid falsifications (dilutions). An electronic tongue system based on the array of six metallic potentiometric sensors (Cu, Sn, Fe, Al, brass and stainless steel wires) was developed and utilized for discrimination of foodstuffs: several types of vinegar and fruit juices. The result of “imaging” of vinegars prepared from white and red wines, balsamic vinegar and diluted vinegars samples, is given on Figure 10. According to the first two principal components (PC1, 86 pc and PC2, 13 pc), two main areas can be indicated on the plot, one of them includes all the pure vinegar samples, while the other embraces tap water and diluted vinegar samples. Simultaneously with qualitative discrimination, metallic electronic tongue system was able to observe the direction of quantitative dilution of vinegars. Fruit juices, dairy products (Winquist et al., 2005), various drinks (black and green teas, coffer) and soft drinks are popular objects of electronic tongues
86
L. LVOVA ET AL.
Figure 10. Imaging of Lavazza coffee, black tear Lipton and several sorts of Korean green tea.
application. A multicomponent analysis of Korean green tea samples was performed in Lvova et al. (2003) by all-solid-state ‘electronic tongue’ microsystem comprised of polymeric sensors of different types based both on PVC and aromatic polyurethane (ArPU) matrices doped with various membrane active components, electrochemically deposited conductive films of polypyrrole (PPy) and polyaniline (PAn) and potentiometric glucose biosensors. The system successfully discriminated different kinds of teas (black and green) and natural coffee, Figure 10. The output of electronic tongue in green teas correlated well with the manufacture’s specifications for (−)-EGC concentration, total catechines and sugars content, l-arginine concentration which was determined with enzyme destruction method and then concentrations of mentioned components were determined in green tea samples with unknown manufacturer specifications. Clinical analysis of biological liquids is another area of electronic tongue systems application. In Legin et al. (1999c) a multisensor system comprising an array of 30 sensors with solvent polymeric membranes doped with various MAC and back-propagation artificial neural network as data processing tool has been applied for multicomponent analysis of solutions close in composition to biological liquids (blood plasma). It has been found that such an approach allows to determine Mg2+ , Ca2+ , pH, HCO3 −, HPO2− 4 in typical ranges with average precision 24 pc that allows to suggest the method as a perspective one for clinic analysis.
CHEMICAL IMAGES OF LIQUIDS
87
Figure 11. Human urine classified by potentiometric electronic tongue based on metallic sensors.
Six potentiometric metallic sensors: Co, 99.9 pc pure, brass (alloy of Cu and 20 wt pc Zn), two alloys of silver (Ag42-Cu17-Zn16-Cd25, Ag60-Cu26-Sn14, where numbers in wt pc), two component alloy Sn40-Pb60, and Cu-P (copper doped with 5 wt pc of phosphorous) and two PVC based electrodes doped with 3 wt pc of monensin and 5 wt pc of TpClPBK were utilized for clinical analysis of human urines. A PCA discrimination score plot of 36 solutions mimicking human urine composition (points 1–131) and 14 real human urine samples (points 132 to 173) are shown on Figure 11. Simultaneously with discrimination between artificial and real urine samples, the groups corresponding to healthy and persons affected by pathologies have been evolved on PCA classification step. It is interesting to notice here, that samples NN 91, 92 has provided by a person, following a strict vegetarian diet. Moreover, a blind sample (points 103 to 105) has been classified as a non urine sample (according to the legend sample appeared a standard black tea preparation). In closing several examples of untypical applications of electronic tongue systems should be reviewed. First of all, a several impressive examples of imaging of bacteria, molds and yeasts species (Sderstrm et al., 2003, 2005) as far as for fermentation process with Aspergillus niger observation (Legin et al., 2004a) and mold growth (Sderstrm et al., 2003) were recently reported. The application of potentiometric metallic sensor array composed of 6 metallic sensors of first kind included Fe, Al, Cu, Sn, brass and stainless steel for discrimination of water soluble humic substance (HS) preparations, organic
88
L. LVOVA ET AL. 400
Agrocore
200
Aldrich
Humate 80
PC2, 36%
0
Haplic Chernozem
August Green Belt
-200
-400
Nobel -600
Albic Luvisol
-800 -400
0
400
800
1200
PC1, 59% Figure 12. Identification of organic fertilizers and soils aqueous extracts with metallic sensors array.
fertilizers and various soil type aqueous extracts was performed. Such a classification may be a first step of qualitative agricultural analysis of soils according to their actual fertility and fertilizers according to their effectiveness since fertility is strictly connected to the amount of water soluble humic substanses. Classification was performed basing on HS complexation affinity with transition metals. A PCA score plot for discrimination of five organic fertilizers, humic acid preparation and two soils (Halpoc Chernozem and Albic Luvisol) is shown on Figure 12. Aqueous extracts of both soil samples are far separated from the group of all organic fertilizers alone PC1 axis, which may be attributed to the lover content and nature of HS in soil extracts. HC and AL soils are well distinguished also alone the PC2, which may be correlated to the pH of the samples as far as to the HS lability, since HC contains less amount of labial fulvic acids. An interesting application of vegetable oils imaging were reported by Apetrei et al. (2005). Vegetables oils used as electroactive binder material of carbon paste electrodes. The electrochemical response of such electrodes immersed in a variety of electrolytic aqueous solutions has been exploited to discriminate the oils. The features observed in the voltammograms are a reflect of the red-ox properties of the electroactive compounds (mainly antioxidants) present in the oils inside of the carbon paste matrix. The different content in polyphenols allows olive oils to be easily discriminated from sunflower oil or corn oil. These parameters together with voltammograms pH influence
CHEMICAL IMAGES OF LIQUIDS
89
were on used as the input variable of PCA. The results indicated that suggested method allows a clear discrimination among oils from different vegetal origins and also allows olive oils of different quality (extra virgin, virgin, lampante and refined olive oil) to be discriminated. An application of the electronic tongue multisensor system (ET) composed of 28 potentiometric chemical sensors both chalcogenide glass and plasticized polymeric membranes was used for discrimination of standard and Mancha Amarela (Yellow spot) cork (Rudnitskaya et al., 2006). Extracts of cork in 10pc ethanol water solutions were analyzed. Two sets of cork samples that included both standard (S) and Mancha Amarela (MA) cork samples from two different factories were studied. It was found that ET could reliably distinguish extracts made from S and MAcork regardless samples‘ origin. ET could predict total phenols’ content with average precision of 9 pc when calibrated using reference data obtained by Folin-Ciocalteu method. Composition pf S and MA cork was studied by ET. The largest difference in concentration between two types of cork extracts was found for content of two acids: gallic and protocatechuic.
9. Non-Electrochemical Electronic Tongues Although the majority of the electronic tongues reported in the literature are based on electrochemical transduction principles, in the last few years some examples of electronic tongues based on different sensing mechanisms. In this scenario, optical transduction has been exploited by the group working at the University of Austin, Texas (Lavigne et al., 1998). The electronic tongue developed by this group was based on polyethylene glycol- polystyrene resin beads functionalized with different dyes. These beads are positioned within silicon microcavities in order to mimic the natural taste buds, Figure 13. Both colorimetric and fluorescence properties variation induced by the interaction of the immobilized dyes with target analytes can be exploited for the sensing mechanism. These changes are recorded by a CCD camera to obtain
Figure 13. Schematic presentation of optical Electronic tongue based on polyethylene glycolpolystyrene resin beads functionalized with different dyes.
90
L. LVOVA ET AL.
the chemical images of the analyzed environments. The analysis of the RGB patterns will allow the identification and potentially the quantification of the analytes present in the sample analyzed. More recently the electronic tongue has been implemented with a micromachined fluidic structure for the introduction of the liquid samples (Sohn et al., 2005). As an extension of his work on the “Smell Seeing” approach, Suslick has recently reported a colorimetric array for the detection of organic compound in water (Zhang and Suslick, 2005). The array was prepared by printing metalloporphyrins, solvatochromic and pH indicator dye spots on a hydrophobic surface.The array was saturated with a reference solution and imaged by an ordinary flatbed scanner to have the starting image. The interaction with target analytes induce color change recorded by the scanner; subtraction of the images with the original reference provides the color change profile, which will be the fingerprint of each analyte. In this configuration the array is not influenced by salts or hydrophilic compounds and for this reason the authors noted that for this reason the array is not really an electronic tongue. Other than optical arrays, nanogravimetric sensors have also been exploited for the development of electronic tongues. The first report was done by Hauptmann and co-workers, with the use of quartz crystal microbalance (QCM) in liquids (Hauptmann et al., 2000). QCMs are more difficult to use in liquid phase, because the high damping of the oscillator requires more electronics and care in the measurements, although the advantages of QCMs in terms of miniaturization and integration with a wide range of different sensing materials are preserved also in the liquid phase. While the use of QCM has not been further developed, more recently Gardner and co-workers have reported the use of a dual shear horizontal surface acoustic wave (SH-SAW) device for a realization of a miniaturized electronic tongue (Sehra et al., 2004). The main advantage claimed by the authors lies in the fact that this device is based on physical and not chemical or electrochemical transduction principles and for this reason it is more robust and durable. The sensing mechanism is based on the measurement of both mechanical (physico-acoustic) properties and electrical (electro-acoustic) parameters of the analyzed liquid. The selectivity of the system can however improved, if necessary, by the deposition of a selective membrane onto the sensing area of the device. The system has bee tested toward the classification of model analytes, representing the basic human taste and exploited also for the discrimination of real samples, such as cow milk. References Albert, K. J., Lewis, N. S., Schauer, C. L., Sotzing, G. A., Stitzel, S. E., Vaid, T. P., and Walt, D. R. (2000) Cross-Reactive Chemical Sensor Arrays, Chemical Review 100, 2595–2626.
CHEMICAL IMAGES OF LIQUIDS
91
Apetrei, C., Rodriguez-Mendez, M. L., and de Saja, J. A. (2005) Modified carbon paste electrodes for discrimination of vegetable oils, Sensors and Actuators B 111/112, 403– 409. Arikawa, Y., Toko, K., Ikezaki, H., Shinha, Y., Ito, T., Oguri, I., and Baba, S. (1995) Sensor Materials 7, 261. Arrieta, A., Rodriguez-Mendez, M. L., and de Saja, J. A. (2003) Sensors Actuators B 95, 357–365. Badr, A. J. and Fulkner, L. R. (2000) Electrochemical Methods: Fundamentals and Applications, 2nd ed., New York, Wiley, 856 pp. Bhlmann, P., Pretsch, E., and Bakker, E. (1998) Carrier-Based Ion-Selective Electrodes and Bulk Optodes. 2. Ionophores for Potentiometric and Optical Sensors, Chemical Review 98, 1593–1687. Buratti, S., Benedetti, S., Scampicchio, M., and Pangerod, E. C. (2004) Characterization and Classification of Italian Barbera Wines by Using an Electronic Nose and Amperometric Electronic Tongue, Analytica Chimica Acta 525, 133–139. Cattrall, R. W. (1997) Chemical Sensors, Vol. 1, Oxford, Oxford University Press. Ciosek, P., Sobanski, T., Augustyniak, E., and Wroblewski, W. (2006) ISE-based Sensor Array System for Classification of Foodstuffs, Measurement Science and Technology 17, 6–11. D’Amico, A., Di Natale, C., and Paolesse, R. (2000) Portraits of Gasses and Liquids by Arrays of Nonspecific Chemical Sensors: Trends and Perspectives, Sensors and Actuators B 68, 324–330. Di Natale, C., Magagnano, A., Davide, F., D’Amico, A., Legin, A., Rudnitskaya, A., Vlasov, Yu., and Seleznev, B. (1997) Multicomponent Analysis of Polluted Waters by Means of Electronic Tongue, Sensors and Actuators B 44, 423–428. Di Natale, C., Paolesse, R., Macagnano, A., Mantini, A., D’Amico, A., Legin, A., Lvova, L., Rudnitaskaya, A., and Vlasov, Yu. (2000a) Electronic Nose and Electronic Tongue Integration for Improved Classification of Clinical and Food Samples, Sensors and Actuators B 64, 15–21. Di Natale, C., Paolesse, R., Macagnano, A., Mantini, A., D’Amico, A., Ubigli, M., Legin, A., Lvova, L., Rudnitaskaya, A., and Vlasov Yu. (2000b) Application of a Combined Artificial Olfaction and Taste System to the Quantification of Relevant Compounds in Red Wine, Sensors and Actuators B 69, 342–347. Duda, R. O., Hart, P. E., and Stork, D. E. (2001) Pattern Classification, 2nd ed., New York, Wiley. Duran, A., Cortina, M., Velasco, L., Rodriguez, J. A., Alegret, S., Calvo, D., and del Valle, M. (2005) Vitrual Instrument as an Automated Potentiometric e-Tongue Based on SIA, In Proceedings of ISOEN’11, April 13–15, Barcelona, Spain, pp. 316–319. Gallaro, J., Alegret, S., and del Valle, M. (2004) A Flow-injection Electronic Tongue Based on Potentiometric Sensors for the Determination of Nitrate in the Presence of Chloride, Sensors and Actuators B 101, 82–80. Geladi, P. and Kowlaski, B. (1986) Partial Least Square Regression: A Tutorial. Analytica Chemica Acta 35, 1–17. Grl, M., Cortina, M., Calvo, D., and del Valle, M. (2005) Automated e-Tongue Based on Potentiometric Sensors for Determining Alkaline-earth Ions in Water, In Proceedings of ISOEN’11, April 13–15, Barcelona, Spain, pp. 296–299. Habara, M., Ikezaki, H., and Toko, K. (2004) Study of Sweet Taste Evaluation Using Taste Sensor with Lipid/Polymer Membranes, Biosensors and Bioelectronics 19, 1559–1563. Harvey, D. (2000) Modern Analytical Chemistry, 1st ed., New York, McGraw-Hill, p. 468.
92
L. LVOVA ET AL.
Hayashi, K., Yamanaka, M., Toko, K., and Yamafuji, K. (1990) Sensors and Actuators B 2, 05. Holmin, S., Krantz-Rlcker, C., and Winquist, F. (2004) Multivariate Optimization of Electrochemically Pre-treated Electrodes Used in a Voltammetric Electronic Tongue, Analytica Chimica Acta 519, 39–46. Iiyama, S., Ezaki, S., Toko, K., Matsuno, T., and Yamafuji, K. (1995) Study of Astringency and Pungency with Multichannel Taste Sensor Made of Lipid Membranes, Sensors and Actuators B 24/24, 75–79. Iiyama, S., Kuga, H., Ezaki, S., Hayashi, K., and Toko, K. (2003) Peculiar Change in Membrane Potential of Taste Sensor Caused by Umami Substances, Sensors and Actuators B 91, 191– 194. Izs, S. and Garnier, M. (2005) The Umami Taste and the IMP/GMP Synergetic Effects Quantify by the ASTREE Electronic Tongue, In Proceedings of ISOEN’11, April 13–15, Barcelona, Spain, pp. 66–69. Krantz-Rlcker, A., Stenberg, M., Winquist, F., and Lundstrm, I. (2001) Electronic Tongues for Environmental Monitoring Based on Sensor Arrays and Pattern Recognition: A Review, Analytica Chimica Acta 426, 217–226. Krusse-Jaress, J. D. (1988) Ion-selective Potentiometry in Clinical Chemistry, Medichal Progress Throufh Technology 13, 107–130. Lavigne, J. J. and Anslyn, E. V. (2001) Angewandte Chemie International Edition 40, 3118. Lavigne, J., Savoy, S., Clevenger, M. B., Ritchie, J. E., Bridget, M., Yoo, S. J., Anslyn, E. V., McDevitt, J. T., Shear, J. B., and Neikirk, D. (1998) Journal of American Chemical Society 120, 6429. Legin, A., Kirsanov, D., Rudnitskaya, A., Iversen, J. J. L., Seleznev, B., Esbensen, K. H., Mortensen, J., Houmller, L. P., and Vlasov, Yu. (2004a) Multicomponent Analysis of Fermentation Growth Media Using the Electronic Tongue (ET), Talanta 64, 766–772. Legin, A., Lvova, L., Rudnitskaya, A., Vlasov, Yu., Di Natale, C., Mazzone, E., and D’Amico, A. (2001) In Proceedings of the Second Symposium in Vino Analytical Science, Bordeaux, France, p. 165. Legin, A., Makarychev-Mikhailov, S., Goryacheva, O., Kirsanov, D., and Vlasov, Yu. (2002) Cross-Sensitive Chemical Sensors Based on Tetraphenylporphyrin and Phthalocyanine, Analytica Chimica Acta 457, 297–303. Legin, A. V., Rudnitskaya, A. M., Legin, K. A., Ipatov, A. V., and Vlasov, Yu. G. (2005a) Methods for Multivariate Calibrations for Processing of the Dynamic Response of a Flow-Injection Multiple-Sensor System, Russian Journal of Applied Chemistry 78, 89–95. Legin, A., Rudnitskaya, A., Seleznev, B., Kirsanov, D., and Vlasov, Yu. (2004b) Chemical Sensor Arrays for Simultaneous Activity of Several Heavy Metals at Ultra Low Level, In Proceedings of Eurosensors XVIII, Rome, Italy, pp. 85–86. Legin, A., Rudnitskaya, A., Seleznev, B., and Vlasov, Yu. (2005b) Electronic Tongue for Quality Assessment of Ethanol, Vodka and Eau-de-vi, Analytica Chimica Acta 534, 129–135. Legin, A., Rudnitskaya, A., Smirnova, A., Lvova, L., and Vlasov, Yu. (1999a) Journal of Applied Chemistry (Russia) 72, 114. Legin, A., Rudnitskaya, A., and Vlasov, Yu. (2003) In S. Alegret (ed.), Integrated Analytical Systems, Comprehensive Analytical Chemistry, Vol. XXXIX, Amsterdam, Elsevier, p. 437. Legin, A., Rudnitskaya, A., Vlasov, Yu., Di Natale, C., Mazzone, E., and D’Amico A. (1999b) Electroanalysis 11, 814. Legin, A., Rudnitskaya, A., Vlasov, Yu., Di Natale, C., Mazzone, E., and D’Amico, A. (2000) Application of Electronic Tongue for Qualitative and Quantitative Analysis of Complex Liquid Media, Sensors and Actuators B 65, 232–234.
CHEMICAL IMAGES OF LIQUIDS
93
Legin, A., Smirnova, A., Rudnitskaya, A., Lvova, L., Suglobova, E., and Vlasov, Yu. (1999c) Chemical Sensor Array for Multicomponent Analysis of Biological Liquids, Analytica Chimica Acta 385, 131–135. Lvova, L., Kim, S. S., Legin, A., Vlasov, Yu., Yang, J. S., Cha, G. S., and Nam H. (2002) All Solid-state Electronic Tongue and its Application for Beverage Analysis, Analytica Chimica Acta 468, 303–314. Lvova, L., Legin, A., Vlasov, Yu., Cha, G. S., and Nam, H. (2003) Multicomponent Analysis of Korean Green Tea by Means of Disposable All-solid-state Potentiometric Electronic Tongue Microsystem, Sensors and Actuators B 95, 391–399. Lvova, L., Martinelli, E., Mazzone, E., Pede, A., Paolesse, R., Di Natale, C., and D’Amico, A. (in press a) Electronic Tongue Based on an Array of Metallic Potentiometric Sensors, Atlanta. Lvova, L., Paolesse, R., Di Natale, C., and D’Amico, A. (in press b) Detection of Alcohols in Beverages: An Application of Porphyrin-based Electronic Tongue, Sensors and Actuators B. Lvova, L., Verrelli, G., Paolesse, R., Di Natale, C., and D’Amico A. (2004) An Application of Porphyrine-based “Electronic Tongue” System for “Verdicchio” Wine Analysis, In Proceedings of Eurosensors XVIII, September 12–15, Rome, Italy, pp. 385–386. Martens, H. and Naes, T. (1989) Multivariate Calibration, London, Wiley. Martinez-Mez, R., Soto, J., Garcia-Breijo, E., Gil, L., Ibez, J., and Gadea, E. (2005a) A Multisensor in Thick-film Technology for Water Quality Control, Sensors and Actuators A 120, 589–595. Martinez-Mez, R., Soto, J., Garcia-Breijo, E., Gil, L., Ibez, J., and Llobet, E. (2005b) An “electronic Tongue” design for the Qualitative Analysis of Natural Waters, Sensors and Actuators B 104, 302–307. Martinez-Mez, R., Soto, J., Gil, L., Garcia-Breijo, E., Ibez, J., Gadea, E., Llobet, E. (2005c) Electronic Tongue for Quantitative Analysis of Water Using Thick-film Technology, In Proceedings of ISOEN’11, April 13–15, Barcelona, Spain, pp. 136–137. Massart, D. L., Vandegiste, B. G., Deming, S. N., Michotte, Y., Kaufmann, L. (1988) Data Handling in Science and Technology. Vol. 2: Chemometrics: A Textbook, Amsterdam, The Netherlands, Elsevier. Moreno, L., Bratov, A., Abramova, N., Jimenez, C., and Dominguez, C. (2005) Multi-sensor Array Used as an Electronic Tongue for Mineral Water Analysis, In Proceedings of ISOEN’11, April 13–15, Barcelona, Spain, pp. 103–105. Mortensen, J., Legin, L., Ipatov, A., Rudnitskaya, A., Vlasov, Yu., and Hjuler, K. (2000) A Flow Injection System Based on Chalcogenide Glass Sensors for the Determination of Heavy Metals, Analytica Chimica Acta 403, 273–277. Nam, H., Cha, G. S., Jeon, Y. H., Kim, J. D., Seo, S. S., Shim, J. H., and Shim, J. H., Artificial Neural Network Analysis and Classification of Beverage Tastes with Solid-state Sensor Array, In Proceedings of Pittcon’05 Conference, Orlando, Florida, February 27–March 4, 2005, 2050–10. Olsson, J., Winquist, F., and Lundstrm, I. (2005) A Self-polishing Electronic Tongue, In Proceedings of Eurosensors XIX, September 11–14, 2005, Barcelona, Spain, TA21. Otto, M. and Thomas, J. D. R. (1985) Analytical Chemistry 57, 2647. Paolesse, R., Di Natale, C., Burgio, M., Martinelli, E., Mazzone, E., Palleschi, G., and D’Amico, A. (2003) Porphyrin-based array of cross-selective electrodes for analysis of liquid samples, Sensors and Actuators B 95, 400–405. Pearce, T. C., Schiffman, S. S., Nagle, H. T., and Gardner, J. W. (2002) (eds.), Handbook of Machine Olfaction: Electronic Nose Technology, New York, Wiley.
94
L. LVOVA ET AL.
Riul Jr., A., dos Santos Jr., D. S., Wohnrath, K., Di Tommazo, R., Carvalho, A. C. P. L. F., Fonseca, F. J., Oliveira Jr., O. N., Taylor, D. M., and Mattoso, L. H. C. (2002) Artificial Taste Sensor: Effcient Combination of Sensors Made from Langmuir–Blodgett Films of Conducting Polymers and a Ruthenium Complex and Self-assembled Films of an Azobenzene Containing Polymer, Langmuir 18, 239–245. Riul Jr., A., Malmegrim, R. R., Fonseca, F. J., and Mattoso, L. H. C. (2003) Nano-Assembled Films for Taste Sensor Application, Artificial Organs 27, 469–472. Rouessac, F. and Rouessac, A. (2002) Chemical Analysis. Mordern Instrumental Methods and Techniques, New York, Wiley, 445 pp. Rudnitskaya, A., Delgadillo, I., Legin, A., Rocha, S., Da Gosta, A.-M., and Simoes, T. (2005) Analysis of Port Wines Using the Electronic Tongue, In Proceedings of ISOEN’11, April 13–15, Barcelona, Spain, pp. 178–179. Rudnitskaya, A., Delgadillo, I., Rocha, S. M., Costa, A. M., and Legin, A. (2006) Quality Valuation of Cork from Quercus suber L. by the Electronic Tongue, Analytica Chimica Acta 563, 315–318. Rudnitskaya, A., Ehlert, A., Legin, A., Vlasov, Yu., and Buttgenbach, S. (2001) Multisensor System on the Basis of an Array of Non-specific Chemical Sensors and Artificial Neural Networks for Determination of Inorganic Pollutants in a Model Groundwater, Talanta 55, 425–431. Sakai, H., Iiyama, S., and Toko, K. (2000) Evaluation of water quality and pollution using multichannel sensors, Sensors and Actuators B 66, 251–255. Sanz Alaejos, M. and Garcia Montelongo, F. J. (2004) Chemical Review 104, 3239–3265. Sderstrm, C., Boren, H., Winquist, F., and Krantz-Rlcker, C. (2003) Use of an Electronic Tongue to Analyze Mold Growth in Liquid Media, International Journal of Food Microbiology 83, 253–261. Sderstrm, C., Rudnitskaya, A., Legin, A., and Krantz-Rlcker, C. (2005) Differentiation of Four Aspergillus Species and one Zygosaccharomyces with Two Electronic Tongues Based on Different Measurement Techniques, Journal of Biotechnology 119, 300–308. Sderstrm, C., Winquist, F., Krantz-Rlcker, C. (2003) Recognition of Six Microbial Species with an Electronic Tongue, Sensors and Actuators B 89, 248–255. Sehra, G., Cole, M., and Gardner, J. W. (2004) Sensors and Actuators B 103, 233. Sohn, Y. S., Goodey, A., Anslyn, E. V., McDevitt, J. T., Shear, J. B., and Neikirk, D. P. (2005) Biosensors and Bioelectronics 21, 303–312. Toko, K. (1996) Taste Sensor with Global Sensitivity, Material Science and Engineering C 4, 69–82. Toko, K. (2000a) Sensors and Actuators B 64, 205. Verrelli, G., Francioso, L., Paolesse, R., Siciliano, P., Di Natale, C., and D’Amico, A. (2005) Electronic Tongue Based on Silicon Miniaturized Potentiometric Sensors, In Proceedings of EUROSENSORS XIX, September 11–14, Barcelona, Spain, TA24. Vlasov, Yu. and Legin, A. (1998) Fresenius Journal of Analytical Chemistry 361, 255. Vlasov, Yu., Legin, A., and Rudnitskaya, A. (1997) Cross-sensitivity Evaluation of Chemical Sensors for Electronic Tongue: Determination of Heavy Metal Ions, Sensors Actuators B 44, 532–537. Wang, J. (2006) Analytical Electrochemistry, 3rd ed., New York, Wiley, 250 pp. Winquist, F., Bjorklund, R., Krantz-Rulcker, C., Lundstrom, I., Ostergren, K., and Skoglund, T. (2005) An Electronic Tongue in the Dairy Industry, Sensors and Actuators B 111/112, 299–304. Winquist, F., Holmin, S., Krantz-Rlcker, C., Wide, P., and Lundstrm, I. (2000) A hybrid electronic tongue, Analytica Chimica Acta 406, 147–157.
CHEMICAL IMAGES OF LIQUIDS
95
Winquist, F. and Lundstrm, I. (1997) An Electronic Tongue Based on Voltammetry, Analytica Chimica Acta 357, 21–31. Winquist, F., Lundstrm, I., and Wide, P. (1999) The Combination of an Electronic Tongue and an Electronic Nose, Sensors and Actuators B 58, 512–517. Wold, H. (1966) Estimation of Principal Components and Related Models by Iterative Least Squares. In P. R. Krishnaiaah (ed.), Multivariate Analysis, New York, Academic Press, pp. 391–420. Yoon, H. J., Shin, J. H., Lee, S. D., Nam, H., Cha, G. S., Strong, T. D., and Brown, R. B. (2000) Solid-state Ion Sensors with a Liquid Junction-free Polymer Membrane-based Reference Electrode for Blood Analysis, Sensors and Actuators B 64, 8–14. Yoshinobu, T., Iwasaki, H., Ui, Y., Furuichi, K., Ermolenko, Yu., Mourzina, Yu., Wagner, T., Nather, N., and Schning, M. J. (2005) The Light-addressable Potentiometric Sensor for Multi-ion Sensing and Imaging, Methods 37, 94–102. Zhang, C. and Suslick, K. S. (2005) Journal of American Chemical Society 127, 11548–11549. Hauptmann, P., Borngraeber, R., Schroeder, J., von-Guericke, O., and Auge, J. (2000) IEEE/EIA International Frequency Control Symposium and Exhibition, p. 22.
This page intentionally blank
SEQUENTIAL DETECTION ESTIMATION AND NOISE CANCELATION E. J. Sullivan1 (
[email protected]) and J. V. Candy2 (
[email protected]) 1 Prometheus Inc., Newport Rhode Island 02840 2 Lawrence Livermore National Laboratories, Livermore California, 94551
Abstract. A noise canceling technique based on a model-based recursive processor is presented. Beginning with a canceling scheme using a reference source, it is shown how to obtain an estimate of the noise, which can then be incorporated in a recursive noise canceler. Once this is done, recursive detection and estimation schemes are developed. The approach is to model the nonstationary noise as an autoregressive process, which can then easily be placed into a state-space canonical form. This results in a Kalman-type recursive processor where the measurement is the noise and the reference is the source term. Once the noise model is in state-space form, it is combined with detection and estimation problems in a self-consistent structure. It is then shown how parameters of interest, such as a signal bearing or range, can be enhanced by including (augmenting) these parameters into the state vector, thereby jointly estimating them along with the recursive updating of the noise model. Key words: (Sullivan) sequential detection, Kalman filter, own-ship noise, noise canceling, recursive detection, nonstationary noise, Neyman–Pearson, likelihood ratio, signal model, noise model, Gauss–Markov
1. Introduction
In underwater acoustic signal processing, own-ship noise is a major problem plaguing the detection, classification, localization and tracking problems (Tolstoy and Clay, 1987; Clay and Medwin, 1977). For example, this noise is a major contributer to towed array measurement uncertainties that can lead to large estimation errors in any form of signal processing problem aimed at extracting quiet target information. Many sonar processing approaches deal with this problem by relying on straightforward filtering techniques to remove these undesirable interferences, but at the cost of precious signal-to-noise ratio (SNR). 97 J. Byrnes (ed.), Imaging for Detection and Identification, 97–105. C 2007 Springer.
98
E. J. SULLIVAN AND J. V. CANDY
This chapter addresses the idea of noise cancelation by formulating it in terms of joint detection and estimation problems (Candy and Sullivan, 2005a,b). Since such problems are, in general, nonstationary in character, the solution must be adaptive, and therefore the approach taken here is to cast it as a recursive model-based problem. By modelbased (Candy, 2006) we mean that as much a priori information as possible is included in the form of physical models, where the parameters of these models can be adaptively estimated. Such an approach leads naturally to state-space based recursive structures. In other words, we will be dealing with Kalman filter based processors. We begin in the next section with the development of a recursive detection structure. This is followed in Sections 3 and 4 with the development of the ship noise and signal models, respectively. Section 5 develops the detection/noise-cancelation scheme. Section 6 continues with the development by showing how this structure can provide enhanced estimation. Finally, Section 7 contains a discussion. 2. Sequential Detection
The detector is configured as a binary hypothesis test based on the Neyman–Pearson criterion. Thus, the optimum solution is given by the likelihood ratio, i.e., L(t) =
Pr(Pt |H1 ) , Pr(Pt |H0 )
(1)
where Pt is the set of time samples of the spatial vector of pressure measurements, i.e., Pt = {p(1), p(2), . . . , p(t)} = {Pt−1 , p(t)},
(2)
where H1 and H0 , respectively, refer to the signal present and the null hypothoses, and p(t) is the spatial vector of measurements at time t. Thus, using Bayes’ rule, the probabilities in Equation (1) can be written as Pr(Pt |H1,0 ) = Pr(p(t)|Pt−1 ; H1,0 )Pr(Pt−1 |H1,0 ).
(3)
Substitution into Equation (1) yields
Pr(p(t)|Pt−1 ; H1 ) Pr(Pt−1 |H1 ) , L(t) = Pr(p(t)|Pt−1 ; H0 ) Pr(Pt−1 |H0 )
(4)
SEQUENTIAL DETECTION AND ESTIMATION
99
but from Equation (1) we see that this can be written as L(t) = L(t − 1)
Pr(p(t)|Pt−1 ; H1 ) . Pr(p(t)|Pt−1 ; H0 )
(5)
Upon taking logarithms, we can now write the sequential log-likelihood ratio as (t) = (t − 1) + lnPr(p(t)|Pt−1 ; H1 ) − lnPr(p(t)|Pt−1 ; H0 ),
(6)
which leads to the binary test implemented as (t) ≥ lnT1 Accept H1 lnT0 < (t) < lnT1 Continue (t) ≤ lnT0 Accept H0
(7)
Following Wald (1947) these thresholds are usually chosen to be PrD 1 − Prmiss = , PrFA PrFA
(8)
1 − PrD Prmiss = . 1 − PrFA 1 − PrFA
(9)
T0 = and T1 =
In these thresholds, PrD and PrFA are probabilities of detection and false alarm chosen by the user. Before we can continue with the detection problem, we must develop expressions for the relevant probability densities in Equation (6). In order to do this we must develop the models for the ship noise and the signal.
3. Ship Noise Model
It can be shown (Candy, 2006; Widrow, 1975; Friedlander, 1982) that the optimum noise canceling structure can be formulated as an identification problem with the noise reference input z(t) related to the ship noise estimate η(t) ˆ by a coloring filter h(t), i.e., η(t) ˆ = h(t) ∗ z(t).
(10)
100
E. J. SULLIVAN AND J. V. CANDY
Using a canonical form, Equation (10) can be written in Gauss–Markov (Jazwinski, 1970) form as η(t) =
Nh
h i z(t − i),
(11)
i=1
with z(t) being the reference. This leads to the state equation ξ (t) = Aξ ξ (t − 1) + Bξ z(t − 1) + w ξ ,
(12)
and the measurement equation η(t) = Cξ ξ (t) + vξ , where
⎡ ⎢ Aξ = ⎢ ⎣
0···
0
I Nh −1
(13)
⎤ 0 .. ⎥ .⎥ 0⎦ 0
Bξ = [1
0
···
0].
(14)
and Cξ = [h 1 h 2 · · · h Nh −1 h Nh ].
(15)
The recursive solution is given by the following algorithm. ξ (t|t − 1) = η(t|t − 1) = η (t) = Rη η (t|t − 1) = ξ (t|t) = K ξ (t) =
Aξ ξ (t − 1|t − 1) + Bξ z(t − 1) Cξ ξ (t − 1|t − 1) η(t) − η(t|t − 1) Cξ P˜ ξ ξ (t|t − 1)Cξ + Rνξ νξ ξ (t|t − 1) + K ξ (t)η (t) Cξ (t) P˜ ξ ξ (t|t − 1)R−1 (t|t − 1) η η
[Prediction] [Meas Pred] [Innovation] [In Cov] [Correction] [Gain] (16)
The P˜ ξ ξ (t|t − 1) term is the state error covariance, and is computed as an integral part of the Kalman algorithm. The notation in these equations has been generalized in order to explicitly indicate the predictive nature of the algorithm. That is, ξ (t|t − 1) is the value of ξ at time t, based on the data up to time t − 1.
SEQUENTIAL DETECTION AND ESTIMATION
101
4. Signal Model
The measurement system is taken to be a towed array of N receiver elements. The received pressure field at the array, as a function of space and time, is given by p(xn (t), t) = s(rn (t), t) + η(xn (t), t) + ν(xn (t), t),
(17)
where xn is the spatial location of the nth receiver element on the array, s is the source (target) signal, η is the interfering own ship noise and ν is the ambient noise. It is assumed that the signal is a weak narrow-band planar wave given by s(xn , t) = a0 ei(ω0 t−k[xn (0)+vt] sin θ) .
(18)
The radian frequency at the source is ω0 , θ is the bearing of the source and v is the speed of forward motion of the array. k = ω0 /c is the wavenumber and c is the speed of sound. Note that for this signal model, x(t) = x(0) + vt. That is, the array motion is modeled as a Galilean transformation on the receiver elements. The significance of this is that, in bearing estimation, the motion contributes information that is contained in the Doppler that enhances the estimate (Sullivan and Candy, 1997; Sullivan and Edelson, 2004). The broadband measurement noise is modelled as zero-mean, white Gaussian. Note that we are not restricting the statistics to be stationary, so we can accommodate the nonstationarities that occur naturally in the ocean environment.
5. Joint Detection/Noise Cancelation
We now have everything we need to construct our processor. Under the null hypothesis, we have that Pr(p(t)|Pt−1 ; H0 ) is conditionally Gaussian, since η is a consequence of the Gauss–Markov model of Equations (16). Thus ˆ − 1), R p p (t|t − 1)). Pr(p(t)|Pt−1 ; H0 ) ∼ N (p(t|t
(19)
Here, p(t|t ˆ − 1) = E{p(t)|Pt−1 ; H0 } is the conditional mean estimate at time t based on the data up to time t − 1, and R p p is the innovations covariance, both of which are available from the estimator of Equations (16). Including the ship noise term in the signal model, we see that for
102
E. J. SULLIVAN AND J. V. CANDY
the null hypothesis, po (t) = η(t) + ν(t),
(20)
so that po (t) = p(t) − p(t|t − 1) = [η(t) − η(t|t − 1)] + ν(t),
(21)
or po (t) = η (t) + ν(t),
(22)
which has the corresponding covariance Rpo po (t) = Rηη (t) + Rνν (t).
(23)
In the case of a deterministic signal, it now follows that for the alternate hypothesis p1 (t) = s(t) + η(t) + ν(t).
(24)
From Equations (20) and (24) then, it now follows that the noise canceled log likelihoods are ˆ R−1 (t)[p0 (t) − η(t)] ˆ lnPr(p(t)|Pt−1 ; H0 ) = [p0 (t) − η(t)] po po
(25)
and ˆ R−1 (t)[p1 (t) − η(t)], ˆ lnPr(p(t)|Pt−1 ; H1 ) = [p1 (t) − η(t)] po po
(26)
where we have used the fact that R p1 p1 (t) = R po po (t).
(27)
Finally, upon substitution of Equations (25) and (26) into Equation (6), the recursive detection/cancelation processor is given by ˆ R−1 (t)[p1 (t) − η(t)] ˆ (t) = (t − 1) + [p1 (t) − η(t)] po po ˆ R−1 (t)[p0 (t) − η(t)]. ˆ −[p0 (t) − η(t)] po po
(28)
6. Joint Estimation/Noise Cancelation
Consider a moving towed array of N elements. The signal at the nth element is given by Equation (18). It has been shown that by including the motion in this manner, the variance on the bearing estimate is reduced, as compared to that of the conventional (beamformer) estimator (Sullivan and Candy, 1997).
SEQUENTIAL DETECTION AND ESTIMATION
103
Here, we wish to further enhance the estimation results by incorporating the noise canceler into a recursive estimation scheme. As can be seen in Equation (18), there are three parameters in the signal model. These are the amplitude, source frequency and bearing. However, if we choose to work in the phase domain, the amplitude is eliminated and we are left with two parameters, ω0 and θ . Including the ship noise, Equation (18) generalizes to1 s(xn , t) = a0 ei(ω0 t−k[xn (0)+vt] sin θ) + η(t).
(29)
The hydrophone measurement is then pn = a0 ei(ω0 t−k[xn (0)+vt] sin θ) + η(t) + v(t),
(30)
so that the canceled measurement for the Kalman filter is ˆ = a0 ei(ω0 t−k[xn (0)+vt] sin θ) + v(t). pn − η(t)
(31)
Although the source frequency appears to be a nuisance parameter, its inclusion is necessary in order to obtain the performance improvement by including the bearing information contained in the Doppler. This is sometimes referred to as the passive synthetic aperture effect. Making the assumption that the bearing changes slowly in time, the recursive estimator can now be cast in the form of a Kalman filter with a “random walk” state equation (Sullivan and Candy, 1997). That is 1 0 θ(t|t − 1) θ (t|t) = . (32) (t|t) = 0 1 ωo (t|t − 1) ωo (t|t) Since we wish to work in the phase domain, the measurement is based on the exponent of Equation (18), and is given by yn ( ) = (ω0 /c)([xn − xn−1 ] + vt) sin θ
n = 1, 2, · · · , N − 1. (33) Note that the ω0 t term does not appear as it does in Equation (18). This is due to the fact that it is only the phase differences that are relevant to the problem; thus, there are N − 1 phase difference measurements for N hydrophones. There is an auxiliary measurement equation that is based on the observed frequency. This is basically the Doppler relation 1
Although this development is based on a narrow band signal model, it can be easily generalized to accommodate a broadband model.
104
E. J. SULLIVAN AND J. V. CANDY
and is given by y N ( ) = ω = ω0 (1 + (v/c) sin θ ),
(34)
with ω being the observed radian frequency. Note that the measurement equations are nonlinear due to the appearance of the term ω0 sin θ . Thus, we will need to use the extended Kalman filter. The joint parametrically adaptive model-based processor (Enhancer/Estimator) in now given by − 1) = Aξ ξ (t − 1|t − 1) + Bξ z(t − 1) − 1) = (t − 1|t − 1) − 1) = Cξ ξ (t − 1|t − 1) − 1) = c[ (t − 1|t − 1)] η (t) = η(t) − η(t|t − 1) y (t) = y(t) − y(t|t − 1) − η(t|t − 1) Rη η (t|t − 1) = Cξ P˜ ξ ξ (t|t − 1)Cξ + Rνν (t) ξ (t|t (t|t η(t|t y(t|t
Ry y (t|t − 1) = J (t) P˜ (t|t − 1)J (t) + Rvv (t) ξ (t|t) = ξ (t|t − 1) + K ξ (t)η (t) (t|t) = (t|t − 1) + K (t)y (t) K ξ (t) = P˜ ξ ξ (t|t − 1)Cξ R−1 (t) η η
K (t) =
[Prediction] [Meas Pred] [Innovation] [In Cov]
(35)
[Correction] [Gain]
P˜ (t|t − 1)J (t)R−1 (t) y y
The term J (t) in the last of Equations (35) is the Jacobian of the (nonlinear) measurement of Equations (33) and (34), which are collectively written as c[ ] in the third of Equations (35), i.e., ∂c[ ] . (36) ∂ That is, the nonlinearity of the measurement equations necessitates the use of the EKF (Candy, 2006), which in turn, requires the Jacobian. J =
7. Discussion
We have shown theoretically that recursive model-based processing can provide adaptive processing schemes that are capable of enhancing both detection and estimation procedures through the proper application of
SEQUENTIAL DETECTION AND ESTIMATION
105
physical models. Here, we have used an all-pole model of own-ship noise and a recursive detector to enhance the detection of a signal. Further, we have shown how the use of such noise models, along with a realistic signal model, can enhance towed-array bearing estimation. References Candy, J. V. (2006) Model-Based Signal Processing, New York, John Wiley & Sons. Candy, J. V. and Sullivan, E. J (2005a) Joint Noise Canceling/Detection for a Towed Array in a Hostile Ocean Environment, presented at the International Conference on Underwater Acoustics Measurements, Heraklion, Crete, Greece, June 28–July 1. Candy, J. V. and Sullivan, E. J. (2005b) Canceling Tow Ship Noise Using an Adaptive Modelbased Approach, presented at the IEEE OES 8th Working Conference on Current Measuring Technology, Oceans 05 Europe, Southampton UK, June 27–29. Clay, C. S. and Medwin, H. (1977) Acoustical Oceanography, New York, John Wiley & Sons, pp. 110–134. Friedlander, B. (1982) System Identification Techniques for Adaptive Noise Canceling, IEEE Transactions on Acoustical Speech Signal Processing., ASSP-30(5), 699–709. Jazwinski, A. (1970) Stochastic Processes and Filtering Theory, New York, Academic Press. Sullivan, E. J. and Candy, J. V. (1997) Space-Time Array Processing: The Model-based Approach, Journal of the Acoustical Society of America, 102(5), 2809–2820. Sullivan, E. J. and Edelson, G. S (2004) A Generalized Cramer-Rao Lower Bound for Line Arrays, presented at the 148th Meeting of the Acoustical Society of America, San Diego CA, Nov. 15–19. Tolstoy, I. and Clay, C. S. (1987) Ocean Acoustics: Theory and Experiment in Underwater Sound, Acoustical Society of America, 3–12. Wald, A. (1947) Sequential Analysis, New York, John Wiley & Sons. Widrow, B., Glower, J. R.(Jr.), McCool, J. M., Kaunitz, J., Williams, C. S., Heart, R. H., Zeldler, J. R., Dong, E. (Jr.), and Goodlin, R. C., (1975) Adaptive Noise Canceling: Principles and Applications, Proceedings of IEEE, 63, 1692–1716.
This page intentionally blank
IMAGE FUSION: A POWERFUL TOOL FOR OBJECT IDENTIFICATION ˇ Filip Sroubek (
[email protected]), Jan Flusser (
[email protected]), and Barbara Zitov´a (
[email protected]) Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vod´arenskou vˇezˇ´ı 4, Praha 8, 182 08, Czech Republic
Abstract. Due to imperfections of imaging devices (optical degradations, limited resolution of CCD sensors) and instability of the observed scene (object motion, media turbulence), acquired images are often blurred, noisy and may exhibit insufficient spatial and/or temporal resolution. Such images are not suitable for object detection and recognition. Reliable detection requires recovering the original image. If multiple images of the scene are available, this can be achieved by image fusion. In this chapter we review the respective methods of image fusion. We address all three major steps – image registration, blind deconvolution and resolution enhancement. Image registration brings the acquired images into spatial alignment, multiframe deconvolution estimates and removes the blur, and the spatial resolution of the image is increased by so-called superresolution fusion. Superresolution is the main topic of the chapter. We propose a unifying system that simultaneously estimates blurs and recovers the original undistorted image, all in high resolution, without any prior knowledge of the blurs and original image. We accomplish this by formulating the problem as constrained least squares energy minimization with appropriate regularization terms, which guarantees a close-to-perfect solution. We demonstrate the performance of the method on many examples, namely on car license plate recognition and face recognition. Both of these tasks are of great importance in security and surveillance systems. Key words: image fusion, multichannel systems, blind deconvolution, superresolution, regularized energy minimization 1. Introduction Imaging devices have limited achievable resolution due to many theoretical and practical restrictions. An original scene with a continuous intensity function o[x, y] warps at the camera lens because of the scene motion and/or 107 J. Byrnes (ed.), Imaging for Detection and Identification, 107–127. C 2007 Springer.
108
ˇ F. SROUBEK ET AL.
change of the camera position. In addition, several external effects blur images: atmospheric turbulence, camera lens, relative camera-scene motion, etc. We will call these effects volatile blurs to emphasize their unpredictable and transitory behavior, yet we will assume that we can model them as convolution with an unknown point spread function (PSF) v[x, y]. This is a reasonable assumption if the original scene is flat and perpendicular to the optical axis. Finally, the CCD discretizes the images and produces a digitized noisy image g[i, j] (frame). We refer to g[i, j] as a low-resolution (LR) image, since the spatial resolution is too low to capture all the details of the original scene. In conclusion, the acquisition model becomes g[i, j] = D((v ∗ o[W (n 1 , n 2 )])[x, y]) + n[i, j] ,
(1)
where n[i, j] is additive noise and W denotes geometric deformation (spatial warping) of the image. Geometric deformations are partly caused by the fact that the image is a 2D projection of a 3D world, and partly by lens distortions and/or motion of the sensor during the acquisition. D(·) = S(g ∗ ·) is the decimation operator that models the function of the CCD sensors. It consists of convolution with the sensor PSF g[i, j] followed by the sampling operator S, which we define as multiplication by a sum of delta functions placed on an evenly spaced grid. The above model for one single observation g[i, j] is extremely ill-posed. Instead of taking a single image we can take K (K > 1) images of the original scene and, in this way, partially overcome the equivocation of the problem. Hence we write gk [i, j] = D((vk ∗ o[Wk (n 1 , n 2 )])[x, y]) + n k [i, j] ,
(2)
where k = 1, . . . , K and D remains the same in all the acquisitions. In the perspective of this multiframe model, the original scene o[x, y] is a single input and the acquired LR images gk [i, j] are multiple outputs. The model is therefore called a single input multiple output (SIMO) formation model. To our knowledge, this is the most accurate, state-of-the-art model, as it takes all possible degradations into account. Because of many unknown parameters of the model, it is hard to analyze (automatically or visually) the images gk and to detect and recognize objects in them. A very powerful strategy is offered by image fusion. The term fusion means in general an approach to extraction of information adopted in several domains. The goal of image fusion is to integrate complementary information from all frames into one new image containing information the quality of which cannot be achieved otherwise. Here, the term “better quality” means less blur and geometric distortion, less noise, and higher spatial resolution. We may expect that object detection and recognition will be easier and more reliable when performed on the fused image. Regardless of the particular fusion algorithm, it is unrealistic to assume that the
IMAGE FUSION
109
Figure 1. Image fusion in brief: acquired images (left), registered frames (middle), fused image (right).
fused image can recover the original scene o[x, y] exactly. A reasonable goal of the fusion is a discrete version of o[x, y] that has higher spatial resolution than the resolution of the LR images and that is free of the volatile blurs. In the sequel, we will refer to this fused image as a high resolution (HR) image f [i, j]. Fusion of images acquired according to the model (2) is a three-stage process – it consists of image registration (spatial alignment), which should compensate for geometric deformations Wk , followed by a multichannel (or multiframe) blind deconvolution (MBD) and superresolution (SR) fusion. The goal of MBD is to remove the impact of volatile blurs and the aim of SR is to increase spatial resolution of the fused image by a user-defined factor. While image registration is actually a separate procedure, we integrate both MBD and SR into a single step (see Figure 1), which we call blind superresolution (BSR). The approach presented in this chapter is one of the first attempts to solve BSR under realistic assumptions with only little a priori knowledge. Image registration is a very important step of image fusion, because all MBD and SR methods require either perfectly aligned channels (which is not realistic) or allow at most small shift differences. Thus, the role of registration methods is to suppress large and complex geometric distortions. Image registration in general is a process of transforming two or more images into a geometrically equivalent form. From the mathematical point of view, it consists of approximating Wk−1 and of resampling the image. For images which are not blurred, registration has been extensively studied in the recent literature (see Zitov´a and Flusser (2003) for a survey). However, blurred images require special registration techniques. They can be, as well as the general-purpose registration methods, divided in two groups – global and landmark-based ones. Regardless of the particular technique, all feature extraction methods,
110
ˇ F. SROUBEK ET AL.
similarity measures, and matching algorithms used in the registration process must be insensitive to image blurring. Global methods do not search for particular landmarks in the images. They try to estimate directly the between-channel translation and rotation. In Myles and Lobo (1998) they proposed an iterative method which works well if a good initial estimate of the transformation parameters is available. In Zhang et al. (2000, 2002) the authors proposed to estimate the registration parameters by bringing the channels into canonical form. Since blur-invariant moments were used to define the normalization constraints, neither the type nor the level of the blur influences the parameter estimation. In Kubota et al. (1999) they proposed a two-stage registration method based on hierarchical matching, where the amount of blur is considered as another parameter of the search space. In Zhang and Blum (2001) they proposed an iterative multiscale registration based on optical flow estimation in each scale, claiming that optical flow estimation is robust to image blurring. All global methods require considerable (or even complete) spatial overlap of the channels to yield reliable results, which is their major drawback. Landmark-based blur-invariant registration methods have appeared very recently, just after the first paper on the moment-based blur-invariant features (Flusser et al., 1996). Originally, these features could only be used for registration of mutually shifted images (Flusser and Suk 1998; Bentoutou et al. 2002). The proposal of their rotational-invariant version (Flusser and Zitov´a, 1999) in combination with a robust detector of salient points (Zitov´a et al., 1999) led to the registration methods that are able to handle blurred, shifted and rotated images (Flusser et al., 1999, 2003). Although the above-cited registration methods are very sophisticated and can be applied almost to all types of images, the results tend to be rarely perfect. The registration error usually varies from subpixel values to a few pixels, so only MBD and SR methods sufficiently robust to between-channel misregistration can be applied to channel fusion. We will assume in the sequel that the LR images are roughly registered and that Wk ’s reduce to small translations. During the last 20 years, blind deconvolution has attracted considerable attention as a separate image processing task. Initial blind deconvolution attempts were based on single-channel formulations, such as in Lagendijk et al. (1990), Reeves and Mersereau (1992), Chan and Wong (1998), Haindl (2000). A good overview is in Kundur and Hatzinakos (1996a,b). The problem is extremely ill-posed in the single-channel framework and cannot be resolved in the fully blind form. These methods do not exploit the potential of multiframe imaging, because in the single-channel case the missing information about the original image in one channel cannot by supplemented by information obtained from the other channels. Research on intrinsically
IMAGE FUSION
111
multichannel methods has begun fairly recently; refer to Harikumar and Bresler (1999), Giannakis and Heath (2000), Pai and Bovik (2001), Panci ˇ et al. (2003), Sroubek and Flusser (2003) for a survey and other references. Such MBD methods break the limitations of previous techniques and can recover the blurring functions from the degraded images alone. We further ˇ developed the MBD theory in Sroubek and Flusser (2005) by proposing a blind deconvolution method for images, which might be mutually shifted by unknown vectors. A similar idea is used here as a part of the fusion algorithm to remove volatile blurs and will be explained more in Section 3. Superresolution has been mentioned in the literature with an increasing frequency in the last decade. The first SR methods did not involve any deblurring; they just tried to register the LR images with subpixel accuracy and then to resample them on a high-resolution grid. A good survey of SR techniques can be found in Park et al. (2003), Farsui et al. (2004). Maximum likelihood (ML), maximum a posteriori (MAP), the set theoretic approach using POCS (projection on convex sets), and fast Fourier techniques can all provide a solution to the SR problem. Earlier approaches assumed that subpixel shifts are estimated by other means. More advanced techniques, such as in Hardie et al. (1997), Segall et al. (2004), Woods et al. (2006), include shift estimation in the SR process. Other approaches focus on fast implementation (Farsiu et al., 2004), space–time SR (Shechtman et al., 2005) or SR of compressed video (Segall et al., 2004). Some of the recent SR methods consider image blurring and involve blur removal. Most of them assume only a priori known blurs. However, few exceptions exist. Authors in Nguyen et al. (2001), Woods et al. (2003) proposed BSR that can handle parametric PSFs with one parameter. This restriction is unfortunately very limiting for most real applications. Probably the first attempts for BSR with an arbitrary PSF appeared in Wirawan et al. (1999), Yagle (2003), where polyphase decomposition of the images was employed. Current multiframe blind deconvolution techniques require no or very little prior information about the blurs, they are sufficiently robust to noise and provide satisfying results in most real applications. However, they can hardly cope with the downsampling operator, which violates the standard convolution model. On the contrary, state-of-the-art SR techniques achieve remarkable results in resolution enhancement in the case of no blur. They accurately estimate the subpixel shift between images but lack any apparatus for calculating the blurs. We propose a unifying method that simultaneously estimates the volatile blurs and HR image without any prior knowledge of the blurs and the original image. We accomplish this by formulating the problem as a minimization of a regularized energy function, where the regularization is carried out in both the image and blur domains. Image regularization is based on variational
112
ˇ F. SROUBEK ET AL.
integrals, and a consequent anisotropic diffusion with good edge-preserving capabilities. A typical example of such regularization is total variation. However, the main contribution of this work lies in the development of the blur regularization term. We show that the blurs can be recovered from the LR images up to small ambiguity. One can consider this as a generalization of the results proposed for blur estimation in the case of MBD problems. This fundamental observation enables us to build a simple regularization term for the blurs even in the case of the SR problem. To tackle the minimization task we use an alternating minimization approach, consisting of two simple linear equations. The rest of the chapter is organized as follows. Section 2 outlines the degradation model. In Section 3 we present a procedure for volatile blur estimation. This effortlessly blends in a regularization term of the BSR algorithm as described in Section 4. Finally, Section 5 illustrates applicability of the proposed method to real situations.
2. Mathematical Model To simplify the notation, we will assume only images and PSFs with square supports. An extension to rectangular images is straightforward. Let f [x, y] be an arbitrary discrete image of size F × F, then f denotes an image column vector of size F 2 × 1 and C A { f } denotes a matrix that performs convolution of f with an image of size A × A. The convolution matrix can have a different output size. Adopting the Matlab naming convention, we distinguish two cases: “full” convolution C A { f } of size (F + A − 1)2 × A2 and “valid” convolution CvA { f } of size (F − A + 1)2 × A2 . In both cases the convolution matrix is a Toeplitz-block-Toeplitz (TBT) matrix. In the sequel we will not specify dimensions of convolution matrices if it is obvious from the size of the right argument. Let us assume we have K different LR frames {gk } (each of size G × G) that represent degraded (blurred and noisy) versions of the original scene. Our goal is to estimate the HR representation of the original scene, which we denoted as the HR image f of size F × F. The LR frames are linked with the HR image through a series of degradations similar to those between o[x, y] and gk in (2). First f is geometrically warped (Wk ), then it is convolved with a volatile PSF (Vk ) and finally it is decimated (D). The formation of the LR images in vector-matrix notation is then described as gk = DVk Wk f + nk ,
(3)
where nk is additive noise present in every channel. The decimation matrix D = SU simulates the behavior of digital sensors by first performing
IMAGE FUSION
113
convolution with the U × U sensor PSF (U) and then downsampling (S). The Gaussian function is widely accepted as an appropriate sensor PSF and it is also used here. Its justification is experimentally verified in (Capel, 2004). A physical interpretation of the sensor blur is that the sensor is of finite size and it integrates impinging light over its surface. The sensitivity of the sensor is highest in the middle and decreases towards its borders with Gaussian-like decay. Further we assume that the subsampling factor (or SR factor, depending on the point of view), denoted by ε, is the same in both x and y directions. It is important to underline that ε is a user-defined parameter. In principle, Wk can be a very complex geometric transform that must be estimated by image registration or motion detection techniques. We have to keep in mind that sub-pixel accuracy in gk ’s is necessary for SR to work. Standard image registration techniques can hardly achieve this and they leave a small misalignment behind. Therefore, we will assume that complex geometric transforms are removed in the preprocessing step and Wk reduces to a small translation. Hence Vk Wk = Hk , where Hk performs convolution with the shifted version of the volatile PSF vk , and the acquisition model becomes gk = DHk f + nk = SUHk f + nk .
(4)
The BSR problem then adopts the following form: We know the LR images {gk } and we want to estimate the HR image f for the given S and the sensor blur U. To avoid boundary effects, we assume that each observation gk captures only a part of f . Hence Hk and U are “valid” convolution matrices CvF {h k } and CvF−H +1 {u}, respectively. In general, the PSFs h k are of different size. However, we postulate that they all fit into a H × H support. In the case of ε = 1, the downsampling S is not present and we face a slightly modified MBD problem that has been solved elsewhere (Harikumar ˇ and Bresler, 1999, Sroubek and Flusser, 2005). Here we are interested in the case of ε > 1, when the downsampling occurs. Can we estimate the blurs as in the case ε = 1? The presence of S prevents us from using the cited results directly. However, we will show that conclusions obtained for MBD apply here in a slightly modified form as well.
3. Reconstruction of Volatile Blurs Estimation of blurs in the MBD case (no downsampling) attracted considerable attention in the past. A wide variety of methods were proposed, such as in Harikumar and Bresler (1999), Giannakis and Heath (2000), that provide a satisfactory solution. For these methods to work correctly, certain channel disparity is necessary. The disparity is defined as weak co-primeness of the channel blurs, which states that the blurs have no common factor except a
114
ˇ F. SROUBEK ET AL.
scalar constant. In other words, if the channel blurs can be expressed as a convolution of two subkernels then there is no subkernel that is common to all blurs. An exact definition of weakly co-prime blurs can be found in Giannakis and Heath (2000). Many practical cases satisfy the channel co-primeness, since the necessary channel disparity is mostly guaranteed by the nature of the acquisition scheme and random processes therein. We refer the reader to Harikumar and Bresler (1999) for a relevant discussion. This channel disparity is also necessary for the BSR case. Let us first recall how to estimate blurs in the MBD case and then we will show how to generalize the results for integer downsampling factors. For the time being we will omit noise n, until Section 4, where we will address it appropriately. 3.1. THE MBD CASE
The downsampling matrix S is not present in (4) and only convolution binds the input with the outputs. The acquisition model is of the SIMO type with one input channel f and K output channels gk . Under the assumption of channel co-primeness, we can see that any two correct blurs h i and h j satisfy gi ∗ h j − g j ∗ h i = 0 .
(5)
Considering all possible pairs of blurs, we can arrange the above relation into one system N h = 0 ,
(6)
where h = [h1T , . . . , hTK ]T and N consists of matrices that perform convolution with gk . In most real situations the correct blur size (we have assumed square size H × H ) is not known in advance and therefore we can generate the ˆ1 × H ˆ 2 . The nullity (null-space above equation for different blur dimensions H dimension) of N is exactly 1 for the correctly estimated blur size. By applying SVD (singular value decomposition), we recover precisely the blurs except for ascalar factor. One can eliminate this magnitude ambiguity by stipulating that x,y h k [x, y] = 1, which is a common brightness preserving assumption. For the underestimated blur size, the above equation has no solution. If the blur ˆ 1 − H + 1) ( H ˆ 2 − H + 1). size is overestimated, then nullity(N ) = ( H 3.2. THE BSR CASE
Before we proceed, it is necessary to define precisely the sampling matrix S. Let Sε1 denote a 1D sampling matrix, where ε is the integer subsampling factor. Each row of the sampling matrix is a unit vector whose nonzero element is at such a position that, if the matrix multiplies an arbitrary vector b, the result
IMAGE FUSION
115
of the product is every εth element of b starting from b1 . If the vector length is M then the size of the sampling matrix is (M/ε) × M. If M is not divisible by ε, we can pad the vector with an appropriate number of zeros to make it divisible. A 2D sampling matrix is defined by Sε := Sε1 ⊗ Sε1 ,
(7)
where ⊗ denotes the matrix direct product (Kronecker product operator). Note that the transposed matrix (Sε )T behaves as an upsampling operator that interlaces the original samples with (ε − 1) zeros. ˇ A naive approach, e.g., proposed in Sroubek and Flusser (2006), Chen et al. (2005), is to modify (6) in the MBD case by applying downsampling and formulating the problem as min N [I K ⊗ Sε U]h2 , h
(8)
where I K is the K × K identity matrix. One can easily verify that the condition in (5) is not satisfied for the BSR case as the presence of downsampling operators violates the commutative property of convolution. Even more disturbing is the fact that minimizers of (8) do not have to correspond to the correct blurs. We are going to show that if one uses a slightly different approach, reconstruction of the volatile PSFs h k is possible even in the BSR case. However, we will see that some ambiguity in the solution of h k is inevitable. First, we need to rearrange the acquisition model (4) and construct from the LR images gk a convolution matrix G with a predetermined nullity. Then we take the null space of G and construct a matrix N , which will contain the correct PSFs h k in its null space. Let E × E be the size of “nullifying” filters. The meaning of this name will be clear later. Define G := [G1 , . . . , G K ], where Gk := CvE {gk } are “valid” convolution matrices. Assuming no noise, we can express G in terms of f , u and h k as G = Sε FUH ,
(9)
H := [CεE {h 1 }(Sε )T , . . . , CεE {h K }(Sε )T ] ,
(10)
where
U := CεE+H −1 {u} and F := CvεE+H +U −2 { f }. The convolution matrix U has more rows than columns and therefore it is of full column rank (see proof in Harikumar and Bresler (1999) for general convolution matrices). We assume that Sε F has full column rank as well. This is almost certainly true for real images if F has at least ε 2 -times more rows than columns. Thus Null(G) ≡ Null(H) and the difference between the number of
116
ˇ F. SROUBEK ET AL.
columns and rows of H bounds from below the null space dimension, i.e., nullity(G) ≥ K E 2 − (εE + H − 1)2 .
(11)
Setting N := K E 2 − (εE + H − 1)2 and N := Null(G), we visualize the null space as ⎤ ⎡ n1,1 . . . n1,N ⎢ .. ⎥ , .. (12) N = ⎣ ... . . ⎦ n K ,1
...
n K ,N
where nkn is the vector representation of the nullifying filter ηkn of size E × E, k = 1, . . . , K and n = 1, . . . , N . Let η˜ kn denote upsampled ηkn by factor ε, i.e., η˜ kn := (Sε )T ηkn . Then, we define ⎤ ⎡ C H {η˜ 1,1 } . . . C H {η˜ K ,1 } ⎥ ⎢ .. .. .. (13) N := ⎣ ⎦ . . . C H {η˜ 1,N } . . . C H {η˜ K ,N } and conclude that Nh = 0,
(14)
where hT = [h1 , . . . , h K ]. We have arrived at an equation that is of the same form as (6) in the MBD case. Here we have the solution to the blur estimation problem for the BSR case. However, since Sε is involved, ambiguity of the solution is higher. Without proofs we provide the following statements. For the correct blur size, nullity(N ) = ε 4 . For the underestimated blur size, (14) has ˆ1 × H ˆ 2 , nullity(N ) = ε 2 ( H ˆ1 − no solution. For the overestimated blur size H ˆ H + ε) ( H2 − H + ε). The conclusion may seem to be pessimistic. For example, for ε = 2 the nullity is at least 16, and for ε = 3 the nullity is already 81. Nevertheless, Section 4 will show that N plays an important role in the regularized restoration algorithm and its ambiguity is not a serious drawback. It is interesting to note that a similar derivation is possible for rational SR factors ε = p/q. We downsample the LR images with the factor q, thereby creating q 2 K images, and apply thereon the above procedure for the SR factor p. Another consequence of the above derivation is the minimum necessary number of LR images for the blur reconstruction to work. The condition of the G nullity in (11) implies that the minimum number is K > ε2 . For example, for ε = 3/2, 3 LR images are sufficient; for ε = 2, we need at least 5 LR images to perform blur reconstruction.
IMAGE FUSION
117
4. Blind Superresolution In order to solve the BSR problem, i.e, determine the HR image f and volatile PSFs h k , we adopt a classical approach of minimizing a regularized energy function. This way the method will be less vulnerable to noise and better posed. The energy consists of three terms and takes the form E(f, h) =
K
DHk f − gk 2 + α Q(f ) + β R(h) .
(15)
k=1
The first term measures the fidelity to the data and emanates from our acquisition model (4). The remaining two are regularization terms with positive weighting constants α and β that attract the minimum of E to an admissible set of solutions. The form of E very much resembles the energy proposed ˇ in Sroubek and Flusser (2005) for MBD. Indeed, this should not come as a surprise since MBD and SR are related problems in our formulation. Regularization Q(f ) is a smoothing term of the form Q(f ) = fT Lf ,
(16)
where L is a high-pass filter. A common strategy is to use convolution with the for L, which in the continuous case
Laplacian
corresponds to Q( f ) = |∇ f |2 . Recently, variational integrals Q( f ) = φ(|∇ f |) were proposed, where φ is a strictly convex, nondecreasing function √ that grows at most linearly. Examples of φ(s) are s (total variation), 1 + s 2 − 1 (hypersurface minimal function), log(cosh(s)), or nonconvex functions, such as log(1 + s 2 ), s 2 /(1 + s 2 ) and arctan(s 2 ) (Mumford-Shah functional). The advantage of the variational approach is that, while in smooth areas it has the same isotropic behavior as the Laplacian, it also preserves edges in images. The disadvantage is that it is highly nonlinear. To overcome this difficulty one must use, e.g., the half-quadratic algorithm (Aubert and Kornprobst, 2002). For the purpose of our discussion it suffices to state that after discretization we arrive again at (16), where this time L is a positive semidefinite block tridiagonal matrix constructed of values depending on the gradient of f . The rationale behind the choice of Q( f ) is to constrain the local spatial behavior of images; it resembles a Markov Random Field. Some global constraints may be more desirable but are difficult (often impossible) to define, since we develop a general method that should work with any class of images. The PSF regularization term R(h) directly follows from the conclusions of the previous section. Since the matrix N in (13) contains the correct PSFs h k in its null space, we define the regularization term as a least-squares fit R(h) = N h2 = hT N T N h .
(17)
ˇ F. SROUBEK ET AL.
118
The product N T N is a positive semidefinite matrix. More precisely, R is a consistency term that binds the different volatile PSFs to prevent them from moving freely and, unlike the fidelity term (the first term in (15)), it is based solely on the observed LR images. A good practice is to include with a small weight a smoothing term hT Lh in R(h). This is especially useful in the case of less noisy data to overcome the higher nullity of N . The complete energy then takes the form E(f, h) =
K
DHk f − gk 2 + αfT Lf + β1 N h2 + β2 hT Lh .
(18)
k=1
To find a minimizer of the energy function, we perform alternating minimizations (AM) of E over f and h. The advantage of this scheme lies in its simplicity. Each term of (18) is quadratic and therefore convex (but not necessarily strictly convex) and the derivatives w.r.t. f and h are easy to calculate. This AM approach is a variation on the steepest-descent algorithm. The search space is a concatenation of the blur subspace and the image subspace. The algorithm first descends in the image subspace and after reaching the minimum, i.e., ∇f E = 0, it advances in the blur subspace in the direction ∇h E orthogonal to the previous one, and this scheme repeats. In conclusion, starting with some initial h0 the two iterative steps are: step 1.
fm = arg min E(f, hm ) f K K ⇔ HkT DT DHk + αL f = HkT DT gk , k=1
step 2.
(19)
k=1
hm+1 = arg min E(f m , h) h
⇔ ([I K ⊗ FT DT DF] + β1 N T N + β2 L)h = [I K ⊗ FT DT ]g, (20) CvH {
[g1T , . . . , gTK ]T
where F := f }, g := and m is the iteration step. Note that both steps consist of simple linear equations. Energy E as a function of both variables f and h is not convex due to the coupling of the variables via convolution in the first term of (18). Therefore, it is not guaranteed that the BSR algorithm reaches the global minimum. In our experience, convergence properties improve significantly if we add feasible regions for the HR image and PSFs specified as lower and upper bounds constraints. To solve step 1, we use the method of conjugate gradients (function cgs in Matlab) and then adjust the solution f m to contain values in the admissible range, typically, the range of values of g. It is common to assume that PSF is positive (h k ≥ 0) and that it preserves image brightness.
IMAGE FUSION
119
We can therefore write the lower and upper bounds constraints for PSFs as 2 hk ∈ 0, 1 H . In order to enforce the bounds in step 2, we solve (20) as a constrained minimization problem (function fmincon in Matlab) rather than using the projection as in step 1. Constrained minimization problems are more computationally demanding but we can afford it in this case since the size of h is much smaller than the size of f. The weighting constants α and βi depend on the level of noise. If noise increases, α and β2 should increase, and β1 should decrease. One can use parameter estimation techniques, such as cross-validation (Nguyen et al., 2001) or expectation maximization (Molina et al., 2003), to determine the correct weights. However, in our experiments we set the values manually according to a visual assessment. If the iterative algorithm begins to amplify noise, we have underestimated the noise level. On the contrary, if the algorithm begins to segment the image, we have overestimated the noise level.
5. Experiments This section consists of two parts. In the first one, a set of experiments on synthetic data evaluate performance of the BSR algorithm with respect to the SR factor and compare the reconstruction quality with other methods. The second part demonstrates the applicability of the proposed method to real data. Results are not evaluated with any measure of reconstruction quality, such as mean-square errors or peak signal to noise ratios. Instead we print the results and leave the comparison to a human eye as we believe that in this case the visual assessment is the only reasonable method. In all the experiments the sensor blur is fixed and set to a Gaussian function of standard deviation σ = 0.34 (relative to the scale of LR images). One should underline that the proposed BSR method is fairly robust to the choice of the Gaussian variance, since it can compensate for insufficient variance by automatically including the missing factor of Gaussian functions in the volatile blurs. Another potential pitfall that we have to take into consideration is a feasible range of SR factors. Clearly, as the SR factor ε increases we need more LR images and the stability of BSR decreases. In addition, rational SR factors p/q, where p and q are incommensurable and large regardless of the effective value of ε, also make the BSR algorithm unstable. It is the numerator p that determines the internal SR factor used in the algorithm. Hence we limit ourselves to ε between 1 and 2.5, such as 3/2, 5/3, 2, etc., which is sufficient in most practical applications.
ˇ F. SROUBEK ET AL.
120
1 4 7 1
(a)
4
7
(b)
Figure 2. Simulated data: (a) original 150 × 230 image; (b) six 7 × 7 volatile PSFs used to blur the original image.
5.1. SIMULATED DATA
First, let us demonstrate the BSR performance with a simple experiment. A 150 × 230 image in Figure 2a blurred with the six masks in Figure 2b and downsampled with factor 2 generated six LR images. In this case, registration is not necessary since the synthetic data are precisely aligned. Using the LR images as an input, we estimated the original HR image with the proposed BSR algorithm for ε = 1.25 and 1.75. In Figure 3 one can compare the results printed in their original size. The HR image for ε = 1.25 (Figure 3b) has improved significantly on the LR images due to deconvolution, however some details on the column are still distorted. For the SR factor 1.75, the reconstructed image in Figure 3c is almost perfect. Next we compare performance of the BSR algorithm with two methods: interpolation technique and state-of-the-art SR method. The former technique ˇ consists of the MBD method proposed in Sroubek and Flusser (2005) followed by standard bilinear interpolation (BI) resampling. The MBD method first removes volatile blurs and then BI of the deconvolved image achieves the desired spatial resolution. The latter method, which we will call herein a “standard SR algorithm”, is a MAP formulation of the SR problem proposed, e.g., in Hardie et al. (1997), Segall et al. (2004). This method uses a MAP framework for the joint estimation of image registration parameters (in our case only translation) and the HR image, assuming only the sensor blur (U) and no volatile blurs. For an image prior, we use edge preserving Huber Markov Random Fields (Capel, 2004).
121
IMAGE FUSION
(a) LR
(b) e = 1.25
(c) e = 1.75
Figure 3. BSR of simulated data: (a) one of six LR images with the downsampling factor 2; (b) BSR for ε = 1.25; (c) BSR for ε = 1.75.
In the case of BSR, Section 3 has shown that two distinct approaches exist for blur estimation. Either we use the naive approach in (8) that directly utilizes the MBD formulation, or we apply the intrinsically SR approach summarized in (14). Altogether we have thus four distinct methods for comparison: standard SR approach, MBD with interpolation, BSR with naive blur regularization and BSR with intrinsic blur regularization. Using the original image and PSFs in Figure 2, six LR images (see one LR image in Figure 3a) were generated as in the first experiment, only this time we added white Gaussian noise with SNR = 50 dB.1 Estimated HR images and volatile blurs for all four methods are in Figure 4. The standard SR approach in Figure 4a gives unsatisfactory results, since heavy blurring is present in the LR images and the method assumes only the sensor blur and no volatile blurs. (For this reason, we do not show volatile blurs in this case.) The MBD method in Figure 4b ignores the decimation operator and thus the estimated volatile blurs are similar to LR projections of the original blurs. Despite the fact that blind deconvolution in the first stage performed well, many details are still missing since interpolation in the second stage cannot properly recover high-frequency information. The signal-to-noise ratio is defined as SNR = 10 log(σ 2f /σn2 ), where σ f and σn are the image and noise standard deviations, respectively. 1
ˇ F. SROUBEK ET AL.
122
1 2 3 4
(a)
1 4 7
1234
(b)
1 4 7
1 4 7
(c)
1 4 7
(d)
Figure 4. Comparison of four different SR approaches (ε = 2): (a) standard SR method, (b) MBD followed by bilinear interpolation, (c) naive BSR approach and (b) proposed intrinsic BSR approach. Volatile blurs estimated by each method, except in the case of standard SR, are in the top row. Due to blurring, the standard SR method in (a) failed to reconstruct the HR image. MBD in (b) provided a good estimate of the blurs in the LR scale and performed correct deconvolution but the HR image lacks many details as simple interpolation increased resolution. Both BSR approaches in (c) and (d) gave close to perfect results. However in the case of the naive approach, inaccurate blur regularization resulted in several artifacts in the HR image.
Both the naive and the intrinsic BSR methods outperformed the previous approaches and the intrinsic one provides a close-to-perfect HR image. Due to the inaccurate regularization term in the naive approach, estimated blurs contain tiny erroneous components that resulted in artifacts in the HR image (Figure 4c). However, a more strict and accurate regularization term in the case of the intrinsic BSR approach improved results, which one can see in Figure 4d. 5.2. REAL DATA
The next two experiments demonstrate the true power of our fusion algorithm. We used real photos acquired with two different acquisition devices: webcamera and standard digital camera. The webcam was Logitech QuickCam for Notebooks Pro with the maximum video resolution 640 × 480 and the minimum shutter speed 1/10s. The digital camera was 5 Mpixel Olympus C5050Z equipped with 3× optical zoom. In both experiments we used cross-correlation to roughly register the LR images.
123
IMAGE FUSION
1 8 1
(a)
8
(b)
Figure 5. Reconstruction of images acquired with a webcam (ε = 2): (a) one of ten LR frames extracted from a short video sequence captured with the webcam, zero-order interpolation; (b) HR image and blurs estimated by the BSR algorithm. Note that many facial features, such as glasses, are not apparent in the LR image, but are well reconstructed in the HR image.
In the first one we hold the webcam in hands and captured a short video sequence of a human face. Then we extracted 10 consecutive frames and considered a small section of size 40 × 50. One frame with zero-order interpolation is in Figure 5a. The other frames look similar. The long shutter speed (1/10s) together with the inevitable motion of hands introduced blurring into the images. In this experiment, the SR factor was set to 2. The proposed BSR algorithm removed blurring and performed SR correctly as one can see in Figure 5b. Note that many facial features (eyes, glasses, mouth) indistinguishable in the original LR image became visible in the HR image. The second experiment demonstrates a task of license plate recognition. With the digital camera we took eight photos, registered them with crosscorrelation and cropped each to a 100 × 50 rectangle. All eight cuttings printed in their original size (no interpolation), including one image enlarged with zero-order interpolation, are in Figure 6a. Similar to the previous experiment, the camera was held in hands, and due to the longer shutter speed, the LR images exhibit subtle blurring. We set the SR factor to 5/3. In order to
ˇ F. SROUBEK ET AL.
124
(a)
(b)
(c)
Figure 6. Reconstruction of images acquired with a digital camera (ε = 5/3): (a) eight LR images, one enlarged with zero-order interpolation; (b) HR image estimated by the BSR algorithm; (c) image acquired with optical zoom 1.7×. The BSR algorithm achieved reconstruction comparable to the image with optical zoom.
better assess the obtained results we took one additional image with optical zoom 1.7× (close to the desired SR factor 5/3). This image served as the ground truth; see Figure 6c. The proposed BSR method returned a well reconstructed HR image (Figure 6b), which is comparable to the ground truth acquired with the optical zoom. 6. Conclusions In this chapter we proposed a method for improving visual quality and spatial resolution of digital images acquired by low-resolution sensors. The method is based on fusing several images (channels) of the same scene. It consists of three major steps – image registration, blind deconvolution and superresolution enhancement. We reviewed all three steps and we paid special attention to superresolution fusion. We proposed a unifying system that simultaneously estimates image blurs and recovers the original undistorted image, all in high resolution, without any prior knowledge of the blurs and original image. We accomplished this by formulating the problem as constrained least squares
IMAGE FUSION
125
energy minimization with appropriate regularization terms, which guarantees a close-to-perfect solution. Showing the good performance of the method on real data, we demonstrated its capability to improve the image quality significantly and, consequently, to make the task of object detection and identification much easier for human observers as well as for automatic systems. We envisage the application of the proposed method in security and surveillance systems.
Acknowledgements This work has been supported by Czech Ministry of Education under project No. 1M0572 (Research Center DAR), and by the Grant Agency of the Czech Republic under project No. 102/04/0155.
References Aubert, G. and Kornprobst, P. (2002) Mathematical Problems in Image Processing, New York, Springer Verlag. Bentoutou, Y., Taleb, N., Mezouar, M., Taleb, M., and Jetto, L. (2002) An Invariant Approach for Image Registration in Digital Subtraction Angiography, Pattern Recognition 35, 2853– 2865. Capel, D. (2004) Image Mosaicing and Super-Resolution, New York, Springer. Chan, T. and Wong, C. (1998) Total Variation Blind Deconvolution, IEEE Transactions on Image Processing 7, 370–375. Chen, Y., Luo, Y., and Hu, D. (2005) A General Approach to Blind Image Super-resolution Using a PDE Framework, In Proceedings of SPIE, Vol. 5960, pp. 1819–1830. Farsiu, S., Robinson, M., Elad, M., and Milanfar, P. (2004) Fast and Robust Multiframe Super Resolution, IEEE Transactions on Image Processing 13, 1327–1344. Farsui, S., Robinson, D., Elad, M., and Milanfar, P. (2004) Advances and Challenges in SuperResolution, International Journal on Imaging System and Technology 14, 47–57. Flusser, J., Boldyˇs, J., and Zitov´a, B. (2003) Moment Forms Invariant to Rotation and Blur in Arbitrary Number of Dimensions, IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 234–246. Flusser, J. and Suk, T. (1998) Degraded Image Analysis: an Invariant Approach, IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 590–603. Flusser, J., Suk, T., and Saic, S. (1996) Recognition of Blurred Images by the Method of Moments, IEEE Transactions on Image Processing 5, 533–538. Flusser, J. and Zitov´a, B. (1999) Combined Invariants to Linear Filtering and Rotation, International Journal of Pattern Recognition Artificial Intelligence. 13, 1123–1136. Flusser, J., Zitov´a, B., and Suk, T. (1999) Invariant-based Registration of Rotated and Blurred Images, In I. S. Tammy (ed.), Proceedings IEEE 1999 International Geoscience and Remote Sensing Symposium, Los Alamitos, IEEE Computer Society, pp. 1262–1264. Giannakis, G. and Heath, R. (2000) Blind Identification of Multichannel FIR Blurs and Perfect Image Restoration, IEEE Transactions on Image Processing 9, 1877–1896.
126
ˇ F. SROUBEK ET AL.
Haindl, M. (2000) Recursive Model-based Image Restoration, In Proceedings of the 15th International Conference on Pattern Recognition, Vol. III, Piscataway, NJ, IEEE Press, pp. 346– 349. Hardie, R., Barnard, K., and Armstrong, E. (1997) Joint MAP Registration and High-Resolution Image Estimation Using a Sequence of Undersampled Images, IEEE Transactions on Image Processing 6, 1621–1633. Harikumar, G. and Bresler, Y. (1999) Perfect Blind Restoration of Images Blurred by Multiple Filters: Theory and Efficient Algorithms, IEEE Transactions on Image Processing 8, 202– 219. Kubota, A., Kodama, K., and Aizawa, K. (1999) Registration and Blur Estimation Methods for Multiple Differently Focused Images, In Proceedings International Conference on Image Processing, Vol. II, pp. 447–451. Kundur, D. and Hatzinakos, D. (1996a) Blind Image Deconvolution, IEEE Signal Processing Magazine 13, 43–64. Kundur, D. and Hatzinakos, D. (1996b) Blind Image Deconvolution Revisited, IEEE Signal Processing Magazine 13, 61–63. Lagendijk, R., Biemond, J., and Boekee, D. (1990) Identification and Restoration of Noisy Blurred Images Using the Expectation-maximization Algorithm, IEEE Transanctions on Acoustics, Speech Signal Processing 38, 1180–1191. Molina, R., Vega, M., Abad, J., and Katsaggelos, A. (2003) Parameter Estimation in Bayesian High-resolution Image Reconstruction With Multisensors, IEEE Transactions on Image Processing 12, 1655–1667. Myles, Z. and Lobo, N. V. (1998) Recovering Affine Motion and Defocus Blur Simultaneously, IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 652–658. Nguyen, N., Milanfar, P., and Golub, G. (2001) Efficient Generalized Cross-validation With Applications to Parametric Image Restoration and Resolution Enhancement, IEEE Transactions on Image Processing 10, 1299–1308. Pai, H.-T. and Bovik, A. (2001) On Eigenstructure-based Direct Multichannel Blind Image Restoration, IEEE Transactions on Image Processing 10, 1434–1446. Panci, G., Campisi, P., Colonnese, S., and Scarano, G. (2003) Multichannel Blind Image Deconvolution Using the Bussgang Algorithm: Spatial and Multiresolution Approaches, IEEE Transactions on Image Processing 12, 1324–1337. Park, S., Park, M., and Kang, M. (2003) Super-resolution Image Reconstruction: a Technical Overview, IEEE Signal Processing Magazine 20, 21–36. Reeves, S. and Mersereau, R. (1992) Blur Identification by the Method of Generalized Crossvalidation, IEEE Transactions on Image Processing 1, 301–311. Segall, C., Katsaggelos, A., Molina, R., and Mateos, J. (2004) Bayesian Resolution Enhancement of Compressed Video, IEEE Transactions on Image Processing 13, 898–911. Shechtman, E., Caspi, Y., and Irani, M. (2005) Space-time Super-resolution, IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 531–545. ˇ Sroubek, F. and Flusser, J. (2003) Multichannel Blind Iterative Image Restoration, IEEE Transactions on Image Processing 12, 1094–1106. ˇ Sroubek, F. and Flusser, J. (2005) Multichannel Blind Deconvolution of Spatially Misaligned Images, IEEE Transactions on Image Processing 14, 874–883. ˇ Sroubek, F. and Flusser, J. (2006) Resolution Enhancement Via Probabilistic Deconvolution of Multiple Degraded Images, Pattern Recognition Letters 27, 287–293. Wirawan, Duhamel, P., and Maitre, H. (1999) Multi-channel High Resolution Blind Image Restoration, In Proceedings of IEEE ICASSP, pp. 3229–3232.
IMAGE FUSION
127
Woods, N., Galatsanos, N., and Katsaggelos, A. (2003) EM-based Simultaneous Registration, Eestoration, and Interpolation of Super-resolved Images, In Proceedings of IEEE ICIP, Vol. 2, pp. 303–306. Woods, N., Galatsanos, N., and Katsaggelos, A. (2006) Stochastic Methods for Joint Registration, Restoration, and Interpolation of Multiple Undersampled Images, IEEE Transactions on Image Processing 15, 201–213. Yagle, A. (2003) Blind Superresolution From Undersampled Blurred Measurements, In Advanced Signal Processing Algorithms, Architectures, and Implementations XIII, Vol. 5205, Bellingham, pp. 299–309, SPIE. Zhang, Y., Wen, C., and Zhang, Y. (2000) Estimation of Motion Parameters from Blurred Images, Pattern Recognition Letters 21, 425–433. Zhang, Y., Wen, C., Zhang, Y., and Soh, Y. C. (2002) Determination of Blur and Affine Combined Invariants by Normalization, Pattern Recognition 35, 211–221. Zhang, Z. and Blum, R. (2001) A Hybrid Image Registration Technique for a Digital Camera Image Fusion Application, Information Fusion 2, 135–149. Zitov´a, B. and Flusser, J. (2003) Image Registration Methods: a Survey, Image and Vision Computing 21, 977–1000. Zitov´a, B., Kautsky, J., Peters, G., and Flusser, J. (1999) Robust Detection of Significant Points in Multiframe Images, Pattern Recognition Letters 20, 199–206.
This page intentionally blank
NONLINEAR STATISTICAL SIGNAL PROCESSING: A PARTICLE FILTERING APPROACH
J. V. Candy (
[email protected]) University of California, Lawrence Livermore National Lab. & Santa Barbara Livermore, CA 94551
Abstract. An introduction to particle filtering is discussed starting with an overview of Bayesian inference from batch to sequential processors. Once the evolving Bayesian paradigm is established, simulation-based methods using sampling theory and Monte Carlo realizations are discussed. Here the usual limitations of nonlinear approximations and non-Gaussian processes prevalent in classical nonlinear processing algorithms (e.g. Kalman filters) are no longer a restriction to perform Bayesian inference. It is shown how the underlying hidden or state variables are easily assimilated into this Bayesian construct. Importance sampling methods are then discussed and shown how they can be extended to sequential solutions implemented using Markovian state-space models as a natural evolution. With this in mind, the idea of a particle filter, which is a discrete representation of a probability distribution, is developed and shown how it can be implemented using sequential importance sampling/resampling methods. Finally, an application is briefly discussed comparing the performance of the particle filter designs with classical nonlinear filter implementations. Key words: particle filtering, Bayesian processing, Bayesian approach, simulation-based sampling, nonlinear signal processing
1. Introduction In this chapter we develop the “Bayesian approach” to signal processing for a variety of useful model sets. It features the next generation of processor that has recently been enabled with the advent of high speed/high throughput computers. The emphasis is on nonlinear/non-Gaussian problems, but classical techniques are included as special cases to enable the reader familiar with such methods to draw a parallel between the approaches. The common ground is the model sets. Here the state-space approach is emphasized because of its inherent applicability to a wide variety of problems, both linear and nonlinear, as well as time invariant and time-varying, including what 129 J. Byrnes (ed.), Imaging for Detection and Identification, 129–150. C 2007 Springer.
130
J. V. CANDY
has become popularly termed “physics-based” models. Here we discuss the next generation of processors that will clearly dominate the future of modelbased signal processing for years to come (Candy, 2006). This chapter discusses a unique perspective of signal processing from the Bayesian viewpoint in contrast to the pure statistical approach. The underlying theme of this chapter is the Bayesian approach which is uniformly developed and followed throughout.
2. Bayesian Approach to Signal Processing In this section we motivate the idea of Bayesian estimation from the purely probabilistic perspective, that is, we do not consider underlying models at all, just densities and distributions. Modern statistical signal processing techniques evolve directly from a Bayesian perspective, that is, they are cast into a probabilistic framework using Bayes’ theorem as the fundamental construct. Bayesian techniques are constructed simply around Bayes’ theorem. More specifically, the information about the random signal, x(t), required to solve a vast majority of estimation/processing problems is incorporated in the underlying probability distribution generating the process. For instance, the usual signal enhancement problem is concerned with providing the “best” (in some sense) estimate of the signal at time t based on all of the data available at that time. The filtering distribution provides that information directly in terms of its underlying statistics. That is, by calculating the statistics of the process directly from the filtering distribution, the enhanced signal can be extracted using a variety of estimators like maximum a posteriori, maximum likelihood, minimum mean-squared error, accompanied by a variety of performance statistics such as error covariances and bounds (Candy, 2006; Sullivan and Candy, 1997). We cast this discussion into a dynamic variable/parameter structure by defining the “unobserved” signal or equivalently “hidden” variables as the set of N x -vectors, {x(t)}, t = 0, . . . , N . On the other hand, we define the observables or equivalently measurements as the set of N y -vectors, {y(t)}, t = 0, . . . , N considered to be conditionally independent of the signal variables. The goal in recursive Bayesian estimation is to sequentially (in-time) estimate the joint posterior distribution, Pr(x(0), . . . , x(N ); y(0), . . . , y(N )). Once the posterior is estimated, then many of the interesting statistics characterizing the process under investigation can be exploited to extract meaningful information. We start by defining two sets of random (vector) processes: X t := {x(0), . . . , x(t)} and Yt := {y(0), . . . , y(t)}. Here we can consider X t to be the set of dynamic random variables or parameters of interest and Yt as the
NONLINEAR STATISTICAL SIGNAL PROCESSING
131
set of measurements or observations of the desired process.1 In any case we start with Bayes’ theorem for the joint posterior distribution as Pr(X t |Yt ) =
Pr(Yt |X t ) × Pr(X t ) . Pr(Yt )
(1)
In Bayesian theory, the posterior defined by Pr(X t |Yt ) is decomposed in terms of the prior Pr(X t ), its likelihood Pr(Yt |X t ) and the evidence or normalizing factor, Pr(Yt ). Each has a particular significance in this construct. It has been shown (Doucet et al., 2001) the joint posterior distribution can be expressed, sequentially, as the joint sequential Bayesian posterior estimator as Pr(y(t)|x(t)) × Pr(x(t)|x(t − 1)) Pr(X t |Yt ) = Pr(X t−1 |Yt−1 ). (2) Pr(y(t)|Yt−1 ) This result is satisfying in the sense that we need only know the joint posterior distribution at the previous stage, t − 1, scaled by a weighting function to sequentially propagate the posterior to the next stage, that is, NEW
WEIGHT
OLD
Pr(X t |Yt ) = W(t, t − 1) × Pr(X t−1 |Yt−1 ) .
(3)
where the weight is defined by Pr(y(t)|x(t)) × Pr(x(t)|x(t − 1)) W(t, t − 1) := Pr(y(t)|Yt−1 ) Even though this expression provides the full joint posterior solution, it is not physically realizable unless the distributions are known in closed form and the underlying multiple integrals or sums can be analytically determined. In fact, a more useful solution is the marginal posterior distribution (Doucet et al., 2001) given by the update recursion2 as Likelihood
Prior
Posterior Pr(y(t)|x(t)) × Pr(x(t)|Yt−1 ) Pr(x(t)|Yt ) = Pr(y(t)|Yt−1 )
(4)
Evidence
where we can consider the update or filtering distribution as a weighting of 1
In Kalman filtering theory, the X t are considered the states or hidden variables not necessarily observable directly, while the Yt are observed or measured directly. 2 Note that this expression precisely satisfies Bayes’ rule as illustrated in the equation.
132
J. V. CANDY TABLE I. Sequential Bayesian processor for filtering posterior Prediction Pr(x(t)|Yt−1 ) = Pr(x(t)|x(t − 1)) × Pr(x(t − 1)|Yt−1 ) d x(t − 1) Update/Posterior Pr(x(t)|Yt ) = Pr(y(t)|x(t)) × Pr(x(t)|Yt−1 )/Pr(y(t)|Yt−1 ) Initial Conditions x(0) P(0) Pr(x(0)|Y0 )
the prediction distribution as in the full joint case above, that is, UPDATE
WEIGHT
PREDICTION
Pr(x(t)|Yt ) = Wc (t, t − 1) × Pr(x(t)|Yt−1 )
(5)
where the weight in this case is defined by Wc (t, t − 1) :=
Pr(y(t)|x(t)) . Pr(y(t)|Yt−1 )
We summarize the sequential Bayesian processor in Table I. These two sequential relations form the theoretical foundation of many of the sequential particle filter designs. Next we consider the idea of Bayesian importance sampling.
3. Monte Carlo Approach In signal processing, we are interested in some statistical measure of a random signal or parameter usually expressed in terms of its moments. For example, suppose we have some signal function, say f (X ), with respect to some underlying probabilistic distribution, Pr(X ), then a typical measure to seek is its performance “on the average” which is characterized by the expectation E X { f (X )} = f (X )Pr(X ) d X. (6) Instead of attempting to use numerical integration techniques, stochastic sampling techniques known as Monte Carlo (MC) integration have evolved as an alternative. The key idea embedded in the MC approach is to represent the required distribution as a set of random samples rather than a specific analytic function (e.g. Gaussian). As the number of samples becomes large, they provide an equivalent representation of the distribution enabling moments to be estimated directly.
NONLINEAR STATISTICAL SIGNAL PROCESSING
133
MC integration draws samples from the required distribution and then forms sample averages to approximate the sought after distributions, that is, it maps integrals to discrete sums. Thus, MC integration evaluates Equation 6 by drawing samples, {X (i)} from Pr(X ) with “→” defined as drawn from. Assuming perfect sampling, this produces the estimated or empirical distribution given by N 1
ˆ Pr(X )≈ δ (X − X (i)) N i=1
which is a probability distribution of mass or weights, 1/N , and random variable or location X (i). Substituting the empirical distribution into the integral gives N 1
ˆ f (X )Pr(X ) dX ≈ f (X (i)) ≡ f (7) E X { f (X )} = N i=1 which follows directly from the sifting property of the delta function. Here f is said to be a MC estimate of E X { f (X )}. A generalization to the MC approach is known as importance sampling which evolves from: g(x) I = × q(x) d x for q(x) d x = 1. (8) g(x) d x = q(x) X X Here q(x) is referred to as the sampling distribution or more appropriately the importance sampling distribution, since it samples the target distribution, g(x), non-uniformly giving “more importance” to some values of g(x) than others. We say that the support of q(x) covers that of g(x), that is, the samples drawn from q(·) overlap the same region (or more) corresponding to the samples of g(·). The integral in Equation 8 can be estimated by: r Draw N -samples from ˆ q(x) : X (i) → q(x) and q(x) ≈
N 1
δ(x − X (i)); N i=1
r Compute the sample mean,
N N g(x) 1
g(X (i)) g(x) 1
I = Eq ≈ × δ(x − X (i)) d x = q(x) q(x) N i=1 N i=1 q(X (i)) Consider the case where we would like to estimate the expectation of the function of X given by f (X ); then choosing an importance distribution, q(x),
134
J. V. CANDY
that is similar to f (x) with covering support gives the expectation estimator p(x) f (x) × p(x) d x = f (x) × q(x) d x. (9) E p { f (x)} = q(x) X X If we draw samples, {X (i)}, i = 0, 1, . . . , N from the importance distribution, q(x), and compute the sample mean, then we obtain the importance sampling estimator N 1
p(x) p(X (i)) E p { f (x)} = × q(x) d x ≈ f (x) f (X (i)) × q(x) N i=1 q(X (i)) X (10) demonstrating the concept. NNote we are again assuming perfect (uniform) ˆ sampling with q(x) ≈ N1 i=1 δ(x − X (i)). The “art” in importance sampling is in choosing the importance distribution, q(·), that approximates the target distribution, p(·), as closely as possible. This is the principal factor effecting performance of this approach, since variates must be drawn from q(x) that cover the target distribution. Using the concepts of importance sampling, we can approximate the posterior distribution with a function on a finite discrete support. Since it is usually not possible to sample directly from the posterior, we use importance sampling coupled with an easy to sample proposal distribution, say q(X t |Yt ) – this is the crucial choice and design step required in Bayesian importance sampling methodology. Here X t = {x(0), . . . , x(t)} represents the set of dynamic variables and Yt = {y(0), . . . , y(t)}, the set of measured data as before. Therefore, starting with a function of the set of variables, say g(X t ), we would like to estimate its mean using the importance concept, that is, (11) E{g(X t )} = g(X t ) × Pr(X t |Yt ) d X t where Pr(X t |Yt ) is the posterior distribution. Using the MC approach, we would like to sample from this posterior directly and then use sample statistics to perform the estimation. Therefore we insert the proposal importance distribution, q(X t |Yt ) as before Pr(X t |Yt ) × q(X t |Yt ) d X t . (12) gˆ (t) := E{g(X t )} = g(X t ) q(X t |Yt ) Now applying Bayes’ rule to the posterior distribution, and defining an unnormalized weighting function as W (t) :=
Pr(Yt |X t ) × Pr(X t ) Pr(X t |Yt ) = q(X t |Yt ) q(X t |Yt )
(13)
NONLINEAR STATISTICAL SIGNAL PROCESSING
and substituting gives
gˆ (t) =
W (t) g(X t ) × q(X t |Yt ) d X t . Pr(Yt )
135
(14)
The evidence or normalizing distribution, Pr(Yt ), is very difficult to estimate; however, it can be eliminated in this expression by first replacing it by the total probability and inserting the importance distribution to give gˆ (t) =
E q {W (t) × g(X t )} E q {W (t)}
(15)
that are just expectations with respect to the proposal importance distribution. Thus, drawing samples from the proposal X t (i) → X t ∼ q(X t |Yt ) and using the MC approach (integrals to sums) leads to the desired result. That is, from the “perfect” sampling distribution, we have that N 1
ˆ t |Yt ) ≈ q(X δ(X t − X t (i)) N i=1
(16)
and therefore substituting, applying the sifting property of the Dirac delta function and defining the “normalized” weights by Wi (t) Wi (t) := N i=1 Wi (t)
for Wi (t) =
Pr(Yt |X t (i)) × Pr(X t (i)) q(X t (i)|Yt )
(17)
we obtain the final estimate gˆ (t) ≈
N
Wi (t) × g(X t (i)).
(18)
i=1
This importance estimator is biased, being the ratio of two sample estimators, but it can be shown that it asymptotically converges to the true statistic and the central limit theorem holds (Tanner, 1993; West and Harrison, 1997; Liu, 2001). Thus as the number of samples increase (N →∞), a reasonable estimate of the posterior is ˆ t |Yt ) ≈ Pr(X
N
Wi (t) × δ(X t − X t (i))
(19)
i=1
which is the goal of Bayesian estimation. This estimate provides a “batch” solution, but we must develop a sequential estimate from a more pragmatic perspective. The importance distribution can be modified to enable a sequential estimation of the desired posterior distribution, that is, we estimate the posterior,
136
J. V. CANDY
ˆ t−1 |Yt−1 ) using importance weights, W(t − 1). As a new sample bePr(X comes available, we estimate the new weights, W(t) leading to an updated ˆ t |Yt ). This means that in order to obtain the estimate of the posterior, Pr(X new set of samples, X t (i) ∼ q(X t |Yt ) sequentially, we must use the previous set of samples, X t−1 (i) ∼ q(X t−1 |Yt−1 ). Thus, with this in mind, the importance distribution, q(X t |Yt ) must admit a marginal distribution, q(X t−1 |Yt−1 ) implying a Bayesian factorization q(X t |Yt ) = q(X t−1 |Yt−1 ) × q(x(t)|X t−1 , Yt ).
(20)
This type of importance distribution leads to the desired sequential solution (Ristic et al., 2004). Recall the Bayesian solution to the batch posterior estimation problem (as before), Pr(y(t)|x(t)) × Pr(x(t)|x(t − 1)) Pr(X t |Yt ) = × Pr(X t−1 |Yt−1 ) Pr(y(t)|Yt−1 ) and recognizing the denominator as just the evidence or normalizing distribution and not a function of X t , we have Pr(X t |Yt ) ∝ Pr(y(t)|x(t)) × Pr(x(t)|x(t − 1)) × Pr(X t−1 |Yt−1 ).
(21)
Substituting this expression for the posterior in the weight relation, we have W (t) =
Pr(Yt |X t ) Pr(x(t)|x(t − 1)) Pr(X t−1 |Yt−1 ) = Pr(y(t)|x(t)) × × q(X t |Yt ) q(x(t)|X t−1 , Yt ) q(X |Yt−1 ) t−1 Previous Weight
(22) which can be written as W (t) = W (t − 1) ×
Pr(y(t)|x(t)) × Pr(x(t)|x(t − 1)) q(x(t)|X t−1 , Yt )
(23)
giving us the desired relationship—a sequential updating of the weight at each time-step. These results then enable us to formulate a generic Bayesian sequential importance sampling algorithm: 1. Choose samples from the proposed importance distribution: xi (t) ∼ q(x(t)|X t−1 , Yt ); 2. Determine the required conditional distributions: Pr(xi (t)|x(t − 1)),
Pr(y(t)|xi (t));
NONLINEAR STATISTICAL SIGNAL PROCESSING
137
3. Calculate the unnormalized weights: Wi (t) using Equation 23 with x(t) → xi (t); 4. Normalize the weights: Wi (t) of Equation 17; and 5. Estimate the posterior distribution: ˆ t |Yt ) = Pr(X
N
Wi (t)δ (x(t) − xi (t)) .
i=1
Once the posterior is estimated, then desired statistics evolve directly. Next we consider using a model-based approach incorporating state-space models (Candy, 2006).
4. Bayesian Approach to the State-Space Bayesian estimation relative to the state-space models is based on extracting the unobserved or hidden dynamic (state) variables from noisy measurement data. The Markovian state vector with initial distribution, Pr(x(0)), propagates temporally throughout the state-space according to the probabilistic transition distribution, Pr(x(t)|x(t − 1)), while the conditionally independent measurements evolve from the likelihood distribution, Pr(y(t)|x(t)). We see that the dynamic state variable at time t is obtained through the transition probability based on the previous state (Markovian property), x(t − 1), and the knowledge of the underlying conditional probability. Once propagated to time t, the dynamic state variable is used to update or correct based on the likelihood probability and the new measurement, y(t). This evolutionary process is illustrated in Fig. 1. Note that it is the knowledge of these conditional distributions that enable the Bayesian processing. The usual model-based constructs of the dynamic state variables indicate that there is an equivalence between the probabilistic distributions and the underlying state/measurement transition models. The functional discrete state representation given by x(t) = A (x(t − 1), u(t − 1), w(t − 1)) y(t) = C (x(t), u(t), v(t))
(24)
where w and v are the respective process and measurement noise sources with u a known input. Here A (·) is the nonlinear (or linear) dynamic state transition function and C (·) the corresponding measurement function. Both conditional probabilistic distributions embedded within the Bayesian framework are completely specified by these functions and the underlying noise
138
J. V. CANDY
Figure 1. Bayesian state-space probabilistic evolution.
distributions: Pr(w(t − 1)) and Pr(v(t)). That is, we have the equivalence3 A (x(t − 1), u(t − 1), w(t − 1)) ⇒ Pr (x(t)|x(t − 1)) ⇔ A (x(t)|x(t − 1)) C (x(t), u(t), v(t)) ⇒ Pr (y(t)|x(t)) ⇔ C (y(t)|x(t)) (25) Thus, the state-space model along with the noise statistics and prior distributions define the required Bayesian representation or probabilistic propagation model defining the evolution of the states and measurements through the transition probabilities. This is a subtle point that must be emphasized and illustrated in the diagram of Figure 1. Here the dynamic state variables propagate throughout the state-space specified by the transition probability (A (x(t)|x(t − 1))) using the embedded process model. That is, the “unobserved” state at time t − 1 depends the transition probability distribution to propagate to the state at time t. Once evolved, the state combines under the corresponding measurement at time t through the conditional likelihood distribution (C (y(t)|x(t))) using the embedded measurement model to obtain the required likelihood distribution. These events continue to evolve throughout with the states propagating through the state transition probability using the process model and the measurements generated by the states and likelihood using the measurement model. From the Bayesian perspective, the broad initial prior is scaled by the evidence and “narrowed” by the likelihood to estimate the posterior. With this in mind we can now return to the original Bayesian estimation problem, define it and show (at least conceptually) the solution. Using 3
We use this notation to emphasize the influence of both process (A) and measurement (C) representation on the conditional distributions.
139
NONLINEAR STATISTICAL SIGNAL PROCESSING
the state-space and measurement representation, the basic dynamic state estimation (signal enhancement) problem can now be stated in the Bayesian framework as: GIVEN a set of noisy uncertain measurements, {y(t)}, and known inputs, {u(t)}; t = 0, . . . , N along with the corresponding prior distributions for the initial state and process and measurement noise sources: Pr(x(0)), Pr(w(t − 1)), Pr(v(t)) as well as the conditional transition and likelihood probability distributions: Pr (x(t)|x(t − 1)), Pr (y(t)|x(t)) characterized by the state and measurement models: A (x(t)|x(t − 1)), C (y(t)|x(t)), FIND the “best” (filtered) estimate of the state, x(t), say xˆ (t|t) based on all of the data up to and including t, Yt ; that is, find the best estimate of the filtering posterior, Pr (x(t)|Yt ), and its associated statistics. Analytically, to generate the model-based version of the sequential Bayesian processor, we replace the transition and likelihood distributions with the conditionals of Equation 25. The solution to the signal enhancement or equivalently state estimation problem is given by the filtering distribution, Pr (x(t)|Yt ) which was solved previously in Section 2 (see Table I). We start with the prediction recursion characterized by the Chapman–Kolmogorov equation replacing transition probability with the implied model-based conditional, that is, Pr(x(t)|Yt−1 ) =
Embedded Process Model
Prior
A(x(t)|x(t − 1)) × Pr(x(t − 1)|Yt−1 ) d x(t − 1).
(26) Next we incorporate the model-based likelihood into the posterior equation with the understanding that the process model has been incorporated into the prediction Prediction ×Pr(x(t)|Yt−1 ) Pr(y(t)|Yt−1 ).
Embedded Measurement Model
Pr(x(t)|Yt ) =
C(y(t)|x(t))
(27)
Thus, we see from the Bayesian perspective that the sequential Bayesian processor employing the state-space representation of Equation 24 is straightforward. Next let us investigate a more detailed development of the processor resulting in a closed-form solution – the linear Kalman filter.
5. Bayesian Particle Filters Particle filtering (PF) is a sequential MC method employing the recursive estimation of relevant probability distributions using the concepts of “importance
140
J. V. CANDY
sampling” and the approximations of distributions with discrete random measures (Ristic et al., 2004; Godsill and Djuric, 2002; Djuric et al., 2003; Haykin and de Freitas, 2004; Doucet and Wang, 2005). The key idea is to represent the required posterior distribution by a set of N p -random samples, the particles, with associated weights, {xi (t), Wi (t)}; i = 1, . . . , N p , and compute the required MC estimates. Of course, as the number of samples becomes very large the MC representation becomes an equivalent characterization of the analytical description of the posterior distribution. Thus, particle filtering is a technique to implement recursive Bayesian estimators by MC simulation. It is an alternative to approximate Kalman filtering for nonlinear problems (Candy, 2006; Sullivan and Candy, 1997; Ristic et al., 2004). In PF continuous distributions are approximated by “discrete” random measures composed of these weighted particles or point masses where the particles are actually samples of the unknown or hidden states from the state-space and the weights are the associated “probability masses” estimated using the Bayesian recursions as shown in Fig. 2. From the figure we see that associated with each particle, xi (t) is a corresponding weight or (probability)
Figure 2. Particle filter representation of posterior probability distribution in terms of weights (probabilities) and particles (samples).
NONLINEAR STATISTICAL SIGNAL PROCESSING
141
mass, Wi (t). Therefore knowledge of this random measure, {xi (t), Wi (t)} characterizes the empirical posterior distribution – an estimate of the filtering posterior, that is, ˆ Pr(x(t)|Y t) ≈
Np
Wi (t)δ(x(t) − xi (t))
i=1
at a particular instant of time t. Importance sampling plays a crucial role in state-space particle algorithm development. PF does not involve linearizations around current estimates, but rather approximations of the desired distributions by these discrete measures. In comparison, the Kalman filter recursively estimates the conditional mean and covariance that can be used to characterize the filtering posterior Pr(x(t)|Yt ) under Gaussian assumptions (Candy, 2006). In summary, particle filters are sequential MC based “point mass” representation of probability distributions. They only require a state-space representations of the underlying process to provide a set of particles that evolve at each time step leading to an instantaneous approximation of the target posterior distribution of the state at time t given all of the data up to that time. Fig. 2 illustrates the evolution of the posterior at a particular time step. Here we see the estimated posterior based on 21-particles (non-uniformly spaced) and we select the 5th particle and weight to illustrate the instantaneous approximation ˆ at time t for xi versus Pr(x(t)|Y t ). Statistics are calculated across the ensemble created over time to provide the inference estimates of the states. For example, the minimum mean-squared error (MMSE) estimate is easily determined by averaging over xi (t), since ˆ xˆ mmse (t) = x(t)Pr(x(t)|Yt ) d x ≈ x(t)Pr(x(t)|Y t) dx =
Np Np 1
1
x(t)Wi (t)δ(x(t) − xi (t)) = Wi (t)xi (t) N p i=1 N p i=1
while the maximum a posterior (MAP) estimate is simply determined by finding the sample corresponding to the maximum weight of xi (t) across the ensemble at each time step, that is, ˆ xˆ MAP (t) = max{Pr(x(t)|Y t )}. xi
(28)
The sequential importance sampling solution to the recursive Bayesian state estimation problem was given previously starting with the recursive form for the importance distribution as q(X t |Yt ) = q(X t−1 |Yt−1 ) × q(x(t)|X t−1 , Yt )
142
J. V. CANDY
and evolving to the recursive expression for the importance weights as W (t) =
Pr(Yt |X t ) × Pr(X t ) Pr(X t |Yt ) = q(X t |Yt ) q(X t−1 |Yt−1 ) × q(x(t)|X t−1 , Yt ) Likelihood
Transition
Pr(y(t)|x(t)) × Pr(x(t)|x(t − 1)) W (t) = W (t − 1) × . q(x(t)|X t−1 , Yt )
(29)
The state-space particle filter (SSPF) evolving from this sequential importance sampling construct follows directly after sampling from the importance distribution, that is, xi (t) → q(x(t)|x(t − 1), y(t)) C(y(t)|xi (t)) × A(xi (t)|xi (t − 1)) Wi (t) = Wi (t − 1) × q(xi (t)|xi (t − 1), y(t)) Wi (t) Wi (t) = N p i=1 Wi (t)
(30)
and the filtering posterior is estimated by ˆ Pr(x(t)|Y t) ≈
Np
Wi (t) × δ(x(t) − xi (t)).
(31)
q (x(t)|X t−1 , Yt ) → q (x(t)|x(t − 1), y(t))
(32)
i=1
Assuming that
then the importance distribution is only dependent on [x(t − 1), y(t)] which is common when performing filtering, Pr(x(t)|Yt ), at each instant of time. This completes the theoretical motivation of the state-space-particle filters. Next we consider a pragmatic approach for implementation.
6. Bootstrap Particle Filter In the previous section we developed the generic SSPF from the simulationbased sampling perspective. The basic design tool when developing these algorithms is the choice of the importance sampling distribution q(·). One of the most popular realizations of this approach was developed by using the transition prior as the importance proposal (Doucet et al., 2001). This prior
NONLINEAR STATISTICAL SIGNAL PROCESSING
143
is defined in terms of the state-space representation by A(x(t)|x(t − 1)) → A(x(t − 1), u(t − 1), w(t − 1)) which is dependent on the known excitation and process noise statistics. It is given by qprior (x(t)|x(t − 1), Yt ) → Pr(x(t)|x(t − 1)). Substituting this choice into the expression for the weights gives Pr(y(t)|xi (t)) × Pr(x(t)|xi (t − 1)) qprior (x(t)|xi (t − 1), Yt ) = Wi (t − 1) × Pr(y(t)|xi (t))
Wi (t) = Wi (t − 1) ×
since the priors cancel. Note two properties of this choice of importance distribution. First, the weight does not use the most recent observation, y(t) and second this choice is easily implemented and updated by simply evaluating the measurement likelihood, C(y(t)|xi (t)); i = 1, . . . , N p for the sampled particle set. These weights require the particles to be propagated to time t before the weights can be calculated. This choice can lead to problems, since the transition prior is not conditioned on the measurement data, especially the most recent. Failing to incorporate the latest available information from the most recent measurement to propose new values for the states leads to only a few particles having significant weights when their likelihood is calculated. The transitional prior is a much broader distribution than the likelihood indicating that only a few particles will be assigned a large weight. Thus, the algorithm will degenerate rapidly. Thus, the SSPF algorithm takes the same generic form as before with the importance weights much simpler to evaluate with this approach. It has been called the bootstrap PF, the condensation PF, or the survival of the fittest algorithm (Doucet et al., 2001). One of the major problems with the importance sampling algorithms is the depletion of the particles, that is, they tend to increase in variance at each iteration. The degeneracy of the particle weights creates a problem that must be resolved before these particle algorithms can be of any pragmatic use in applications. The problem occurs because the variance of the importance weights can only increase in time (Doucet et al., 2001), thereby making it impossible to avoid this weight degradation. Degeneracy implies that a large computational effort is devoted to updating particles whose contribution to the posterior is negligible. Thus, there is a need to somehow resolve this problem to make the simulation-based techniques viable. This requirement leads to the idea of “resampling” the particles. The main objective in simulation-based sampling techniques is to generate i.i.d. samples from the targeted posterior distribution in order to perform
144
J. V. CANDY
statistical inferences extracting the desired information. Thus, the importance weights are quite critical since they contain probabilistic information about each specific particle. In fact, they provide us with information about “how probable a sample has been drawn from the target posterior” (van der Merwe, 2004; Schoen, 2006). Therefore, the weights can be considered acceptance probabilities enabling us to generate independent (approximately) samples ˆ from the posterior, Pr(x(t)|Yt ). The empirical distribution, Pr(x(t)|Y t ) is defined over a set of finite (N p ) random measures, {xi (t), Wi (t)}; i = 1, . . . , N p approximating the posterior, that is, ˆ Pr(x(t)|Y t) ≈
Np
Wi (t)δ(x(t) − xi (t)).
(33)
i=1
Resampling, therefore, can be thought of as a realization of enhanced particles, xˆ k (t), extracted from the original samples, xi (t) based on their “acceptance probability”, Wi (t) at time t, that is, statistically we have Pr xˆ k (t) = xi (t) = Wi (t) for i = 1, . . . , N p (34) or we write it symbolically as xˆ k (t) ⇒ xi (t) with the set of new particles, {xˆ k (t)}, replacing the old set, {xi (t)}. The fundamental concept in resampling theory is to preserve particles with large weights (large probabilities) while discarding those with small weights. Two steps must occur to resample effectively: (1) a decision, on a weight-by-weight basis, must be made to select the appropriate weights and reject the inappropriate; and (2) resampling must be performed to minimize the degeneracy. The overall strategy when coupled with importance sampling is termed sequential importance resampling (SIR) (Doucet et al., 2001). We illustrate the evolution of the particles through a variety of prediction-update time-steps in Fig. 3 where we see evolution of each set of particles through prediction, resampling and updating. We summarize the bootstrap particle filter algorithm in Table II. This completes the algorithm, next we apply it to a standard problem.
7. Example: Nonlinear Non-Gaussian Prediction We consider a well-known problem that has become a benchmark for many of the PF algorithms. It is highly nonlinear, non-Gaussian and nonstationary
NONLINEAR STATISTICAL SIGNAL PROCESSING
145
Figure 3. Evolution of particle filter weights and particles using the sequential state-space SIR algorithm: resampling, propagation (state-space transition model), update (state-space measurement likelihood), resampling . . . .
and evolves from studies of population growth (Doucet et al., 2001). The state transition and corresponding measurement model are given by 25x(t − 1) 1 x(t − 1) + + 8 cos (1.2(t − 1)) + w(t − 1) 2 1 + x 2 (t − 1) x 2 (t) y(t) = + v(t) 20
x(t) =
where t = 1.0, w ∼ N (0, 10) and v ∼ N (0, 1). The initial state is Gaussian distributed with x(0) ∼ N (0.1, 5). In terms of the state-space representation, we have 25x(t − 1) 1 x(t − 10) + 2 1 + x 2 (t − 1) b[u(t − 1)] = 8 cos (1.2(t − 1)) x 2 (t) . c[x(t)] = 20
a[x(t − 1)] =
146
J. V. CANDY TABLE II. Bootstrap SIR state-space particle filtering algorithm INITIALIZE: xi (0) → Pr(x(0))
Wi (0) =
1 Np
i = 1, . . . , N p
[sample]
IMPORTANCE SAMPLING: xi (t) ∼ A (x(t)|xi (t − 1)) ; w i ∼ Pr(w i (t))
[state transition]
Weight Update Wi (t) = Wi (t − 1) × C (y(t)|xi (t))
[weights]
Weight normalization Wi (t) Wi (t) = N p i=1 Wi (t) RESAMPLING: (xˆ i (t) ⇒ x j (t)) DISTRIBUTION: ˆ Pr(x(t)|Y t) ≈
Np
Wi (t)δ(x(t) − xˆ i (t))
[posterior distribution]
i=1
STATE ESTIMATION: xˆ (t|t) = E{x(t)|Yt } ≈
Np 1
xˆ i (t) N p i=1
ˆ Xˆ M A P (t) = max Pr(x(t)|Y t)
[conditional mean] [MAP]
In the Bayesian framework, we would like to estimate the instantaneous posterior filtering distribution, ˆ Pr(x(t)|Y t) ≈
Np
Wi δ(x(t) − xi (t))
(35)
i=1
where the unnormalized importance weight is given by Wi (t) = Wi (t − 1) ×
C (y(t)|x(t)) × A (x(t)|x(t − 1)) . q(x(t)|X t−1 , Yt )
(36)
The weight recursion for the bootstrap case is Wi (t) = Wi (t − 1) × C (y(t)|x(t)). Therefore, for our problem, the Bayesian processor has its state transition probability given by Pr (x(t)|x(t − 1)) → A (x(t)|x(t − 1)) ∼ N (x(t) : a[x(t − 1)], Rww ) . (37) Thus, the SIR algorithm becomes:
NONLINEAR STATISTICAL SIGNAL PROCESSING
147
r Draw samples (particles) from the state transition distribution: xi (t) → N (x(t) : a[x(t − 1)], Rww ) w i (t) → Pr(w(t)) ∼ N (0, Rww ) xi (t) =
1 25xi (t − 1) + 8 cos (1.2(t − 1)) + w i (t − 1) xi (t − 1) + 2 1 + xi2 (t − 1)
r Estimate the likelihood, C (y(t)|x(t)) → N (y(t) : c[x(t)], Rvv (t)) c[xi (t)] =
xi2 (t) 20
r Update and normalize the weight: Wi (t) = Wi (t)/ N p Wi (t) i=1 r Resample: xˆ i (t) ⇒ x j (t) r Estimate the instantaneous posterior: ˆ Pr(x(t)|Y t) ≈
Np
Wi δ(x(t) − xˆ i (t))
i=1
r Estimate the corresponding statistics: ˆ Xˆ map (t) = arg maxPr(x(t)|Y t) Xˆ mmse (t) = E{x(t)|Yt } =
Np
ˆ xˆ i (t)Pr(x(t)|Y t)
i=1
ˆ Xˆ median (t) = median Pr(x(t)|Y t )) . We show the simulated data in Fig. 4. In a we see the hidden state and b the noisy measurement. The estimated instantaneous posterior distribution surface for the state is shown in Fig. 5a while slices at selected instants of time are shown in b. Here we see that the posterior is clearly not unimodal and in fact we can see its evolution in time as suggested by Fig. 2 previously. The final state and measurement estimates are shown in Fig. 4b demonstrating the effectiveness of the PF bootstrap processor for this problem. Various ensemble estimates are shown (e.g. median, MMSE, MAP). It is clear that the extended Kalman filter gives a poor MMSE estimate, since the posterior is not Gaussian (unimodal). 8. Summary In this chapter we have provided an overview of nonlinear statistical signal processing based on the Bayesian paradigm. We have shown that the next generation processors are well-founded on MC simulation-based sampling
Figure 4. Population growth problem: (a) Simulated state with mean. (b) Simulated measurement with mean. (c) Ensemble of state estimates: median, EKF , MMSE , MAP. (d) Ensemble of measurement estimates: median, EKF, MMSE, MAP.
148 J. V. CANDY
Figure 5. Population growth problem: (a) instantaneous posterior surface. (b) time slices of the posterior (cross-section) at selected time-steps.
NONLINEAR STATISTICAL SIGNAL PROCESSING
149
150
J. V. CANDY
techniques. We reviewed the development of the sequential Bayesian processor using the state-space models. The popular bootstrap algorithm was outlined and applied to a standard problem used within the community to test the performance of a variety of PF techniques.
References Candy, J. V. (2006) Model-Based Signal Processing, New Jersey, John Wiley. Djuric, P., Kotecha, J., Zhang, J., Huang, Y., Ghirmai, T., Bugallo, M., and Miguez, J. (2003) Particle Filtering. IEEE Signal Processing Magazines 20(5), 19–38. Doucet, A., de Freitas, N., and Gordon, N. (2001) Sequential Monte Carlo Methods in Practice, New York, Springer-Verlag. Doucet, A. and Wang, X. (2005) Monte Carlo Methods for Signal Processing, IEEE Signal Processing Magazines 24(5), 152–170. Godsill, S. and Djuric, P. (2002) Special Issue: Monte Carlo methods for statistical signal processing. IEEE Transactions on Signal Processing, 50, 173–449. Haykin, S. and de Freitas, N. (2004) Special Issue: Sequential State Estimation: From Kalman Filters to Particle Filters. Proceedings of IEEE, 92(3), 399–574. Liu, J. (2001) Monte Carlo Strategies in Scientific Computing, New York, Springer-Verlag. Ristic, B., Arulampalam, S., and Gordon, N. (2004) Beyond the Kalman Filter: Particle Filters for Tracking Applications, Boston, Artech House. Schoen, T. (2006) Estimation of Nonlinear Dynamic Systems: Theory and Applications Linkopings Univ., Linkoping, Sweden PhD Dissertation. Sullivan, E. J. and Candy, J. V. (1997) Space-time Array Processing: The Model-based Approach, Journal of the Acoustical Society of America, 102(5), 2809–2820. Tanner, M. (1993) Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions, 2nd ed.; New York, Springer-Verlag. van der Merwe, R. (2004) Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models OGI School of Science & Engr., Oregon Health & Science Univ., PhD Dissertation. West, M. and Harrison, J. (1997) Bayesian Forecasting and Dynamic Models, 2nd ed., New York, Springer-Verlag.
POSSIBILITIES FOR QUANTUM INFORMATION PROCESSING∗ ˇ ep´an Holub Stˇ Charles University, Prague, Czech Republic
Abstract. This tutorial introduces quantum information science, a quickly developing area of multidisciplinary research. The basic idea of the field is to study physical systems obeying the laws of quantum mechanics to obtain promising possibilities for computation, image processing, data transfer and/or high precision detection. Quantum information science is an intersection of mathematics, physics and informatics. We explain basic ingredients that contribute to the general picture. Although we mention also some up-to-date experimental results, the focus will be on the theoretical description of quantum data transfer and a possible quantum computer. Key words: quantum mechanics, quantum information, quantum computer, Mach-Zehnder interferometer
1. Introduction Automatic processing of information is one of the most striking features of our present civilization. Origins of what we call “computer” date back to the first half of the twentieth century, when Alan Turing gave an ingenious mathematical description of a general algorithmic process (Turing, 1936). All our computers are in fact a kind of physical realization of the old fashioned device known as a Turing machine. From the mathematical point of view computers do not evolve, they are just faster and faster, due to enormous technological improvement, which obeys, since 1965, the famous prediction of Gordon Moore: the power of computer processing doubles every eighteen months. There are, however, ultimate physical restrictions for classical computers, given by the speed of light (electromagnetic field) and the size of atoms. At the sub-atomic level, the laws of quantum physics begin to predominate, which profoundly challenges the classical approach to information. Quantum ∗
The work is a part of the research project MSM 0021620839 financed by MSMT.
151 J. Byrnes (ed.), Imaging for Detection and Identification, 151–167. C 2007 Springer.
152
ˇ EP ˇ AN ´ HOLUB ST
mechanics, on the other hand, is not only a thread. More importantly, it offers possibilities which give rise to a new and fascinating field of research called quantum information science. The mathematical notion of information is usually based on symbols chosen from a finite alphabet, typically the famous zeros and ones. The physical part of the question is how to represent these symbols: which states of which physical system should distinguish between them. If the information is “encoded” by a quantum system the situation seems to be quite analogous. The crucial difference, however, is the fact that quantum systems can exist in superpositions of basic states, a strange kind of mixture of them. Quantum information science is basically an attempt to take advantage of this property.
2. The Difference Between Classical and Quantum Physics 2.1. MACH-ZEHNDER INTERFEROMETER
The difference between classical and quantum physics can be illustrated by a simple experiment called the Mach-Zehnder interferometer. The device is depicted in Figure 1. A and B are half-silvered mirrors, which split a beam of light in two parts, hence called beamsplitters. Half of the beam goes through the mirror, the second half is reflected. U and L are just ordinary mirrors. By the laws of optics, light changes its phase by π if reflected by a surface bounding an environment with higher refraction factor (i.e., lower speed of light). Therefore in our picture the only reflection not causing the phase shift is the reflection of the beam arriving from mirror U to mirror B, since the
Figure 1. Mach-Zehnder interferometer.
QUANTUM INFORMATION
153
reflection occurs between glass (foreground) and air (background). (The phase shift caused by the transition through the glass is neglected in our description.) Numbers 1 to 4 refer to possible positions of detectors. If we measure at positions 1 and 2, the same energy is detected by each detector. Measurement by detectors 3 and 4 (with detectors 1 and 2 removed) shows that there is no signal on detector 3, only on 4. The explanation is simple: the beam traveling to 3 through U is shifted by π with respect to the beam traveling through L. Therefore the destructive interference cancels the signal. On detector 4, on the other hand, the interference is constructive. This is the classical description of the experiment. When the amount of light is diminished under certain quantum, the light starts to behave as a particle, the famous photon. This means that the whole photon is either reflected or passes through the mirror, it does not divide in two. The probability that the photon will be detected by detector 1 (2 resp.) is exactly one half. This is confirmed by the experiment. A strange thing happens when we let the photon go through the mirror B without previous measurement. The classical probabilistic interpretation suggests that the probability will again be one half for both detectors 3 and 4. Indeed, with probability 1/4 the photon will be reflected both by A and B, and similarly for the other three combinations. But the experimental result is the same as in the interference model: the signal is detected always at 4, never at 3. It is now the task for quantum mechanics to explain this kind of phenomena.
2.2. THE POSTULATES OF QUANTUM MECHANICS
Quantum mechanics is a mathematical tool for the description of quantum phenomena, and is based on three postulates. Postulate 1 A quantum system with m possible observable values is described by the complex vector space Hm = Cm with inner product (called Hilbert space). A state of the system is a one-dimensional subspace of Hm . It is always represented by a unit vector u, traditionally written as |u. From the point of view of quantum information science the basic quantum system is two-dimensional, and is called qubit, as an analog to the classical bit. To stress the analogy, basis vectors of H2 are usually denoted by |0, and |1. Note that even if we restrict ourselves to a unit representation of the one-dimensional state subspace, it is not given uniquely. All vectors of the form eαi |u are equivalent expressions of the same state. Note also that
154
ˇ EP ˇ AN ´ HOLUB ST
the arithmetic expression of basis vectors is the canonical 1 0 |0 = , |1 = . 0 1 The basic difference between bit and qubit, already mentioned above, is the possibility of the system existing in a superposition of basis states, as for example the state |u =
|0 + |1 . √ 2
Postulate 2 The time evolution of a quantum system is given by a unitary operator U . Therefore, if a system is in state |u at time t0 , then it is in state U |u at time t1 . Obviously, the operator U depends on what happened to the system. It may be for example affected by an electromagnetic field of given intensity. In quantum information science we usually consider operators U as black boxes. This is again analogous to the classical approach to bits, when we for example speak about a NOT-gate, without examining its physical realization. An operator is unitary, by definition, if it satisfies U ∗U = Id, where U ∗ is the adjoint operator of U . It is natural to work with the matrix representation of operators. Then U ∗ is the transposed complex conjugate matrix of U . An equivalent definition of unitary operator is that it preserves inner product, and therefore also the norm. This is a natural requirement on the time evolution of quantum systems, since then a unit vector evolves again to a unit vector. It is also important that unitary transformations are invertible, it is therefore always possible to reverse the process. Probably the most useful unitary operator in quantum computing, as we shall see, is the Hadamard transform 1 1 1 H=√ . 2 1 −1 Verifying that it is unitary is a simple exercise. Postulate 3 Measurement of a quantum system is always related to a certain orthonormal basis. Consider a measurement of Hm in the orthonormal basis |b1 , . . . , |bm . After the measurement, the system collapses to one of the basis states |bi , and the outcome will be a value λi corresponding to that state. It is the eigenvalue of |bi with respect to some operator related to the basis and called an observable. If the measured system is in the state |u =
m i=1
αi |bi
QUANTUM INFORMATION
155
then the eigenvalue of |bi will be obtained with probability |αi |2 . Note that the sum of probabilities of all outcomes is one, since |u is a unit vector. The complex number αi is called the amplitude of probability. The measurement postulate is a very strange one. It claims that the result of a measurement is given only with certain probability. Quantum physics has no means to say more about the outcome; it claims that it is not the fault of the theory, it is nature, which is substantially random when it comes to measurements. Moreover, the collapse of the system is in general not a unitary operation. Therefore measurements do not obey Postulate 2. These facts have caused a lot of controversies. Albert Einstein, for example, insisted that the theory cannot be considered a complete description of physical reality (Einstein et al., 1935), and it seems that he never acquiesced to quantum mechanics. Nevertheless, the point of view we are describing here, advocated for example by Bohr (1935), became the mainstream. A very important fact is that the same system can be measured in different bases of our choice. In other words, we can measure different observables. To obtain respective probabilities, it is then necessary to express the state in the chosen basis (or to use some other linear algebra tricks). Let us give an example. Measure the qubit |u =
|0 + |1 √ 2
in the canonical basis |0, |1, with eigenvalues 1, −1, respectively. Then the outcome will be plus or minus one with probability 1/2, and the state of the system after the measurement will be |0 or |1. Note that the measurement is destructive. We will never be able to figure out which state has been measured. If, on the other hand, one measures the same state in the basis |b1 = |u =
|0 + |1 , √ 2
|b2 =
|0 − |1 , √ 2
then one obtains the eigenvalue of |b1 with probability one, and the state does not change. 2.3. MACH-ZEHNDER: THE SOLUTION OF THE RIDDLE
Let us now show how quantum mechanics explains the experimental results of the Mach-Zehnder interferometer. Observe that the interferometer consists of an experiment repeated twice in the same way, namely an interaction of a photon with a half-silvered mirror. The elementary experiment has two possible inputs: the photon arrives either from the glass-side, or from the
ˇ EP ˇ AN ´ HOLUB ST
156
silver-side of the mirror, and similarly two possible outputs: it leaves the mirror on the glass side or on the silver side. (In Figure 1 only one possible input for the mirror A is shown.) Schematically the situation can be depicted as follows:
The detection of the photon on one of the two paths is an observable. Denote the basis for this observable by |0, |1. The photon is in state |0 if it travels along the upper path (mirror U in Figure 1) and in state |1 if traveling along the lower path (mirror L). The key to the riddle is the superposition. The state of the photon can be a mixture of both basis states, and that’s exactly what, according to quantum mechanics, happens after the interaction of the photon with the beamsplitter. The transformation is given by |0 →
(|0 + |1) , √ 2
|1 →
(|0 − |1) . √ 2
Recall that |0, |1 are vectors of C2 : 0 1 . , |1 = |0 = 1 0 The matrix of the interaction is then 1 1 1 , √ 2 1 −1 which is the above Hadamard transform. The action of the beamsplitter on a photon in a state α|0 + β|1 can be portrayed as:
In our description of the experiment in Figure 1 we suppose that the photon begins in state |0. Therefore after the interaction with the mirror A it will be √ in state (|0 + |1)/ 2, and, by Postulate 3, the probability that the photon will be detected by the detector 1 (2 resp.) is exactly one half. After detection the photon will be in a basis state corresponding to the detector by which it was detected.
QUANTUM INFORMATION
157
An inspection of Figure 1 shows that the entire Mach-Zehnder interferometer can be schematized as:
The matrix of the second beamsplitter is 1 1 1 , √ 2 −1 1 and the overall effect of the apparatus is given by 1 1 1 1 1 1 01 ·√ = , √ 10 2 −1 1 2 1 −1 which agrees with experimental results: the photon which enters along the upper path will be detected in the lower one (and vice versa).
3. Complexity and Circuits An algorithm is a method to solve a problem, which consists of many (typically infinitely many) instances. For example, an instance of the traveling salesman problem (TSP) is a weighted graph, and the solution is the cheapest path through all nodes. An algorithmic solution of the problem must describe a finite and uniform procedure which solves all possible instances. It is not clear, a priori, which instructions should be accepted in an algorithm. One will surely not allow an instruction “find the cheapest path” in the algorithm for TSP, since instructions have to be clear and elementary. On the other hand, restricting the set of tools can result in too weak a concept of the algorithm. During the twentieth century the Turing machine (TM) was accepted as the correct expression of what we mean by algorithm. This general agreement is based on the fact that all alternative models are either weaker or equivalent to TM. Whence the Church–Turing thesis: Any problem, which can be algorithmically solved, can be solved by the TM.
The thesis cannot be proved, of course, since it hinges on an intuitive notion of algorithmic solution. In fact, it is something between a claim and a definition. A natural question arises: what about possible quantum computers? Could they do more than TM? The answer is negative. It is possible to simulate all quantum processes by the conventional computer, it just is very complicated and slow. Consequently, the right question is whether quantum computers can
158
ˇ EP ˇ AN ´ HOLUB ST
do something substantially faster than TM. The answer to this question is not so simple. We first have to specify what is meant by “substantially faster”, and therefore how the complexity of an algorithm should be measured. The usual concept of algorithmic complexity deals with a function f (n), which gives an upper bound for the number of elementary steps leading to the solution of any instance of length n. It turns out that given an algorithm, various kinds of classical computers do not differ too much in their ability to execute the algorithm. More precisely, if the complexity of an algorithm is given by a polynomial on the traditional TM, then it is given by a (smaller) polynomial also on our best contemporary computers. Somehow, since Turing, we have not invented better computers, we have only constructed faster TM. It seems, therefore, that whether an algorithm is polynomial is independent of the machine we use. In this respect (hypothetical) quantum computers can beat TM. As we shall see, there is a problem—although a bit artificial—which cannot be solved in polynomial time by TM, but is easy for a quantum computer. Even here, however, the situation is not so clear if we allow probabilistic TMs, i.e. algorithms which give the right answer with some required probability close to one. Of course, deterministic algorithms are always better, but if the error probability of a probabilistic TM is lower than the probability that your computer will be destroyed by a meteorite, it does not seem very reasonable to insist on deterministic algorithms. It is not known whether quantum probabilistic algorithms are stronger than classical probabilistic ones, and it seems unlikely that this could be proved in the near future, since the fact would imply extremely hard theoretical claims. From the theoretical point of view, therefore, quantum computers are not convincingly better than classical ones. In practice, however, quantum algorithms have an ace in the hole: there is a practical probabilistic quantum algorithm which factors large numbers. The intractability assumption of such factorizations is the cornerstone of many presently used secure algorithms. 3.1. BOOLEAN AND QUANTUM CIRCUITS
Boolean circuits are a convenient way to describe an algorithm for a given input length, since each algorithm can be seen as a device computing a boolean function. The input is encoded as a sequence of 0s and 1s, and the output as well. In the previous chapter we represented the beamsplitter in a way which recalls the schematic notation of logical gates. This was not by chance. We want to view quantum phenomena as building blocks for construction of algorithms. There is, however, one important difference between classical logical gates and quantum ones: quantum processes are all reversible, while
159
QUANTUM INFORMATION
for example NAND, which is a universal boolean gate, is irreversible. It is impossible to find out whether the resulting 1 came from two zeros or from 0,1. So far we have discussed the possibility that quantum computers are stronger than classical ones. Now it turns out that the possibilities of quantum gates may be limited since they have to be reversible. In reality, every classical circuit can be simulated in a reversible way, perhaps with some auxiliary inputs. There are universal reversible gates, for example the Toffoli gate T , which has three input and three output bits. It is defined by T : (a, b, c) → (a, b, c ⊕ ab), where ⊕ denotes the logical sum (i.e. XOR). The Toffoli gate is reversible, since T ◦ T = Id: (a, b, c) → (a, b, c ⊕ ab) → (a, b, c ⊕ ab ⊕ ab) = (a, b, c), and it is universal, since NAND(a, b) = T (a, b, 1). In order to simulate boolean circuits by quantum circuits, it is therefore enough to construct a quantum Toffoli gate, which is in principle possible. Every quantum circuit will have a lot of auxiliary bits as is clear from the following comparison. a
a
a
b
b
1
ab
ab b
classical NAND
reversible NAND
4. Quantum Algorithms So far three basic quantum algorithms which have better performance than classical algorithms are known: r Deutch-Jozsa algorithm, which decides whether a given boolean function is constant or balanced; r Quantum Fourier transform with several applications, the most important of which is Shor’s factoring algorithm; r Grover’s algorithm for database search. We have already noted that the most spectacular achievement of hypothetical quantum computers would be their ability to factor large integers. The
ˇ EP ˇ AN ´ HOLUB ST
160
Shor’s algorithm is, from the quantum point of view, a direct application of the Fourier transform, which is a well known tool able to detect periodicity. It allows to find the order of a chosen element a in the cyclic group Z N , where N is the factored number, which allows, with a high probability, to find a factor. The key capability of quantum computers is therefore fast computation of the Fourier transform. While the classical Fast Fourier Transform works in time O(N log N ), the Quantum Fourier Transform is polynomial in log N , which means an exponential speedup. Here we explain in detail a simple version of Deutch-Jozsa algorithm. Although it has limited importance for practical problems, the algorithm makes clear why the quantum approach can have advantages over the classical one. First we need some more theory regarding composite systems. 4.1. COMPOSITE SYSTEMS
Consider a quantum system made up of n smaller systems. Then we have the following additional Postulate 4 The system V consisting of systems V1 , V2 , . . . , Vn is the tensor product V = V1 ⊗ V2 ⊗ · · · ⊗ Vn . This means that it is a n1 di dimensional system, where di is the dimension of Vi , and the basis of V is the set {|v j1 ⊗ · · · ⊗ |v jn | ji = 1, . . . , di , i = 1, . . . , n}. We usually consider only systems composed of several qubits. They have dimension 2n , where n is the number of qubits. The notation of basis vectors |k1 ⊗ · · · ⊗ |kn , where ki ∈ {0, 1} is often simplified into |k1 . . . kn . This is a very clever notation, since if sequences k1 . . . kn are understood as binary numbers, one obtains the basis {|0, |1, |2, . . . , |2n − 1} of the composite system. For example the basis of the four dimensional system is {|00, |01, |10, |11}, which indicates its possible decomposition into two qubits. It is important to note that, although the composite system is a tensor product of smaller systems, it also contains states which cannot be written as tensor product of vectors from those systems; it cannot be decomposed. For
QUANTUM INFORMATION
161
instance, the state |w =
|00 + |11 √ 2
of the four dimensional system is not a product of two qubits, as is easy to verify. The two qubits are said to be entangled. Note that if two qubits are in the state |w, they will with probability 1/2 have both values corresponding to |0, or both values corresponding to |1. This implies a very strange fact, which challenges basic intuitions of classical physics. It is experimentally possible to prepare entangled qubits (photons, for instance), which can be then separated to a distance of many kilometers. If one of the qubits is measured, the outcome will immediately determine the outcome of the measurement of the distant entangled qubit. In other words, quantum mechanics allows non-local effects, which was one of the motives for Einstein’s disconcertion.
4.2. DEUTSCH’S ALGORITHM
Deutsch’s problem is linked to origins of quantum algorithms. We are given a black box function f : {0, 1}2 → {0, 1}, and have to decide whether it is constant or not. In the classical setting it is obvious that we need to make two queries, i.e. to learn both values of f , in order to decide the question. On the other hand, the question asks for just one bit of information: constant yes, or no? Unfortunately there is no way to ask just this question. But that is exactly what a quantum computer can do. Deutch’s algorithm shows how to decide the question by a single query. First it is important to clarify how a quantum black box function is given. The function f is in general not injective, therefore irreversible, and we have seen that all quantum circuits have to be reversible. The standard way to represent such functions is the following gate, with an auxiliary input and output:
x
x
Uf y
y f(x)
ˇ EP ˇ AN ´ HOLUB ST
162
The U f transform acts on the two qubit system and is defined on the basis vectors |x ⊗ |y by Uf
|x ⊗ |y −→ |x ⊗ |y ⊕ f (x). It is easy to see that it just permutes the basis vectors |00, |01, |10, |11. Note that U f ◦ U f = Id, therefore the transform is unitary. This is the first two-qubit gate we see here. It should not be confused with the beamsplitter gate, which is one-qubit, and in this context should be depicted as
where |v = H |u, or even better as
to indicate that we do not care about whether the Hadamard gate is physically realised by the beamsplitter or otherwise. The quantum circuit solving Deutsch’s problem is quite simple. It consists, apart from the black box, of three Hadamard gates: H
0
x
H
x
Uf H
1
y
y f(x)
s2
s1
s3
s4
We have indicated the four stages of the algorithm by vertical lines. In the initial stage, the two qubit are in the state s1 = |01. In the second stage we obtain s2 =
|0 + |1 |0 − |1 ⊗ √ . √ 2 2
What happens in the black box depends, of course, on the function f . The simplest case is f (0) = f (1) = 0, when U f is the identity. Therefore s3 =
|0 + |1 |0 − |1 ⊗ √ √ 2 2
for f (0) = f (1) = 0.
163
QUANTUM INFORMATION
If f (0) = f (1) = 1, the action of U f is given by |00 → |01
|01 → |00
|10 → |11
|11 → |10.
Then 1 1 s3 = U f (|00 − |01 + |10 − |11) = (|01 − |00 + |01 − |10) = 2 2 |0 + |1 |0 − |1 =− √ ⊗ √ . 2 2 Similarly we can proceed with the other two possibilities to obtain altogether ⎧ |0+|1 |0−|1 ⎪ ⎨ ± √2 ⊗ √2 if f (0) = f (1), s3 = ⎪ ⎩ ± |0−|1 √ √ ⊗ |0−|1 if f (0) = f (1). 2 2 Finally, s4 =
⎧ ⎪ ⎨ ±|0 ⊗
|0−|1 √ 2
if f (0) = f (1),
⎪ ⎩ ±|1 ⊗
|0−|1 √ 2
if f (0) = f (1).
Now it suffices to measure the first qubit. The eigenvalue of |0 will disclose that f is constant, the eigenvalue of |1 the opposite. Deutch’s algorithm shows the basic idea of all quantum algorithms: the superposition of states allows, in a sense, to compute many values at the same time. Note that after the Hadamard transforms, a combination of all basis vectors enters the gate U f . On the other hand, this alone does not mean that after the evaluation of the combination we have a direct approach to any value we wish to know, since there is no obvious way to extract the desired information from the result. For example, if we measured the first qubit of the state s3 , we would get both eigenvalues with the same probability regardless of f .
5. Quantum Information Information science is not limited to algorithms and solving problems. An equally important task is sending information over a channel, and doing so efficiently and often also secretly, which gives rise to classical coding theory and cryptology. What happens if we consider a channel which has quantummechanical properties, instead of the classical one? Quantum mechanics suggest that quantum systems contain much more information than classical bits. Superposition of states allows parallel transmission of a lot of information.
164
ˇ EP ˇ AN ´ HOLUB ST
Suppose for example that we want to transmit four bits bi , i = 0, . . . , 3 of classical information. We can prepare a two qubit system in superposition 3 1 |00 |01 |10 |11 + (−1)b1 + (−1)b2 + (−1)b3 , (−1)bi |i = (−1)b0 2 i=0 2 2 2 2
which contains four bits of information encoded as signs of the basis states. This is a promising idea, but alas, it does not work. The irreparable reason is that it is impossible to gain the information from the system. There is no way to distinguish reliably between the sixteen possible states suggested above. By a measurement we obtain again just two bits of information, and maybe not even that, if the measurement is not reasoned: we have already seen that to measure the state |0 + |1 √ 2 yields no information at all, since the outcome is a completely random bit. The problem is that each measurement, by Postulate 3, is destructive. Some information is always lost as soon as we measure a superposition of basis states. 5.1. NO-CLONING THEOREM
One may think of the following solution. Make many copies of the received state, and then by many different experiments, for example by repeated measurements in different bases, learn what the original state was like. Unfortunately, this idea does not help either. Quantum mechanics prohibits copying states! More precisely, it allows to make a copy only of basis states, which corresponds to the classical copying of information, and hence does not yield an advantage. The result is known as the No-cloning theorem, and we are going to prove it. First we have to formulate the claim properly. Suppose we have an unknown state |u and want to copy it. This means we want to implement the transformation |u ⊗ |0
U
−→
|u ⊗ |u.
The choice of |0 as the “blank state” is arbitrary, but it has to be firm. We do not know anything about u, and therefore cannot choose the blank state to somehow fit the copied one. By Postulate 2, the transformation has to be unitary. The No-cloning theorem now claims that if U is a unitary transformation then the desired copying equality holds only for two states |a, |b.
QUANTUM INFORMATION
165
To prove the theorem let U (|a ⊗ |0) = |a ⊗ |a, U (|b ⊗ |0) = |b ⊗ |b. Consider now a general state α|a + β|b. Since U is a linear operator, we have U ((α|a + β|b) ⊗ |0) = α · U (|a ⊗ |0) + β · U (|b ⊗ |0) = α · |a ⊗ |a + β · |b ⊗ |b. Suppose that U copies α|a + β|b. Then also U (α|a + β|b) ⊗ |0 = (α|a + β|b) ⊗ (α|a + β|b) = α 2 · |a ⊗ |a + αβ|a ⊗ |b +βα|b ⊗ |a + β 2 · |b ⊗ |b. Comparing the two expressions it is obvious that either α or β is zero, which completes the proof. 5.2. QUANTUM CRYPTOGRAPHY
So far we have spoken about drawbacks of quantum information, which essentially follow from the destructive nature of the measurement. That very feature, however, turns out to be very useful from the security point of view. In short, if an eavesdropper intercepts a quantum channel he or she cannot escape unnoticed. We describe the cryptography protocol BB84, which exploits this basic idea. The name of the protocol comes from the fact that it was published by Bennett and Brassard (1984). The aim of the protocol is to exchange a secret key over a public quantum channel. The key can be subsequently used for other cryptography tasks. Suppose A and B want to exchange a secret key of length n. Then A generates two sequences b1 , . . . , bm and c1 , . . . , cm of m = δn random bits. The number δ has the following significance: during the protocol many transmitted qubits will be discarded. Therefore δ is chosen so that at least n usable bits remain with required high probability. A then encodes each bit by a qubit in the following way. Denote |+ =
|0 + |1 √ 2
|− =
|0 − |1 . √ 2
166
ˇ EP ˇ AN ´ HOLUB ST
There are now two possibilities to encode each bit bi by a qubit |u i : 0 → |0 1 → |1
or
0 → |+ . 1 → |−
A choses the encoding according to the values of ci . If ci = 0, the bit bi is encoded by |0 or |1, if ci = 1, by |+ or |−. All qubits are now transmitted through a public channel to B, which measures |u i in a randomly chosen basis di . If di = ci , the outcome is a useless random number. If, on the other hand, di = ci , the output is equal to the input. After the measurement, both A and B make their sequences of basis choices ci and di public. It is now clear which bits are usable for the shared secret. Suppose there are at least 2n of them (which happens with high probability depending on δ). In the control stage A and B randomly choose half of the usable bits to check publicly, seeing whether the values agree. They should agree, and if they do not, then the channel was intercepted (or is otherwise unreliable). Therefore, if a disagreement in some chosen number t of control bits appears, the exchange is discarded. The eavesdropper E is successful on the jth bit only if ci = di = ei , where ei is the basis in which E measured |u i , and only if the bit was not chosen for the control. Since the basis in which the bit is encoded is random and is disclosed only after the transmission, the interception by E will, with high probability, cause a detectable disturbance of the transmitted signal. Note that the no-cloning theorem plays an important role here. Even if the transmission of the signal is public, an eavesdropper cannot make a copy of it for further inspection. 5.3. PRACTICAL REALIZATIONS
Although the theory of quantum information is very nice, there is an obvious question: How much of the theory can be implemented? The effort to build quantum computers by such research giants as IBM, the U.S. Department of Defense and the NEC Electronic Corporation is extremely intense. The opinions on the practicality of quantum computers differ from scientist to scientist, and the situation is changing very quickly. The difficulties encountered are enormous, and it is hard to say whether they are substantial, or whether everything is just a question of technology development. The theoretical results described in this chapter are experimentally verifiable. The main problem is that quantum systems are very unstable. Quantum mechanics works for closed quantum systems and it is difficult to avoid an interaction of the system with the environment, and to keep the system under control. For reasonably large systems this is impossible so far.
QUANTUM INFORMATION
167
In 2001 IBM announced an experimental realization of 7 qubit computation which implemented Shor’s algorithm to verify that 15 = 3 · 5. At the moment, however, it seems that the approach used in this experiment cannot be extended beyond 10 qubits. The approach is called nuclear magnetic resonance (NMR), which handles large number of molecules (about 108 ), which produce a macroscopically detectable signal. An alternative approach is called ion trap, and deals with single atoms cooled to a very low temperature. This approach seems to be more promising at the moment. In December 2005 researchers from the University of Michigan announced that they constructed an ion trap using a technology similar to classical chips (Stick et al., 2006). In the area of quantum communication the situation is much more optimistic. The first computer network in which communication is secured with quantum cryptography was unveiled in 2004 in Cambridge, Massachusetts. It is based on optic fibers and covers a distance of about 10 km. Quantum key distribution described in this chapter was experimentally demonstrated in 1993 for a distance of several tens of kilometers. However, at the present time the maximum is about 50 km due to weakening of the signal.
References Bennett, C. H. and Brassard, G. (1984) Quantum Cryptography: Public Key Distribution and Coin Tossing, In Proceedings of IEEE International Conference on Computers, Systems, and Signal Processing, pp. 175–179. Bohr, N. (1935) Can Quantum-Mechanical Description of Physical Reality be Considered Complete?, Physics Review 48, 696–702. Einstein, A., Podolsky, B., and Rosen, N. (1935) Can Quantum-Mechanical Description of Physical Reality be Considered Complete?, Physics Review 47, 777–780. Stick, D., Hensinger, W. K., Olmschenk, S., Madsen, M. J., Schwab, K., and Monroe, C. (2006) Ion Trap in a Semiconductor Chip, Nature Physics 2, 36–39. Turing, A. M. (1936) On Computable Numbers, with an Application to the Entscheidungs Problem, Proceedings of London Mathematical Society. 43(2), 230–265.
This page intentionally blank
MULTIFRACTAL ANALYSIS OF IMAGES: NEW CONNEXIONS BETWEEN ANALYSIS AND GEOMETRY Yanick Heurteaux1 (
[email protected]) and St´ephane Jaffard2 (
[email protected]) 1 Laboratoire de Math´ematiques, Universit´e Blaise Pascal, 63177 Aubi`ere Cedex, France 2 Laboratoire d’Analyse et de Math´ematiques Appliqu´ees, Universit´e Paris XII, 61 Avenue du G´en´eral de Gaulle, 94010 Cr´eteil Cedex, France
Abstract. Natural images can be modelled as patchworks of homogeneous textures with rough contours. The following stages play a key role in their analysis: r Separation of each component r Characterization of the corresponding textures r Determination of the geometric properties of their contours. Multifractal analysis proposes to classify functions by using as relevant parameters the dimensions of their sets of singularities. This framework can be used as a classification tool in the last two steps enumerated above. Several variants of multifractal analysis were introduced, depending on the notion of singularity which is used. We describe the variants based on H¨older and L p regularity, and we apply these notions to the study of functions of bounded variation (indeed the BV setting is a standard functional assumption for modelling images, which is currently used in the first step for instance). We also develop a multifractal analysis adapted to contours, where the regularity exponent associated with points of the boundary is based on an accessibility condition. Its purpose is to supply classification tools for domains with fractal boundaries. Key words: pointwise exponents, fractal boundaries, multifractal analysis
1. Mathematical Modelling of Natural Images In order to develop powerful analysis and synthesis techniques in image processing, a prerequisite is to split the image into simpler components, and to develop some classification procedures for these components. Consider the example of a natural landscape: It consists of a superposition of different pieces 169 J. Byrnes (ed.), Imaging for Detection and Identification, 169–194. C 2007 Springer.
170
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
which present some homogeneity. Perhaps there will be a tree in the foreground, mountains in the background and clouds at the top of the picture. An efficient analysis procedure should first be able to separate these components which will display completely unrelated features and analyse each of them separately, since they will individually present some homogeneity. Considering an image as a superposition of overlapping components is referred to as the “dead-leaves” model, introduced by Matheron. See (Matheron, 1975) and (Bordenave et al., 2006) for recent developments. Each piece appears as a relatively homogeneous part which is “cut” along a certain shape . The homogeneous pieces are the “textures” and their modelling can be performed by interpreting them as the restriction on the “shape” of random fields of R2 with particular statistical properties (stationarity, . . . ). If a statistical model depending on a few parameters is picked, then one needs to devise a robust statistical test in order to estimate the values of these parameters. Afterwards, the test can be used as a classification tool for these textures. Furthermore, once the relevant values of the parameters have been identified, the model can be used for simulations. The procedure is now standard to classify and generate computer simulations of clouds for instance (Arneodo et al., 2002; Naud et al., 1997). Another problem is the modelling of the shape of ; indeed natural scenes often do not present shapes with smooth edges (it is typically the case for the examples of trees, mountains or clouds that we mentioned) and the challenge here is to develop classification tools for domains with non-smooth (usually “fractal”) boundaries. Until recently, the only mathematical tool available was the box-dimension of the boundary (see Definition 3.1) which is an important parameter but nevertheless very insufficient for classification (many shapes share the same box dimension for their boundary, but clearly display very different features). Let us come back to the separation of the image into “simpler” components that present some homogeneity. It can be done using a “global” approach: The image is initially stored as grey-levels f (x, y) and is approximated by a simple “cartoon” u(x, y). What is meant by “simple” is that textures will be replaced by smooth pieces and rough boundaries by piecewise smooth curves.1 The building of a mathematical model requires summarizing these qualitative assumptions by choosing an appropriate function space setting. In practice, the space BV (for “bounded variation”) is usually chosen. A function f belongs to BV if its gradient (in the distribution sense) is a bounded measure (the name BV refers to the one-dimensional case where a function f belongs to BV if the sums i | f (xi+1 ) − f (xi )| are uniformly bounded, no matter how
1
This kind of simplification was anticipated by Herg´e in his famous “Tintin” series and by his followers of the Belgian “la ligne claire” school.
MULTIFRACTAL ANALYSIS OF IMAGES
171
we chose the finite increasing sequence xi ). Indeed, the space BV presents several of the required features : It allows only relatively smooth textures but, on the other hand, it allows for sharp discontinuities along smooth lines (or hypersurfaces, in dimension > 2). It is now known that natural images do not belong to BV (Gousseau and Morel, 2001), but this does not prevent BV being used as a model for the “sketch” of an image: f is decomposed as a sum u + v where u ∈ BV and v is an approximation error (for instance it should have a small L 2 norm) and one uses a minimization algorithm in order to find u. In such approaches, we may expect that the discontinuities of the “cartoon” u will yield a first approximation of the splitting we were looking for. Such decompositions are referred to as “u + v” models and lead to minimization algorithms which are currently used; they were initiated by L. Rudin and S. Osher, and recent mathematical developments were performed by Meyer (2001) and references therein. Once this splitting has been performed, one can consider the elementary components of the image (i.e. shapes that enclose homogenous textures) and try to understand their geometric properties in order to obtain classification tools; at this point, no a priori assumption on the function is required; one tries to characterize the properties of the textures and of the boundaries by a collection of relevant mathematical parameters; these parameters should be effectively computable on real images in order to be used as classification parameters and hence for model selection. Multifractal analysis is used in this context: It proposes different pointwise regularity criteria as classification tools and it relates them to “global” quantities that are actually computable. The different pointwise quantities (regularity exponents) which are used in multifractal analysis are exposed in Section 2, where we also recall their relationships see Jaffard (2004). In Section 3, we deal with “global” aspects. The tools (fractional dimensions) used in order to measure the sizes of sets with a given pointwise regularity exponent are defined (they are referred to as spectra of singularities). We also draw a bridge between these local analysis tools and the global segmentation approach described above. The implications of the BV assumption on the multifractal analysis of a function are derived. The results of this section supply new tools in order to determine if particular images (or homogenous parts of images) belong to BV. In Sections 4 and 5 we concentrate on the analysis of domains with fractal boundaries. Section 4 deals with general results concerning the pointwise exponents associated with these domains and Section 5 deals with their multifractal analysis. Apart from image processing, there are other motivations for the multifractal analysis of fractal boundaries, e.g. in physics and chemistry: turbulent mixtures, aggregation processes, rough surfaces, see (Jaffard and Melot, 2005) and references therein.
172
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
2. Pointwise Smoothness Each variant of multifractal analysis is based on a definition of pointwise smoothness. In this section, we introduce the different definitions used, explain their motivations and their relationships. 2.1. POINTWISE EXPONENTS FOR FUNCTIONS AND MEASURES
The most simple notion of smothness of a function is supplied by C k differentiability. Recall that a bounded function f belongs to C 1 (Rd ) if it has everywhere partial derivatives ∂ f /∂ xi which are continuous and bounded; C k differentiability for k ≥ 2 is defined by recursion: f belongs to C k (Rd ) if it belongs to C 1 (Rd ) and each of its partial derivatives ∂ f /∂ xi belongs to C k−1 (Rd ). Thus a definition is supplied for uniform smoothness when the regularity exponent k is an integer. Taylor’s formula follows from the definition of C k differentiability and states that, for any x0 ∈ Rd , there exists C > 0, δ > 0 and a polynomial Px0 of degree less than k such that if |x − x0 | ≤ δ,
then
| f (x) − Px0 (x)| ≤ C|x − x0 |k .
This consequence of C k differentiability is just in the right form to yield a definition of pointwise smoothness which also makes sense for fractional orders of smoothness; following a usual process in mathematics, this result was turned into a definition. Definition 2.1. Let α ≥ 0, and x0 ∈ Rd ; a function f : Rd → R is C α (x0 ) if there exists C > 0, δ > 0 and a polynomial Px0 of degree less than α such that if |x − x0 | ≤ δ,
then | f (x) − Px0 (x)| ≤ C|x − x0 |α .
(1)
The H¨older exponent of f at x0 is h f (x0 ) = sup {α : f is C α (x0 )}. Remarks: The polynomial Px0 in (1) is unique and, if α > 0, the constant term of Px0 is f (x0 ); P is called the Taylor expansion of f at x0 of order α; (1) implies that f is bounded in a neighbourhood of x0 ; therefore, the H¨older exponent is defined only for locally bounded functions; it describes the local regularity variations of f . Some functions have a constant H¨older exponent: They display a “very regular irregularity”. A typical example is the Brownian motion B(t) which satisfies almost surely: ∀x, h B (x) = 1/2. H¨older regularity is the most widely used notion of pointwise regularity for functions. However, it suffers several drawbacks; one of them appeared at the very beginning of the introduction of multifractal analysis, in the mideighties; indeed, it was introduced as a tool to study the velocity of turbulent fluids, which is not necessarily a locally bounded function see Parisi and Frisch (1985); and, as mentioned above, H¨older regularity can only be applied
173
MULTIFRACTAL ANALYSIS OF IMAGES
to locally bounded functions. Several mathematical drawbacks were already discovered at the beginning of the sixties by Cald´eron and Zygmund (1961). Another one which appeared recently is that the H¨older exponent of a function which has discontinuities cannot be deduced from the size of its wavelet coefficients. This is a very serious drawback for image analysis since images always contain objects partly hidden behind each other (this is referred to as the “occlusion phenomenon”), and therefore necessarily display discontinuities. If B is a ball, let f B,∞ = supx∈B | f (x)|, and, if 1 ≤ p < ∞, 1/ p 1 p | f (x)| d x ; f B, p = Vol(B) B finally let Br = {x : |x − x0 | ≤ r } (not mentioning x0 in the notations won’t introduce ambiguities afterwards). A clue to understand how the definition of pointwise H¨older regularity can be weakened (and therefore extended to a wider setting) is to notice that (1) can be rewritten f − Px0 Br ,∞ ≤ Cr α . Therefore, one obtains a weaker criterion by substituting in this definition the local L ∞ norm by a local L p norm. The following definition was introduced by Cald´eron and Zygmund (1961). p
Definition 2.2. Let p ∈ [1, +∞); a function f : Rd −→ R in L loc belongs p to Tα (x0 ) if ∃R, C > 0 and a polynomial Px0 of degree less than α such that ∀r ≤ R
f − Px0 Br , p ≤ Cr α . p
(2)
p
The p-exponent of f at x0 is h f (x0 ) = sup{α : f ∈ Tα (x0 )}. It follows from the previous remarks that the H¨older exponent h f (x0 ) coincides with h ∞ f (x 0 ). Note that (2) can be rewritten | f (x) − Px0 (x)| p d x ≤ Cr αp+d . (3) ∀r ≤ R, Br
These p-smoothness conditions have several advantages when compared with the usual H¨older regularity conditions: They are defined as soon as f belongs locally to L p and the p-exponent can be characterized by conditions bearing on the moduli of the wavelet coefficients of f (Jaffard, 2006). Note that the p Tα (x0 ) condition gets weaker as p goes down, and therefore, for a given p x0 , p → h f (x0 ) is a decreasing function. Let us now focus on the weakest possible case, i.e. when p = 1. First, recall that, if f is a locally integrable function, then x0 is a Lebesgue point of f if 1 ( f (x) − f (x0 ))d x −→ 0 when r −→ 0. (4) Vol(Br ) Br
174
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
Therefore, one can see the Tα1 (x0 ) smoothness criterium as a way to quantify how fast convergence takes place in (4) when x0 is a Lebesgue point of f . The L 1 norm of f − Px0 expresses an average smoothness of f : How close (in the mean) are f and a polynomial. Sometimes one rather wants to determine how large f is in the neighbourhood of x0 ; then the relevant quantity is the rate of decay of the local L 1 norms B(x0 ,r ) | f (x)|d x when r → 0. This quantity can also be considered for a nonnegative measure μ instead of an L 1 function. In that case, one considers B(x0 ,r ) dμ = μ(B(x0 , r )). This leads us to the following pointwise size exponent. Definition 2.3. Let p ∈ [1, +∞); a nonnegative measure μ belongs to Sα (x0 ) if there exist positive constants R and C such that dμ ≤ Cr α . (5) ∀r ≤ R, Br
The size-exponent of μ at x0 is sμ (x0 ) = sup{α : μ ∈ Sα (x0 )} = lim inf r →0
log μ(B(x0 , r )) . log r
If f ∈ L 1 , then s f (x0 ) is the size exponent of the measure dμ = | f (x)|d x. If f ∈ L 1 and if Px0 in (3) vanishes, then the definitions of the 1-exponent and the size exponent of f coincide except for the normalization factor r d in (3) which has been dropped in (5); thus, in this case, s f (x0 ) = h 1f (x0 ) + d. This discrepancy is due to historical reasons: Pointwise exponents for measures and for functions were introduced independently. It is however justified by the following remark which is a consequence of two facts: μ((x, y]) = |F(y) − F(x)|, and the constant term of Px0 is F(x0 ). Remark: Let μ be a non-negative measure on R such that μ(R) < +∞ and let F be its repartition function defined by F(x) = μ((−∞, x]); if the polynomial Px0 in (3) is constant, then sμ (x0 ) = h 1F (x0 ). One does not subtract a polynomial in the definition of the pointwise exponent of a measure because one is usually interested in the size of a measure near a point, not its smoothness. Consider the very important case where μ is the invariant measure of a dynamical system; then the size exponent expresses how often the dynamical system comes back close to x0 , whereas a smoothness index has no direct interpretation in terms of the underlying dynamical system. p We will need to use Tα (x0 ) smoothness expressed in a slightly different form:
MULTIFRACTAL ANALYSIS OF IMAGES
175
p Proposition 2.4. Let f ∈ L loc , and α ∈ (0, 1]; let f r = 1 f (x)d x. Vol(Br ) Br Then 1/ p p 1 p f ∈ Tα (x0 ) ⇐⇒ f (x) − f r d x ≤ Cr α . (6) Vol(Br ) Br p
Proof. Suppose that f ∈ Tα (x0 ) and let A be the constant polynomial which p appears in the definition of Tα ; then 1 fr − A = ( f (x) − A)d x; Vol(Br ) Br H¨older’s inequality yields that | f r − A| is bounded by 1/ p 1 p | f (x) − A| d x (Vol(Br ))1/q Vol(Br ) Br ≤ C(Vol(Br ))1/q−1+1/ p r α = Cr α . Thus, f r = A + O(r α ). As a consequence, if we replace A by f r in the p quantity to be estimated in the definition of Tα , the error is O(r α ). Conversly, suppose that (6) is true. Let r, r be such that 0 < r ≤ r . We have f − f r L p (Br ) ≤ Cr α+d/ p
and f − f r L p (Br ) ≤ C(r )α+d/ p .
Since r ≤ r , f − f r L p (Br ) ≤ C(r )α+d/ p ; therefore f r − f r L p (Br ) ≤ C(r )α+d/ p , so that | f r − f r | ≤ C(r )α . It follows that f r converges to a limit f 0 when r goes to 0. Moreover f r = f 0 + O(r α ) and therefore one can take A = f 0. 2.2. POINTWISE EXPONENTS FOR BOUNDARY POINTS OF DOMAINS
We will show how to draw distinctions between points of the boundary of a domain , by associating to each of them an exponent, which may change from point to point along the boundary. This will allow us afterwards to perform a multifractal analysis of the boundary, i.e. to use as a discriminating parameter between different types of boundaries the whole collection of dimensions of the corresponding sets where this exponent takes a given value. Let us check if the exponents previously introduced can be used; the function naturally associated with a domain is its characteristic function 11 (x) which takes the value 1 on and 0 outside . The H¨older exponent of 11 cannot play the role we expect, since it only takes two values: +∞ outside ∂ and 0 on
176
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
∂. Let us now consider the p-exponents and the size exponent. We start by a toy-example: The domain α ⊂ R2 defined by (x, y) ∈ α if and only if |y| ≤ |x|α . At the point (0, 0) one immediately checks that, if α ≥ 1, the p-exponent takes the value (α − 1)/ p, and the size exponent takes the value α + 1. On the other hand, if 0 < α < 1, the p exponent takes the value (1 − α)/(αp) but the size exponent is always equal to 2. This elementary computation shows the following facts: The p-exponent of a characteristic function can take any nonnegative value, the size exponent can take any value larger than 2; the 1-exponent and the size exponent give different types of information. The following proposition, whose proof is straightforward, gives a geometric interpretation for the size exponent of 11 . Proposition 2.5. Let be a domain of Rd and let x0 ∈ ∂; 11 ∈ Sα (x0 ) if and only if ∃R > 0 and C > 0 such that ∀r ≤ R Vol( ∩ B(x0 , r )) ≤ Cr α . The following definition encapsulates this geometric notion. Definition 2.6. A point x0 of the boundary of is weak α-accessible if there exist C > 0 and r0 > 0 such that ∀r ≤ r0 , Vol ( ∩ B(x0 , r )) ≤ Cr α+d .
(7)
The supremum of all values of α such that (7) holds is called the weak accessibility exponent at x0 . We denote it by αw (x0 ). Thus αw (x0 ) is a non negative number and is nothing but the size exponent of the measure 11 (x)d x shifted by d. The following proposition of Jaffard and Melot (2005) shows that, for characteristic functions, all the p-exponents yield the same information and therefore one can keep only the 1-exponent. Proposition 2.7. Let be a domain of Rd and let x0 ∈ ∂; then 11 ∈ p Tα / p(x0 ) if and only if either 11 ∈ Sα+d (x0 ) or 11c ∈ Sα+d (x0 ), where c denotes the complement of . Following the same idea as above, one can also define a bilateral accessibility exponent of a domain which is the geometric formulation of the 1-exponent of the function 11 , see Jaffard and Melot (2005). Definition 2.8. A point x0 of the boundary ∂ is bilaterally weak α-accessible if there exist C > 0 and r0 > 0 such that ∀r ≤ r0 , c min Vol ( ∩ B(x0 , r )) , Vol ( ∩ B(x0 , r )) ≤ Cr α+d . (8) The supremum of all values of α such that (8) holds is called the bilateral weak accessibility exponent at x0 . We denote it by βw (x0 ).
MULTIFRACTAL ANALYSIS OF IMAGES
177
Remark 1: It follows immediately from the above definitions that the bilateral exponent βw (x0 ) is the supremum of the unilateral exponents αw (x0 ) associated with and its complement c . In practice, using unilateral or bilateral exponents as classification tools in multifracal analysis will be irrelevant when and c have the same statistical properties. It is the case when they are obtained by a procedure which makes them play the same role (for instance if ∂ is the edge of the fracture of a metallic plate). On the other hand, unilateral exponents should yield different types of information when the roles played by and its complement are very dissymetric (electrodeposition aggregates for instance). Remark 2: If ∈ BV , then, by definition grad(11 ) is a measure, and therefore one could also consider an additional exponent, which is the size exponent of |grad(11 )|. We won’t follow this idea because, in applications, one has no direct access to the measure grad(11 ), and we want to base our analysis only on information directly available from . We will also use the following alternative accessibility exponents. Definition 2.9. A point x0 of the boundary of is strong α-accessible if there exist C > 0 and r0 > 0 such that ∀r ≤ r0 , Vol ( ∩ B(x0 , r )) ≥ Cr α+d .
(9)
The infimum of all values of α such that (9) holds is called the strong accessibility exponent at x0 . We denote it by αs (x0 ). A point x0 of the boundary of is bilaterally strong α-accessible if there exist C > 0 and r0 > 0 such that ∀r ≤ r0 , c (10) min Vol ( ∩ B(x0 , r )) , Vol ( ∩ B(x0 , r )) ≥ Cr α+d . The infimum of all values of α such that (10) holds is called the bilateral strong accessibility exponent at x0 . We denote it by βs (x0 ). The following result yields alternative definitions of these exponents. Proposition 2.10. Let x ∈ ∂; then log Vol ( ∩ B(x, r )) , r →0 log r log Vol ( ∩ B(x, r )) . αs (x) + d = lim sup log r r →0 αw (x) + d = lim inf
Similar relations hold for the indices βw (x) and βs (x). Other exponents associated with boundaries have been introduced; they were based on the notion of density, which we now recall.
178
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
Definition 2.11. Let x0 ∈ ; the density of at x0 is Vol(B(x0 , r ) ∩ ) . r →0 Vol(B(x0 , r ))
D(, x0 ) = lim
(11)
This limit does not necessarily exist everywhere; thus, if one wants to obtain an exponent which allows a classification of all points of ∂, the upper density ¯ exponent D(, x0 ) or the lower density exponent D(, x0 ) should rather be used; they are obtained by taking in (11) respectively a lim sup or a lim inf. The set of points where D(, x0 ) differs from 0 and 1 is called the measure theoretic boundary, see Chap. 5 of Ziemer (1989). This allows to introduce topological notions which have a measure-theoretic content. The measure theoretic interior of is the set of points satisfying D(, x0 ) = 1; the measure theoretic exterior is the set of points satisfying D(, x0 ) = 0. See Chap. 5 of Ziemer (1989) for more on these notions which bear some similarities with the ones we will develop in Section 4.1. Note that points with a positive weakaccessibility exponent all have a vanishing density, so that density exponents are a way to draw a distinction between different points of weak-accessibility 0. This refinement has been pushed even further when has a finite perimeter (i.e. when 11 ∈ BV ). Points of density 1/2 can be classified by considering points where the boundary is “close” to a hyperplane (see (Ziemer, 1989) for precise definitions); such points constitute the “reduced boundary” introduced by de Giorgi. We will come back to these classifications in Section 4.2.
3. Fractional Dimensions, Spectra and Multifractal Analysis 3.1. FRACTIONAL DIMENSIONS
In order to introduce global parameters which allow to describe the “fractality” of the boundary of a domain, we need to recall the notions of dimensions that will be used. Their purpose is to supply a classification among sets of vanishing Lebesgue measure in Rd . The simplest notion of dimension of a set E (and the only one that is computable in practice) is the upper box-dimension. It can be obtained by estimating the number of dyadic cubes that intersect E. Recall that a dyadic cube of scale j is of the form k1 k1 + 1 kd kd + 1 λ= , × ··· × j , , where k = (k1 , . . . kd ) ∈ Zd ; 2j 2j 2 2j F j denotes the set of dyadic cubes of scale j. Definition 3.1. (Upper box-dimension) Let E be a bounded set in Rd and N j (E) be the number of cubes λ ∈ F j that intersect E. The upper
MULTIFRACTAL ANALYSIS OF IMAGES
179
box-dimension of the set E is defined by (E) = lim sup j→+∞
log(N j (E)) . log(2 j )
This notion of dimension presents two important drawbacks. The first one is that it takes the same value for a set and its closure. For example, the upper box-dimension of the set Q of rational numbers is equal to 1, but we would expect the dimension of a countable set to vanish. The second one is that it is not a σ -stable index, i.e. the dimension of a countable union of sets usually differs from the supremum of the dimensions of the sets. In order to correct these drawbacks, a very clever idea, introduced by Tricot (1982), consists in “forcing” the σ -stability as follows: Definition 3.2. (Packing dimension) Let E ⊂ Rd ; the packing dimension of E is
dim P (E) = inf sup[(E i )] ; E ⊂ Ei , i∈N
i∈N
where the infimum is taken on all possible “splittings” of E into a countable union. The Hausdorff dimension is the most widely used by mathematicians. Definition 3.3. (Hausdorff dimension) Let E ⊂ Rd and α > 0. Let us introduce the following quantities: Let n ∈ N; if = {λi } i∈N is a countable collection of dyadic cubes of scales at least n which forms a covering of E, then let
Hnα (E, ) = diam (λi )α , and Hnα (E) = inf Hnα (E, ) , i∈N
where the infimum is taken on all possible coverings of E by dyadic cubes of scales at least n. The α-dimensional Hausdorff measure of E is Hα (E) = lim Hnα (E). n→+∞
The Hausdorff dimension of E is dim H (E) = sup (α > 0 ; Hα (E) = +∞) = inf (α > 0 ; Hα (E) = 0) . Remark 1. Hausdorff measures extend to fractional values of d the notion of d-dimensional Lebesgue measure, indeed, Hd is the Lebesgue measure in Rd . The Hausdorff dimension is an increasing σ -stable index. Remark 2. The following inequalities are always true, see Falconer (1990). 0 ≤ dim H (E) ≤ dim P (E) ≤ (E) ≤ d.
180
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
3.2. SPECTRA OF SINGULARITIES
In all situations described in Section 2, a “pointwise smoothness function” is associated to a given signal (this may be for example the H¨older exponent, the p-exponent or the size exponent). In the case where the signal is irregular, it is of course impossible to describe this function point by point. That is why one tries to obtain a statistical description, by determining only the dimensions of the sets of points with a given exponent. This collection of dimensions, indexed by the smoothness parameter is called the spectrum of singularities. Actually, two kinds of spectra are used, depending on whether one picks the Hausdorff or the packing dimension, see Theorems 5.3 and 5.4 for estimates on such spectra. In the next section, we will estimate the pp spectrum of BV functions. This p-spectrum d f (H ) is the Hausdorff dimension of the set of points whose p-exponent is H . If p = ∞, d ∞ f (H ) is simply denoted by d f (H ). It denotes the Hausdorff dimensions of the sets of points where the H¨older exponent is H , and is called the spectrum of singularities of f .
3.3. MULTIFRACTAL ANALYSIS OF BV FUNCTIONS
We saw that the space BV is currently used in order to provide a simple functional setting for “sketchy” images, i.e. images which consist of piecewise smooth pieces separated by lines of discontinuities which are piecewise smooth. This approach is orthogonal to the multifractal point of view; indeed, multifractal analysis makes no a priori assumption on the function considered and, therefore, is relevant also in the analysis of non smooth textures and irregular edges. In order to go beyond this remark, it is important to understand the implications of the BV assumption on the multifractal analysis of a function. They strongly depend on the number of variables of f ; therefore, though our main concern deals with functions defined on R2 , considering the general case of functions defined on Rd will explain some phenomena which, if dealt with only for d = 1 or 2, might appear as strange numerical coincidences. We start by recalling the alternative definitions of the space BV(Rd ). Let be an open subset of Rd and f ∈ L 1 (Rd ). By definition, 1 d |D f | = sup f div g, g = (g1 , . . . , gd ) ∈ C0 (, R ) and g∞ ≤ 1 ,
d ∂gi where div g = i=1 . ∂ xi This notation is justified as follows: An integration by parts shows that if f ∈ C 1 (Rd ), |D f | = |grad f |d x where grad f = ( ∂∂xf1 , . . . , ∂∂xfd ).
MULTIFRACTAL ANALYSIS OF IMAGES
181
Definition 3.4. Let ⊂ Rd , and f ∈ L 1 (Rd ); f belongs to BV () if |D f | < +∞.
Recall that the alternative definition is: f ∈ BV () if f ∈ L 1 () and grad f (defined in the sense of distributions) is a Radon vector-measure of finite mass. What is the correct setting in order to perform the multifractal analysis of a BV function? In dimension 1, the alternative definition in terms of Radon measures immediately shows that a BV function is bounded (indeed a Radon measure is the difference of two positive measures and the primitive of a positive measure of finite mass is necessarily bounded). Therefore, one can expect that the BV assumption has a consequence on the “usual” spectrum d f (H ) based on the H¨older exponent. On the other hand, if d > 1, then a BV function need not be locally bounded (consider for instance the function 1/(xα ) in a neighborhood of 0, for α small enough). A simple superposition argument shows that it may even be nowhere locally bounded; therefore, we cannot expect the BV assumption to yield any information concerning the “usual” spectrum of singularities in dimension 2 or more. The following Sobolev embeddings precisely determine for which values of p a BV function locally belongs to L p (see (Giusti, 1984)). d Proposition 3.5. ((Giusti, 1984)) Let d = d−1 (d is the conjugate exponent of d). If f ∈ BV (Rd ) then f d ≤ C(d) |D f |. (12)
Moreover, if B = B(x0 , r ) and f B = f − f B L
d
(B)
1 Vol(B)
B
f (x)d x,
≤ C(d)
|D f |.
(13)
B
Since (12) states that BV(Rd ) is embedded in L d (Rd ), we can infer from this proposition that the “right” value of p in order to study the pointwise smooothness of functions in BV (Rd ) is p = d . The following result actually gives estimates of the d -spectrum of BV functions. Theorem 3.6. Let f ∈ BV(Rd ). The d -spectrum of f satisfies
d df (H ) ≤ H + (d − 1). Proof of Theorem 3.6. If d = 1, f is the difference of two increasing functions. The theorem is a consequence of the classical bound d(H ) ≤ H for probability measures, see Brown et al. (1992) and the remark that, if H ≤ 1, the
182
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
size exponent of a positive measure and the H¨older exponent of its primitive coincide. We can therefore assume that d ≥ 2. We can clearly suppose that H ≤ 1. Let us consider f on the unit cube [0, 1]d and let j ≥ 0. We split this cube into 2d j dyadic cubes of width 2− j . If λ is a dyadic cube √ in F j , let T V (λ) denote the total variation of f on the ball Bλ = B(μλ , d2− j ) where μλ is the center of λ, i.e. TV(λ) = Bλ |D f |. Let δ > 0 and denote by A(δ, j) the set of λ’s such that TV(λ) ≥ 2−δ j and by ˜ N (δ, j) its cardinal. Since only a finite number C(d) of balls Bλ overlap,
TV(λ) ≤ C(d) |D f |. N (δ, j)2−δ j ≤ λ∈A(δ, j)
Therefore N (δ, j) ≤ C2δ j .
(14)
Let x0 be such that it only belongs to a finite number of A(δ, j). Let λ j (x0 ) −j denote the dyadic cube of width 2√ which contains x0 . For j large enough, −δ j TV(λ j (x0 )) ≤ 2 . If B = B(x0 , d2−( j+1) ), (13) implies that f − f B L d (B) ≤ C |D f | ≤ C |D f | ≤ C2−δ j ; B
Bλ
d d thus, using Proposition 2.4, f ∈ Tδ−d/d (x 0 ) (= Tδ−d+1 (x 0 )). Denote
Aδ = lim sup A(δ, j). j→+∞
The set Aδ consists of points that belong to an infinite number of sets A(δ, j). Then, (14) implies that dim H (Aδ ) ≤ δ. If x0 ∈ Aδ , we just showed that f ∈ d Tδ−d+1 (x0 ). It follows that the set of points of d -exponent δ − d + 1 is of Hausdorff dimension at most δ. In other words, d df (δ − d + 1) ≤ δ, hence Theorem 3.6 holds. Remark: Let us pick δ > d − 1 but arbitrarily close to d − 1. We saw that Aδ has dimension less than δ and if x0 ∈ / Aδ , then x0 belongs to Tαd for an α > 0 so that x0 is a Lebesgue point of f . It follows that, if f is a BV function, then the set of points which are not Lebesgue points of f has Hausdorff dimension at most d − 1. Related results are proved in Section 5.9 of Evans and Gariepy (1992) (see in particular Theorem 3). Theorem 3.6 only gives information on the d -exponent and cannot give additional information on q-regularity for q > d since a function of BV(Rd ) may nowhere be locally in L q for such values of q. However, images are just grey-levels at each pixel and therefore are encoded by functions that take values between 0 and 1. Therefore, a more realistic modelling is supplied by
183
MULTIFRACTAL ANALYSIS OF IMAGES
the assumption f ∈ BV ∩ L ∞ . Let us now see if this additional assumption allows us to derive an estimate on the q-spectrum. Lemma 3.7. Let f ∈ Tα (x0 ) ∩ L ∞ (Rd ) for some p ≥ 1 and let q satisfy p < q q < +∞. Then f ∈ Tαp/q (x0 ). p
Proof. By assumption, f − f Br L p (Br ) ≤ Cr α+d/ p , where Br denotes the ball B(x0 , r ). Let ω = qp , so that 0 < ω < 1; since f is bounded, by interpolation, f − f Br L q (Br ) ≤ (2 f ∞ )(1−ω) f − f Br ωL p (Br ) . Therefore, if β = αp/q, then f − f Br L q (Br ) ≤ Cr (α+d/ p)ω = Cr β+d/q . Corollary 3.8. Let f ∈ BV (Rd ) ∩ L ∞ (Rd ), and q ≥ d . The q-spectrum of f satisfies q
d f (H ) ≤
q H + (d − 1). d
Remark: Of course this inequality is relevant only when H ≤
d . q
Proof. We come back to the proof of Theorem 3.6. We proved that outside the d set Aδ , f belongs to Tδ−d+1 (x0 ). It follows from the previous lemma that f also q belongs to Tγ (x0 ) for γ = δ − dd dq = δdq − qd . Since Aδ is of dimension at most δ, the corollary follows just as the end of Theorem 3.6.
4. Topological and Geometric Properties of the Essential Boundary 4.1. ESSENTIAL BOUNDARY AND MODIFIED DOMAIN
The geometric quantities introduced in Section 2 do not change if is replaced ˜ as long as they differ by a set of measure 0. This is clear by another set , p when we consider the function 11 (viewed as a L loc -function), the measure 11 (x)d x or the indices αw , αs , βw and βs . Therefore, the only points of the boundary that are pertinent to analyse from a “measure” point of view are those for which ∀r > 0, Vol(B(x0 , r ) ∩ ) > 0 and Vol(B(x0 , r ) ∩ c ) > 0. This motivates the following definition. Definition 4.1. (Essential boundary) Let be a Borel subset of Rd . Denote by ∂ess the set of points x0 ∈ Rd such that for every r > 0, Vol(B(x0 , r ) ∩ ) > 0
and
Vol(B(x0 , r ) ∩ c ) > 0.
The set ∂ess is called the essential boundary of .
184
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
It is clear that ∂ess ⊂ ∂. More precisely, we have the following characterization of ∂ess ; recall that, if A and B are subsets of Rd , then AB = (A ∪ B) \ (A ∩ B). Proposition 4.2. Let x ∈ Rd . Then, x ∈ ∂ess if and only if x is a boundary point of every Borel set such that Vol( ) = 0. Remark: In particular, ∂ess is a closed subset of Rd . Proof of Proposition 4.2. Let
A=
∂ .
Vol( )=0
It is clear that ∂ess ⊂ A. Conversly, suppose for example that there exists r > 0 such that Vol( ∩ B(x, r )) = 0. Define by = \ B(x, r ). Then Vol( ) = 0 and x ∈ ∂ . The essential boundary can also be defined as the support of the distribution grad(11 ). According to Proposition 4.2, it is natural to ask if there exists ˜ which is minimal in the sense that Vol() ˜ = 0 and a modified Borel set ess ˜ ∂ = ∂ . Proposition 4.3. (Modified domain) Let be a Borel set in Rd . There exists ˜ such that a Borel set ˜ =0 Vol()
and
˜ ∂ess = ∂ .
˜ ⊂ ∂ for every such that Vol( ) = 0. The Borel set In particular ∂ ˜ is called the modified domain of . Remark: This notion is implicit in many books of geometric measure theory, see for instance (Giusti, 1984) page 42. We can suppose in the following that ˜ and ∂ess = ∂. = Proof of Proposition 4.3. Let (Bn )n∈N be a sequence of open balls which is a base for the usual topology in Rd . Let I − = {n ∈ N ; Vol(Bn ∩ ) = 0} and I + = n ∈ N ; Vol(Bn ∩ c ) = 0 . Observe that if p ∈ I − and q ∈ I + , then, B p ∩ Bq = ∅. Define
˜ = \ Bn Bn . n∈I −
n∈I +
˜ = 0. There remains to prove that ∂ ˜ ⊂ ∂ess . Let It is clear that Vol() ˜ and r > 0. Let n be such that x ∈ Bn ⊂ B(x, r ). Since x is in the x ∈ ∂ ˜ Bn ∩ ˜ = ∅. So, n ∈ I − and Vol(Bn ∩ ) > 0. In the same way, closure of , ˜ c and Bn ∩ ˜ c = ∅; thus n ∈ I + and Vol(Bn ∩ c ) > 0. x is in the closure of
MULTIFRACTAL ANALYSIS OF IMAGES
185
Finally Vol(B(x, r ) ∩ ) > 0 and Vol(B(x, r ) ∩ c ) > 0 so that x ∈ ∂ess . We can also define the essential interior and essential closure of by ◦ ess = x ∈ Rd ; ∃r > 0 ; Vol(B(x, r ) ∩ c ) = 0 and ess
= x ∈ Rd ; ∀r > 0, Vol(B(x, r ) ∩ ) > 0 .
They are respectively open and closed subsets of Rd and satisfy ess
◦ ess
∂ess = \
.
4.2. BALANCED POINTS
We now explore the topological properties of the sets of points of the essential boundary ∂ess for which either βw or βs vanishes. We begin with a definition which identifies natural subsets of the sets of points with accessibility 0. Definition 4.4. Let ⊂ Rd be a Borel set and x0 ∈ ∂ess . 1. A point x0 is strongly balanced if there exists 0 < η < 1/2 and r0 > 0 such that Vol(B(x0 , r ) ∩ ) ∀r ≤ r0 , η ≤ ≤ 1 − η. Vol(B(x0 , r )) 2. A point x0 is weakly balanced if there exists 0 < η < 1/2 such that ∀r0 > 0, ∃r ≤ r0 ;
η≤
Vol(B(x0 , r ) ∩ ) ≤ 1 − η. Vol(B(x0 , r ))
We denote by SB() (resp. WB()) the set of strongly (resp. weakly) balanced points in ∂ess . It is clear that SB() ⊂ x0 ∈ ∂ess ; βs (x0 ) = 0 and
WB() ⊂ x0 ∈ ∂ess ; βw (x0 ) = 0 .
Recall that Baire’s theorem asserts that, if E is a complete metric set, a countable intersection of open dense sets is dense. A set which contains such an intersection is called generic. Proposition 4.5. Let be a Borel subset of Rd and ∂ess its essential boundary. The set W B() of weakly balanced points is generic in ∂ess (for the
186
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
induced topology). As a consequence, the set of points x0 ∈ ∂ess such that βw (x0 ) = 0 is generic in ∂ess . Proposition 4.6. Let be a Borel subset of Rd and ∂ess its essential boundary. The set SB() of strongly balanced points is dense in ∂ess . As a consequence, the set of points x0 ∈ ∂ess such that βs (x0 ) = 0 is dense in ∂ess . Remark: It would be interesting to determine if SB() is generic in ∂ess . Proof of Proposition 4.5. We first remark that Baire’s theorem can be applied in ∂ess (because it is a closed subset of Rd ). Let x0 ∈ ∂ess and ε > 0. Lebesgue’s differentiability theorem, applied to the Borel function f = 11 asserts that, for almost every x ∈ Rd , Vol(B(x, r ) ∩ ) −→ f (x) Vol(B(x, r ))
when r −→ 0.
Recall that Vol({x ∈ B(x0 , ε/2) ; f (x) = 1}) > 0 and Vol({x ∈ B(x0 , ε/2) ; f (x) = 0}) > 0. We can then find y0 , y1 ∈ B(x0 , ε/2) such that Vol(B(y0 , r ) ∩ ) 3 ≥ Vol(B(y0 , r )) 4
and
Vol(B(y1 , r ) ∩ ) 1 ≤ Vol(B(y1 , r )) 4
when r is small enough. Let yt = t y1 + (1 − t)y0 . The intermediate value theorem applied to the continuous function t −→
Vol(B(yt , r ) ∩ ) Vol(B(yt , r ))
allows us to construct a point x1 ∈ B(x0 , ε/2) (which is equal to yt for some value of t) such that Vol(B(x1 , r ) ∩ ) 1 = . Vol(B(x1 , r )) 2 Such an open ball B(x1 , r ) will be called a “perfectly balanced” ball. The connexity of the ball B(x1 , r ) implies that it intersects ∂ess (remember that ˜ see Proposition ∂ess is the topological boundary of the modified domain , 4.3). Let On be the union of all the open balls of radius r ≤ 1/n that are “perfectly balanced”. We just have seen that On ∩ ∂ess is an open dense subset of ∂ess . So n≥1 On ∩ ∂ess is a countable intersection of open dense subsets of the essential boundary ∂ess . Moreover, if x ∈ n≥1 On ∩ ∂ess ,
MULTIFRACTAL ANALYSIS OF IMAGES
187
we can find a sequence of points xn ∈ Rd and a sequence of positive real numbers rn ≤ 1/n such that for every n ≥ 1 x ∈ B(xn , rn )
and
1 Vol(B(xn , rn ) ∩ ) = Vol(B(xn , rn )). 2
We then have 2−(d+1) Vol(B(x, 2rn )) ≤ Vol(B(x, 2rn ) ∩ ) ≤ (1 − 2−(d+1) ) ×Vol(B(x, 2rn )), which proves that x ∈ W B().
Proof of Proposition 4.6. We develop the same idea as in Proposition 4.5. We use the norm ∞ instead of the Euclidian norm in Rd and we will denote by B∞ (x, r ) the “balls” related to this norm (which are cubes!). Let x0 ∈ ∂ess and ε > 0. Using the same argument as in Proposition 4.5, we can find x1 ∈ B∞ (x0 , ε/2) and r ≤ ε/2 such that Vol(B ∞ (x1 , r ) ∩ ) Vol(B ∞ (x1 , r ))
1 = . 2
The closed cube B ∞ (x1 , r ) can be divided into 2d closed cubes of radius r/2 whose interiors do not overlap. Suppose that none of them is “perfectly balanced”. We can then find two points z 0 , z 1 such that B ∞ (z 0 , r/2) ⊂ B ∞ (x1 , r ), B ∞ (z 1 , r/2) ⊂ B ∞ (x1 , r ),
1 Vol(B ∞ (z 0 , r/2) ∩ ) > Vol(B ∞ (z 0 , r/2)) 2 1 Vol(B ∞ (z 1 , r/2) ∩ ) < Vol(B ∞ (z 1 , r/2)). 2
Using once again the intermediate value theorem, we can construct a point x2 (which is a barycenter of z 0 and z 1 ) such that B ∞ (x2 , r/2) ⊂ B ∞ (x1 , r ) and such that the ball B ∞ (x2 , r/2) is “perfectly balanced”. Iterating this construction we obtain a sequence of “perfectly balanced” cubes B ∞ (xn , r 2−(n−1) ) such that B ∞ (xn+1 , r 2−n ) ⊂ B ∞ (xn , r 2−(n−1) ). Let x∞ = limn→∞ xn and 0 < ρ ≤ r . Let us denote by n the integer such that ρ r 2−n < √ ≤ r 2−(n−1) . d We observe that
√ B (x∞ , ρ) ⊃ B∞ x∞ , ρ/ d ⊃ B∞ x∞ , r 2−n ⊃ B∞ xn+2 , r 2−(n+1) .
188
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
In other words, the√ball B (x∞ , ρ) contains a “perfectly balanced” cube with length at least ρ/2 d. We deduce that ⎧ d ⎪ ρ 1 ⎪ ⎪ √ ⎨Vol(B(x∞ , ρ) ∩ ) ≥ 2 2 d (15) d ; ⎪ 1 ρ ⎪ c ⎪ √ ⎩Vol(B(x∞ , ρ) ∩ ) ≥ 2 2 d (15) asserts that x∞ ∈ SB(). Moreover, x0 − x∞ ∞ ≤ ε and the proof is finished. 4.3. THE FRACTAL DIMENSION OF THE SET OF BALANCED POINTS
We first consider the dimension of the set of points of accessibility 0. Theorem 4.7. Let be a Borel subset of Rd such that ∂ess = ∅. Then dim P (W B()) ≥ d − 1. Remark: In particular, dim P (∂ess ) ≥ d − 1. Let us begin with a lemma which is a slight modification of a well known result (see (Falconer, 1990) or (Heurteaux, 2003)). Lemma 4.8. Let G be a nonempty subset of Rd which satisfies Baire’s property (for the induced topology) and δ > 0. Suppose that for every x ∈ G, and every r > 0, (G ∩ B(x, r )) ≥ δ. Then dim P (G) ≥ δ. Proof. Suppose that G ⊂ n∈N E n . Denote by E n the closure (in Rd ) of E n . Baire’s property implies that one of the related closed sets E n ∩ G has an interior point in G. Thus there exist x ∈ G, r > 0 and n 0 ∈ N such that G ∩ B(x, r ) ⊂ E n 0 ∩ G, so that (E n 0 ) = (E n 0 ) ≥ (E n 0 ∩ G) ≥ (G ∩ B(x, r )) ≥ δ and Lemma 4.8 follows.
Proof of Theorem 4.7. As in Section 4.2, let On be the union of all “perfectly balanced” open cubes of radius r ≤ 1/n and let G = n≥1 On ∩ ∂ess ; G is a dense Gδ of the Baire space ∂ess , so that it satisfies Baire’s property. Moreover, G ⊂ W B(). According to Lemma 4.8, it is sufficient to prove that for every x ∈ G and every r > 0, (G ∩ B(x, r )) ≥ d − 1. Let x ∈ G and r > 0. We can find y ∈ Rd and ρ > 0 such that the cube B∞ (y, ρ) is “perfectly balanced” and x ∈ B∞ (y, ρ) ⊂ B(x, r ). Let us split the cube B∞ (y, ρ) into 2d j cubes of length 2− j+1 ρ which are called Ci . We want to estimate the number N j of cubes Ci that intersect G. For each cube Ci , let ω(Ci ) = Vol(Ci ∩ )/Vol(Ci ). The mean of ω(Ci ) is 1/2. So, at least 1/3r d of the ω(Ci ) are greater than 1/4
MULTIFRACTAL ANALYSIS OF IMAGES
189
and 1/3r d of the ω(Ci ) are lower than 3/4. Now, there are two possibilities: Either 1/6th of the cubes Ci are such that 1/4 ≤ ω(Ci ) ≤ 3/4; all those cubes intersect G (see the proof of Proposition 4.5 and 4.6) and N j ≥ 2d j /6. Else, there are at least 1/6th of the cubes such that ω(Ci ) ≤ 1/4 and 1/6th of the cubes such that ω(Ci ) ≥ 3/4. Let A be the union of all the closed cubes C i such that ω(Ci ) ≤ 1/2. Then 5 1 Vol(B∞ (y, ρ)) ≤ Vol(A) ≤ Vol(B∞ (y, ρ)). 6 6 Isoperimetric inequalities (see for example (Ros, 2005)) ensure that the “surface” of the boundary of A is at least Cρ d−1 . In particular, there exist at least C(ρ)2 j(d−1) couples of cubes (C, C ) such that C ∩ C = ∅, ω(C) ≤ 1/2 and ω(C ) ≥ 1/2. It follows that C ∩ G = ∅ or C ∩ G = ∅ (an intermediate cube is “perfectly balanced”). It follows that N j ≥ C2 j(d−1) . In either case, N j ≥ C2 j(d−1) . So, (G ∩ B(x, r )) ≥ d − 1.
5. Multifractal Properties of the Essential Boundary 5.1. CONSTRUCTION OF THE SCALING FUNCTION
We will construct a multifractal formalism based on the dyadic grid whose purpose is to derive the Hausdorff (or packing) dimensions of the level sets of the functions αw and αs . Recall that Fn is the set of dyadic (semi-open) cubes of scale n; denote by λn (x) the unique cube in Fn that contains x. The following proposition is a simple consequence of the inclusions B(x, 2−n ) ⊂ √ −n 3λn (x) ⊂ B(x, 3 d2 ). Proposition 5.1. Let be a Borel subset of Rd and x ∈ ∂ess . Then log Vol (3λn (x) ∩ ) , −n log 2 log Vol (3λn (x) ∩ ) . αs (x) + d = lim sup −n log 2 n→+∞ αw (x) + d = lim inf n→+∞
Proposition 5.1 suggests to introduce a scaling function as follows. Let be a Borel set such that ∂ess is bounded and not empty; let
(Vol(3λ ∩ ))q where Fn ∗ = {λ ∈ Fn : λ ∩ ∂ess = ∅}, S(q, n) = λ∈Fn ∗
and τ (q) = lim sup n→+∞
1 log (S(q, n)) . n log 2
(16)
190
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
The function τ is decreasing and convex. The standard justification of the multifractal formalism runs as follows: First, the contribution to S(q, n) of the set of points where the (weak or strong) accessibility exponent takes a given value α is estimated. If the dimension of this set is d(α), then there are about 2d(α)n dyadic cubes in Fn ∗ which cover this set; and such a cube satisfies Vol(3λ ∩ ) ∼ 2−αn . Therefore the order of magnitude of the contribution we look for is 2−(αq−d(α))n . When n → +∞, the preponderent contribution is clearly obtained for the value of α that minimizes the exponent αq − d(α); thus τ (q) = infα (αq − d(α)). If d(α) is a concave function, then this formula can be inverted and d(α) is recovered from τ (q) by an inverse Legendre transform: d(α) = inf(αq + τ (q)). q
The multifractal formalism holds if, indeed, this relationship between the scaling function and the spectrum of singularties holds. We give in Section 5.3 some results in this direction. Remark 1: The factor 3 in the definition of S(q, n) is not always used in the derivation of the multifractal formalism for measures; however, it improves its range of validity, as shown by Riedi (1995). The novelty in our derivation is the restriction of the sum to the cubes λ such that λ ∩ ∂ess = ∅; this allows ◦ ess
to eliminate all the points in
ess
and in Rd \ .
Remark 2: In (Testud, 2006), Testud already introduced such a “restricted” scaling function. In the context of his paper, a strange Cantor set K perturbs the multifractal analysis of the measure. Multifractal formalism breaks down at different levels. Testud introduces the scaling function τ K in which the sum is restricted to the dyadic intervals that meet the Cantor set K and proves that for all the “bad exponents”, the dimension of the level set is given by the Legendre transform τ K∗ . 5.2. PROPERTIES OF THE SCALING FUNCTION
Theorem 5.2. Let be a Borel subset of Rd such that ∂ess is nonempty and bounded. Define τ (q) as in (16). The following properties hold. 1. 2. 3. 4.
τ (0) = (∂ess ) and ∀q ≥ 0, τ (q) ≤ (∂ess ) − dq. ∀q ≥ 0, τ (q) ≥ d − 1 − dq. ∀q ∈ R, dim P (SB()) ≤ τ (q) + dq. ∀q ∈ R, dim H (WB()) ≤ τ (q) + dq.
191
MULTIFRACTAL ANALYSIS OF IMAGES
Proof of Theorem 5.2. 1. If λ ∩ ∂ess = ∅, then, Vol(3λ ∩ ) > 0 and (Vol(3λ ∩ ))0 = 1, thus τ (0) = (∂ess ). More precisely, if q > 0, then
(Vol(3λ ∩ ))q ≤ Card (Fn ∗ )(3.2−n )dq ; λ∈Fn ∗
it follows that τ (q) ≤ (∂ess ) − dq. 2. If n is large enough, using a similar argument as in Theorem 4.7, we can find at least c2(d−1)n cubes in Fn ∗ which are “quite balanced”. These cubes satisfy Vol(3λ ∩ ) ∼ 2−dn and the inequality follows. 3. It is easy to see that x0 ∈ SB() if and only if there exists 0 < η < 1/2 and n 0 such that ∀n ≥ n 0 ,
η≤
Vol(3λn (x0 ) ∩ ) ≤ 1 − η. (3.2−n )d
(17)
Let Un 0 , η denote the set of points that satisfy (17). Let α < dim P (SB()). We can find p, n 0 ∈ N∗ such that (Un 0 , 1/ p ) ≥ dim P (Un 0 , 1/ p ) > α. If Nk is the number of cubes λ ∈ Fk we need to cover Un 0 , 1/ p , then, Nk ≥ 2kα infinitely often. Suppose q > 0 (the proof is similar if q < 0). We get q
1 3dq q −k d (Vol(3λ ∩ )) ≥ Nk (3.2 ) ≥ q 2k(α−dq) p p λ∈F ∗ k
infinitely often. We conclude that τ (q) ≥ α − dq. 4. Note that x0 ∈ W B() if and only if there exists 0 < η < 1/2 such that ∀n 0 , ∃n ≥ n 0 ;
η≤
Vol(3λn (x0 ) ∩ ) ≤ 1 − η. (3.2−n )d
(18)
Let Vη denote the set of points that satisfy (18). Let p ∈ {2, 3, . . .}, n 0 ∈ N∗ and suppose that q > 0 (the proof is similar if q < 0). We can cover V1/ p with cubes of scale n ≥ n 0 such that Vol(3λ ∩ ) ≥ 1p (3.2−n )d . Let R be such a covering and τ > τ (q). We have
(Vol(3λ ∩ ))q diam (λ)τ diam (λ)τ +dq ≤ C λ∈R
λ∈R
≤C
n≥n 0
λ∈Fn ∗
(Vol(3λ ∩ ))q 2−nτ .
192
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
Moreover, if τ > τ > τ (q) and n 0 sufficiently large, then
(Vol(3λ ∩ ))q ≤ 2nτ . λ∈Fn ∗
It follows that
diam (λ)τ +dq ≤ C
λ∈R
n≥n 0
2n(τ
−τ )
≤
C . 1 − 2τ −τ
We conclude that dim H (V1/ p ) ≤ τ + dq and dim H (WB()) ≤ τ + dq.
5.3. THE MULTIFRACTAL FORMALISM ASSOCIATED WITH ∂ess
The proofs of points 3 and 4 in Theorem 5.2 allow to obtain estimates of the level sets of accessibility index. Theorem 5.3. Let be a Borel subset of Rd such that ∂ess is nonempty and bounded. Define τ (q) as in (16). If α ≥ 0, let E αw = x ∈ ∂ess ; αw (x) ≤ α and E αs = x ∈ ∂ess ; αs (x) ≤ α . For every q > 0, dim H (E αw ) ≤ (d + α)q + τ (q)
and
dim P (E αs ) ≤ (d + α)q + τ (q).
In particular, if α + d ≤ −τ− (0), then dim H (E αw ) ≤ τ ∗ (α + d)
and
dim P (E αs ) ≤ τ ∗ (α + d).
The proof uses the same ideas as in Theorem 5.2 and requires to introduce the set of points x ∈ ∂ess such that Vol (3λn (x) ∩ ) ≥ 2−n(α+d+ε) infinitely often (resp. for n large enough). In the same way, we can also prove the following twin result. Theorem 5.4. Let be a Borel subset of Rd such that ∂ess is nonempty and bounded. Define τ (q) as in (16). If α ≥ 0, let Fαw = x ∈ ∂ess ; αw (x) ≥ α and Fαs = x ∈ ∂ess ; αs (x) ≥ α . For every q < 0, dim P (Fαw ) ≤ (d + α)q + τ (q)
and
dim H (Fαs ) ≤ (d + α)q + τ (q).
and
dim H (Fαs ) ≤ τ ∗ (α + d).
In particular, if α + d ≥ −τ+ (0), dim P (Fαw ) ≤ τ ∗ (α + d)
MULTIFRACTAL ANALYSIS OF IMAGES
193
Remark 1: The set E αs (resp. Fαw ) is quite similar to the set of strong αaccessible points (resp. weak α-accessible points). Remark 2: The results in Theorem 5.3 and 5.4 are standard multifractal inequalities adapted to the context of boundaries (see (Brown et al., 1992)).
References Arneodo, A., Audit, B., Decoster, N., Muzy, J.-F., and Vaillant, C. (2002) Wavelet-based Multifractal Formalism: Applications to DNA Sequences, Satellite Images of the Cloud Structure and Stock Market Data, In A. Bunde, J. Kropp, and H. J. Schellnhuber (eds.), The Science of Disasters, New York, Springer, pp. 27–102. Bordenave, C., Gousseau, Y., and Roueff, F. (in press) The Dead Leaves Model: A General Tessellation Modelling Occlusion, Advances in Applied Probability. Brown, G., Michon, G., and Peyri`ere, J. (1992) On the Multifractal Analysis of Measures, Journal of Statistical Physics 66, 775–790. Cald´eron, A. P. and Zygmund, A. (1961) Local Properties of Solutions of Elliptic Partial Differential Equations, Studia Mathemalica 20, 171–227. Evans, L. and Gariepy, R. (1992) Measure Theory and Fine Properties of Functions, Boca Raton, FL, CRC Press. Falconer, K. (1990) Fractal Geometry: Mathematical Foundations and Applications, NewYork, John Wiley & Sons Ltd. Giusti, E. (1984) Minimal Surfaces and Functions of Bounded Variation, Bassle, Birkh¨auser. Gousseau, Y. and Morel, J.-M. (2001) Are Natural Images of Bounded Variation?, SIAM Journal on Mathematical Analysis 33, 634–648. Heurteaux, Y. (2003) Weierstrass Functions with Random Phases, Transactions of the American Mathematical Society 355, 3065–3077. Jaffard, S. (2004) Wavelet Techniques in Multifractal Analysis, In M. Lapidus and M. van Frankenhuijsen (eds.), Fractal Geometry and Applications: A Jubilee of Benoˆıt Mandelbrot, Vol. 72, Proceedings in Symposia in Pure Mathematics, AMS, pp. 91–152. Jaffard, S. (2006) Wavelet Techniques for Pointwise Regularity, Annales de La Faculte des Sciences de Touloure 15, 3–33. Jaffard, S. and Melot, C. (2005) Wavelet Analysis of Fractal Boundaries, Part 1: Local Regularity and Part 2: Multifractal Formalism, Communications in Mathematical Physics 258, 513–565. Jaffard, S., Meyer, Y., and Ryan, R. (2001) Wavelets: Tools for Science and Technology, S.I.A.M. Matheron, G. (1975) Random Sets and Integral Geometry, New York, John Wiley and Sons. Meyer, Y. (2001) Oscillating Patterns in Image Processing and Nonlinear Evolution Equations, University Lecture Series. 22. AMS. Naud, C., Schertzer, D., and Lovejoy, S. (1997) Radiative Transfer in Multifractal Atmospheres: Fractional Integration, Multifractal Phase Transitions and Inversion Problems, Vol. 85, New York, Springer. IMA Vol. Math. Appl., pp. 239–267. Parisi, G. and Frisch, U. (1985) On the Singularity Spectrum of Fully Developed Turbulence, In Turbulence and Predictability in Geophysical Fluid Dynamics, Proceedings of the International Summer School in Physics Enrico Fermi, North Holland, pp. 84–87. Riedi, R. (1995) An Improved Multifractal Formalism and Self-similar Measures, Journal of Mathematical Analysis and Applications 189, 462–490.
194
´ YANICK HEURTEAUX AND STEPHANE JAFFARD
Ros, A. (2005) The Isoperimetric Problem, In Global Theory of Minimal Surfaces, Clay Mathematic Proceedings, 2, Providence, RI, American Mathematical Society, pp. 175–209. Testud, B. (2006) Phase Transitions for the Multifractal Analysis of Self-similar Measures, Non Linearity 19, 1201–1217. Tricot, C. (1982) Two Definitions of Fractional Dimension, Mathematial Proceedings of the Cambridge Philosophical Society 91, 57–74. Ziemer, W. (1989) Weakly Differentiable Functions, New York, Springer.
CHARACTERIZATION AND CONSTRUCTION OF IDEAL WAVEFORMS
Myoung An and Richard Tolimieri Prometheus Inc., 17135 Front Beach Road, Unit 9, Panama City Beach, FL 32413
Abstract. Using the finite Zak transform (FZT) periodic polyphase sequences satisfying the ideal cyclic autocorrelation property and sequence pairs satisfying the optimum cyclic cross correlation property are constructed. The Zak space correlation formula plays a major role in the design of signals having special correlation properties. Sequences defined by permutation matrices in Zak space always satisfy the ideal cyclic autocorrelation property. A necessary and sufficient condition on pairs of permutation matrices in Zak space such that the corresponding pairs of sequences satisfy the optimum cyclic correlation property is derived. This condition is given in terms of a collection of permutations, called ∗-permutations. The development is restricted to the case N = L 2 , L an odd integer. An algorithm for constructing ∗-permutations is provided.
1. Introduction In this work we use the finite Zak transform (FZT) to construct periodic polyphase sequences satisfying the ideal cyclic autocorrelation property and sequence pairs satisfying the optimum cyclic cross correlation property. In communication theory these sequences are called Z q -sequences. Their correlation properties have been extensively studied in Chu (1972), Frank and Zadoff (1973), Heimiller (1961), Popovic (1992), and Suehiro and Hatori (1988). A detailed treatment of Z q -sequences and their application to spread spectrum can be found in Mow (1995). The approach in these works has the merit that it is direct. The approach in this work is based on designing periodic sequences in Zak space (ZS). The ZS correlation formula plays a major role in the design of signals having special correlation properties. Sequences defined by permutation matrices in ZS always satisfy the ideal cyclic autocorrelation property. The analogous result proved directly in communication theory can be found in Frank and Zadoff (1973) and Heimiller (1961). We will derive a necessary and sufficient condition on pairs of 195 J. Byrnes (ed.), Imaging for Detection and Identification, 195–206. C 2007 Springer.
196
MYOUNG AN AND RICHARD TOLIMIERI
permutation matrices in ZS such that the corresponding pairs of sequences satisfy the optimum cyclic correlation property. This condition is given in terms of a collection of permutations, called ∗-permutations. The development is restricted to the case N = L 2 , L an odd integer. We will show examples and provide an algorithm for constructing ∗-permutations. We believe this concept to be new. The main consequence is that we can construct significantly larger collections of sequence pairs satisfying the optimal correlation property than those constructed in Popovic (1992). The following notation and elementary results will be used throughout this work. A vector x ∈ C N is written ⎤ ⎡ x0 ⎢ x1 ⎥ ⎥ ⎢ x = ⎢ .. ⎥ = [xn ]0≤n
N −1
xn yn∗ ,
n=0
where ∗ denotes complex conjugation. enN , 0 ≤ n < N , is the vector in C N having 1 in the nth component and 0 in all other components. The set {enN : 0 ≤ n < N } is an orthonormal basis of C N . The notation xy, and X Y denotes the vector in C N formed by componentwise multiplication. The trace of an N × N matrix x is Tr x =
N −1
xn,n .
n=0
F is the N -point Fourier transform matrix F = [w mn ]0≤m, n
1
w = e2πi N .
The N × N time reversal matrix R N is defined by R N x = [x N −n ]0≤n
N − n taken modulo N .
F and R N are related by F RN = RN F = F ∗
CONSTRUCTION OF IDEAL WAVEFORMS
197
and F 2 = N RN . We also have R 2N = I N , the identity matrix, and F F ∗ = N IN . As a result F −1 = N −1 F ∗ . S N is the N × N cyclic shift matrix defined by S N x = [x N −1+n ] ,
x ∈ CN .
D N is the diagonal matrix defined by
D N = Diag [w n ]0≤n
1
w = e2πi N .
An important result is F S N F −1 = D N . Linear combinations of vectors and matrices will be used. For example N −1
xn D nN = Diag (Fx) ,
x ∈ CN .
n=0
1 N is the vector in C N all of whose components are 1. E(L , K ) is the L × K matrix of all ones. Perm(N ) is the group of permutations of the set {0, 1, . . . , N − 1}. An element π ∈ Perm(N ) is defined by (π(0) π(1) · · · π (N − 1)) . If X is an N × N matrix X = [X 0 · · · X N −1 ] , where X n , 0 ≤ n < N is the nth column vector of X , and for π ∈ Perm(N ) we define X π = X π (0) · · · X π (N −1) .
198
MYOUNG AN AND RICHARD TOLIMIERI
2. Cyclic Correlation We identify C N with the space of periodic complex sequences of period N . For x, y ∈ C N the cyclic correlation v = x ◦ y ∈ C N is defined by vm =
N −1
∗ xn yn−m .
n=0
x ◦ x is called the cyclic autocorrelation of x, while if x and y are distinct, x ◦ y is called the cyclic cross correlation of x and y. A vector x ∈ C N satisfies the ideal cyclic autocorrelation property if ⎡ ⎤ 1 ⎢0⎥ ⎢ ⎥ x ◦ x = ||x||2 e0N = ||x||2 ⎢ .. ⎥ , ⎣.⎦ 0 where ||x|| is the norm of x. If x and y satisfy the ideal cyclic autocorrelation property, then we must have Sarwate (1979) |vn | ≤
||x|| ||y|| , √ N
0 ≤ n < N.
A pair of vectors x, y ∈ C N satisfying the ideal cyclic autocorrelation property satisfies the optimum cross correlation property if |vn | =
||x|| ||y|| , √ N
0 ≤ n < N.
In this case we say that the pair (x, y) satisfies the ideal cyclic correlation property.
3. Zak Space Correlation Formula The FZT of a vector determines a two-dimensional time–frequency representation of the vector called the Zak space representation. In this section we define the FZT and derive the ZS correlation formula. Set N = L K = L 2 M, L and M positive integers. Identify C L × C K with the space of L × K complex matrices. For y ∈ C N define the vectors yk ∈ C L , 0 ≤ k < K , by yk = [yk+l K ]0≤l
0 ≤ k < K.
CONSTRUCTION OF IDEAL WAVEFORMS
199
We called the vectors yk , 0 ≤ k < K , the decimated components of y. Define M L y = [y0 · · · y K −1 ] . M L is a linear isomorphism from the one-dimensional space C N onto the two-dimensional space C L × C K . No computations are involved. It is simply a data rearrangement. For the remainder of this work, F is the L-point Fourier transform matrix. Define Z L y = F M L y,
y ∈ CN .
Z L y is called the L × K ZS representation of y. The mapping Z L : C N −→ C L × C K is called the L × K FZT. The FZT is, up to scalar factor L, a linear isometry from C N onto C L × C K . Theorem 1. If x, y ∈ C N and v = x ◦ y, then V0 =
K −1
X m Ym∗
m=0
and Vk = D L−1
k−1
X m Y K∗ −(k−m) +
m=0
K −1
∗ X m Ym−k ,
1 ≤ k < K,
m=k
where Vk is the kth column of Z L v. Example 1. N = 9 and L = K = 3. V0 = X 0 Y0∗ + X 1 Y1∗ + X 2 Y2∗ , V1 = D3−1 X 0 Y2∗ + X 1 Y0∗ + X 2 Y1∗ ,
V2 = D3−1 X 0 Y1∗ + X 1 Y2∗ + X 2 Y0∗ . 4. Permutation Waveforms In this section we use the ZS correlation formula to derive correlation properties of signals whose ZS representations are permutation matrices. Set N = L 2 , E = I L and F = [F0 · · · FL−1 ] . Define eπ ∈ C N by Z L eπ = E π ,
π ∈ Perm(L).
200
MYOUNG AN AND RICHARD TOLIMIERI
Then M L eπ = F −1 E π = L −1 Fπ∗ . In the following discussion set 1 = 1 L and 0 = 0 L . Consider π ∈ Perm(L) and set v = eπ ◦ eπ . By the ZS correlation formula V0 =
L−1
E π (m) E π (m) =
m=0
L−1
E π(m) = 1
m=0
and Vk = D L−1
k−1
E π (m) E π (m−k) +
m=0
L−1
E π (m) E π (m−k) ,
1 ≤ k < L,
m=k
where m − k is taken modulo L. Since E π (m) E π (m−k) = 0,
1 ≤ k < L,
we have Vk = 0, 1 ≤ k < L, proving the following result. Theorem 2. If π ∈ Perm(L), then Z L (eπ ◦ eπ ) = [1 0 · · · 0] and eπ satisfies the ideal cyclic autocorrelation property. 5. ∗-Permutations In this section we determine a necessary and sufficient algebraic condition on a permutation pair (π, σ ) such that (eπ , eσ ) satisfies the ideal cyclic correlation property. This condition is given in terms of a collection of permutations, the ∗-permutations. For an odd integer L the collection of ∗-permutations is nonempty. Details are given in the next section. For even L it is likely, but not proved, that the collection of ∗-permutations is the empty set. Suppose γ ∈ Perm(L), and define the subsets k (γ ) = {0 ≤ m < L:m = γ (m − k)},
0 ≤ k < L,
where m − k is taken modulo L. Setting E γ = E γ (0) · · · E γ (L−1) , we have
Tr E γ SL−k = o (k (γ )) ,
where o( ) denotes the number of elements.
0 ≤ k < L,
CONSTRUCTION OF IDEAL WAVEFORMS
201
Theorem 3. For γ ∈ Perm(L), L=
L−1
o (k (γ )) .
k=0
Proof.
By linearity of T r L−1 k=0
Tr
E γ SL−k
= Tr
Eγ
L−1
SL−k
= T r E γ I (L , L) ,
k=0
where I (L , L) is the L × L matrix of all ones. The theorem follows from
T r E γ I (L , L) = T r (I (L , L)) = L . The following corollary will be useful in describing the sets k (γ ), 0 ≤ k < L, when γ is a ∗-permutation. Corollary 1. For γ ∈ Perm(L), o (k (γ )) > 0, 0 ≤ k < L, if and only if o (k (γ )) = 1, 0 ≤ k < L. For any γ1 , γ2 ∈ Perm(L), define γ1 + γ2 by (γ1 + γ2 )(r ) ≡ γ1 (r ) + γ2 (r ) mod L ,
0 ≤ r < L.
γ1 + γ2 is a mapping from Z/L into itself, but is not necessarily a permutation. Denote the identity permutation in Perm(L) by π0 . γ is called a ∗-permutation if π0 − γ is a permutation. Since
(π0 − γ ) γ −1 = − π0 − γ −1 , γ is a ∗-permutation if and only if γ −1 is a ∗-permutation. For odd L, there always exist ∗-permutations. The mapping γ (n) ≡ 2n mod L ,
0 ≤ n < L,
is a ∗-permutation. For the rest of this section γ is a ∗-permutation. Since γ −1 is a ∗permutation
−1 πm = π0 − γ −1 is a permutation, where πm = (m 0 m 1 · · · m L−1 ). πm satisfies πm − π0 = γ −1 πm and πm is a ∗-permutation. This means m k − k ≡ γ −1 (m k ) mod L
202
MYOUNG AN AND RICHARD TOLIMIERI
and m k = γ (m k − k) ,
0 ≤ k < L,
which by Theorem 3 implies the following result. Theorem 4. If γ is a ∗-permutation, then k (γ ) = {m k },
0 ≤ k < L.
The converse of Theorem 4 is also true. Suppose σ ∈ Perm(L) and k (σ ) = {rk },
0 ≤ r < L.
rk is the unique solution of r ≡ σ (r − k) mod L ,
0 ≤ k < L.
This implies that rk = r j ,
0 ≤ j, k < L , k = j,
and we can form the permutation πr = (r0 r1 · · · r L−1 ) . πr satisfies πr = σ (πr − π0 ) implying πr−1 = π0 − σ −1 . Since πr−1 is a permutation, σ −1 and σ are ∗-permutations, proving the converse. Theorem 5. Suppose π, σ ∈ Perm(L) and γ = π −1 σ . The sequence pair (eπ , eσ ) satisfies the ideal correlation property if and only if γ is a ∗-permutation. Set L equal to the set of ∗-permutations in Perm(L). For σ ∈ L the set of pairs of permutations L (σ ) = {(π, πσ ) : π ∈ Perm(L)} determines a set of pairs of sequences {(eπ , eπ σ ) : π ∈ Perm(L)} satisfying the ideal cyclic correlation property. Suppose N = L K = L 2 R, L an odd positive integer, R a positive integer. Define x ∈ C N by Z L x = [E π . . . E π ] D,
CONSTRUCTION OF IDEAL WAVEFORMS
203
where E π is the L × L permutation matrix corresponding to π ∈ Perm(L) and D is a K × K diagonal matrix whose diagonal entries have absolute value one. Under these conditions we have the following theorem. Theorem 6. x satisfies the ideal cyclic autocorrelation property if and only if, for 0 ≤ s < L and 1 ≤ r < R, v −π(s)
r −1
∗ dt L+s d(K +t−r )Ls +
R−1
∗ dt L+s d(t−r )L+s = 0,
t=r
t=0 1 L
where v = e2πi . Example 2. π = π0 , R = 2. The condition in Theorem 6 is ∗ v −s ds d L+s + d L+s ds∗ = 0,
1
v = e2π L L , 0 ≤ s < L .
One class of solutions is given by choosing d0 , . . . , d L−1 arbitrarily having absolute value one, then setting d L+s = e−2πi 2L ids , s
0 ≤ s < L.
6. Golay Pairs Suppose π, σ ∈ Perm(L) such that γ = π −1 σ is a ∗-permutation. Define x ∈ C N , N = L 2 , by Z L x = Eπ + Eσ and set v = x ◦ x. The problem is to determine the collection of all permutation pairs (π1 , σ1 ), π1 , σ1 ∈ Perm(L) such that γ1 = π1−1 σ1 is a ∗-permutation and v1 = x1 ◦ x1 = v, where x1 ∈ C N is defined by Z L x1 = E π1 + E σ1 . Given such a permutation pair (π1 , σ1 ), the pair (x, y), where Z L y = E π1 − E σ1 , is a Golay pair. Write
and
−1 πm = π0 − γ −1 ,
−1 πm1 = π0 − γ1−1 ,
πn = (π0 − γ )−1 πn1 = (π0 − γ1 )−1 .
204
MYOUNG AN AND RICHARD TOLIMIERI
As in Section 5, the sum and difference of permutations is taken modulo L. We will have v = v1 if the following two conditions are satisfied.
A. π (m k ) = π1 m 1k and π (n k − k) = π1 n 1k − k , 0 ≤ k < L. B. 0 ≤ m k < k if and only if 0 ≤ m 1k < k and 0 ≤ n k < k if and only if 0 ≤ n 1k < k, 0 ≤ k < L. We can have v = v1 under other conditions, but we will only discuss this case. Given a permutation pair (π, σ ) such that γ = π −1 σ is a ∗-permutation, we say that a permutation pair (π1 , σ1 ), such that γ1 = π1−1 σ1 is a ∗permutation satisfies condition A with respect to (π, σ ) if ππm = π1 πm1 and π (πn − π0 ) = π1 (πn1 − π0 ) . This statement is equivalent to condition A above. Algorithm A (π, σ ) is a permutation pair such that π −1 σ is a ∗permutation. The algorithm computes a permutation pair (π1 , σ1 ) such that π1−1 σ1 is a ∗-permutation and (π1 , σ1 ) satisfies condition A with respect to (π, σ ) r Compute δ1 = π −1 − σ −1 . Since π −1 σ is a ∗-permutation, δ1 is a permutation. r Compute δ2 ∈ Perm(L) such that δ2 δ −1 is a ∗-permutation. 1 r Compute σ −1 = −δ2 . 1 r Compute π −1 = δ1 − δ2 . Since δ2 δ −1 is a ∗-permutation, π −1 is a permu1 1 1 tation. Example 3. L = 5, u = 2 and v = 1. The permutation pair (γ3 , γ4 ) satisfies condition A with respect to (γ1 , γ2 ). Since πm = γ2 and πm1 = γ4 , we have the following table. k 0 1 2 3 4
mk 0 2 4 1 3
m 1k 0 4 3 2 1
π1 m 1k 0 2 4 1 3
Example 4. L = 5, u = 2 and v = 4. The permutation pair (γ4 , γ1 ) satisfies condition A with respect to (γ1 , γ2 ). Since πm = γ2 and πm1 = γ3 ,
CONSTRUCTION OF IDEAL WAVEFORMS
we have the following table. k 0 1 2 3 4
mk 0 2 4 1 3
m 1k 0 3 1 4 2
205
π1 m 1k 0 2 4 1 3
A modification of algorithm A is convenient for dealing with the construction of permutation pairs satisfying condition B. Algorithm A1 Consider a permutation pair (π, σ ) such that π −1 σ is a ∗-permutation, and a ∗-permutation πm1 . The algorithm constructs a permutation pair (π1 , σ1 ) such that π1−1 σ1 is a ∗-permutation,
−1 πm1 = π0 − σ1−1 π1 and ππm = π1 πm1 . r Compute π1 = ππm π −11 . m r Compute σ1 = π1 πm1 (πm1 − π0 )−1 . Since πm1 is a ∗-permutation, σ1 is a well-defined permutation. Theorem 7. For 0 ≤ k < L, 0 ≤ m k < k if and only if L − k ≤ n L−k < L . Choosing the ∗-permutation πm1 in algorithm A1 such that, for 0 ≤ k < L, 0 ≤ m k < k if and only if 0 ≤ m 1k < k, the permutation pair (π1 , σ1 ) constructed in algorithm A1 satisfies condition A and condition B with respect to (π, σ ). References Chu, D. C. (1972) Polyphase Codes with Good Periodic Correlation Properties, IEEE Transactions on Information Theory IT 18, 531–532. Frank, R. and Zadoff, S. (1973) Phase Shift Pulse Codes with Good Periodic Correlation Properties, IEEE Transactions on Information Theory IT 19, 244. Heimiller, R. C. (1961) Phase Shift Pulse Codes with Good Periodic Correlation Properties IRE Transactions on Information Theory IT 7, 254–257. Mow, W. H. (1995) Sequence Design for Spread Spectrum, Hong Kong, Chinese University Press.
206
MYOUNG AN AND RICHARD TOLIMIERI
Popovic, B. M. (1992) Generalized Chirp-like Polyphase Sequences with Optimum Correlation Properties, IEEE Transactions on Information Theory IT 38 1406–1409. Sarwate, D. V. (1979) Bounds on Cross Correlation and Autocorrelation of Sequences, IEEE Transactions on Information Theory IT 25 720–724. Scholtz, R. A. and Welch, L. R. (1978) Group Characters: Sequences with Good Correlation Properties, IEEE Transactions on Information Theory IT 24 (5). Suehiro, N. and Hatori, M. (1988) Modulatable Orthogonal Sequences and their Applications to SSMA Systems, IEEE Transactions on Information Theory IT 34 93–100. Tolimieri, R. (2004) Final Technical Report, Brooks AFB. Tolimieri, R. and An, M. (2005) Progress Report, Brooks AFB. Tolimieri, R. and An, M. (in preparation) Affine Group Zak Transform.
IDENTIFICATION OF COMPLEX PROCESSES BASED ON ANALYSIS OF PHASE SPACE STRUCTURES Teimuraz Matcharashvili1,2 (
[email protected]), Tamaz Chelidze1 (
[email protected]), and Manana Janiashvili3 1 Institute of Geophysics, 1 Alexidze str. 0193 Tbilisi, Georgia 2 Georgian Technical University, 77 Kostava ave. 0193, Tbilisi,Georgia 3 Institute of Cardiology, 2 Gudamakari str. 0141 Tbilisi,Georgia
Abstract. The problem of investigation of temporal and/or spatial behavior of highly nonlinear or complex natural systems has long been of fundamental scientific interest. At the same time it is presently well understood that identification of dynamics of processes in complex natural systems, through their qualitative description and quantitative evaluation, is far from a purely academic question and has an essential practical importance. This is quite understandable as systems with complex dynamics abound in nature and examples can be found in very different areas such as medicine and biology (rhythms, physiological cycles, epidemics), atmosphere (climate and weather change), geophysics (tides, earthquakes, volcanoes, magnetic field variations), economy (financial markets behavior, exchange rates), engineering (friction, fracturing), communication (electronic networks, internet packet dynamics) etc. The past two decades of research on qualitative and especially quantitative investigations of dynamics of real processes of different origin brought significant progress in the understanding of behavior of natural processes. At the same time serious drawbacks have also been revealed. This is why exhaustive investigation of dynamical properties of complex processes for scientific, engineering or practical purposes is now recognized as one of the main scientific challenges. Much attention is paid to elaboration of appropriate methods aiming to measuring the complexity of both global and local dynamical behaviors from the observed data sets—time series. This chapter presents a short overview of modern methods of qualitative and quantitative evaluation of dynamics of complex natural processes such as calculation of Lyapunov exponents and fractal dimensions, recurrence plots and recurrence quantification analysis. Other related methods are also described. The traditional approach to studying dynamical behavior of complex nonlinear systems is to reconstruct from observation scalar time series state or phase space plot. This graph indicates how the systems behavior changes over the time. We focus on the methods of identification and quantitative evaluation of complex dynamics that are based on the testing of evolutionary and geometric properties 207 J. Byrnes (ed.), Imaging for Detection and Identification, 207–241. C 2007 Springer.
208
TEIMURAZ MATCHARASHVILI ET AL.
of phase space graphs as images of investigated complex dynamics. For practical examples of the application of nonlinear methods for identification of complex natural processes, our results on medical, geophysical, hydrological, and stick-slip time series analysis are presented. Key words: complexity, dynamics, nonlinear time series analysis, natural processes. 1. Introduction A common statement in the scientific literature is that the complexity of nature has always been an inevitable problem in our efforts towards understanding and describing spatial forms and temporal evolution of natural systems. “Complex” and “complexity” are often heard scientific terms, though there is little consensus on their official definitions and they still have a variety of meanings depending on the context (Arecchi and Fariny, 1996; Shiner et al., 1999). This is because the study of complexity, in both a dynamical and structural sense, is in its infancy and is a rapidly developing field at the forefront of many areas of science including mathematics, physics, geophysics, economics, biology, etc. Available current scientific definitions of complexity are often so controversial that the half-serious statement that the complexity problem is itself complex (Yao et al., 2004) looks quite reasonable. In spite of this, it is clear that natural systems and/or processes are complex due to their nonlinearity, an intrinsic property of the underlying laws conditioning absence of strict determinism of the universe. The presence of this property is revealed in the specificity of systems’ temporal behavior and spatial structures, what we name as complex (Kantz and Schreiber, 1997; Matcharashvili et al., 2000). In order to avoid misunderstanding caused by tradition, associating the term nonlinearity exceptionally with dynamics, it should be stressed that at present the terms nonlinearity and complexity commonly are regarded as synonyms. This is convenient in order to address both complex nonlinear temporal evolution and complex non-Euclidean spatial forms of natural systems. As an inherent property, nonlinearity or complexity is revealed in the absence of a deterministic cause-effect relation, observed on different spatial and temporal scales. This property incorporates phenomena with a very broad diversity of dynamical features. Generally this diversity manifests itself in a certain kind of hierarchy of dynamical behavior ranging from strict determinism to total randomness. Most important is that between these extremes there are many intermediate states that reveal different degrees of orderliness such as, e.g., periodicity, quasiperiodicity, deterministic chaos, low and high dimensional dynamics, hyperchaos, etc. (Kantz and Schreiber, 1997; Theiler and Prichard, 1997). Until recently both qualitative detection and quantitative evaluation
IDENTIFICATION OF COMPLEX PROCESSES
209
of these intermediate states was impossible due to the absence of a corresponding mathematical formalism and appropriate data analysis methods. It is necessary to mention that traditional linear methods mostly are not effective for complex processes of interest. This is why in different fields of science and practice there has been an explosion of papers searching for methods aiming at detection of peculiarities of complex systems evolution in order to achieve reliable identification of processes by their dynamics. At present a time series nonlinear analysis technique is elaborated (Abarbanel et al., 1993; Berge et al., 1984; Eckman and Ruelle, 1985; Kantz and Schreiber, 1997; Packard et al., 1980; Rapp et al., 1993), which often (but not always due to the same cause—complexity of nature) enables us to achieve correct qualitative and quantitative assessment of complex processes by measuring their dynamical characteristics. These methods aiming to measure the complexity of both local and global spatial and temporal scales are universal and applicable to a very broad range of complicated processes. There are several main approaches to quantify the complexity of processes by analysis of measured time series (Boffetta et al., 2002). Some of them have roots in dynamical systems and fractal theory and include Lyapunov exponents, Kolmogorov-Sinai entropy and fractal dimensions (Abarbanel et al., 1993; Eckman and Ruelle, 1985; Kantz and Schreiber, 1997). These methods are based on reconstruction and testing of phase space objects that are equivalent to unknown dynamics. The other methods stem from information theory including Shannon entropy (Schreiber, 1993), algorithmic complexity (Shiner et al., 1999; Yao et al., 2004) and mostly are based on symbolic dynamics. For different complex systems, various approaches to complexity measurements can be used. The common problem of many methods, including ones based on phase space reconstruction, is the requirement of long high quality stationary data sets, which is not always easy to fulfill when dealing with natural or laboratory systems. To overcome these difficulties new phase space structure testing measures have been proposed, such as recurrence plots (RP) and recurrence quantitative analysis (RQA). These methods equip us to gain new understanding of complex natural dynamics. At the same time it should be stressed that, notwithstanding all efforts that are obtained by different modern methods, measures of complex dynamics sometimes may be mutually complementary or may be even contradictory and should be carefully tested.
2. Phase Space Structures as Image of Dynamics As it was mentioned above here we focus on complex dynamics investigation methods based on phase space structure testing. The practical method to reconstruct the equivalent to real complex dynamics structure from the
210
TEIMURAZ MATCHARASHVILI ET AL.
time series of a single observable was invented by Packard et al. (1980) and Takens (1981). Generally, the state of an investigated system can be described by its state variables x1 (t), x2 (t), . . . , xd (t). These state variables form a vector x (t) in a d-dimensional phase space. The rotation of x (t) forms phase space trajectores or orbits. When the system of interest is nonrandom it has a property known as recurrence (Ruelle, 1994). This means that after some transients, the system comes back close to the some points in the phase space again and again. The character of time evolution of trajectory forms a phase space structure—the attractor of the system. The shape of the attractor provides essential information on dynamical features of the investigated process. Thus the attractor in phase space can be considered as an equivalent image of dynamics under consideration. Generally a point in a phase space is associated with a single state of the system which is fully defined by a set of m dynamical variables. It is clear that, in order to have a complete description of the state of the dynamical system of interest, these m physical quantities should all be measured, at least in principle. Unfortunately, in most experimental situations, not all (and often only a single) physical quantity of a state variable can be measured; all that we have is a one-dimensional time series and from this series one would like to learn as much as possible about the system that generated the signal. As a rule measurements result in discrete time series gi (t), where t = i(t) and t is the sampling rate. Commonly the sampling rate is constant, forming equidistant time series, but this is not always the case. Time series taken at time intervals of different length, so called unevenly sampled time series, are also quite common (Schreiber and Schmitz, 1999). So far as systems variables are coupled a single component contains essential information about the dynamics of the whole system (Castro and Sauer, 1997; Kantz and Schreiber, 1997; Rapp et al., 1993). Therefore the trajectory reconstructed from this scalar time series is expected to have the same properties as the trajectory embedded in the original phase space, formed by all m state variables. Loosely speaking Packard et al. (1980) and Takens (1981) independently proposed the idea of using a single sequence of measurements to transform process dynamics into the phase space structure (the image of investigated dynamics) to gain the information on the unknown underlying dynamics from this structure. According to the embedding theorem, there exists a one-to one image of an attractor in embedding space, if the embedding dimension is sufficiently high (Hegger et al., 1999). The idea was successfully realized after Takens (1981) proved that it is possible to reconstruct from a single scalar time series a new attractor which is diffeomorphically equivalent to the attractor in the original state space of the system under study. Essentially two methods of reconstructions are available, delay coordinates and derivative coordinates techniques. Derivative coordinates were originally proposed by Packard et al. (1980) and consist of using
IDENTIFICATION OF COMPLEX PROCESSES
211
the higher order derivatives of the measured time series as the independent coordinates. Since derivatives are more susceptible to noise this technique is usually not very practical for real life data, which are inherently very noisy. Therefore the method of delay coordinates was recognized as a more practical tool. Data delayed by a lag T helps to exclude distortions of analyzed dynamics caused by temporal closeness of observations. The T value should be large enough to avoid insubstantial functional dependence between data and not too large to make them be statistically completely independent. If these conditions are fulfilled, a set of d dimensional vectors in d dimensional space can be reconstructed: X(i) = [x(i), x(i + T ), x(i + 2T ), . . . , x(n + (d − 1)T )].
(1)
The Takens (1981) theorem guarantees that the dynamics reconstructed in this way, if properly embedded, is equivalent to the dynamics of the original, underlying system (Packard et al., 1980; Takens, 1981). Equivalence of these two dynamics means that their dynamical invariants (e.g. generalized dimensions, the Lyapunov spectrum, to be described below) are identical. The delay time, T , for the reconstructions can be calculated from the autocorrelation function or from the mutual information (MI) first minimum. The averaged mutual information evaluates the amount of bits of information shared between two data sets over a range of time delays and is deN j) fined as: I (X, Y ) = p(i, j) log2 px p(i, (Abarbanel et al., 1993; Cover (i) p y ( j) i, j
and Thomas, 1991; Kantz and Schreiber, 1997; Kraskov et al., 2004) , where px (i) and p y ( j) are probabilities of finding measurements x(i) and x(i + T ) in the time series, p(i, j) is the joint probability of finding measurements x(i) and x(i + T ) in the time series, and T is the time lag. It is important to mention that in contrast to the linear correlation coefficient (which also can be used for delay time calculation), MI is also sensitive to dependencies which are not linear, i.e. do not manifest themselves in the covariance. MI is zero if and only if the two random variables are strictly independent. The MI calculation is not only used to find the correct time value delay, it is also important as a tool for providing information on the probability distribution of phase space points. In order to define the correct value of the embedding dimension de ≥ 2da + 1 (where de is the dimension of the embedding space and da is the attractors dimension) one uses the so-called false nearest neighbor method (Hegger et al., 1999; Kennel et al., 1992). The percentage of false nearest neighbors (phase points projected into neighborhoods of points to which they would not belong in higher dimensions) approaches zero while the dimension of the phase space increases. Generally recurrence to certain states or phase
212
TEIMURAZ MATCHARASHVILI ET AL.
space locations is a feature of deterministic dynamics (more generally nonrandom dynamics with some extent of determinism). Therefore many complex dynamics analysis statistical tools are based on the inspection of neighborhoods in a reconstructed phase space (e.g. see Casdagli, 1997). Indeed, if phase structures of two processes located in correctly embedded phase space are close then the dynamics of interest will also be close by their characteristics (of course they will not completely coincide due to dynamical instability and/or noise). We will shortly describe several methods of phase space structure testing. Since the attractor (phase space image of dynamics) is formed, two most popular ways for the quantitative evaluation of complexity of the analyzed dynamics are: quantification of the average evolution patterns of neighboring trajectories in a state space and/or quantification of the geometric patterns of state space object. Evolution of phase space trajectories could be analyzed by calculation of the spectrum of Lyapunov exponents or by calculation of the maximal Lyapunov exponent λmax . Generally Lyapunov exponents quantify the average exponential rate of divergence of neighbouring trajectories in a state space, and thus provide a measure of the sensitivity of a system to local perturbations (Rosenstein et al., 1993; Kantz and Schreiber, 1997). For measured data sets the maximum Lyapunov exponent λmax for a dynamical system can be determined from d(t) = d0 eλmax t , where d(t) is the mean divergence between neighboring trajectories in a state space at time t and d0 is the initial separation between neighboring points. There are several methods (Wolf et al., 1985; Sato et al., 1987; Rosenstein et al., 1993) for estimating λmax , which often suffer from serious practical drawbacks. Namely, they are unreliable for small data sets and need essential computational resources. Generally, if λ < 0 phase trajectories are drawing together and the dynamical system has an attractor in the form of a fixed point. When λ = 0, the system tends to the stable limit cycle. λ > 0 means that phase trajectories are moving away and such a system may be chaotic or random (Rosenstein et al., 1993). In order to characterize unknown dynamics by the geometry of reconstructed phase structures the algorithm for calculation of fractal dimensions of phase space point sets should be used. It is known that the fractal dimension of an attractor roughly characterizes the complexity of a system and gives a lower bound for the number of equations or variables needed for modeling the underlying dynamical process. There are several such measures based on the quantification of self similar properties of phase space objects. These measures are information dimension (di ), Hausdorf dimension dH , etc. (Abarbanel et al., 1993; Kantz and Schreiber, 1997). We shortly describe here only the method of computing the correlation dimension or fractal dimension as proposed by Grassberger and Procaccia (1983)(GPA). In spite of difficulties, when applied
IDENTIFICATION OF COMPLEX PROCESSES
213
to real data sets, GPA remains the most popular and often used method of quantifying geometrical features of phase space objects. This is probably due to the simplicity of the algorithm (Bhattacharya, 1999) and the fact that the same intermediate calculations are used to estimate both fractal dimension and entropy. The correlation sum, C(r, N ), quantifies the way in which the density of points in state space scales with the size of the volume containing those points. The correlation sum C(r ) of a set of points in a vector space is defined as the fraction of all possible pairs of points which are closer than a given distance r . The basic formula useful for practical application is N N 2 C(r, N ) = (r − x1 − x j ) (N − w)(N − w + 1) i=1 j=i+w
(2)
where (x) is the Heaviside step function, (x) = 0 if x < 0 and (x) = 0 if x ≥ 0; xi − x j is the Euclidian norm (i = j is excluded); w is Theiler’s window. For fractal systems for long enough time series and for small r , the relationship C(r ) ∝ r ν is correct. Ordinarily such a dependence is correct only for the restricted range of r values, the so-called scaling region. Correlation dimension ν or d2 is defined as ν = d2 = lim
r →0
log C(r ) . log(r )
(3)
In practice the value of d2 is found from the slopes of log C(r, N ) − log r curves for different phase space dimensions. The correlation dimension of an unknown process is the saturation value of d2 , which does not change with increasing phase space dimension. If saturation does not take place the correlation dimension is infinitely large, which is typical for random processes. For a correct analysis it is necessary to have long enough data sequences, at least N ≥ 10d/2 , where N is the length of the time series and d is the dimension of the attractor (Abarbanel et al., 1993). The three dimensions above satisfy the relation d2 < di < dH , with equality when the points in the state space are distributed uniformly over the attractor. In spite of the popularity of the d2 calculation method, GPA results must be interpreted with great care as it is well known that linear stochastic processes can also mimic low-dimensional dynamics (Rapp et al., 1993; Theiler and Prichard, 1997). In other words, the saturation of a correlation dimension and the existence of positive Lyapunov exponents cannot always be considered as decisive proof of deterministic chaos, a dynamical regime which is predictable in the sense of patterns, and is closest to quasiperiodicity (Kantz and Schreiber, 1997; Rapp et al., 1993). Since linear correlations lead to many spurious conclusions in nonlinear time series analyses, it is important to verify results obtained using the so-called surrogate data approach. This is a method to test the null hypothesis that the
214
TEIMURAZ MATCHARASHVILI ET AL.
analyzed time series is generated by a specific process with known linear properties (Theiler et al., 1992). It should be stressed again that the above phase space measures have strict restrictions in the sense of time series length and are mostly relevant for the analysis of low dimensional or deterministically chaotic systems. When the dynamics of the investigated process is more complex or when the dimension of the underlying attractor is moderately large, say d2 > 5, results of dimensional analysis on a finite amount of real data series are not reliable enough (Schreiber and Schmitz, 1999). Moreover the real data series are often very noisy and contain measurement noise as well as dynamical noise (noise interacting with dynamics); then the conventional estimates fail as well. Therefore when we deal with complex dynamics, a less ambitious and more realistic goal commonly is to search for the inherent nonlinearity of the processes, or to rank them by the extent of nonlinearity. The practical importance of this statement becomes clear in the light of known facts that in most cases the dynamical behavior of natural scale-invariant processes is nonrandom, revealing nonlinear structure. Very seldom is there some evidence of a deterministic chaotic type of dynamics (Goltz, 1998; Marzochi, 1996; Theiler and Prichard, 1997). The mentioned method of surrogate data equips us for testing the nonlinear structure of complex dynamics. Indeed, the concept of surrogates was introduced (Theiler et al., 1992) in order to detect nonlinearity in time series. The surrogate data is inherently a stochastic signal which mimics certain statistical properties of the original signal, such as temporal autocorrelation and Fourier power spectra. The surrogates can be constructed from the original time series on the basis of different null hypotheses. The three types of surrogates used most often address the three main hypotheses, namely, temporally independent noise, linearly filtered noise, and nonlinear transformation of linear filtered noise. So whenever we try to quantify the degree of nonlinearity, the results of calculating the above measures should be compared with the similar quantities for surrogate data sets. Phase randomized surrogate sets (PR—obtained by destroying the nonlinear structure through randomization of the phases of a Fourier transform of the original time series and the following inverse transformation) is often used to test the null hypothesis that the time series are linearly correlated with Gaussian noise (Theiler et al., 1992). Also the Gaussian scaled random phase (GSRP) surrogate set can be generated to address a null hypothesis that the original time series is linearly correlated noise that has been transformed by a static, monotone nonlinearity (Rapp et al., 1993, 1994). GSRP surrogates are generated in a three-step procedure. At first a Gaussian set of random numbers is generated, which has the same rank structure as the original time series. After phase randomized surrogates of these Gaussian sets are constructed. Finally the rank structure of the original time series must be reordered according to the rank structure of the phase randomized Gaussian set (Theiler et al., 1992).
IDENTIFICATION OF COMPLEX PROCESSES
215
Generally these two methods of generation of surrogates are based on shuffling of the original data set but, in the case of Gaussian scaled random phase surrogates, controlled shuffles (Rapp et al., 1994) can give more precise and reliable results than the unstructured shuffles of the random phase surrogates. Commonly, for testing the null hypothesis, d2 is used as the discriminating metric. There are several ways to measure the difference between the discriminating metric measure of the original (Morig ) and the surrogate (Msurr ) time series. The most common measure of the significance of the difference between the original time series and the surrogate data is given by the cri terion S = Msurr − Morig /σsurr , where σsurr denotes the standard deviation of Msurr . The details of the procedure, as well as an analytic expression for S—the uncertainty in S, are described in (Theiler et al., 1992). Alternatively, Monte Carlo probability can be used, defined as PM = (number of cases M ≤ Morig )/(number of cases), where PM is an empirical measure of the probability that a value of Msurr will be less than Morig . It is particularly appropriate when the number of surrogates is small, or when the distribution of values of M obtained with surrogates is non-Gaussian (Rapp et al., 1994). For rejecting the null hypothesis, the Barnard and Hope nonparametric test (Rapp et al., 1993) can be used. With this criterion, the null hypothesis is rejected at a confidence level pc = 1/(Nsurr + 1), if Morig < Msurr for all surrogates. One serious real data analysis problem is the influence of noise. One often uses the so-called nonlinear noise reduction method (which in fact is phase space nonlinear filtering) instead of common linear filtering procedures. The latter, as is well known, may destroy the original nonlinear structure of analyzed complex processes (Hegger et al., 1999). Nonlinear noise reduction relies on exploration of the reconstructed phase space of the considered dynamical process instead of using frequency information of linear filters (Hegger et al., 1999; Kantz and Schreiber, 1997; Schreiber, 1999). As pointed out above, most methods of analysis need rather long and stationary data sets, which is commonly not typical for measured time series. This limitation gave a strong impetus to further development of new techniques to get an insight into the complex processes having relatively short and noisy observable time series. For this purpose several measures of complexity, mostly based on a symbolic dynamics approach, have been proposed. These include Renyi entropies, effective complexity, ε complexity, etc. (Rapp et al., 2001; Wackerbauer et al., 1994). Here we will not discuss all these methods but focus on the approach based on the study of attractors organization (or testing of topology of phase space images of unknown dynamics). This relatively new technique, oriented on the exploration of the phase space structure-image of dynamics, is the method of recurrence plots (RP) (Marwan, 2003). Let us recall here that if a dynamical system under investigation has any deterministic structure, an attractor appears in state space. As was already mentioned, an attractor is the set of points in phase
216
TEIMURAZ MATCHARASHVILI ET AL.
space towards which dynamical trajectories will converge. Again, recurrence is a fundamental property of nonrandom dynamical systems, which exponentially diverge under small disturbances although, after some time, return to a state that is arbitrarily close to a former state. Recurrence plots visualize such a recurrent behavior of dynamical system. Usually real processes are characterized by complex dynamics embedded in a high dimensional phase space. RP enables to investigate the structure of these high dimensional phase spaces through a two-dimensional representation of their recurrences. It is important to observe that the recurrence plot method is effective for even nonstationary and rather short time series (Gilmore, 1993, 1998).Generally recurrence plots are designed to locate hidden recurring patterns and structure in time series and are defined by the N × N symmetric matrix: →
→
Ri, j = (εi − x i − x j ), →
i, j = 1, ......, N ,
(4)
where x i, j are reconstructed phase space vectors, using Taken’s time delay method. Insofar as RP is based on Takens delay-coordinate embedding, the dynamical invariants of the true and reconstructed dynamics are identical when that embedding is correctly chosen. Therefore it is natural to assume that the RP of a correctly reconstructed trajectory bears similarity to an RP of → the true dynamics. In fact in (4) x i stands for the point in phase space at which the system is situated at time i, εi is a predefined cut-off distance and (x) is → the Heaviside function. The cut-off distance defines a sphere centered at x i . Recurrence of the phase space trajectory to a certain state is a fundamental property of deterministic dynamical systems (Argyris et al., 1994; Marwan et al., 2002). If the trajectory in the reconstructed phase space returns at time i → → into the -neighborhood of its location at time j (i.e. if x i is closer to x j than cut-off distance) Ri, j = 1 and these two vectors are considered to be recurrent. Otherwise Ri, j = 0. It is possible to visualize values of Ri, j by black and white dots (Marwan, 2003; Zbilut and Weber, 1992), but often the recurrence plot relates distances Ri, j to colors, e.g. the larger the distance, the “cooler” the color. Thus, the recurrence plot is a rectangular plot consisting of pixels whose colors correspond to the magnitude of data values in a two-dimensional array and whose coordinates correspond to the locations of the data values in the array. The black points indicate the recurrences of the investigated dynamical system, revealing its hidden regular and clustering properties. By definition RP is symmetric and has a black main diagonal (line of identity) formed by distances in the matrix compared with themselves. In order to understand RP it should be stressed that it visualizes a distance matrix which represents autocorrelation in the series at all possible time (distance) scales. As far as distances are computed for all possible pairs, elements near the diagonal on the RP plots correspond to short range correlation, whereas the long range
IDENTIFICATION OF COMPLEX PROCESSES
217
correlations are revealed by the points distant from the diagonal. Hence if the analyzed dynamics (time series) is deterministic (ordered, regular), then the recurrence plot shows short line segments parallel to the main upward diagonal. At the same time if dynamics is purely random, the RP will not present any structure at all. One of the crucial points in RP analysis is selection of cutoff distance ε or radius. If ε is selected too low, no recurrent point will be found. At the same time it cannot be set too high because every point by mistake can be assumed as recurrent. Exhaustive overviews on this subject can be found in (Zbilut et al., 1998; Marwan, 2003) etc. The primordial aim of RP testing was the visual inspection of structures located in high dimensional phase spaces where other methods are useless, especially when we deal with real data sets. The view of recurrence plots provides a unique possibility to observe time evolution patterns of phase space trajectories both at large and small scales. By the large scale patterns or topology the recurrence plots can be characterized as homogeneous (dynamics with uniformly distributed characteristics), periodic (dynamics with distinct periodic components), drift (dynamics with slowly varying parameters) and/or disrupted-intermittent (dynamics characterized by abrupt changes)(Marwan, 2003). At small scales patterns (or texture) of recurrence plots can be characterized as single dots, diagonal lines, vertical lines and horizontal lines. The exact recurrent dynamics generate long diagonal lines separated by a fixed distance. A large amount of single isolated scattered dots and the vanishing amount of lines is typical for heavily fluctuating dynamics under influence of non-correlated noises (in this case insufficient dimension of the embedding space is not excluded). The non-regular occurrence at short as well as at long diagonal lines is characteristic for low dimensional chaotic processes, and the non-regular occurrence of extended uniform areas corresponds to irregular high dimensional dynamics. In a more general sense the line structures in an RP exhibit the local time relationship between the current phase space trajectory segments. Stationarity of the whole time series requires that the density of line segments be uniform. RP was developed for single data sets but Zbilut et al. (1998) have expanded it by considering two different time series. The cross-recurrence between two → → series {xi } and {yi } is defined as C Ri, j = (εi − x i − x j ). Here, the two time series are embedded in the same phase space. The representation is analogous to the RP, and is called a cross-recurrence plot (CRP)(Marwan, 2003). Qualitative patterns of an unknown dynamics presented as a fine structure of RP or CRP often are too difficult to be considered in detail. Zbilut and Webber have developed a tool which quantifies the structures in RPs called Recurrence Quantitative Analysis (RQA)(Zbilut and Weber, 1992). They define measures using the recurrence point density, the length of diagonal and vertical structures in the recurrence plot, the recurrence rate, the entropy of recurrent points distribution etc. Presently at least 8 different statistical RQA characteristics
218
TEIMURAZ MATCHARASHVILI ET AL.
are known (Zbilut and Weber, 1992; Ivanski and Bradley, 1998; Marwan, 2003), though their practical meaning is not always quite clear. Computation of these measures in small windows moving along the main diagonal of the RP reveal the time dependent behavior of these variables, making possible the identification of unknown dynamical patterns in time series (Zbilut and Weber, 1992; Marwan et al., 2002). Here we briefly touch only on main RQA statistical values. The first of these statistics, termed recurrence (%REC ), is simply the percentage of points on the RP that are darkened or, in other words, those pairs of points whose spacing is below the εi —predefined cut-off distance. It quantifies the number of time instants characterized by a recurrence in the signals: the more periodic the signal dynamics, the higher the %REC value. The second RQA statistic is called determinism (%DET); it measures the percentage of recurrent points in an RP that are contained in lines parallel to the main diagonal. The main diagonal itself is excluded from these calculations because points there are trivially recurrent. Intuitively, %DET measures how “organized” an RP is. This variable discriminates between the isolated recurrent points and those forming diagonals. Since a diagonal represents points close to each other, %DET also contains information about the duration of a stable interaction: the longer the interactions, the higher the %DET value. The third often used RQA statistic, called entropy, is closely related to %DET. Entropy (ENT ) is the Shannon information entropy of a line distribution measured in bits and is calculated by binning the diagonal lines according to N their lengths and using the formula: ENT = − k=1 Pk log2 Pk where N is the number of bins and Pk is the percentage of all lines that fall into bin k. In other words Pk is defined as the ratio of the number of k-point-long diagonals to the total number of diagonals. ENT is measured in bits of information, because of the base-2 logarithm. Thus, whereas %DET accounts for the number of diagonals, ENT quantifies the distribution of the diagonal line lengths. The larger the variation in the lengths of the diagonals, the more complex the deterministic structure of the RP. A more complex dynamic requires a larger number of bits (ENT ) to be represented. The fourth RQA statistic, termed TREND, measures how quickly an RP goes away from the main diagonal. As the name suggests, TREND is intended to detect nonstationarity in the data. The fifth RQA statistics is called length of the maximal deterministic line (MAXLINE) and is equal to the longest line length found in the computation of DET. Eckmann, Kamphorst, and Ruelle claim that line lengths on RPs are directly related to the inverse of the largest positive Lyapunov exponent (Zbilut and Weber, 1992; Marwan , 2003). Small MAXLINE values are therefore indicative of randomlike behavior. In a purely periodic signal, lines tend to be very long, so MAXLINE is large. This completes our short overview of qualitative and quantitative methods of nonlinear time series analysis based on testing of phase space images.
IDENTIFICATION OF COMPLEX PROCESSES
219
3. Investigation of Dynamics of Complex Natural Processes and Laboratory Models The last two decades witnessed development of robust nonlinear methods of time series analysis for detection and identification of complex dynamical processes. The importance of qualitative and quantitative analysis of such complicated processes is well recognized in almost every field of science. Below we describe some of our results demonstrating the ability of the universal methods of time series analysis briefly discussed above to detect hidden dynamical properties in seemingly random processes. In order to avoid repetition we state that, from the classical linear data analysis point of view, almost all processes and time series considered below appear random. 3.1. DETECTION OF NONRANDOM NONLINEAR STRUCTURE IN GEOPHYSICAL PROCESSES
Significant variability, both in time and space, makes the problem of identification and quantification of complex geophysical phenomena extremely difficult. There are no mathematical models formulated (and found appropriate) for describing space time dynamics of these phenomena; all attempts to formulate a “universal” mathematical model for describing geophysical phenomena have proven futile (Sivakumar et al., 2002). Therefore it is accepted that the only way to understand dynamical features of complex geophysical processes is through the analysis of measured data sets using modern nonlinear methods. 3.1.1. Dynamical Structure of Seismicity Earthquakes are one of the strongest natural calamities, causing a great human and material losses. The earthquake in China in 1976 took 255000 human lives, Armenia (1988)—25 000, Iran (1990)—50 000 etc. Earthquakes are the expression of the continuing evolution of the planet earth, more precisely of the deformation of its crust. According to modern plate tectonics, the Earth’s lithosphere is mobile and is divided into several blocks (plates) that are forced to move due to convection in the underlying viscous asthenosphere. The relative motion of these plates results in accumulation of stresses on their boundaries. These stresses can be released by a fast slip (earthquakes) or by seismic creep. Most strain energy accumulated in the fault is transformed into heating, deformation and destruction of rocks. Only a small part (1 to 10%) is converted into seismic waves that propagate in the lithosphere. The energy radiated during a moderate earthquake is comparable with that of several large nuclear explosions. The lithosphere is an extremely complex system from the point of view of both its structure and processes
220
TEIMURAZ MATCHARASHVILI ET AL.
therein (Turcotte, 1992; Korvin, 1992; Kagan, 1994; Keilis-Borok, 1994). Dynamics of seismic processes is viewed as extremely complicated, so that the level of “turbulence” of the lithosphere exceeds that of the atmosphere (Kagan, 1994, 1997). During more than one hundred years of instrumental observations several important characteristics of spatial, temporal and energetic distributions of earthquakes have been revealed (Scholz, 1990; Turcotte, 1992; Keilis-Borok, 1994; Goltz, 1998; Matcharashvili et al., 2000; Rundle et al., 2000). Nevertheless the character of dynamics of seismic processes remains the subject of intense discussions because it is directly related to the problem of earthquake prediction. Critics of earthquake prediction (Kagan, 1994, 1997; Geller, 1999; Kanamori and Brodsky, 2001), etc. regard seismic process as completely random, while its proponents consider it as complex and high-dimensional though not random (Main, 1997; Wyss, 1997; Chelidze and Matcharashvili, 2003; Knopoff, 1999), etc. Indeed, a completely random process is unpredictable on any spatial and temporal scales. On the other hand, for processes with nonrandom dynamical structure there always exist specific spatial and temporal scales at which systems behavior is deterministic, i.e. predictable. From this point of view a nonrandom seismic process can not be regarded as completely unpredictable. Of course it is clear that predictability in this sense does not necessarily mean a “real” forecast of every hazardous event for practically meaningful time scales. At present, the evidence of a nonrandom structure of seismicity has mainly scientific importance because it gives ground to looking for predictive markers. This is also important for testing modern ideas on possible control of practically unpredictable seismic processes. To bring some light in this field let us consider seismic processes from the point of view of their dynamic structure. As mentioned above one of the most popular approaches to the problem of identification of patterns of complex dynamics, including seismicity, is based on the evaluation of the nonlinear structure (or what is the same, nonlinear structure of the appropriate time series)(Rapp et al., 1993; Theiler et al., 1992). In this way it is possible to achieve reliable detection of dynamical regime(s) of seismic processes by calculation of their measurable characteristics. These characteristics can also be calculated prior to and after strong earthquakes. This is important in the search for possible earthquake predictive dynamical markers. In order to answer the above question on nonlinear structure in earthquake generation (or in other words its predictability) it is necessary to investigate dynamical properties of seismic processes in all three characteristic domains: energetic, spatial and temporal. For this purpose “time series” of inter-event time intervals (waiting time), magnitude sequences and inter-event distances of earthquakes in the entire Caucasian region have been used. All these time series were taken from the earthquake catalogue for the Caucasus and the adjacent territories for the 1962–1993 time period (Seismological Data Base of Institute of Geophysics,
IDENTIFICATION OF COMPLEX PROCESSES
221
Figure 1. Typical plot of correlation dimension d2 versus embedding dimension p for the Caucasus, the Greater Caucasus, and the Javakheti region, magnitude (middle curve) and interevent time intervals (lower curve) sequences. The upper curve represents the random numbers set.
Tbilisi, Georgia). It was shown (see Figure 1) that despite the fact that the size and temporal distribution of earthquakes both obey a power law, they are dynamically quite different. The magnitude distribution of earthquakes in the Caucasian region is undoubtedly high-dimensional, d2 as a rule is larger than 8 (d2 > 5 is the high dimensionality threshold)(Sprott and Rowlands, 1995). According to our results as well as to the reports of other authors (Korvin, 1992; Smirnov, 1995) the fractal dimension for the distribution of inter earthquake distances is low (d2 < 2). Most interesting is that the waiting time interval distribution reveals an obviously low dimensional nonlinear structure (d2 of the order of 1.6–2.5 and positive λmax of the order of 0.2–0.7), although it can not be recognized as a deterministic chaos (Matcharashvili et al., 2000)(see Figure 1). This low dimensionality of earthquakes’ temporal distribution is in complete agreement with earlier results for other parts of world (Goltz, 1998). After the low dimensional temporal distribution iof earthquakes is found it is important to reveal the character of this dynamics before and after strong earthquakes. So as a next step on the way to a better understanding of the underlying dynamics of earthquake generation, we have undertaken comparison of the properties of waiting time distribution before and after large events. For this purpose we have considered waiting time sequences of a seismic catalogue separately before and after the largest events, using the above-mentioned tests such as correlation dimension and Lyapunov exponent calculation as measures of non-linearity. According to our results the general properties of dynamics of earthquakes’ temporal distribution before and after the largest regional events does not indicate a qualitative difference from the integral dynamics obtained by consideration of time series from the complete original catalogues (see Figure 2). Indeed, correlation dimensions of all considered waiting time
222
TEIMURAZ MATCHARASHVILI ET AL.
Figure 2. Variation of 1000 time interval width sliding windows correlation dimension—d2 along Paravani earthquake region inter-event time interval sequences at 50 event steps.
sequences from the original catalogue (containing all independent events and aftershocks above the threshold magnitude), both preceding and following the largest events in the Caucasus, converge to a limit value. At the same time it is important that these values are not coinciding. Consequently, as long as all investigated time series have correlation dimension lower than the low dimensional threshold (d2 < 5) see also (Goltz, 1998), it can be deduced that earthquakes’ temporal distribution is characterized by a low dimensional dynamics before, as well as after the largest regional event. At the same time in the energetic domain earthquakes’ magnitude distribution remains high dimensional before and after strong events. As stressed above, results of dimensional calculations, especially when a low-dimensional process is detected, should be verified using special methods. While testing low-dimensional waiting time sequences for deterministic chaos we have the typical problems always encountered in testing real, usually short and noisy time series. In order to overcome discrimination problems, as in the case of high-dimensional processes, one has to test time series for evidence of nonlinearity (Theiler and Prichard, 1997). One additional reason why this approach has become popular is that, from the practical point of view, the goal of detecting nonlinearity in low dimensional data is easier than a confident identification of chaotic dynamics (Theiler et al., 1992). It was found that, in all cases, time interval sequences obtained from the original catalogue above threshold magnitudes before and after the largest events reveal evidence of a nonlinear structure. In other words, the null hypothesis that these sequences are generated by linearly correlated noise
IDENTIFICATION OF COMPLEX PROCESSES
223
or by static monotone nonlinearity should be rejected. The significance of differences of S-measure of natural sequences before and after the earthquakes considered from the appropriate phase randomized (SPR ) and Gaussian scaled random phases (SGSRP ) surrogates are significant at the p < 0.005 confidence level; thus the significance of differences for waiting time sequences before and after the Dagestan (M = 6.6) earthquake are SPR = 55.6 ± 0.27, SGSRP = 15.9 ± 0.20 and SPR = 50.5 ± 0.15, SGSRP = 17.1 ± 0.13; for the Paravani (M = 5.6) earthquake SRP = 51.1 ± 0.21, SGSRP = 16.2 ± 0.13 and SPR = 64.2 ± 0.27, SGSRP = 11.5 ± 0.17; for the Spitak (M = 6.9) earthquake SPR = 49.2 ± 0.12, SGSRP = 11.4 ± 0.12 and SPR = 52.2 ± 0.27, SSRP = 15.2 ± 0.19; for the Racha (M = 6.9) earthquake SPR = 57.6 ± 0.23, SGSRP = 16.3 ± 0.23 and SPR = 51.5 ± 0.17, SGSRP = 18.4 ± 0.11. To understand the above differences in the correlation dimension values before and after the largest earthquakes, we used a sliding window technique. We considered the sequence of 6695 events of the Paravani earthquake’s inter-event time intervals. Here N o = 5300 is the ordinal number of the time interval, which directly preceded the largest earthquake. We have calculated d2 for 1000 event sliding windows with a step of 50 events starting with event N o = 3200 up to event N o = 5800. Hence the first window consists of time interval sequences between earthquakes in the range 3200 to 4200. As shown in Figure2, values of d2 decrease for the windows following the largest event. The decrease begins when a sliding window contains about 20 inter-event time intervals after the largest event and becomes significant when 40 to 50 such events are included in the sequence. Note that the window 4310 to 5310, like the window 4300 to 5300, reveals the background value of a correlation dimension for waiting time sequences before the largest earthquake. It seems doubtful that such an essential change in the dynamical properties of the considered sequence can be caused by the addition of so few new data, unless there is a hidden regularity in a sufficiently long waiting time sequence containing data preceding the largest event. These results indicate that measuring of complexity of seismic time series may provide markers having precursory meaning and in the future may help to elaborate earthquake prediction tests (Matcharashvili et al., 2002). Thus it is clear that seismicity in two domains (temporal and spatial) out of three (energetic, temporal and spatial) reveals low dimensional nonlinear structure. This and similar results obtained in the last decade lead to understanding that, in spite of extreme complexity, the processes related to the earthquake generation are characterized by some internal dynamical structure and thus are not completely random (Smirnov, 1995; Goltz, 1998; Rundle et al., 2000; Matcharashvili et al., 2002). Despite the proofs that seismic activity is a non-random process, the physics of the internal or external factors involved is still poorly understood; but it can be
224
TEIMURAZ MATCHARASHVILI ET AL.
asserted that the general problem of earthquake prediction and/or earthquake triggering, one of the most challenging targets of current science, should not be considered as an “alchemy of present time” (Geller, 1999). In other words, the quest for earthquake predictive markers or triggering factors should be recognized as an obviously difficult, but scientifically well-grounded task related to the search for determinism in the complex seismic process. 3.1.2. Detection of Changes in Earthquake Dynamics Caused by External Anthropogenic Influences As discussed above, dynamics of earthquake related processes in the earth’s crust are nonrandom, having low and/or high dimensional structures. One sign of such processes, in nonrandom systems which are close to the critical state, is their high sensitivity to initial conditions as well as to relatively weak external influences. This general property of complex systems acquires special significance for practically unpredictable seismic processes. Indeed, insofar as we cannot govern initial conditions of lithospheric processes, even the possibility of control of seismic processes has immense scientific and practical importance (e.g. to induce by specific external influences the release of accumulated seismic energy via a series of small or moderate earthquakes instead of one strong devastating event). To begin to understand such possible control we investigate the dynamics of seismic processes and modeling natural seismicity in laboratory systems. According to recent investigations, Earth’s crust in seismically active regions can be in the critical state or close to it (Bak et al., 1988; Scholz, 1990). This can explain the known phenomena of effectiveness of relatively small influences (such as tidal variations, filling large reservoirs, pumping of water into boreholes etc. (Sibson, 1994), adding an insignificant contribution to the existing tectonic strains, on seismic process. In the experiments, initially aimed towards finding resisitivity precursors of strong earthquakes in the upper layers of earth’s crust by MHD-sounding, an unexpected effect of microseismicity activation after these discharges has been discovered in the Bishkek, Central Asia test area (Tarasov, 1997). Experiments demonstrating the triggering effect of MHD (magnetohydrodynamic) soundings on the microseismic activity of the region have been performed in 1975 to 1996 by IVTAN (Institute of High Temperatures of the Russian Academy of Sciences) in the Central Asia test area (Tarasov, 1997; Jones, 2001). During these experiments deep electrical sounding of the crust was carried out at the Bishkek test site from 1983 to 1989. The source of electrical energy was an MHD generator, and the load was an electrical dipole of 0.4 resistance with electrodes located at a distance of 4.5 km from each other. When the generator was fired, the load current was 0.28 to 2.8 kA, the sounding pulses had durations of 1.7 to 12.1 s, and the energy generated was mostly in the range of 1.2 to 23.1 MJ (Volykhin et al., 1993). In order
IDENTIFICATION OF COMPLEX PROCESSES
225
Figure 3. Correlation dimension versus embedding dimension of waiting times sequences: asterisks—integral time series (1975–1996), circles-before beginning of experiment (1975– 1983), squares—during experiments (1983–1988), triangles—after experiments (1988–1992), diamonds correspond to random number sequence.
to test the possibility of external anthrophic impact on seismic regime the dynamics of temporal distribution of earthquakes around the test area was analyzed. For this purpose sequences of time intervals in seconds between consecutive earthquakes, compiled from the seismic catalogue of the Institute of Physics of Earth (Moscow), were investigated using tools of nonlinear time series analysis (Chelidze and Matcharashvili, 2003). The main goal was to find out whether strong EM discharges may lead to changes in dynamics of temporal distribution of seismicity. The time period before beginning the experiment (1975–1983), the time period of cold and hot runs (1983– 1988) and the time period immediately after accomplishment of experiments (1988–1992), as well as the time period long after the experiment (1992–1996) were considered separately. Interevent time sequences, corresponding to these periods, have approximately equal lengths (about 3660 events). The integral time series (5297 time intervals) for the whole period of observation (1975– 1996) containing waiting time sequences between all events above the threshold reveals clear low correlation dimension (d2 = 1.22 ± 0.43) (Figure 3). Shorter time series also were considered. Namely 1760 waiting times before (1975–1983), 1953 waiting times during MHD experiments (1983–1988) and 1584 waiting times of the period after experiments (1988–1992). Time series before and especially during MHD experiments also have low correlation dimension (d2 < 5). Namely d2 = 3.83 ± 0.80 before and d2 = 1.04 ± 0.35
226
TEIMURAZ MATCHARASHVILI ET AL.
Figure 4. Qualitative RP analysis of temporal distribution of earthquakes (M > 1.7) before the beginning of EM experiments (1975–1983), during experiments (1983–1988) and after experiments (1988–1992). Recurrence plots analysis of waiting times sequences: (a) before experiments, (b) during experiments, (c) after experiments.
during experiments. On the other hand, opposite to what was mentioned above, after cessation of experiments (Figure 3, triangles) the correlation dimension of waiting times sequences noticeably increases (d2 > 5.0), exceeding the low dimensional threshold (d2 = 5.0). That means that after termination of experiments the extent of regularity or extent of determinism in the process of earthquake temporal distribution decreases. The process becomes much more random both qualitatively (Figure 4) and quantitatively (Figure 3 triangles). For comparison, in Figure 3, results for random number sequence are shown too (diamonds). The found low correlation dimension of integral waiting time series in central Asia is in good agreement with the above results on low dimensional dynamical structure of earthquake temporal distribution for other seismoactive regions (Goltz, 1998; Matcharashvili et al., 2000). On the other hand Figure 3 shows that a relatively short ordered sequence can decrease correlation dimension of a long integral time series, which contains much larger sequences of high dimension. Thus the low value of d2 of a long time series does not mean that the whole sequence is ordered—ordering can be intermittent. The data on correlation dimension, together with qualitative RP results shown in Figure 4, provide evidence that after the beginning of EM discharges the dynamics of temporal distribution of earthquakes around IVTAN test area undergoes essential changes; it becomes more regular, or events of corresponding time series become functionally much more interdependent. These results were tested against the possible influence of noise using a noise reduction procedure (Schreiber, 1993; Kantz and Schreiber, 1997) as well as against trends or non-stationarity in interevent data sets (Goltz, 1998). The tests confirm our conclusions. Thus the changes before, during and after experiments (Figure 3) are indeed related to dynamics of the temporal
IDENTIFICATION OF COMPLEX PROCESSES
227
distribution of earthquakes caused by external anthropic influence (MHD discharges)(Chelidze and Matcharashvili, 2003). Subsequently, in order to have a basis for a more reasonable rejection of spurious conclusions caused by possible linear correlations in the data sets considered, we have used the surrogate data approach to test the null hypothesis that our time series are generated by a linear stochastic process (Theiler et al., 1992; Rapp et al., 1993, 1994; Kantz and Schreiber, 1997). Precisely, PR and GSRP surrogates sets for the waiting times series were used (Chelidze and Matcharashvili, 2003). Surrogate testing of waiting time sequences before (a) and during (b) experiments using d2 as a discriminating metric for each of our data sequences has been carried out. 75 of PR and GSRP surrogates have been generated. The significance criterion S for analyzed time series before experiments is: 22.4 ± 0.2 for PR and 5.1 ± 0.7 for GSRP surrogates. After beginning of experiments the null hypothesis that the original time series is linearly correlated noise was rejected with significance criterion S: 39.7 ± 0.8 for PR and 6.0 ± 0.5 for GSRP surrogates. These results can be considered as strong enough evidence that the analyzed time series are not linear stochastic noise. The above conclusion about the increase of regularity in an earthquake’s temporal distribution after beginning of experiments (external influence on seismic process) is confirmed also by results of RQA; namely RR(t) = 9.6, DET(t) = 3.9 before experiments, RR(t) = 25, DET(t) = 18 during and RR(t) = 3, DET = 1.5 after experiments. It was shown in our previous research that small earthquakes play a very important role in the general dynamics of earthquake temporal distribution (Matcharashvili et al., 2000). This is why we also have carried out the analysis of time series containing all data available from the entire catalogue of waiting time sequences, including small earthquakes that are below the magnitude threshold. This test is also valid for checking the robustness of results in the case of adding a new, not necessarily complete, set of data to our original set. The total number of events in the whole catalogue increased to 14,100, while in the complete catalogue for all three above-mentioned periods (before, during and after MHD experiments) there were about 4000 data in each one. Both the complete and whole catalogues of waiting time sequences reveal low dimensional nonlinear structure in the temporal distribution of earthquakes before and especially during experiments. This means that our results on the influence of hot and cold EM runs on the general characteristics of earthquakes’ temporal distribution dynamics remain valid for small earthquakes too. Thus the conclusion drawn from this analysis is that the anthropogenic external influence, in the form of strong electromagnetic pulses, invokes distinct changes in the dynamics of earthquakes’ temporal distribution—dynamical characteristics of regional seismic processes are changed. Though energy released during these experiments
Figure 5. (a) Variation of water level in Enguri high dam reservoir above sea level in 1978–1995. (b) RQA %DET of daily number of earthquakes calculated for consecutive one year sliding windows, (c) Cumulative sum of released daily seismic energy. (d) RQA %DET of magnitude (black columns) and interearthquake time interval (grey columns) sequences (1) before impoundment, (2) during flooding and reservoir filling and (3) periodic change of water level in reservoir.
228 TEIMURAZ MATCHARASHVILI ET AL.
IDENTIFICATION OF COMPLEX PROCESSES
229
was very small compared to the energy in even small earthquakes, it still can be considered as a strong enough man made impact. It was recently found that it is possible to control the dynamics of complex systems through small external periodic influences. In this respect we have investigated the possible influence of water level periodic variation on dynamics of regional seismic activity. For this purpose data sets of the water level variation in the Western Georgia Enguri high dam reservoir and the seismicity data sets for the surrounding area for 1973–1995 were investigated. The height of the dam is 272 m, the (average) volume of water in the reservoir 1.1 × 109 m3 . The Enguri reservoir was built in 1971–1983. Preliminary flooding of the territory started at the end of December 1977; since 15 April 1978 the reservoir was filled step by step to a 510 m mark (above the sea level). Since 1987 the water level in the reservoir has been changing seasonally, almost periodically. Thus we have defined three distinct periods for our analysis, namely, (i) before impoundment, (ii) flooding and reservoir filling and (iii) periodic change of water level. Figure 5a shows the daily record of the water level in the Enguri dam reservoir for the period 1978–1995. The relevant size of the region to be investigated, i.e. the area around the Enguri high dam which can be considered sensitive to the reservoir influence (90 km), was evaluated based on the energy release acceleration analysis approach (Bowman et al., 1998) as it is described in (Peinke, et al., 2006). The number of earthquakes which occurred above the representative threshold (1200) was too small to carry out a correct correlation dimension and Lyapunov exponent calculation for these three periods. So the RQA calculation was carried out. It was shown that when the external influence on the earth’s crust caused by reservoir water becomes periodic the extent of regularity of the daily distribution of earthquakes essentially increases (see Figure 5a and b). It is also clear that during flooding and nonregular reservoir filling the amount of released seismic energy increases (Figure 5a and c) in complete accordance with well-known concepts of reservoir-induced seismicity (Talwani, 1997; Simpson et al., 1988). It is important to mention that the influence of an increasing amount of water and its subsequent periodic variation essentially affects the character of earthquake’s magnitude and temporal distribution (see Figure 5d). In particular the extent of order in earthquake’s magnitude distribution substantially increases when an external influence becomes periodic (black columns). At the same time dynamics of earthquake’s temporal distribution changed even under irregular influence, though not so much as under periodic influence. Similar conclusions are drawn from other RQA measures. All these results indicate that a slow periodic influence (loading – unloading) may change dynamical properties of local seismicity. Assuming the possibility of control of dynamics of seismic processes under small periodic influences of the water level variation, the following question might arise: if there exists the control
230
TEIMURAZ MATCHARASHVILI ET AL.
caused by periodic reservoir loading, why do the earthquakes correlate weakly with the solid earth tides (see Vidale et al., 1998; Beeler and Lockner, 2003). A possible explanation is that the control strength depends not only on the amplitude of forcing, but also on its frequency. If both parameters are varied, the synchronization area between external periodic influence and seismic activity forms the so-called Arnold’s tongue with some preferred frequency that needs minimal forcing (Pikovsky et al., 2003; Chelidze et al., 2005). At larger or smaller frequencies the forcing needed for achieving synchronization increases drastically and at large deviations from the optimal frequency the synchronization becomes impossible. According to (Beeler and Lockner, 2003) the daily tidal stress is changing too fast in comparison with the characteristic time of the moderate seismic event nucleation, which is of the order of months or years. On the other hand, the frequency of periodic loading exerted by the reservoir exploitation is one year and that is close to the characteristic time of significant earthquake preparation. Therefore, the synchronization effect of reservoir periodic loading of seismic events can be accepted as a realistic physical mechanism for related dynamical changes. 3.2. DETECTION OF DYNAMICAL CHANGES IN LABORATORY STICK-SLIP ACOUSTIC EMISSION UNDER PERIODIC FORCING
The above results provide serious arguments that, under external periodic influence, the dynamics of natural seismicity can be changed. This is very important from both the scientific and engineering points of view. At the same time, real field seismic data often are too short and incomplete to draw unambiguous conclusions on related complex dynamics. Therefore, in order to test our field data results, similar analysis on the acoustic emission data sets, during stick-slip, has been carried out (Chelidze and Matcharashvili, 2003; Chelidze et al., 2005). Acoustic mission accompanying stick-slip experiments is considered as a model of a natural seismic process (Johansen and Sornette, 1999; Rundle et al., 2000). Our laboratory setup consisted of two samples of roughly finished basalt. One of the samples (plates),the lower one, was fixed; the upper one was pulled with constant speed by the special mover, connected to the sample by the rope-spring system; acoustic and EM emissions accompanying the slip were registered. The following cases were studied: i) pulling the sample without any additional impact; ii) the slip with applied additional weak mechanical (periodic) impact; iii) slip with applied periodic EM field. Thus, experiments have been carried out under or without periodic mechanical or electromagnetic (EM) forcing, which simulates the external periodic influence. If the large pulling force can be modulated by a weak periodic force of EM or mechanical nature, this could show high sensitivity of critical or “nearly critical” systems to small external impact. As was mentioned the
IDENTIFICATION OF COMPLEX PROCESSES
231
Figure 6. Dynamical changes in temporal distribution of acoustic emission under increasing external periodic forcing.
aim was to prove experimentally the possibility of controlling the slip dynamical regime by a weak mechanical or EM impact. The elementary theoretical model of EM coupling with friction can be formulated in the following way. It is well known that application of an EM field to a dielectric invokes some forces acting upon molecules of the body; the resultant of them is called the ponderomotive or electrostriction force Fp that is affecting the whole sample. The force is proportional to the gradient of the field intensity squared and it carries away the sample in the direction of the largest intensity. Quite different regimes of time distribution of maximal amplitudes of acoustic emission were observed depending on the intensity of the applied external weak (relative to the pulling force) perturbations. As shown in Figure 6, increased periodic forcing first leads to a more regular temporal distribution, while further increase decreases the extent of regularity. This experiment confirms the above field results on the possible controlling effect of external periodic influence on the seismic process. 3.2.1. Changes in Dynamics of Water Level Variation During Increased Regional Seismic Activity As the next example of using the phase space structures testing methods of dynamical changes detection in earthquake related natural processes, we present results of our investigation of water level variation in deep boreholes. Variation of water level in deep boreholes is caused by a number of endogenous and exogenous factors.
232
TEIMURAZ MATCHARASHVILI ET AL.
Figure 7. Variation of monthly extent of regularity in water level data sets, observed in deep boreholes calculated for 720 windows at 720 data steps. Triangles—Akhalkalaqi borehole, Circles—Lisi borehole, and asterisks—Kobuleti borehole.
The most important factor is the strain change in the upper Earth crust. Deep boreholes represent some kind of sensitive volumetric strainmeters, where the water level is reacting to the small ambient deformation. Hence, water level variations obviously will also reflect the inherent response of the aquifer to the earthquake related strain redistribution in the earth’s crust (Kumpel, 1994; Gavrilenko et al., 2000). Therefore investigation of water level variations in deep boreholes may provide additional understanding of dynamics of processes related to earthquake preparation in the earth’s crust (King et al., 1999). We have investigated dynamics of water level variation in deep boreholes in Georgia. Analysis was carried out on water level hourly time series from 3 (about 2000 m depth) boreholes: Lisi (44.45 N, 21.45 E), Akhalkalaki (43.34 N, 41.22 E), and Kobuleti (41.48 N, 41.47 E) (01.03.1990– 29.02.1992). Several strong seismic events took place during this time period in the Caucasus. Namely, M = 6.9 (29.04.1991) M = 6.3 (15.06.1991) and M = 6.3 (23.10.1992) earthquakes. The aim was to answer the question whether strain redistribution in the earth’s crust related to earthquake preparation may lead to dynamical changes of water level variation in deep boreholes.As it is shown in Figure 7 in most cases regularity of variation of
IDENTIFICATION OF COMPLEX PROCESSES
233
water level in deep boreholes essentially decrease several months prior to a strong earthquakes. At the same time the extent of such decrease is different for different boreholes and obviously depends on the geological structure of area. 3.3. DETECTION OF DYNAMICAL CHANGES IN EXCHANGE RATE DATA SETS
Evidence of increased interest of economists in the field of investigation of complex processes has often been demonstrated during the last decade (Bunde et al., 2002). In fact, though economists have postulated the existence of economic cycles, still there are serious problems in the practical detection and identification of different types of behavior of economic systems (Hodgson, 1993; McCauley, 2004). Here we show results of our analysis of time series of the Georgian Lari—US Dollar exchange rate. Data sets were provided by the National Bank of Georgia for the 03.01.2000–31.12.2003 time period. This 7230 data length time series consists of GEL/USD exchange rate values calculated 5 times per day. It follows from our analysis that the dynamics of the GEL/USD exchange rate is characterized by a clear internal structure. Indeed, as seen in Figure 8a, the recurrence plot of GEL/USD exchange rate time series reveals visible structure in the phase point distribution. The same time series after dynamical structure distortion (data shuffling), PR and GSRP procedures do not reveal visible structure (see e.g. Figure 8b). We conclude that the revealed dynamical nonrandom structure of the analyzed GEL/USD exchange rate time series is an inherent characteristic and is not caused by linear effects. Quantitatively, GEL/USD
Figure 8. Recurrence plots of (a) original and (b) GSRP, GEL/USD exchange rate time series.
234
TEIMURAZ MATCHARASHVILI ET AL.
Figure 9. RQA metrics of Georgian Lari/US Dollar exchange rate time series. Percentage of recurrence points in 7 days span sliding windows of original (solid line) and phase randomised time series (thin line). Grey peaks corresponds to percentage of determinism in original time series.
exchange rate time series were analyzed using shifted, 7 days span, sliding windows. Quantitative analysis of the recurrence plot structure shows that the dynamics of GEL/USD exchange rates is a complex process with a variable share of deterministic components (Figure 9). The approximate period of this variation is one year (see Figure 9). The maximal extent of GEL/USD exchange rate regularity was detected in the beginning of 2001. 3.4. DETECTION OF DYNAMICAL CHANGES IN ARTERIAL PRESSURE DATA SETS
Detection and identification of dynamical aspects of physiological processes remain two of the main challenges of current clinical and experimental medicine. In this regard significant attention was paid to changes in heart dynamics under different cardiac conditions (Elbert et al., 1994; Pikkujamsa et al., 1999; Weiss et al., 1999; Bunde et al., 2002). Physiological time series of different origin have been investigated. For previous years, based on the correlation integral approach, it was suggested that there is some evidence of nonlinear structure in normal heart dynamics (Lefebvre et al., 1993; Govindan et al., 1998). Moreover it has been established for physiological data sets of different origin that a high degree of variability is often a feature of young healthy hearts, and that an increase in variability of physiological characteristics is common for aging and diseases (Zokhowski et al., 1997; Pikkujamsa et al., 1999). According to the demands of the GPA algorithm, these results were
IDENTIFICATION OF COMPLEX PROCESSES
235
obtained for long data time series, e.g. 24-h Holter monitor recordings. At the same time, despite a number of important findings, from the practical point of view the method of nonlinear analysis of medical time series discussed above is problematic. This is conditioned by problems related to the absence of stability in the obtained data sequences, i.e., the measured signal depends on the patients’ emotional and physical condition at a given moment (Zokhowski et al., 1997; Zhang and Thakor, 1999). Due to the difficulty of obtaining long time series the RQA method became increasingly popular among physiological researchers. In the present study time series of indexes of the myocardial contractile function have been investigated, including time series of maximal velocity of myocardial fibers circular contraction, time of intraventricular pressure increase, mean velocity of myocardial fibers circular contraction and maximal rate of intraventricular pressure. These time series consist of apex cardiograph records (Mason et al., 1970; Antani et al., 1979). A total 120 adult males were studied, including: 30 healthy subjects, 30 patients with first, 30 with second, and 30 with severe third stage of arterial hypertension (according to the classification of 1996). These time series correspond to a composite set of concatenated subsequences containing 15 to 20 observables (calculated contractile indexes) for each of 30 people relevant to the same group. These subsequences form a total of 500 data length time series for separate indexes. A similar approach of multivariable reconstruction was successfully applied earlier for the calculation of correlation dimension of different composite time series (Elbert et al., 1994; Rombouts et al., 1995; Yang et al., 1994). Besides systolic and diastolic pressure, the heart rate variability in healthy subjects and patients with different stages of arterial hypertension were investigated. The analyzed time series consist of 24 h ambulatory monitoring of blood pressure recordings from 160 patients. Their age was between 30 and 70. The patients under study were not given medicines for 2–3 days preceding the examination. Blood pressure recording was carried out in a calm environment, in the sitting position according to the standard method provided by hypertension guidelines. The 24 hr monitoring of blood pressure was carried out from 11:00 a.m. to 11.00 a.m. of the next day, taking into consideration the physiological regime of the patients. Intervals between the measurements was 15 min. Systole, diastole, average tension as well as heart rate were defined. The analyzed multivariable time series were compiled as consecutive sequences of appropriate data sets of each patient from the considered groups. Integral time series contain about 1300 data for each healthy and pathological group analyzed. As seen in Figure 10, the extent of regularity of time series of myocardial indeces increases in pathology. A similar result was obtained for systolic pressure time series (Figure 11 diamonds). An increase of regularity is also visible for heart rate time series, although this is not so essential. At the same time the dynamics of diastolic pressure was practically unchanged.
236
TEIMURAZ MATCHARASHVILI ET AL.
Figure 10. RQA %DET of indexes of myocardial contractile function time series versus the state of health. Diamonds—maximal velocity of myocardial fibers circular contraction, squares—time of intraventricular pressure increase, triangles—mean velocity of myocardial fibers circular contraction. Asteriscs - maximal rate of intraventricular pressure.
It is important to mention that time series used were multivariable as they contain data from different patients of the same physiologic group. Moreover myocardial indeces and arterial pressure data sets were taken from different groups. Taking into account all these facts it is very important that dynamical changes with the extent of pathology increase are similar. In other words, in almost all cases considered dynamics of physiological processes in pathology became more regular than in healthy conditions. These results are in accord with our and other authors’ results (Elbert et al., 1994; Garfinkel et al., 1997; Pikkujamsa et al., 1999; Weiss et al., 1999; Matcharashvili and Janiashvili, 2001) and show that nonlinear time series analysis methods enable the detection of dynamical changes in complex physiological processes. 4. Conclusions Modern methods of qualitative and quantitative analysis of the complexity of natural processes are able to detect tiny dynamical changes imperceptible with classical data analysis methods. In this chapter the main principles of modern time series analysis methods, based on reconstructed phase space
Figure 11. (a) RQA %DET and (b) Laminarity of blood pressure time series versus the state of health. Diamonds—systolic pressure, squares—diastolic pressure, triangles—heart rate.
IDENTIFICATION OF COMPLEX PROCESSES
237
238
TEIMURAZ MATCHARASHVILI ET AL.
structure testing, have been briefly described. It was shown, using data sets of different origins, that correctly using these methods may indeed provide a unique opportunity for qualitative detection and quantitative identification of dynamical peculiarities of complex natural processes. Acknowledgements Authors would like gratefully acknowledge director of NATO ASI “Imaging for Detection and Identification”, Dr. James S. Byrnes for manuscript corrections and overall support. We also wish to thank Dr. Mikheil Tsiklauri from Tbilisi State University for technical assistance. References Abarbanel, H.D.I., Brown, R., Sidorowich., and Tsimring, L. S. (1993) The Analysis of Observed Chaotic Data in Physical Systems, Rev. Mod. Phys. 65 (4), 1331–1392. Antani, J.A., Wayne, H.H., and Kuzman, W.J. (1979) Ejaction Phase Indexes by Invasive and Noninvasive Methods: An Apexcardiographic, Echocardiographic and Ventriculographyc Correlative Study, Am. J. Cardiol. 43(2), 239–247. Arecchi, F.T., and Fariny, A. (1996) Lexicon of Complexity, ABS. Sest. F. Firenze. Argyris, J. H., Faust, G., and Haase, M. (1994) An Exploration of Chaos, North-Holland, Amsterdam. Bak, P., Tang, C., and Wiesenfeld, K. (1988) Self-organized Criticality, Phys. Rev. A 38, 364– 374. Beeler, N.M. and Lockner, D.A. (2003) Why Earthquakes Correlate Weakly with the SolidEarth Tides: Effects of Periodic Stress on the Rate and Probability of Earthquake Occurrence, J. Geophys. Res., ESE. 108(8), 1–17. Bennett, C. H. (1990) in Complexity, Entropy and the Physics of Information, ed., AddisonWesley, Reading MA. Berge, P., Pomeau. Y., and Vidal, C. (1984) Order within Chaos, J. Wiley, NY. Bhattacharya, J. (1999) Search of Regularity in Irregular and Complex Signals, Nonlinear Dynamics PhD, Indian Institute of Technology, India. Boffetta, G., Cencini, M., Falcioni, M., and Vulpiani, A. (2002) Predictability: A Way to Characterize Complexity, Phys. Rep. 356, 367–474. Bowman D., Ouillon G., Sammis C., Sornette A., and Sornette, D. (1998) An Observational Test of the Critical Earthquake Concept, J. Geophys. Res. 103, 24359–24372. Bunde, A., Kropp, J., and Schellnhuber H. J. (2002) The Science of Disasters, Climate Disruptions, Heart Attacks, Market Crashes, ed., Springer, Heidelberg. Casdagli, M.C. (1997) Recurrence Plots Revisited, Physica D 108, 12–44. Castro, R. and Sauer, T. (1997) Correlation Dimension of Attractors through Interspike Interval, Phys. Rev. E 55(1), 287–290. Chelidze, T. and Matcharashvili, T. (2003) Electromagnetic Control of Earthquakes Dynamics? Computers and Geosciences 29(5), 587–593. Chelidze, T., Matcharashvili, T., Gogiashvili, J., Lursmanashvili, O., and Devidze, M. (2005) Phase Synchronization of Slip in Laboratory Slider System, Nonlin. Proc. in Geophysics 12, 1–8.
IDENTIFICATION OF COMPLEX PROCESSES
239
Cover T. M. and Thomas, J. A. (1991) Elements of Information Theory, Wiley, New York. Eckman, J.P. and Ruelle, D. (1985) Ergodic Theory of Chaos and Strange Attractors, Rev. Mod. Phys. 57(3), 617–656. Elbert, T., Ray, W.J., Kowalik, Z.J., Skinner, J.E., Graf, E. K., and Birnbauer, N. (1994) Chaos and Physiology, Physiol Rev. 74, 1–49. Garfinkel, A., Chen, P., Walter, D.O., Karaguezian, H., Kogan, B., Evans, S.J., Karpoukhin, M., Hwang, C., Uchida, T., Gotoh, M., and Weiss, J.N. (1997) Qoasiperiodicity and Chaos in Cardiac Fibrillation, J. Clin. Invest. 99(2), 305–314. Gavrilenko, P., Melikadze, G., Chelidze, T., Gibert, D., and Kumsiashvili, G. (2000) Permanent Water Level Drop Associated with Spitak Earthquake: Observations at Lisi Borehole and Modeling, Geophys. J. Int. 143, 83–98. Geller, R.J. (1999) Earthquake Prediction: Is this Debate Necessary? Nature, Macmillan Publishers Ltd.; http://helix.nature.com. Gilmore, R. (1993) A New Test for Chaos, J. Econ. Behav. Organization 22, 209–237. Gilmore, R. (1998) Topological Analysis of Chaotic Dynamical Systems, Rev. Mod. Phys. 70, 1455–1529. Goltz C. (1998) Fractal and Chaotic Properties of Earthquakes, Springer, Berlin. Govindan, R.B., Narayanan, K., and Gopinathan, M.S. (1998) On the Evidence of Deterministic Chaos in ECG: Surrogate and Predictability Analysis, Chaos 8(2), 495–502. Grassberger, P. and Procaccia, I. (1983) Estimation of the Kolmogorov Entropy from a Chaotic Signal, Phys. Rev. A 28(4), 2591–2593. Hegger, R., Kantz, H., and Schreiber, T. (1999) Practical Implementation of Nonlinear Time Series Methods: The TISEAN Package, Chaos 9, 413–440. Hodgson G. M. (1993) Economics and Evolution: Bringing Life Back into Economics, Ann Arbor: University of Michigan Press. Ivanski, J. and Bradley, E. (1998) Recurrence Plots of Experimental Data: To Embed or not to Embed? Chaos 8, 861. Johansen. A. and Sornette, D. (1999) Acoustic Radiation Controls Dynamic Friction: Evidence from a Spring-Block Experiment, Phys. Rev. Lett. 82, 5152–5155. Jones, N. (2001) The Quake Machine, New Scientist 30(6), 34–37. Kagan, Y.Y. (1994) Observational Evidence for Earthquakes as a Nonlinear Dynamic Process, Physica D 77, 160–192. Kagan, Y.Y. (1997) Are Earthquakes Predictable? Geophys. J. Int. 131, 505–525. Kanamori, H. and Brodsky, E.E. (2001) The Physics of Earthquakes, Physics Today 6, 34– 40. Kantz, H. and Schreiber, T. (1997) Nonlinear Time Series Analysis, Cambridge, University Press. Keilis-Borok, V.I. (1994) Symptoms of Instability in a System of Earthquake-Prone Faults, Physica D 77, 193–199. Kennel, M.B., Brown R., and Abarbanel, H.D.I. (1992) Determining Minimum Embedding Dimension using a Geometrical Construction, Phys. Rev. A 45, 3403–3411. King, C. Y., Azuma, S., Igarashi, G., Ohno, M., Saito, H., and Wakita, H. (1999) Earthquake– Related Water-lewel Changes at 16 Closely Clustered Wells in Tono, Central Japan, J. Geoph. Res. 104(6), 13073–13082. Knopoff, L. (1999) Earthquake Prediction is Difficult But Not Impossible, Nature, Macmillan Publ. Ltd.; http://helix.nature.com. Korvin, G. (1992) Fractal Models in the Earth Sciences, Elsevier, NY. Kraskov, A., Stogbauer,H., and Grassberger, P. (2004) Estimating Mutual Information, Phys. Rev. 69 (066138).
240
TEIMURAZ MATCHARASHVILI ET AL.
Kumpel, H. (1994) Evidence for Self-similarity in the Harmonic Development of Earth Tides, In: Fractals and Dynamic Systems in Geoscience, Kruhl, J. H. ed., Springer, Berlin, pp. 213– 220. Lefebvre, J.H., Goodings, D.A., Kamath., and Fallen, E.L. (1993) Predictability of Normal Heart Rhytms and Deterministic Chaos, Chaos 3(2), 267–276. Main, I. (1997) Earthquakes–Long Odds on Prediction, Nature 385, 19–20. Marzochi, W. (1996) Detecting Low-dimensional Chaos in Time Series of Finite Length Generated from Discrete Parameter Processes, Physica D 90, 31–39. Marwan, M., Wessel,N., Meyerfeldt, U., Schirdewan, A., and Kurths, J. (2002) Recurrenceplot-based Measures of Complexity and their Application to Heart-rate-variability Data, Phys. Rev. E 66 (026702). Marwan, M. (2003) Encounters with neighborhood, University of Potsdam, Germany, Theoretical Physics PhD. Mason, D. T., Spenn, J.F., and Zelis, R. (1970) Quantification of the Contractile State of the Intact Human Heart, Am. J. Cardiol 26(3), 248–257. Matcharashvili, T., Chelidze, T., and Javakhishvili, Z. (2000) Nonlinear Analysis of Magnitude and Interevent Time Interval Sequences for Earthquakes of Caucasian Region, Nonlinear Processes in Geophysics 7, 9–19. Matcharashvili, T. and Janiashvili, M. (2001) Investigation of Variability of Indexes of Myocardial Contractility by Complexity Measure in Patients with Hypertension, In Sulis W., Trofimova I. (Eds) Proceedings of the NATO ASI Nonlinear dynamics in life and social sciences, 204–214, IOS Press, Amsterdam. Matcharashvili, T., Chelidze, T., Javakhishvili Z., and Ghlonti, E. (2002) Detecting Differences in Dynamics of Small Earthquakes Temporal Distribution before And After Large Events, Computers & Geosciences 28(5), 693–700. McCauley, J. L. (2004) Dynamics of Markets: Econophysics and Finance. Cambridge, UK, Cambridge University Press. Ott, E. (1993) Chaos in Dynamical Systems, Cambridge University Press. Packard, N.H., Crutchfield, J.P., Farmer, J.D., and Shaw, R.S. (1980) Geometry from a Time Series, Phys. Rev. Lett. 45, 712–716. Peinke, J., Matcharashvili, T., Chelidze, T., Gogiashvili, J., Nawroth, A., Lursmanashvili, O., and Javakhishvili, Z. (2006) Influence of Periodic Variations in Water Level on Regional Seismic Activity Around a Large Reservoir, Field and Laboratory Model, Physics of the Earth and Planetary Interior 156(2), 130–142. Pikkujamsa,S., Makikallio, T., Sourander, L., Raiha., I., Puukka, P., Skytta, J, Peng, C.K., Goldberger, A., and Huikuri, H. (2006) Cardiac Interbeat Interval Dinamics from Childhood to Senescence: Comparison of Conventional and New Measures Based on Fractals and Chaos Theory, Circulation 100, 383–393. Pikovsky, A., Rosenblum, M.G., and Kurth. J. (2003) Synchronization: Universal Concept in Nonlinear Science, Cambridge University Press 411. Rapp, P.E., Albano, A.M., Schmah, T.I., and Farwell, L. A. (1993) Filtered Noise can Mimic Low-dimensional Chaotic Attractors, Phys. Rev. E. 47(4), 2289–2297. Rapp, P.E., Albano, A.M., Zimmerman, I. D., and Jumenez-Montero, M.A. (1994) Phaserandomized Surrogates Can Produce Spurious Identification of Non-random Structure, Phys. Lett. A 192(1), 27–33. Rapp, P. E., Cellucci, C. J., Korslund, K. E., Watanabe, T. A., and Jimenez-Montano, M. A. (2001) Effective Normalization of Complexity Measurements for Epoch Length and Sampling Frequency, Phys. Rev. 64 (016209).
IDENTIFICATION OF COMPLEX PROCESSES
241
Rombouts, S.A., Keunen, R.W., and Stam, C.J. (1995) Investigation of Nonlinear Structure in Multichannel EEG, Phys.Lett. A 202(5/6), 352–358. Rosenstein, M. T., Collins, J. J., and DeLuca, C. J. (1993) A Practical Method for Calculating Largest Lyapunov Exponents from Small Data Sets, Physica D 65, 117–134. Ruelle, D. (1994) Ehere Can One Hope to Profitably Apply the Ideas of Chaos? Physics Today 47(7), 24–32. Rundle, J., Turcotte, D., and Klein, W. (2000) GeoComplexity and the Physics of Earthquakes, AGU, Washington. Sato, S., Sano, M., and Sawada, Y. (1987) Practical Methods of Measuring the Generalized Dimension and the Largest Lyapunov Exponent in High Dimensional Chaotic Systems, Prog. Theor. Phys. 77, 1–5. Scholz C.H. (1990) Earthquakes as Chaos, Nature 348, 197–198. Schreiber, T. (1993) Extremely Simple Nonlinear Noise-reduction Method, Phys. Rev. E 47(4), 2401–2404. Schreiber, T. (1999) Interdisciplinary Application of Nonlinear Time Series Methods, Phys. Rep. 308, 1–64. Schreiber, T., and Schmitz, A. (1999) Testing for Nonlinearity in Unevenly Sampled Time Series, Phys. Rev. E 59(044). Shannon, C. E. (1948) A Mathematical Theory of Communications, The Bell System, Techn. J. 27, 623–656. Shannon, C. E. (1964) The Mathematical Theory of Communication, University of Illinois, Urbana, IL. Shiner, J. S., Davison, M., and Landsberg, P. T. (1999) Simple Measure for Complexity, Phys. Rev. E 59(2), 1459–1464. Sibson, R. (1994) Simple Measure for Complexity, In: Geofluids: Origin, Migration and Evolution of Fluids in sedimentary Basins, Parnell J., ed., The Geological Society, London, pp. 69–84. Simpson, D.W., Leith, W. S., and Scholz. C. (1988) Two Types of Reservoir-induced Seismicity, Bull. Seism. Soc. Am. 78, 2025–2040. Sivakumar, B., Berndtsson, R., Olsson, J., and Jinno, K. (2002) Reply to “Which Chaos in the Rainfall-runoff Process”, Hydrol. Sci. J. 47(1), 149–58. Smirnov, V.B. (1995) Fractal Properties of Seismicity of Caucasus, J. of Earthq. Prediction Res. 4, 31–45. Sprott, J. C. and Rowlands, G. (1995) Chaos Data Analyzer; the Professional Version, AIP, NY. Takens, F. (1981) Detecting Strange Attractors in Fluid Turbulence, In: Dynamical Systems and Turbulence, Rand, D., and Young, L.S., ed., Berlin, pp. 366–381. Talwani, P. (1997) On Nature of Reservoir-induced Seismicity, Pure and Appl. Geophys. 150, 473–492. Tarasov, N.T. (1997) Crustal Seismicity Variation Under Electric Action, Transactions (Doklady) of the Russian Academy of Sciences 353(3), 445–448. Theiler, J., Eubank, S., Longtin, A., Galdrikian, B., and Farmer, J.D. (1992) Testing for Nonlinearity in Time Series: The Method of Surrogate Data, Physica D 58, 77–94. Theiler, J. and Prichard, D. (1997) Using “Surrogate-surrogate Data” to Calibrate the Actual Rate of False Positives in Tests for Nonlinearity in Time Series, In: Nonlinear Dynamics and Time Series,Cutler, D., and Kaplan, D.T., ed., Fields Institute Communications, pp. 99–113. Turcotte, D. (1992) Fractals and Chaos in Geology and Geophysics, Thesis PhD, University Press, Cambridge.
242
TEIMURAZ MATCHARASHVILI ET AL.
Vidale, J.E., Agnew, D.C., Johnston, M.J.S., and Oppenheimer, D.H. (1998) Absence of Earthquake Correlation with Earth Tides: An Indication of High Preseismic Fault Stress Rate, J. Geophys. Res. 103(24), 567–572. Volykhin, A.M., Bragin, V.D., and Zubovich, A.P. (1993) Geodynamic Processes in Geophysical Fields, Moscow, Nauka. Wackerbauer, R., Witt, A., Atmanspacher, H., Kurths, J., and Scheingraber, H. (1994) A Comparative Classification of Complexity Measures, Chaos, Solitons & Fractals 4, 133–137. Weiss, J.N., Garfinkel, A., Karaguezian, H., Zhilin, Q., and Chen, P. (1999) Chaos and Transition to Ventricular Fibrilation: A New Approach to Antiarrhythmic Drug Evaluation, Circulation 99(21), 2819–2826. Wolf, A., Swift, J., Swinney, H., and Vastano, J. (1985) Determining Lyapunov Exponents from a Time Series, Physica D 16, 285–317. Wyss, M. (1997) Cannot Earthquakes Be Predicted? Science 278, 487–488. Yang, P., Brasseur, G. P., Gille, J. C., and Madronich, S. (1994) Dimensionalities of Ozone Attractors and their Global Distribution, Physica D 76 (3310343). Yao, W., Essex, C., Yu, P., and Davison, M. (2004) Measure of Predictability, Phys. Rev. E 69, 110–123. Zbilut, J.P. and Weber, C, L. (1992) Embeddings and Delays as Derived from Quantification of Recurrence Plots, Phys. Lett. A 171, 199–203. Zbilut, J.P.A., Giuliani, C.L., and Webber Jr. (1998) Detecting Deterministic Signals in Exceptionally Noisy Environments Using Cross-recurrence Quantification, Phys. Lett. A 246, 122–128. Zhang, X. and Thakor, N.V. (1999) Detecting Ventricular Tachicardia and Fibrillation by Complexity Measure, IEEE Trans. on Biomed. Eng. 46(5), 548–555. Zokhowski. M., Winkowska-Nowak, K., and Nowak, A. (1997) Autocorrelation of R-R Distributions as a Measure of Heart Variability, Phys. Rev. E 56(3), 3725–3727.
TIME-RESOLVED LUMINESCENCE IMAGING AND APPLICATIONS
Ismail Mekkaoui Alaoui (
[email protected]) Department of Physics, Faculty of Sciences Semlalia, Cadi Ayyad University, BP. 2390 Marrakech 40000, Morocco
Abstract. The registration of luminescence emission spectra at different decay times from the excitation pulse is called time-resolved (gated) luminescence spectroscopy. This technique makes it possible to distinguish between short and long decay components. In particular, a long decay emission (phosphorescence) can be separated from the background signal, which most often consists of short decay fluorescence and scattering; because measurements do not occur until a certain time has elapsed from the moment of excitation. Time-resolved spectroscopy finds applications in many areas such as molecular phosphorescence, fingerprint detection, immunoassay, enzyme activity measurements, etc. In this presentation we will focus on fingerprint detection with time-resolved luminescence imaging using Eu(III) and Tb(III). These two chemicals react with Ruhemann’s Purple, RP (RP is the reaction product of ninhydrin with amino acid glycine), and are suitable for this purpose. They have a relatively long luminescence decay of millisecond-order. Instrumentation and fingerprint samples developed by this technique will be presented and discussed. Key words: luminescence, time-resolved, imaging, laser, fingerprints
1. Introduction The detection of latent fingerprints by laser (Menzel, 1989), or more generally, luminescence detection of fingerprints provides great sensitivity. Fingerprint treatments such as rhodamine 6G staining and ninhydrin/zinc chloride are now routine. What has kept laser fingerprint detection from becoming a truly universal technique is that many surfaces fluoresce very intensely under the laser illumination, overwhelming the fingerprint luminescence. Such surfaces were until recently not amenable to the current routine procedures such as those mentioned above. To mitigate this deficiency, timeresolved luminescence imaging has been explored (Mitchell and Menzel, 1989). 243 J. Byrnes (ed.), Imaging for Detection and Identification, 243–248. C 2007 Springer.
244
ISMAIL MEKKAOUI ALAOUI
Two general chemical strategies were explored. One involved the use of transition metal complexes that yield charge transfer phosphorescence (long-lived luminescence) with microsecond-order lifetimes. These complexes would be used as staining dyes for smooth surfaces much like rhodamine 6G, or be incorporated into dusting powders (Menzel, 1988). Alternatively, for porous surfaces such as paper, ninhydrin treatment followed by treatment with rare earth salts, involving the lanthanides Eu3+ or Tb3+ , an analog to the ninhydrin/zinc chloride treatment, was investigated (Mekkaoui, Alaoui, 1992). The formed rare earth-RP complexes exhibit luminescence lifetimes of millisecond order (Mekkaoui Alaoui and Menzel, 1993). Some of the above transition metal complexes are amenable to excitation with bluegreen light (the customary Ar-laser output), others respond to near-ultraviolet excitation (also obtainable with Ar-lasers). Such wavelength switching, although not difficult, is clumsy nonetheless. Worse, for time-resolved imaging of luminescence of microsecond order lifetime, the laser chopping requires devices such as electro-optic modulators. These are not only expensive, but demand delicate optical alignment and careful gain and bias adjustment. For luminescence of millisecond-order lifetimes, the laser chopping is much easier since a cheap and easily usable mechanical light chopper suffices. 2. Time Resolved Luminescence Imaging 2.1. SENSITIZED LUMINESCENCE AND INTRAMOLECULAR ENERGY TRANSFER
Sensitized luminescence is the process whereby an element having no appreciable absorption band in the visible or ultraviolet (UV) spectrum is made to emit appreciable radiation upon excitation in this region as a result of energy transfer from the absorbing ligand with which it is complexed. The rare earth salts showing luminescence (Eu3+ and Tb3+ ) are known for their narrow and weak absorption bands in the UV region coupled with emission bands which have narrow half-widths in the visible region. The radiative transitions of these elements are weak because the absorption of the ion itself is very low. But it can be enhanced, when (Eu3+ or Tb3+ ) is bonded to appropriate organic ligands, via intramolecular energy transfer from the organic ligands (good absorbers) to the rare earth ions (good luminescers) when excited with the right excitation (Weissman, 1942). Rare earth-RP complexes show emission enhancement of the rare earth ions. So far, the only excitations that lead to energy transfer from the organic ligands, RPs, to the Eu3+ or Tb3+ are in the near UV range (200 to 400 nm) (Mekkaoui Alaoui, 1995). Eu-RP complexes are readily prepared in solution. If such solutions were effective for fingerprint staining, akin to rhodamin 6G, then one should be able to use
TIME-RESOLVED LUMINESCENCE IMAGING AND APPLICATIONS 245
this chemical strategy for fingerprints on smooth and porous surfaces alike. This could reduce instrumentation cost and also make for ease of instrumentation operation. We need to state that some fingerprint reagents, such as 1,2-indanedione (Joullie and Petrovskaia, 1998) fluoresce under Ar laser and develop fingerprints without requiring time resolved luminescence imaging (Mekkaoui Alaoui et al., 2005). 2.2. PRINCIPLE OF TIME-RESOLVED LUMINESCENCE
The basic principle of the technique is as follows. The beam of an Ar-laser is chopped by a mechanical light chopper or electro-optic modulator so that laser pulses with sharp cut-off illuminate the article under scrutiny. The article is viewed with an imaging device (CCD camera) synchronized to the laser pulses such that it turns on shortly after laser cut-off and turns off shortly before onset of the next laser pulse. The imaging device is operated in this way because the offending background fluorescences have short lifetimes (nanoseconds order), i.e., the background fluorescence decays very quickly after laser pulse cut-off. If fingerprint treatments can be developed such that much longer luminescence lifetimes result, then the imaging device will only detect this long-lived luminescence and will suppress the background. The principle of the time-resolved luminescence imaging is sketched in Figure 1. Timeresolved spectroscopy finds other applications in many areas (Diamandis, 1988). The system used in time-resolved luminescence imaging consists of a laser, modulator, microscope, camera, signal trigger discriminator, microcomputer. An argon-ion laser modulated by a mechanical chopper excites the samples. The sample luminescence is detected after the background fluorescence has died. The image is taken by the CCD (charge coupled device) camera through a microscope. The camera is connected to a microcomputer. The image taken by the camera is sent to the monitor and can be manipulated or stored or printed. 2.3. LASER MODULATIONS
CW argon-ion lasers are used as excitation of our samples. They deliver continuous wave signals. For time-resolved luminescence purposes, we need some method of switching off and on (modulation) the excitation of the samples as desired depending on the luminescence decay time and also on the background decay lifetime. Two ways of laser modulation were used depending on the decay time range of the compounds. Mechanical light chopper for relatively long lifetimes (of milliseconds order), and electro-optic modulation for relatively short lifetime decays (of nanosecond or microseconds orders). For
246
ISMAIL MEKKAOUI ALAOUI
Figure 1. Principle of time-resolved luminescence imaging
Eu-RP compounds (3 ms), a mechanical chopper is sufficient when operating at 169 Hz (6 ms).
3. Application in Fingerprints Development 3.1. FINGERPRINTS STAINING
The stability of the RP-Eu3+ complex when conserved at low temperature inspired us to use them for fingerprints staining. Fingerprints were deposited on surfaces that can be stained (aluminum cans, plastics, tapes, etc.). Eu-RP complexes were used to stain these surfaces. The surfaces were chosen to show
TIME-RESOLVED LUMINESCENCE IMAGING AND APPLICATIONS 247
Figure 2. Fingerprint, on a coke can, developed by time-resolved luminescence technique: ungated camera (left), gated camera (right)
high background fluorescence, and the time-resolved luminescence imaging system was used to suppress the background. The results are promising: A fingerprint from a soft drink can (a highly fluorescent surface) stained with Eu-nitrate-RP solution has been developed as shown in Figure 2. We can see a high background luminescence signal (left image) overwhelming the fingerprint luminescence signal, while, when the image is gated (right), the fingerprint signal is showing up. 3.2. FINGERPRINT DEVELOPMENT
Fingerprints were deposited on porous surfaces (i.e., paper, cardboard) and were then treated with Ninhydrin and analogs. Some of the samples were left at ambient humidity and temperature, and some of the samples were incubated at about 50◦ C and 60% relative humidity. The samples left at ambient conditions were slower to develop. The samples were then sprayed with solutions of EuCl3 · 6H2 O. Under UV light, the red luminescence from the fingerprint was generally comparable to that of the rest of the surface with only slight enhancement of luminescence. The usage of the time resolved luminescence imaging system to picture the fingerprints was not always possible, because something quenches the energy transfer from the RP to Eu or possibly enhances the emission of unreacted Eu (paper absorbs strongly in the near-UV and fluoresces as well as a result). We concluded that either the problem comes from the surface itself or that the chloride salt is simply insufficient in many instances. Then we started to use different kind of papers and different Eu anions. Some samples were developed in this way, but still more work is needed in this area.
248
ISMAIL MEKKAOUI ALAOUI
Acknowledgements Most of this work has been done at the Center for Forensic Studies, TTU, with a grant from the Moroccan-American Commission for Cultural and Educational Exchanges and the Fulbright Program.
References Diamandis, E. P. (1988) Immunoassay with Time-resolved Fluorescence Spectroscopy: Principle and Applications, Clinical Biochemistry 21, 139–150. Joulli, M. M. and Petrovskaia, O. (1998) A Better Way to Develop Fingerprints, ChemTech 28(8), 41–44. Mekkaoui Alaoui, I. (1992) Optical Spectroscopic Properties of Ruhemann’s Purple Complexes, Time Resolved Luminescence Imaging and Applications, Ph.D. Texas Tech University. Mekkaoui Alaoui, I. (1995) Non-Participation of the Ligand First Triplet State in Intramolecular Energy Transfer In Europium and Terbium Ruhemann’s Purple Complexes, Journal of Physical Chemistry 99, 13280–13282. Mekkaoui Alaoui, I. and Menzel E. R. (1993) Spectroscopy of Rare Earth Ruhemann’s Purple Complexes, Jouranl of Forensic Sciences 38(3), 506–520. Mekkaoui Alaoui, I., Menzel, E. R., Farag, M., Cheng, K. H., and Murdock, R. H. (2005) Mass Spectra and Time-resolved Fluorescence Spectroscopy of the Reaction Product of Glycine with 1,2-Indanedione in Methanol, Forensic Science International 152, 215–219. Menzel, E. R. (1988) Laser Detection of Latent Fingerprints: Tris(2,2 -bipyridyl Ruthenium (II) Chloride Hexahydrate as a Staining Dye for Time-resolved Imaging, In E. R. Menzel (ed.), Fluorescence Detection II, SPIE Proceedings, Vol. 910, pp. 45–51. Menzel, E. R. (1989) Detection of Latent Fingerprints by Laser-excited Luminescence, Analytical Chemistry 61, 557–561A. Mitchell, K. E. and Menzel, E. R. (1989) Time Resolved Luminescence Imaging: Application to Latent Fingerprint Detection, In E. R. Menzel (ed.), Fluorescence Detection III, SPIE Proceedings, Vol. 1054, pp. 191–195. Weissman, S. I. (1942) Intramolecular Energy Transfer, the Fluorescence of Complexes of Europium, Journal of Chemical Physics 10, 214–217.
SPECTRUM SLIDING ANALYSIS
Vladimir Ya. Krakovsky (vladimir
[email protected]) National Aviation University, Kiev, Ukraine
Abstract. The spectrum sliding analysis (SSA) is a dynamic spectrum analysis in which the next analysis interval differs from the previous one by including the next signal sample and excluding the first one from the previous analysis interval. Such a spectrum analysis is necessary for time-frequency localization of analyzed signal with given peculiarities. Using the well-known fast Fourier transform (FFT) towards this aim is not effective. Recursive algorithms which use only one complex multiplication for computing one spectrum sample during each analysis interval are more effective. The author improved one algorithm so that it is possible to use only one complex multiplication for computing two, four, and even eight (for complex signals) spectrum samples simultaneously. Problems of realization and application of the spectrum sliding analysis are also considered in the paper. Key words: spectrum sliding analysis, dynamic spectrum analysis, recursive algorithms, sliding spectrum algorithms and analyzers, instant spectrum algorithms and analyzers, algorithms for instantaneous spectrum digital analyzers, multichannel filter
1. Introduction Depending on the form of signal presentation, spectrum sliding analysis (SSA) may be implemented by analog, digital, or discrete-analog spectrum analyzers. Narrow band analog filters can be used to implement SSA at some points of the frequency’s working band. However, such analyzers are usually designed to analyze power spectrum and are not capable of analyzing phase spectrum and complex spectrum Cartesian constituents, which restricts their application (Chajkovsky, 1977). The spectrum analysis discrete-analog method uses the discrete signal formed by sampling the signal to be analyzed with an uncrossed pulse sequence, the magnitudes of which are altered in accordance with the sample values. This type of analyzer permits spectrum analysis with quality and quantity compatibility conditions, satisfying the information completeness condition (Chajkovsky, 1977), and it may be adapted for SSA (Chajkovsky and Krakovsky, 1979; Krakovsky, 1981). Spectrum digital 249 J. Byrnes (ed.), Imaging for Detection and Identification, 249–261. C 2007 Springer.
250
VLADIMIR YA. KRAKOVSKY
analyzers (Plotnikov et al., 1990) are the most popular. There are two SSA spectrum methods: in one the origin of the reference function is matched with the interval analysis origin. This approach computes the sliding spectrum (Chajkovsky, 1976). In many applications (e.g., speech recognition) there is no need for such matching. When this is the case the computation is called the instant spectrum (Chajkovsky, 1976). Algorithms and devices implementing the two methods are considered in the following sections.
2. Sliding Spectrum Algorithms and Devices The discrete sliding spectrum is given by the formula: Fq ( p) =
1 N
q
− p(k−(q−N +1))
f (k)W N
, p ∈ 0, P − 1, q = 0, 1, 2, . . . ,
k=q−N +1
(1) where q is an analysis interval index, k is a signal sample index in the limits of the sliding window k ∈ q − N + 1, q, N is the size of the processing extract, and Fq ( p) is the complex spectrum sample at the frequency pω at the instant − pk qt, f (k) is the analyzed signal sample value at the instantkt, W N is 2π the complex reference coefficient symbolic representation exp − j N pk . t and ω are the discrete time and frequency intervals, determined by the information completeness condition (Chajkovsky, 1977). Direct computation of the functional (1) requires NP complex multiplications and (n − 1)P additions. Fast Fourier transformation (FFT) (Cooley and Tukey, 1965) reduces this number by N /(log2 N ). Further reduction is possible. The formula: 1 p Fq ( p) = Fq−1 ( p) + f (q) − f (q − N ) W N , p ∈ 0, P − 1, N q = 0, 1, 2, . . . . (2) leads to a recursive algorithm (Halberstein, 1996) computing the sliding spectrum samples. The computer requirements (number of complex multiplications) are considerably lower using (2) rather than (1). In (1), N complex multiplications are computed for each spectrum sample (log2 N if FFT is used). Using (2) requires only one complex multiplication for the same computation. This advantage has led to many sliding spectrum analyzers implementing (2). The spectrum analyzer functional diagram implementing the algorithm (2) is shown in Fig. 1. The following symbols are used in the diagram: ADC is the analog-digital converter, DR is the delay register, SU is the subtraction
251
SPECTRUM SLIDING ANALYSIS
Figure 1. The spectrum analyzer functional diagram (2).
unit, AD is the adder, WU is the weighting unit, MU is the multiplier, RAM is the random-access memory; f (t) is the analog input signal; f q are digital readouts of the signal f (t) ; f q−N are the readouts f q delayed by N steps; f q p are the readouts of the difference 2π signal f q − f q−N ; W N are the readouts of the weighting function exp j N p ; Fq ( p) , Fq−1 ( p) are the sliding spectrum readouts in the current and the preceding steps, respectively. A detailed description of the characteristics of the analyzers (Chajkovsky et al., 1977, 1978; Dillard, 1977) implementing (2), are given in Krakovsky (1980). Computational stability using algorithm (2) is achieved by the appropriate rounding of twiddle factor constituents having factor modulus not exceeding unity (Krakovsky and Chajkovsky, 1984). The constituents word size should be (1/2)log2 (number of sliding steps) more than the signal samples word size (Krakovsky, 1983a). For an unlimited number of sliding steps it is possible to use the technical decision described in (Perreault and Rich, 1980). While using algorithm (2) the reference function is always matched with the analysis interval origin, thereby ensuring a matched filtering (Chajkovsky and Krakovsky, 1983, 1985, 1986). 3. Instant Spectrum Algorithms and Devices The instant spectrum 1 Fq ( p) = N
q k=q−N +1
− pk
f (k)W N
,
p ∈ 0, P − q,
q = 0, 1, 2, . . .
(3)
can be implemented by the following simple recursive algorithm, in Lejtes and Sobolev (1969) and Schmitt and Starkey (1973). 1 − pq Fq ( p) = Fq−1 ( p) + f (q) − f (q − N ) W N , p ∈ 0, P − 1, N q = 0, 1, 2, . . . . (4) − pq
This algorithm is always stable and the twiddle factors W N word size is the same as the word size of signal samples. Moreover, it is possible to achieve matching with an additional multiplier and a complex conjugate
252
VLADIMIR YA. KRAKOVSKY
Figure 2. The spectrum analyzer functional diagram (4).
device (Chajkovsky, 1976). In Krakovsky and Koval (1982, 1984) devices that implement this algorithm using the conveyer mode are described. The spectrum analyzer functional diagram implementing algorithm (4) is shown in Fig. 2, where Fq ( p) are the increments of the spectrum readouts (1/N ) f q W N− pq . The spectrum analyzer functional diagram implementing algorithm (4) with matching is shown in Fig. 3, where CCU and AMU are the complex conjugate device and additional multiplier, respectively (Chajkovsky, 1976). This algorithm has a remarkable property which permits one to organize SSA so that one complex multiplication may be used for computing two, four, and even eight (for complex signals) spectrum harmonics at once (Krakovsky, 1990, 1997, 1998, 2000; Krakovskii, 1993). This may be done by presenting algorithm (4) as follows: F( p) = Fq−1 ( p) + Fq ( p), p ∈ 0, P − 1, q = 0, 1, 2, . . . , 1 2π Fq ( p) = f (q) − F(q − N ) exp − j pq N N
(5) (6)
The spectrum increments Fq ( p) may be used not only for spectrum harmonic p, but also for the spectrum harmonics pi = i
N + p, i ∈ 1, 3 4
(7)
pk = k
N − p, k ∈ 1, 4 4
(8)
and
Figure 3. Spectrum analyzer function with matching.
SPECTRUM SLIDING ANALYSIS
253
Figure 4. Generalized functional diagram: implementation device.
using known properties of the complex exponential function. A simplified summary of the algorithm (5) can be described as follows: (a) for harmonics of the form (7) Fq ( pi ) = Fq−1 ( pi ) + (− j)iq Fq ( p),
q = 0, 1, 2, . . . ,
(9)
q = 0, 1, 2, . . . ,
(10)
(b) for harmonics of the form (8) Fq ( pk ) = Fq−1 ( pk ) + (− j)kq Fq (− p),
where Fq (− p) are complex conjugated spectrum increments Fq ( p), if the signal samples are real. If the signal samples are complex, the increments Fq (− p) are generated of the products by inverting the signs
of the signal increments f q = N1 f (q) − f (q − N ) with the imaginary part of the weighting function and then forming the appropriate algebraic sums.
Devices implementing algorithms (9) and (10) are presented in Krakovsky and Koval (1988a, 1988b, 1989) and Krakovsky et al. (1991a, 1991b). Their generalized functional diagram is shown in Fig. 4, where SW are switching components, m is the number of additional subranges, m ∈ {1, 3, 7}. In the simplest case of complex signal analysis (P = N ), the harmonic range is partitioned into two subranges p ∈ 0, N2 − 1, and by (7) for i = 2 we have p2 = N2 + p. By (9) the readouts for the harmonics p2 are determined by: Fq ( p2 ) = Fq−1 ( p2 ) + (−1)q Fq ( p),
(11)
254
VLADIMIR YA. KRAKOVSKY
TABLE I. Multiplexer control signal q mod 4
Fq ( p1 )
Fq ( p2 )
Fq ( p3 )
0
Fq ( p)
Fq ( p)
Fq ( p)
1
ImFq ( p) − j ReFq ( p)
−Fq ( p)
−ImFq ( p) + j ReFq ( p)
2
−Fq ( p)
Fq ( p)
−Fq ( p)
3
−ImFq ( p) + j ReFq ( p)
−Fq ( p)
ImFq ( p) − j ReFq ( p)
i.e., the spectrum increment Fq ( p) for harmonic p obtained at the MU output are fed through SW to the input of the adder for subrange p2 . In SW, as we see from (11), the sign of the increments Fq ( p) are either preserved or inverted, depending on the parity of the readout index q. To this end, it is sufficient to include in SW two operand sign control units (for the real and imaginary components of Fq ( p)), which are controlled by the binary signal from the output of the least significant bit in the q-counter (Krakovsky and Koval (1988a)). Operand sign control can be performed by exclusive OR circuits. Simultaneous evaluation of two harmonics doubles the functional speed at a small additional hardware cost. Using (7) and dividing the harmonic range into four (for a complex signal) or two (for a real signal) subranges, we can similarly design appropriate analyzer structures (Krakovsky and Koval, 1988b, 1989). In this case, SW, in addition to operand sign control unit, also contains multiplexers for the operations (− j)iq Fq ( p). The activation sequence of these multiplexers with the corresponding operand sign control circuits can be determined from the data in Table I, which gives the dependences of the increments Fq ( p1 ), Fq ( p2 ), and Fq ( p3 ) on Fq ( p) and q. We see from Table I that the multiplexer control signal can be taken from the output of the least significant bit of the q-counter, while the control signals for the operand sign control circuits can be provided either by the outputs of the two least significant bits of this counter or by their logical XOR. The SSA speed can be further doubled if, in addition to (7), we also use (8). In this case, SW must also include components that generate the spectrum increments Fq (− p) (10). Moreover, the second SW input should receive from the SU the signal readout increment f q (dashed line in Fig. 4). This is needed for organizing simultaneous processing in corresponding subranges of operands of the harmonics (2i + 1) N8 and k N4 , i ∈ 1, 3, k ∈ 1, 4. To explain this technique, Tables II and III list the indices of simultaneously determined harmonics for real (Krakovsky et al., 1991a) and complex (Krakovsky et al., 1991b) signals. With the determination of the harmonic
255
SPECTRUM SLIDING ANALYSIS TABLE II. Indices of simultaneously determined harmonics Indices of simultaneously determined harmonics
Subrange p
1
2
···
N 8
−1
N 8
N 8
+1
N 4
p1
N 4
−1
N 4
−2
···
p2
N 4
+1
N 4
+2
···
3 N8 − 1
3 N8
p3
N 2
−1
N 2
−2
···
3 N8 + 1
0
TABLE III. Indices of simultaneously determined harmonics Indices of simultaneously determined harmonics
Subrange p
1
2
···
N 8
−1
N 8
N 8
+1
N 4
p1
N 4
−1
N 4
−2
···
p2
N 4
+1
N 4
+2
···
3 N8 − 1
3 N8
p3
N 2
−1
N 2
−2
···
3 N8 + 1
N 2
p4
N 2
+1
N 2
+2
···
5 N8 − 1
5 N8
p5
3 N4 − 1
3 N4 − 2
···
5 N8 + 1
3 N4
p6
3 N4 + 1
3 N4 + 2
···
7 N8 − 1
7 N8
p7
N −1
N −2
···
7 N8 + 1
0
p = N8 , Fq ( p), is used to determine the harmonics (2i + 1) N8 , i ∈ 1, 3, and f q is used at the same time to calculate the harmonics k N4 , k ∈ 1, 4. Structural diagrams of analyzers based on algorithms (9) and (10) are given in the invention descriptions (Krakovsky and Koval, 1988a, 1988b, 1989; Krakovsky et al., 1991a, 1991b). These analyzers use modern hardware components and require some additional hardware costs to expand the frequency range by two, four, or eight times. The additional hardware costs increase much slower than the incremental benefits produced by group complex multiplication.
4. Multichannel Matched Filtering on the Basis of SSA SSA may be used for FIR-filtering (Rabiner and Gold, 1975) in invariant to information shift (stationary) mode. Such filters perform signal convolution
256
VLADIMIR YA. KRAKOVSKY
with a given impulse response τ ∗ s(τ ) = f (t)g(t − τ ) dt = τ −T
T
f ∗ (t + τ )g(t) dt.
(12)
0
In accordance with Parsevol theorem (12) is equivalent to the integral ∞ 1 s(τ ) = Fτ (ω)G ∗ (ω) dω, 2π −∞
(13)
where G(ω) is the unshifted impulse response spectrum, and Fτ (ω) is the signal sliding spectrum in the analysis interval T . The discrete version of (13) is r (q) =
P−1
Fq ( p)G ∗ ( p),
(14)
p=0
where Fq ( p) and G ∗ ( p) are samples of the spectra Fτ (ω) and G ∗ (ω), P is the number of impulse response spectrum readouts. Expression (14) is equivalent to the traditional discrete correlation r (q) =
N −1
f (k + q)g(k).
(15)
k=0
The expression (14) is the formal description of the FIR-filtering algorithm on the basis of SSA (Chajkovsky and Krakovsky, 1986). Practical realization of the filtration procedure while processing with a set of D different impulse responses may be performed by the device, whose functional diagram is shown in Fig. 5, where SSA is a harmonic sliding analyzer; MUi are multipliers,
Figure 5. Varying impulse responses.
SPECTRUM SLIDING ANALYSIS
257
i ∈ 1, D ; ROM G∗i are complex conjugated transfer function ROMs of the spectral readouts for the appropriate filter channels; and AAi are accumulating adders of the pair products. In such a device mutually overlapped input signal readouts series are used, with the help of SSA, to continuously in time complete the sliding spectrum readouts series Fq ( p), q = 0, 1, 2, . . . . These values are simultaneously placed in all the MUi . The second inputs of the MUi are connected to outputs of the ROM G∗i , which, synchronously with each pth harmonic readout of the sliding spectrum, present respective pth harmonic readouts of the complex conjugated transfer function spectrum. The products from the MUi outputs are placed into the AAi inputs. At the end of each sliding step, we form the sum (14) of the readouts ri (q). As follows from Fig. 5, each channel of filtering consists of one accumulating adder, one ROM G∗i , and one multiplier connected to the common output of the SSA. The channel structure is uniform throughout. The difference is determined by the contents of the transfer function spectral readouts ROM G∗i . The concrete performances of the appropriate devices for multichannel filtering on the basis of SSA are given in Chajkovsky and Krakovsky (1983) for real signals and Chajkovsky and Krakovsky (1985) for complex signals. A criterion of effectiveness of the multichannel filter based on the SSA is the number of complex multiplications used for realization of one step of sliding on the set of D channels of filtering. In accordance with algorithm (14) and the structure of the complex multichannel filter the number αs of complex multiplication operations realizing one step of sliding is determined by the sum of the operations in the first stage of processing (stage of computing the sliding spectrum) and operations in the second stage of processing (D times of weighting and summation of the spectral components). Thus, αs = P + DP = P(1 + D). The appropriate computing expenditure realizing the standard correlation procedure of the complex multichannel filtering (15) is αc = D N . Consequently, the benefit of the effectiveness of the multichannel filtering on the basis of SSA can be estimated by the ratio of the computing expenditures B=
αc DN . = αs P(1 + D)
(16)
The more N and the less P, compared with N , the more advantageous filtering on the basis of SSA. It is possible to show that, in the case of matched
258
VLADIMIR YA. KRAKOVSKY
(optimal) filtering, the number P practically coincides with the base of the useful signal Q = FT. The most benefit is realized in the use of the matched filtering when P = 1. In this case the filter degenerates into D independent digital resonators. If N D, then Bmax =
DN ≈ N. 1+ D
(17)
As the order of the realized filter or base of the filtered signal increases (P increases) the relative benefit decreases. In the limit, when P = N Bmin =
D ≈ 1, 1+ D
(18)
and the effectiveness of the digital filter on the basis of SSA does not essentially differ from the effectiveness of the correlation filter. It is important to note that the decrease in computing expenditures, when using SSA, leads to an almost proportional decrease in the hardware volume used. For example, when realizing D-channel filtering on the basis of correlation convolution (15) it is necessary to use D multipliers and adders working in parallel. The speed of each is determined by the necessity to perform N multiplications and additions for one sliding step. When realizing multichannel filters on the basis of SSA the number of multipliers and adders used to achieve the same productivity may be decreased approximately by N /P times, because each channel performs only P multiplications and additions. In the extreme case when organizing D digital resonators (P = 1) it is sufficient to use solely the multiplier in the SSA, since the secondary processing in each channel completely degenerates and the SSA itself serves as the multichannel filter. When filtering simple and low complexity signals in conditions of high degree a priori frequency uncertainty, when N P, multichannel filter on the basis of SSA has advantages in hardware expenditures not only in comparison with correlation FIR-filtering but also with very effective filtering on the basis of FFT (Rabiner and Gold, 1975). Multichannel filtering devices contain a direct Fourier transform unit. The output is connected with D filtering channels, each of which consists of a weighting unit and an inverse Fourier transform unit. These devices, without special buffer memory units, form the readouts of filtering the signal irregularly in time. This distorts the time scale and the regularity of the output information. To achieve stationary mode using these devices, we require additional hardware expenditures. Furthermore, the presence in each channel of an inverse Fourier transform unit increases weight, volume, and cost of the hardware. Devices (Chajkovsky and Krakovsky 1983, 1985), which realize the discussed algorithm of filtration, are free of these drawbacks.
SPECTRUM SLIDING ANALYSIS
259
5. Recommendation on Spectrum Sliding Analysis Realization It is also possible to increase the response speed of the instant digital spectrum analyzers using table multiplication (Sobolev, 1991). This may be used if there is enough memory for appropriate tables. An A-D converter with the aim to increase the accuracy may be used. A converter (Chajkovsky et al., 1988) with Sample-Hold circuit (Krakovsky, 1985) allows an increase in both the accuracy and the speed. Project Computer System for Precise Measurement of Analog Signals showed that it is possible to achieve an accuracy of 16 bits with the sampling frequency up to 2 MHz. Projects Real Signal Instant Spectrum Digital Analyzer in the Information Processing Systems and Complex Signal Instant Spectrum Digital Analyzer in the Information Processing Systems showed that using Xilinxs PLAs Virtex-2 (http://www.xilinx.com) it is possible to implement inventions by Krakovsky et al. (1991a) and Krakovsky et al. (1991b) with the sampling frequency up to 3.9 MHz for the size of processing extracts N = 256 and the accuracy of 8 bits. Spectrum digital analyzers output only Cartesian constituents of the spectrum complex samples. If only the modulus of these samples is required, it is possible to increase the response speed by using approximating methods of modulus calculations based on comparison, shift, and summation (Krakovsky, 1982, 1983b).
Acknowledgments The author would like to thank Myoung An, Richard Tolimieri, and Benjamin Shenefelt for their help in preparing the paper for the publication. All the material presented in this paper has been reproduced from the text Spectrum Sliding Analysis, Vladimir Ya. Krakovsky, Psypher Press, 2006, with explicit permissions from the author and the publisher.
References Chajkovsky, V. I. (1976) Real Time Spectrum Digital Analyzers Functional Peculiarities, Kiev (preprint/Ukraine SSR NAS Institute of Cybernetics; 76-39) (in Russian). Chajkovsky, V. I. (1977) Spectrum Experimental Analysis Metrology Peculiarities, Kiev (preprint/Ukraine SSR NAS Institute of Cybernetics; 77-6) (in Russian). Chajkovsky, V. I., Koval, V. F., Krakovsky, V. Ya., and Pikulin, V. S. (1977) Fourier Spectrum Analyzer, USSR Author Certificate 560232, May 30. Chajkovsky, V. I. and Krakovsky, V. Ya. (1979) Sliding Spectrum Discrete-analog Analyzer, USSR Author Certificate 666488, June 5.
260
VLADIMIR YA. KRAKOVSKY
Chajkovsky, V. I. and Krakovsky, V. Ya. (1983) Multichannel Filter, USSR Author Certificate 995282, February 7. Chajkovsky, V. I. and Krakovsky, V. Ya. (1985) Multichannel Filter, USSR Author Certificate 1193778 A, November 11. Chajkovsky, V. I. and Krakovsky, V. Ya. (1986) Signal Discrete Filtration on the Basis of Spectrum Sliding Analysis, Kibernetika i vychislitelnaja tekhnika (71), 55–58. Chajkovsky, V. I., Krakovsky, V. Ya., and Koval, V. F. (1978) Fourier Spectrum Digital Analyzer, USSR Author Certificate 614440, July 5. Chajkovsky, V. I., Krakovsky, V. Ya., and Koval, V. F. (1988) A-D Converter, USSR Author Certificate 1401608 A2, May 7. Cooley, J. W. and Tukey, J. W. (1965) An Algorithm for the Machine Calculation of Complex Fourier Series, Mathematics and Computation 19, 297–301. Dillard, G. M. (1977) Method and Apparatus for Computing the Discrete Fourier Transform Recursively, USA Patent 4023028, May 10. Halberstein, J. H. (1966) Recursive, Complex Fourier Analysis for Real-time Applications, Proceedings of IEEE 54, 903. Krakovskii, V. Ya. (1993) Digital Analyzers for Dynamic Spectral Analysis, Measurement Techniques (USA) 36(12), 1324–1330 (translation of Izmer. Tekh. (Russia) 36(12), 13–16 (1993). Krakovskii, V. Ya. (1997) Generalized Representation and Implementation of Speedimprovement Algorithms for Instantaneous Spectrum Digital Analyzers, Cybernetics and System Analysis 32(4), 592–597. Krakovsky, V. Ya. (1980) Development and Investigation of Specialized Computing Devices for Electrical Signals Spectrum Sliding Analysis, Eng. Sci. Cand. Dissertation, NAS Institute of Cybernetics, Kiev (in Russian). Krakovsky, V. Ya. (1981) Instant Spectrum Discrete-analog Analyzer, USSR Author Certificate 834576, May 30. Krakovsky, V. Ya. (1983a) Selection of Sliding Spectrum Digital Analyzer Twiddle Factors Word Size. Izmeritelnaja tekhnika 10, 13–14 (in Russian). Krakovsky, V. Ya. (1983b) Complex Quantities Modulus Approximate Determination, Metrology (4), 7–10, 1983 (in Russian). Krakovsky, V. Ya. (1985) Sample-Hold Circuit, USSR Author Certificate 1185398 A, October 15. Krakovsky, V. Ya. (1990) Algorithms for Increase of Speed of Response for Digital Analyzers of the Instant Spectrum, Kibernetika (5), 113–115 (in Russian). Krakovsky, V. Ya. (1992) Complex Quantities Modulus Approximate Determination Errors, Metrology (4), 25–30 (in Russian). Krakovsky, V. Ya. (1998) Spectrum Sliding Analysis Algorithms and Device, In Proceedings of the 1st International Conference on Digital Signal Processing and Its Application, June 30–July 3, Vol. I, Moscow, Russia, pp. 104–107. Krakovsky, V. Ya. (2000) Harmonic Sliding Analysis Problem, In J. S. Byrnes (ed.), Twentieth Century Harmonic Analysis—A Celebration, NATO Science Series, II. Mathematics, Physics and Chemistry, Vol. 33, pp. 375–377. Krakovsky, V. Ya. and Chajkovsky, V. I. (1984) Spectrum Sliding Analysis Peculiarities, Autometrija 6, 34–37 (in Russian). Krakovsky, V. Ya. and Koval , V. F. (1982) Instant Spectrum Digital Analyzer, USSR Author Certificate 932419, May 30. Krakovsky, V. Ya. and Koval, V. F. (1984) Instant Spectrum Digital Analyzer, USSR Author Certificate 1095093 A, May 30.
SPECTRUM SLIDING ANALYSIS
261
Krakovsky, V. Ya. and Koval, V. F. (1988a) Instant Spectrum Digital Analyzer, USSR Author Certificate 1377762 A2, February 29. Krakovsky, V. Ya. and Koval, V. F. (1988b) Complex Signal Instant Spectrum Digital Analyzer, USSR Author Certificate 1406507 A2, June 30. Krakovsky, V. Ya. and Koval, V. F. (1989) Instant Spectrum Digital Analyzer, USSR Author Certificate 1456904 A2, February 7. Krakovsky, V. Ya., Koval, V. F., Kuznetsova, Z. A., and Eresko, V. V. (1991a) Real Signal Instant Spectrum Digital Analyzer, USSR Author Certificate 1563408 A2 (Permission for publication N 267/22, 28.11.91). Krakovsky, V. Ya., Koval, V. F., Kuznetsova, Z. A., and Eresko, V. V. (1991b) Complex Signal Instant Spectrum Digital Analyzer, USSR Author Certificate 1568732 A2 (permission for publication N 267/22, 28.11.91). Lejtes, R. D. and Sobolev, V. N. (1969) Synthetic Telephone Systems Digital Modeling, Moscow, Svjaz (in Russian). Litovchenko, L. S. (2003) Computer System for Precise Measurement of Analog Signals (Diploma Project), V. Ya. Krakovsky (project leader), Kyiv, National Aviation University (in Ukrainian). Perreault, D. A. and Rich, T. C. (1980) Method and Apparatus for Suppression of Error Accumulation in Recursive Computation of a Discrete Fourier Transform, USA Patent 4225937, September 30. Plotnikov, V. N., Belinsky, A. V., Sukhanov, V. A., and Zhigulevtsev, Yu. N. (1990) Spectrum Digital Analyzers, Moscow, Radio i svjaz (in Russian). Rabiner, L. R. and Gold, B. (1975) Theory and Application of Digital Signal Processing, Englewood Cliffs, NJ, Prentice-Hall. Schmitt, J. W. and Starkey, D. L. (1973) Continuously Updating Fourier Coefficients Every Sampling Interval, USA Patent 3778606, December 11. Sergienko, P. V. (2003) Real Signal Instant Spectrum Digital Analyzer in the Information Processing Systems (Diploma Project), V. Ya. Krakovsky (project leader), Kyiv, National Aviation University (in Ukrainian). Smetaniuk, S. V. (2003) Complex Signal Instant Spectrum Digital Analyzer in the Information Processing Systems (Diploma Project), V. Ya. Krakovsky (project leader), Kyiv, National Aviation University (in Ukrainian). Sobolev, V. N. (1991) Dynamic Spectral Analysis Fast Algorithm, In All-Union ARSO-16 Seminar Reports Abstracts, Moscow State University, pp. 232–233 (in Russian). Tolimieri, R. and An, M. (1998) Time-Frequency Representations, Bassle, Birkhauser.
INDEX
algorithms for instantaneous spectrum digital analyzers 249 automatic target recognition (ATR) 1, 13
joint research 64 Kalman filter 97, 98, 103, 104, 129, 139, 141, 147
batch posterior estimation 136 Bayesian estimation 130, 135, 137, 138 Bayesian inference 129 blind deconvolution 107, 109–111, 121, 124 bootstrap 142–144, 146, 147, 150
landmines 33, 49 laser 30, 243–245 Latex 49 likelihood ratio 14, 97–99 luminescence 243–247
central limit theorem 135 complexity 10, 12, 15, 17, 27, 44, 66, 157, 158, 207–209, 212, 215, 223, 236, 258 cyclic autocorrelation 195, 198, 200, 203 cyclic cross correlation 195, 198
Mach-Zehnder interferometer 151, 152, 155, 157 Markov 117, 120 Monte Carlo 129, 132, 215 multichannel filter 249, 257, 258 multichannel systems 107 multifractal analysis 169, 171, 172, 175, 178, 180, 181, 190 multi-perspective classification 1, 16, 18
dynamic spectrum analysis 249 dynamics 207–224, 226, 227, 229–236 electrochemical sensors 63–65, 67 electronic tongue 63–65, 72, 74–76, 78–87, 89, 90 expectation 119, 132–135 filtering 5, 12, 39, 97, 129–132, 139–142, 146, 215, 251, 255–258 fingerprints 90, 243–247 finite support 134 fractal boundaries 169–171 Finite Zak Transform (FZT) 195, 198, 199 Gauss-Markov 97, 100, 101 Golay pair 203 ground penetrating radar 29, 34–36 Improvised Explosive Decices (IED) 49, 50 image fusion 44, 107–109 imaging 1, 2, 6, 29, 30, 33, 36–39, 41–45, 75, 82–88, 107, 108, 243–247 importance sampling 129, 132–134, 136, 141–144, 146 instant spectrum algorithms & analyzers 249–251
NATO 49 natural processes 207, 208, 219, 231, 236, 238 Neyman-Pearson 97, 98 noise canceling 97, 99 noise model 97, 99 non-Gaussian 131, 144, 215 nonlinear statistical signal processing 129, 147 nonlinear time series analysis 208, 213, 218, 225 nonstationary noise 97 orthonormal basis 154, 196 own-ship noise 97, 101, 105 particle filter 129, 132, 139–142, 144, 145 passive bistatic radar 29, 30, 44, 45 pattern recognition 14, 63 permutation matrix 195, 196, 199, 203 permutation waveforms 199 photometry 49 point mass 140, 141
263
264
INDEX
pointwise exponents 169, 171, 172, 174, 175 posterior distribution 130, 131, 134, 135, 137, 140, 141, 143, 146, 147 prediction distribution 132 probability distribution 77, 129, 130, 133, 138–141, 211 quantum computer 151, 157–161, 166 quantum information 151–154, 163, 165, 166 quantum mechanics 151, 153, 155, 156, 161, 163, 164, 166 radar imaging 1, 29, 30, 36 recursive algorithms 149–151 recursive detection 97, 98, 102 regularized energy minimization 107 remote sensing 29, 49, 50
sequential detection 97, 98 signal model 97, 98, 101, 103, 105 sliding spectrum algorithms & analyzers 249, 250 spectroscopy 49, 64, 65, 243, 245 spectrum sliding analysis 249, 259 star permutation 195, 196, 200–205 state-space model 129, 137, 138, 150 superresolution 32, 107, 109, 111, 117, 124 time reversal matrix 196 time-resolved 243–247 waveforms 3–5, 31, 34, 37, 41, 46, 195, 199 Zak space ZS) 195, 196, 198–200 Zak transform 195