Modeling, Estimation and Optimal Filtering in Signal Processing
Modeling, Estimation and Optimal Filtering in Signal ...
141 downloads
1286 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Modeling, Estimation and Optimal Filtering in Signal Processing
Modeling, Estimation and Optimal Filtering in Signal Processing
Mohamed Najim
First published in France in 2006 by Hermes Science/Lavoisier entitled “Modélisation, estimation et filtrage optimal en traitement du signal” First published in Great Britain and the United States in 2008 by ISTE Ltd and John Wiley & Sons, Inc. Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address: ISTE Ltd 6 Fitzroy Square London W1T 5DX UK
John Wiley & Sons, Inc. 111 River Street Hoboken, NJ 07030 USA
www.iste.co.uk
www.wiley.com
© ISTE Ltd, 2008 © LAVOISIER, 2006 The rights of Mohamed Najim to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988. Library of Congress Cataloging-in-Publication Data Najim, Mohamed. [Modélisation, estimation et filtrage optimal en traitement du signal. English] Modeling, Estimation and Optimal Filtering in Signal Processing / Mohamed Najim. p. cm. Includes bibliographical references and index. ISBN: 978-1-84821-022-6 1. Electric filters, Digital. 2. Signal processing--Digital techniques. I. Title. TK7872.F5N15 2008 621.382'2--dc22 2007046085 British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN: 978-1-84821-022-6 Printed and bound in Great Britain by Antony Rowe Ltd, Chippenham, Wiltshire.
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Chapter 1. Parametric Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2. Discrete linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1. The moving average (MA) model . . . . . . . . . . . . . . . . . 1.2.2. The autoregressive (AR) model. . . . . . . . . . . . . . . . . . . 1.3. Observations on stability, stationarity and invertibility . . . . . . . . 1.3.1. AR model case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2. ARMA model case . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4. The AR model or the ARMA model? . . . . . . . . . . . . . . . . . . 1.5. Sinusoidal models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1. The relevance of the sinusoidal model. . . . . . . . . . . . . . . 1.5.2. Sinusoidal models. . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6. State space representations . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2. State space representations based on differential equation representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.3. Resolution of the state equations . . . . . . . . . . . . . . . . . . 1.6.4. State equations for a discrete-time system . . . . . . . . . . . . 1.6.5. Some properties of systems described in the state space . . . . 1.6.5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.5.2. Observability . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.5.3. Controllability. . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.5.4. Plurality of the state space representation of the system . 1.6.6. Case 1: state space representation of AR processes . . . . . . . 1.6.7. Case 2: state space representation of MA processes. . . . . . . 1.6.8. Case 3: state space representation of ARMA processes . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. 1 . 5 . 9 11 16 17 19 20 21 21 23 27 27
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
28 29 30 32 32 32 33 33 34 35 36
vi
Modeling, Estimation and Optimal Filtering in Signal Processing
1.6.9. Case 4: state space representation of a noisy process . . . . . 1.6.9.1. An AR process disturbed by a white noise . . . . . . . . 1.6.9.2. AR process disturbed by colored noise itself modeled by another AR process. . . . . . . . . . . . . . . . . . . . . . . . . 1.6.9.3. AR process disturbed by colored noise itself modeled by a MA process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 39 . . . . 39 . . . . 40 . . . . 42 . . . . 44 . . . . 44
Chapter 2. Least Squares Estimation of Parameters of Linear Models . . . . 49 2.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Least squares estimation of AR parameters . . . . . . . . . . . . . . 2.2.1. Determination or estimation of parameters? . . . . . . . . . . 2.2.2. Recursive estimation of parameters . . . . . . . . . . . . . . . 2.2.3. Implementation of the least squares algorithm . . . . . . . . . 2.2.4. The least squares method with weighting factor . . . . . . . . 2.2.5. A recursive weighted least squares estimator. . . . . . . . . . 2.2.6. Observations on some variants of the least squares method . 2.2.6.1. The autocorrelation method. . . . . . . . . . . . . . . . . 2.2.6.2. Levinson’s algorithm . . . . . . . . . . . . . . . . . . . . 2.2.6.3. The Durbin-Levinson algorithm . . . . . . . . . . . . . . 2.2.6.4. Lattice filters . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.6.5. The covariance method . . . . . . . . . . . . . . . . . . . 2.2.6.6. Relation between the covariance method and the least squares method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.6.7. Effect of a white additive noise on the estimation of AR parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.6.8. A method for alleviating the bias on the estimation of the AR parameters . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.7. Generalized least squares method . . . . . . . . . . . . . . . . 2.2.8. The extended least squares method . . . . . . . . . . . . . . . 2.3. Selecting the order of the models . . . . . . . . . . . . . . . . . . . . 2.4. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
49 49 50 56 59 61 63 63 65 67 72 75 77
. . . . 78 . . . . 82 . . . . .
. . . . .
. . . . .
. . . . .
85 93 99 100 103
Chapter 3. Matched and Wiener Filters . . . . . . . . . . . . . . . . . . . . . . . 105 3.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . 3.2. Matched filter . . . . . . . . . . . . . . . . . . . . . 3.2.1. Introduction . . . . . . . . . . . . . . . . . . . 3.2.2. Matched filter for the case of white noise. . 3.2.3. Matched filter for the case of colored noise 3.2.3.1. Formulation of problem . . . . . . . . . 3.2.3.2. Physically unrealizable matched filter
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
105 106 106 107 112 112 115
Table of Contents
3.2.3.3. A matched filter solution using whitening techniques . . . . 3.3. The Wiener filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2. Formulation of problem . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3. The Wiener-Hopf equation . . . . . . . . . . . . . . . . . . . . . . . 3.3.4. Error calculation in a continuous physically non-realizable Wiener filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.5. Physically realizable continuous Wiener filter. Rational spectra case 3.3.6. Discrete-time Wiener filter . . . . . . . . . . . . . . . . . . . . . . . 3.3.6.1. Finite impulse response (FIR) Wiener filter . . . . . . . . . . 3.3.6.2. Infinite impulse response (IIR) Wiener filter . . . . . . . . . 3.3.7. Application of non-causal discrete Wiener filter to speech enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.7.1. Modified filter expression. . . . . . . . . . . . . . . . . . . . . 3.3.7.2. Experimental results . . . . . . . . . . . . . . . . . . . . . . . . 3.3.7.3. Enhancement using combination of AR model and non-causal Wiener filter . . . . . . . . . . . . . . . . . . . . . . . . 3.4. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
. . . . .
116 119 119 120 121
. . . . .
124 127 133 134 138
. 139 . 139 . 143 . 143 . 146
Chapter 4. Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 4.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. Recursive least squares algorithm. . . . . . . . . . . . . . . . 4.2.1. Exact RLS method . . . . . . . . . . . . . . . . . . . . . 4.2.2. Forgetting factor RLS method . . . . . . . . . . . . . . 4.3. The least mean squares algorithm. . . . . . . . . . . . . . . . 4.4. Variants of the LMS algorithm . . . . . . . . . . . . . . . . . 4.4.1. Normalized least mean squares (NLMS) . . . . . . . . 4.4.2. Affine projection algorithm (APA) . . . . . . . . . . . 4.5. Summary of the properties of the different adaptive filters . 4.6. Application: noise cancellation . . . . . . . . . . . . . . . . . 4.7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
149 151 151 155 156 167 167 172 178 179 182
Chapter 5. Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 5.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. Derivation of the Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1. Statement of problem. . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2. Propagation step: relationship between xˆ (k 1 / k ) and xˆ (k / k ) ; recurrence relationship between the error covariance matrices P(k + 1 / k) and P(k / k) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3. Update step: relationship between xˆ (k / k ) and xˆ (k / k 1) ; recursive relationship between P(k / k) and P(k / k – 1) . . . . . . . . . .
. 185 . 186 . 186
. 187 . 189
viii
Modeling, Estimation and Optimal Filtering in Signal Processing
5.2.4. Expression of the Kalman filter gain. . . . . . . . . . . . . . . . 5.2.5. Implementation of the filter . . . . . . . . . . . . . . . . . . . . . 5.2.6. The notion of innovation. . . . . . . . . . . . . . . . . . . . . . . 5.2.7. Derivation of the Kalman filter for correlated processes . . . . 5.2.8. Relationship between the Kalman filter and the least squares method with forgetting factor. . . . . . . . . . . . . . . . . . . . . . . . 5.3. Application of the Kalman filter to parameter estimation. . . . . . . 5.3.1. Estimation of the parameters of an AR model . . . . . . . . . . 5.3.2. Application to speech analysis . . . . . . . . . . . . . . . . . . . 5.4. Nonlinear estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1. Model linearization: linearized Kalman filter . . . . . . . . . . 5.4.2. The extended Kalman filter (EKF) . . . . . . . . . . . . . . . . . 5.4.3. Applications of the EKF . . . . . . . . . . . . . . . . . . . . . . . 5.4.3.1. Parameter estimation of a noisy speech signal. . . . . . . 5.4.3.2. Application to tracking formant trajectories of speech signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
190 194 196 198
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
201 202 202 207 208 209 212 213 214
. . . 218 . . . 220 . . . 221
Chapter 6. Application of the Kalman Filter to Signal Enhancement . . . . . 223 6.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Enhancement of a speech signal disturbed by a white noise . . . . . . . . 6.2.1. State space representation of the noisy speech signal . . . . . . . . . 6.2.2. Speech enhancement procedure . . . . . . . . . . . . . . . . . . . . . 6.2.3. State of the art dedicated to the single-channel enhancement methods using Kalman filtering . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.4. Alternative methods based on projection between subspaces . . . . 6.2.4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.4.2. Preliminary observations . . . . . . . . . . . . . . . . . . . . . . 6.2.4.3. Relation between subspace-based identification methods and the Kalman algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.4.4. Signal prediction using the optimal Kalman filter . . . . . . . 6.2.4.5. Kalman filtering and/or smoothing combined with subspace identification methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.4.6. Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.5. Innovation-based approaches . . . . . . . . . . . . . . . . . . . . . . . 6.2.5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.5.2. Kalman-filter based enhancement without direct estimation of variances Q and R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.5.3. Kalman-filter based enhancement using a suboptimal gain . . 6.2.5.4. Alternative approach to Kalman-filter based enhancement, using the estimation of variances Q and R . . . . . . . . . . . . . . . . .
223 224 224 225 228 233 233 233 234 236 237 238 240 240 241 242 244
Table of Contents
ix
6.3. Kalman filter-based enhancement of a signal disturbed by a colored noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 6.4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 6.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Chapter 7. Estimation using the Instrumental Variable Technique . . . . . . 255 7.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2. Introduction to the instrumental variable technique . . . . . . 7.2.1. Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2. Review of existing instrumental variable methods for the estimation of AR parameters. . . . . . . . . . . . . . . . 7.3. Kalman filtering and the instrumental variable method . . . . 7.3.1. Signal estimation using noisy observations. . . . . . . . 7.3.2. Estimation of AR parameters using the filtered signal . 7.3.3. Estimation of the variances of the driving process and the observation noise. . . . . . . . . . . . . . . . . . . . . . . . . 7.3.4. Concluding observations. . . . . . . . . . . . . . . . . . . 7.4. Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1. Preliminary observations . . . . . . . . . . . . . . . . . . 7.4.2. Comparative study. Case 1: white additive noise . . . . 7.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 255 . . . . . . . 257 . . . . . . . 257 . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
260 261 262 264
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
265 266 267 267 271 281 281
Chapter 8. H Estimation: an Alternative to Kalman Filtering? . . . . . . . . 285 8.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2. Introduction to H estimation . . . . . . . . . . . . . . . . . . . . 8.2.1. Definition of the H norm . . . . . . . . . . . . . . . . . . . 8.2.2. H filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3. Riccati equation-based recursive solution of H filtering 8.2.4. Review of the use of H filtering in signal processing . . 8.3. Estimation of AR parameters using H filtering . . . . . . . . . 8.3.1. H filtering for the estimation of AR parameters . . . . . 8.3.2. Dual H estimation of the AR process and its parameters 8.4. Relevance of H filtering to speech enhancement . . . . . . . . 8.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
285 286 287 287 291 295 296 296 299 304 310 311
Chapter 9. Introduction to Particle Filtering . . . . . . . . . . . . . . . . . . . . 315 9.1. Monte Carlo methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 9.2. Sequential importance sampling filter . . . . . . . . . . . . . . . . . . . . . 321
x
Modeling, Estimation and Optimal Filtering in Signal Processing
9.3. Review of existing particle filtering techniques . . . . . . . . . . . . . . . 326 9.4. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Appendix A. Karhunen Loeve Transform . . . . . . . . . . . . . . . . . . . . . . 335 Appendix B. Subspace Decomposition for Spectral Analysis . . . . . . . . . . 341 Appendix C. Subspace Decomposition Applied to Speech Enhancement . . . 345 Appendix D. From AR Parameters to Line Spectrum Pair . . . . . . . . . . . 349 Appendix E. Influence of an Additive White Noise on the Estimation of AR Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Appendix F. The Schur-Cohn Algorithm. . . . . . . . . . . . . . . . . . . . . . . 361 Appendix G. The Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Appendix H. An Alternative Way of Understanding Kalman Filtering. . . . 371 Appendix I. Calculation of the Kalman Gain using the Mehra Approach . . 373 Appendix J. Calculation of the Kalman Gain (the Carew and Belanger Method) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Appendix K. The Unscented Kalman Filter (UKF) . . . . . . . . . . . . . . . . 385 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Preface
“There is no royal road to science, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits.” Karl Marx
The core of this book is taken from a series of lectures I gave in Shanghai in 1983 and 1985. On this occasion, the Chinese Association of Science and Technology (CAST) gave to me a calligraphic version of philosopher Karl Marx’s famous quotation. I have always found the above quotation very pertinent and often used it to perk up my students in times of discouragement. I would like to thank Professor Yong Xiang Lu, President of the Chinese Sciences Academy, who encouraged me to use the quotation at the beginning of this book.
xii
Modeling, Estimation and Optimal Filtering in Signal Processing
The performances of computers in general, and signal processors in particular, have been constantly evolving towards smaller and faster versions with higher and higher storage capabilities. This improved performance allows today’s engineer to implement algorithms that are ever more complex1. Through this book, we wish to give the reader the significant results hitherto obtained in the field of parametric signal modeling, and to present some new approaches. To the best of our knowledge, these results are dispersed through various textbooks, and there is no single compendium grouping them all together. Moreover, a large part of these results are only presented in journal articles often inaccessible to the average student. This book attempts to fill this gap. We will mostly consider signal estimation/identification, traditionally grouped in the domain of control engineering. Today, however, this parameter estimation/identification deserves a stature of its own in view of its recent maturity. One example where the deserved importance was given to it is the tri-annual conferences organized by the International Federation of Automatic Control (IFAC) and called the “Symposium on Identification and System Parameter Estimation”. A recent trend during these conferences has been the increasing importance given to challenges lying in the field of signal processing. Over the past 15 years or so, identification in signal processing has undergone fervent activity, even a renaissance. One of the fundamental differences between control engineering and signal/image processing comes from the nature of the input signal. In the former, the input is known whereas in the latter, it is always unknown. Several problems specific to signal processing have their roots in this difference. For example, the rapid development of digital communications has led to a renewed interest in the identification of Single Input Multiple Output (SIMO) and Multiple Input Multiple Output (MIMO) systems. Additionally, equalization and blind-deconvolution issues are also increasingly important. This rise in interest is best attested by the number and quality of related articles in the IEEE Transactions on Signal Processing and the ICASSP conferences. This book is organized into 9 chapters. Chapter 1 starts with a brief introduction and review of the basic theory of discrete linear models, notably the AR and ARMA models. We then analyze the shortcomings of these models and present an alternative composed of sinusoidal models and ARCOS models to characterize periodic signals. 1 The term “complex” here includes both the computation cost and the abstraction level involved in the associated mathematical approaches.
Preface
xiii
Once we have chosen the model and its order, the estimation of its parameters has to be addressed. In Chapter 2, we present the least squares method and its variants in signal processing, namely the autocorrelation method and the covariance method for the AR model. For a non-recursive case, we then define the Normal (or Yule-Walker) equations. Thereafter, we take up the recursive forms of the least squares algorithm and consider the lower-complexity algorithms such as the Levinson and Durbin-Levinson methods. This latter topic also serves as the framework in which we introduce the reflection coefficients and the lattice algorithms. However, as we will note in the appropriate place, the least squares method gives biased estimations for correlated measurements. To get unbiased estimates, we introduce the generalized least squares method and the extended least squares method. Finally, we analyze the effect of an additive white measurement noise on the least squares estimation of AR parameters. We also present a review of existing methods used to compensate for the influence of the measurement noise. In Chapters 3 to 5, we take up parameter estimation using optimal filters and adaptive filters such as the LMS, RLS and APA. For the discrete-time case, we present the relation linking the Wiener filter to the least squares method. To put R. E. Kalman’s contribution into perspective, we first present N. Wiener’s original derivation [1] for continuous signals, leading to the Wiener Hopf integral equation. The non-recursive nature of the Wiener filter led Kalman to propose an alternative recursive solution, the Kalman filter [2]. This alternative solution consisted of the transformation of the integral equation into a differential stochastic equation, for which he then found a recursive solution. In Chapter 5, we derive the Kalman filter using an algebraic approach. Even though this algebraic approach may lack a certain elegance, it is based on fundamental notions of linear algebra. This presentation can form the starting point for the interested reader, leading him to the innovation-based presentation of the Kalman filter presented by Kailath et al. in their book Linear Estimation [3]. We then present the Extended Kalman Filter (EKF) for nonlinear cases. This extended filter is useful when we have to carry out the joint estimation of the desired signal and the corresponding model parameters associated with it. The EKF is, however, not the only possible solution for nonlinear estimation cases. The purpose of the following chapters is to present other solutions, treating a case often seen in signal processing: when the covariance matrices of the driving process Q and the noise R are not known a priori. Thus, in Chapter 6, we restate the classic methods proposed by Carew-Belanger and R. K. Mehra in the domain of control in the early 1970s. The use of sub-space approaches for identification frees us from the constraint of having known covariance matrices of Q and R; this allows us to see the signal enhancement
xiv
Modeling, Estimation and Optimal Filtering in Signal Processing
problem as a realization issue. We then analyze the relevance of these approaches to enhance a signal. As the test signal, we choose the speech signal because it combines features such as quasi-periodicity in the case of vowels and randomness for consonants. In addition, approaches such as Linear Predictive Coding (LPC), initially derived for speech signals, found widespread use as a generic technique in many other areas. This is also the case for the wavelets in the area of seismic signals. Chapter 7 concerns parameter estimation techniques using instrumental variables. They are an alternative to the least squares methods and provide unbiased estimations. Instrumental variables techniques require the formulation of an intermediate matrix which is constructed, for example, using the system input in the case of control. However, information on the input is not available in speech processing, and we thus propose an alternative approach based on two interactive Kalman filters. Moreover, to use the optimal Kalman filter, we have to rely on a number of assumptions which cannot always be respected in real cases. The filter known as the H filter makes it possible to relax these assumptions. More specifically, this concerns the nature of the random processes and the necessity of knowing the variance matrices a priori. Thus, Chapter 8 is dedicated to this filter. We first recall the work done and results obtained so far, as concerns significant applications in signal processing, as well as some recent solutions. We then compare the H2 and H-based approaches in the field of signal enhancement. This comparison will moreover justify our decision to grade in this book the LMS algorithm in the category of optimal filters. For this justification, we use the results obtained at the Stanford school, wherein it was shown that this filter is H-optimal. To further ease the statistical assumptions, we present particle filtering as an alternative to Kalman filtering in Chapter 9. The work presented in this book is the result of classes given in several universities and the work carried out by our research group. I have been fortunate to share my wishes with my PhD students and fellow professors. I would like to mention the following people for their contributions in writing this book: Eric Grivel, who has been involved in this adventure since its outset; without his continuous availability, I would not have been able to complete this book; Marcel Gabrea, presently at the Ecole de Technologie Supérieure de Montréal in Canada; and David Labarre to whom I am indebted for the material from his PhD dissertation, which he provided to write Chapters 8 and 9. I would also like to extend my gratitude to the following people for their constructive criticism and
Preface
xv
suggestions during the writing of this book: Audrey Giremus, Pierre Baylou, Ali Zohlgadri, Nicolai Christov from the University of Lille and Ezio Todini from the Unviersity of Bologne. This book is the first in a new series being launched by ISTE, under my direction, concerning signal processing. The first part of this book is suitable as a textbook for students in the first year of Masters programs. The other chapters are the result of recent work performed on the different aspects of model parameter estimation in realistic scenarios. Even though this second part is the core of this book, it is accessible to a wide readership, assuming that the fundamental theory is known. Mohamed NAJIM [1] N. Wiener, Extrapolation, Interpolation and Smoothing at Stationary Time Series, Wiley and Sons, New York, 1949. [2] R. E. Kalman, “A new approach to linear filtering and prediction problems”, ASME, Series D, Journal of Basis Eng., vol. 82, pages 35-45, March 1960. [3] T. Kailath, A. Sayed and B. Hassibi, Linear Estimation, Prentice Hall, 2000.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Chapter 1
Parametric Models
1.1. Introduction A “signal” corresponds to a physical quantity that varies with time, space, etc. A wide range of parameters are converted to electrical signals. In an industrial process, for example, sensors allow the translation of parameters such as temperature, pressure, liquid and gas flow rates, etc., into electrical signals. The variation of the surrounding air pressure due to a person speaking through a microphone is translated into an electrical signal. Ground pressure variations can sometimes result from controlled events, such as in the case of artificial seismology when it is used for oil exploration. However, variations in ground pressure could also result from uncontrolled events such as earthquakes. In such a case, seismographs provide electrical signals to characterize the phenomenon. Signals, the closest approximations of physical magnitudes, can be deterministic or random processes. If the signals are deterministic, they can either be periodic or non-periodic, or combinations of periodic and random components. Their spectral content can be studied using transformations such as the Fourier transform. Such representations are said to be “non-parametric”, whether they are in the time or frequency domain. Their major advantage comes from the fact that they are easily exploitable. However, the development of digital processors – their spread on the one hand and, on the other, the influence of identification methods developed for the analysis of systems – leads electronics engineers to use “economical” representations of
2
Modeling, Estimation and Optimal Filtering in Signal Processing
signals, i.e., representations which use only a finite number of parameters, such as the ARMA (autoregressive moving average), AR (autoregressive) and MA (moving average) models. The AR model is highly popular. In speech processing, several coding schemes based on the Code-Excited Linear Prediction (CELP, [15]) have been used. These rely on the 10th – or 16th – order AR model of the speech signal (Figures 1.1 and 1.2). For speech enhancement, several approaches have been developed. These will be discussed in Chapter 6. In speech recognition, some of the signal’s characteristics are extracted using parametric approaches. In mobile communications, the Rayleigh fading channel has a U-shaped spectrum (Figure 1.3). Even though it is bandlimited in frequency, it is sometimes approximated using a 1st or 2nd order autoregressive model, mainly because of the simplicity of this model. Baddour et al. have, moreover, analyzed the relevance of very high-order autoregressive models for channel simulations [6] [7]. AR-based parametric approaches also find widespread use in spectral analysis. For example, while dealing with biomedical signals to analyze a cardiorespiratory system consisting of fluctuations in the heart beat, respiratory movements and blood pressure, an AR-based spectral analysis of the electrocardiogram (ECG) allows the detection of changes in the frequency-domain properties of the ECG [4]. Thus, the operation of the cardiorespiratory system and its variations over time can be studied. Similarly, fetal breathing movements can be studied using the signals recorded by a series of electrodes placed in the mother’s womb [2]. Finally, the characterization of the frequency response of electroencephalogram (EEG) signals using an AR model could be foreseen for the detection of diseases [23]. For example, this approach is used to determine the brain areas responsible for epileptic seizures [35].
Parametric Models
150
100
amplitude
50
0
-50
-100
-150 0
0.005
0.01
0.015 time
0.02
0.025
0.03
80
70
amplitude
60
50
40
30
20
10
0
0.05
0.1
0.15
0.2
0.25 freq
0.3
0.35
0.4
Figure 1.1. Plot of an unvoiced speech signal and its power spectral density (PSD)
0.45
0.5
3
4
Modeling, Estimation and Optimal Filtering in Signal Processing
Figure 1.2. Plot of a voiced speech signal and its PSD
power spectral density of a Rayleigh channel
Parametric Models
5
45
40
35
30
25
20
15
10
5
0 -0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
normalized frequency
Figure 1.3. Power spectral density of a Rayleigh fading channel (normalized maximum Doppler frequency equal to 0.15)
However, the ARMA models are only well-adapted for signals with dominant random components. A new model, which also contains sinusoidal components, has to be proposed to accommodate the frequency-domain singularities of signals with strongly periodic segments. In this chapter, we will first consider the characterization of the MA, AR and ARMA models. Thereafter, we will introduce sinusoidal models. Finally, we will briefly review the state space representation of systems. We will also review the properties of discrete linear systems, such as observability and controllability. This representation will be used in Chapters 5 to 8. 1.2. Discrete linear models The use of models such as ARMA can be traced back to the beginning of the 20th century. The ARMA model was introduced by Yule in the 1920s for the study of time series [49]. It is based around the central idea that a time series containing large amounts of iterative or correlated information can be obtained by linear-filtering a series of independent random variables. When these variables are assumed to be zero-mean Gaussian random variables with constant variances, as is mostly the case, we speak of a white Gaussian noise. The theoretical framework of these models can be attributed to the mathematician Wold at the end of the 1930s. Wold’s decomposition theorem states
6
Modeling, Estimation and Optimal Filtering in Signal Processing
that any regular and stationary process can be expressed as the sum of two orthogonal processes, one deterministic and the other random [47]. Signals can also be modeled using alternative approaches, including those using exponential or other types of functions [12] [34]. Let us consider an analog signal y(t) represented by p+1 samples corresponding to time instants kT, (k – 1)Ts, …, (k – p)Ts where Ts is the sampling period.
^y (k )
y (k 1) " y (k p)`
Supposing that this signal is generated by using a white process, denoted u and characterized by q+1 samples:
^u (k )
u (k 1) " u (k q )`
A discrete linear model of the signal can be defined as a linear combination between the samples ^y (n)`n k ,...k p and ^u (n)`n k ,...,k q , which can be expressed as follows: a 0 y (k ) " a p y (k p)
b0 u ( k ) " b q u ( k q )
[1.1]
This makes up the ARMA1 model, which is said to be of the order (p, q), where ^ ai `i 0, ..., p and ^ bi `i 0, ..., q are called the transversal parameters. Equation [1.1] is attractive because, instead of saving an infinite number of samples necessary for representing the signal, it only uses a finite number of characteristic parameters and allows for the reconstitution of the signal using this finite number. The conditions for this reconstitution will be discussed later. In the rest of this chapter, we will use the convention whereby a0 = 1. Equation [1.1] is thus changed to:
1 A process y (k ) is said to be autoregressive moving average with exogenous input (ARMAX) if it is defined as follows:
a0 y (k ) a1 y (k 1) " a p y (k p)
b0u (k ) b1u (k 1) " bqu (k q ) c0v(k ) c1v(k 1) " cr v(k r )
where the driving process u (k ) is zero-mean white, and v (k ) is an exogenous input applied to the system. This model is often used in fields such as control engineering and econometrics.
Parametric Models
y (k )
p
q
i 1
i 0
¦ a i y (k i ) ¦ bi u (k i )
7
[1.2]
The bilateral z-transform, Y(z), of the sequence y(k) is defined by: Y ( z)
f
¦ y (k ) z k
[1.3]
k f
where z is a complex variable included in the convergence-domain of the series. Noting that the z-transform of y(k – l) respects: f
¦ y (k l ) z k
z l Y ( z ) for all integer values of l,
k f
equation [1.2] of the ARMA models, using the z-transformation of its two constituent elements, becomes:
Y ( z) " a p z pY ( z)
b 0U ( z ) " b q z q U ( z )
The transfer function of the ARMA model is thus given as follows: Y ( z) U ( z)
b0 b1 z 1 " bq z q 1 a1 z 1 " a p z p
B( z ) A( z )
H ( z) .
[1.4]
H(z) is known as the transfer function.
Figure 1.4. Transfer function of the ARMA model
The ARMA model can thus be understood as a filter with a transfer function H(z). This filter is fed with an input u(k) whose z-transform is denoted U(z), and it delivers an output signal y(k) whose z-transform is denoted Y(z). The polynomials A(z) and B(z) are characterized by the location of their zeros in the z-plane. The zeros of B(z) are the same as the zeros of H(z) while the zeros of A(z) are the poles of H(z) (Figure 1.5).
8
Modeling, Estimation and Optimal Filtering in Signal Processing
0.8 0.6
imaginary part
0.4 0.2
zeros
0
poles
-0.2 -0.4 -0.6 -0.8
-1
-0.5
0
0.5
1
power spectral density of the ARMA process
real part 30
20
10
0
-10
-20
-30 -0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
normalized frequency
0.2
0.3
0.4
0.5
Figure 1.5. Example of an ARMA process: the location of the zeros and poles, and the power spectral density of the corresponding process
At this stage, the following questions arise: and
^ bi `i
0, ..., q
– if we consider different realizations of the signal, do parameters
^ a i `i
0, ..., p
– what is the minimum number of parameters
^ a i `i
0, ..., p
needed to satisfactorily represent the signal? and
^ bi `i
0, ..., q
remain the same?
– is it possible to obtain a model in which some of the parameters ^ a i `i
^ bi `i
0, ..., q
are zero?
– how can the order (p, q) of the model be determined?
0, ..., p
or
Parametric Models
9
To better understand the influence of these parameters, we will first take up two specific cases of the ARMA model: the models known as MA and AR. 1.2.1. The moving average (MA) model
For the MA model, we will suppose that all the parameters ^ a i `i
0, ..., p
are zero,
except a0 = 1. Model [1.1] is thus named “the moving average” model, and is expressed simply as follows: b0 u (k ) b1u (k 1) " bq u (k q)
y (k )
[1.5]
Since A(z) = 1, this model is characterized by the location of its zeros in the z-plane, giving rise to the name of the “all-zero model” (Figure 1.6). H ( z)
b0 b1 z 1 ! bq z q
[1.6]
Figure 1.6. Transfer function of the MA model
The MA model can also be interpreted as the output of a Finite Impulse Response (FIR) filter excited by an input signal u(k). H(z) is the transfer function of this filter. Figures 1.7 and 1.8 show the shape of the power spectral density of a MA process characterized by the two following complex conjugate zeros within the unit circle in the z-plane: z1
§ f · R exp jT 0 R exp¨¨ j 2S 0 ¸¸ fs ¹ ©
z2
§ f · R exp jT 0 R exp¨¨ j 2S 0 ¸¸ fs ¹ ©
and: [1.7]
In the following, we will pay attention to the evolution of the shape spectrum given by a MA model as its zeros move closer to the unit circle in the z-plane.
10
Modeling, Estimation and Optimal Filtering in Signal Processing
Given the properties of u(k), the above equation entails a study of the square of § f · ¸ . First, we set the normalized angular the modulus of H(z) with z exp¨¨ j 2S ¸ f s ¹ © frequency T 0 to S / 3 and vary the magnitude of the zeros from 0.09 and 0.99 in steps of 0.075. The modification in the location of the zeros in the z-plane can thus be observed, as can the square of |H(z)|.
0.8
0.6
imaginary part
0.4
0.2
0
increasing zero modulus
-0.2
-0.4
-0.6
-0.8
-1
-0.5
0
0.5
1
real part
transfer function modulus (in dB)
10
increasing zero modulus
5 0 -5 -10 -15 -20 -25 -30 -35 -40 -0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
normalized frequency
Figure 1.7. Representation of |H(z) |2 with z
§ f · ¸ , for zeros of H(z) with the same exp¨¨ j 2S ¸ f s ¹ ©
argument and different magnitudes. MA process
Parametric Models
11
If the modulus of the zeros is kept constant at 0.95 and if T 0 varies from S / 6 to 5S / 6 in steps of S / 6 , we can observe how the minima of |H(z)| change (Figure 1.8).
15 0.8 10 0.6 5 0
0.4 0.2 0
-5
-0.2 -10 -0.4 -15 -0.6 -20 -0.8 -25
-1 -30 -0.5
-0.4
-0.5 -0.3
0
-0.2
-0.1
0
0.5 0.1
0.2
1 0.3
0.4
0.5
§ f · ¸ , for zeros of H(z) with the same exp¨¨ j 2S f s ¸¹ ©
Figure 1.8. Representation of |H(z)| 2 with z
magnitude and different arguments. MA process
1.2.2. The autoregressive (AR) model
When the transversal parameters
^ bi `i
0, ..., q
are all zero, except b0 = 1, the
model is called the “autoregressive” model and is expressed simply as follows: y (k )
p
¦ a i y (k i) u (k )
[1.8]
i 1
In equation [1.4], the polynomial B(z) reduces to a constant value of B(z) = 1, and H(z) contains only poles, giving rise to the name “all-pole model”. The transfer function H(z) can be written as follows, to highlight the poles ^ p i `i 1, ..., p : H ( z)
1
1
1 pi z ¦ a i z p
i 1
1
p
i 0
[1.9] i
12
Modeling, Estimation and Optimal Filtering in Signal Processing
The AR model can be thought of in the following ways: p
– the term ¦ a i y (k i ) in equation [1.8] can be seen as a prediction of y(k) i 1
based on the last p values of the process, and u(k) as an error term in such a prediction; – equation [1.8] can itself be seen as a difference equation and a filtering of signal u(k) by an Infinite Impulse Response (IIR) filter.
Figure 1.9. Transfer function of the AR model
The location of the poles in the z-plane completely defines the filter and, consequently, the model associated with this filter. If we suppose that the driving process u(k) and the signal y(k) are real, the poles ^ p i `i 1, ..., p are either real or
complex conjugates since this is the only case in which the coefficients ^ a i `i
1, ..., p
of equation [1.8] are real. Let us then consider a second-order AR model in which u(k) is a real zero-mean Gaussian white process with a variance of 1. If the transfer function allows for two complex conjugate poles in the z-plane, we can write: p1
§ f · R exp jT 0 R exp¨¨ j 2S 0 ¸¸ fs ¹ ©
and: p1*
§ f · R exp jT 0 R exp¨¨ j 2S 0 ¸¸ fs ¹ ©
[1.10]
The Fourier transform Y(f) of y(k) will then be equal to the product of U(f) and § f · ¸ , where U(f) is the Fourier transform of u(k), i.e.: H(z) at z exp¨¨ j 2S f s ¸¹ ©
Parametric Models
Y f H f U f
Uf
13
[1.11]
ª § § f ·º f ·º ª ¸» «1 p1* exp¨ j 2S ¸» «1 p1 exp¨¨ j 2S ¸ ¨ f s ¸¹»¼ f s ¹»¼ «¬ «¬ © ©
Equivalently: Y f
Uf ª § f f0 «1 R exp¨¨ j 2S fs © ¬«
§ ·º ª f f0 ¸» «1 R exp¨ j 2S ¨ ¸ fs © ¹¼» ¬«
·º ¸» ¸ ¹¼»
[1.12]
The power spectral density of y(k) depends on the values of R and the normalized angular frequency T0 2S
f0 , and could show a more or less sharp fs
resonance. This power spectral density will have the same form as the square of the magnitude of the corresponding H(z) with z
§ f · exp¨¨ j 2S ¸¸ . f s ¹ ©
To illustrate this point, let us first fix the normalized angular frequency T 0 to S / 3 , and let us vary R from 0 to 0.99 in steps of 0.075. The evolution of the
location of the different poles in the z-plane, and of |H(z) |2 in such a case is shown in Figure 1.10.
We notice that two resonances appear, at normalized frequencies of rT 0 / 2S r1 / 6 . These resonances sharpen as R approaches the value 1.
14
Modeling, Estimation and Optimal Filtering in Signal Processing
0.8 0.6
imaginary part
0.4 0.2 0 -0.2
increasing pole modulus
-0.4 -0.6 -0.8
-1
-0.5
0
0.5
1
real part
transfer function modulus (in dB)
40 35 30 25 20 15 10 5 0
increasing pole modulus
-5 -10 -0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
normalized frequency
Figure 1.10. Representation of |H(z)| 2 with z
§ f · ¸ , for poles of H(z) with the same exp¨¨ j 2S f s ¸¹ ©
argument and different magnitudes. AR process
If R is maintained constant at 0.95 and if T 0 varies from S / 6 to 5S / 6 in steps of S / 6 , we can see that the resonances appear successively at normalized frequencies which are multiples of r1 / 6 (Figure 1.11).
Parametric Models
15
0.8 0.6
imaginary part
0.4
changing the pair of the complex conjugate poles
0.2 0 -0.2 -0.4 -0.6 -0.8
-1
-0.5
0
0.5
1
real part
transfer function modulus (in dB)
30
25
20
15
10
5
changing the pair of the complex conjugate poles
0
-5
-10
-15 -0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
normalized frequency
Figure 1.11. Representation of |H(z)| 2 with z
§ f · ¸ , for poles of H(z) with the same exp¨¨ j 2S f s ¸¹ ©
magnitude and different arguments. AR process
Moreover, the variance of the driving white Gaussian process u(k) has an effect on the power spectral density of the AR process y(k): it undergoes an upward shift as the variance of u(k) goes from 1 to 100 (Figure 1.12).
16
Modeling, Estimation and Optimal Filtering in Signal Processing
50
40
30
20
10
0
-10
-20
-30 -0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Figure 1.12. Effect of the variance of the driving process on the PSD of the AR process. Representation of the mean spectral value and the PSD
1.3. Observations on stability, stationarity and invertibility
Stability and causality considerations are imposed on systems due to practical constraints. A system is said to be stable in the bounded-input bounded-output (BIBO) sense if there is a bounded output for every bounded input. A linear invariant system is thus said to be stable if and only if its impulse response h(k) respects the following condition: f
¦ h(k ) f
[1.13]
k f
The transfer function is the z transform of this impulse response; and thus, for every z in the region of convergence: H z
f
¦ h(k ) z k d
k f
f
¦
k f
h( k ) z k
[1.14]
Parametric Models
17
However, on the unit circle in the z-plane: f
¦
f
h( k ) z k
k f
¦ h( k )
[1.15]
f
[1.16]
k f
thus giving rise to: H ( z) z
exp( j 2S
f ) fs
Several “stability criteria” have been proposed to study and characterize stability, such as the criteria of Routh, Jury, Schur-Cohn, etc. Moreover, a system is said to be causal if its response never precedes its input. This corresponds to the philosophical idea of causality: cause precedes consequence. For a causal system, the necessary and sufficient condition for stability is that all the poles of the transfer function be inside the unit circle within the z-plane. 1.3.1. AR model case
The simplest case is that of a real first-order AR model defined as follows: y (k ) ay (k 1)
u ( k ) k
[1.17]
where input u(k) is a zero-mean real white sequence with variance V u2 . The autocorrelation functions of y(k) and u(k) are named, respectively, . denotes the r yy (W ) E^y (n) y (n W )` and ruu (W ) E^u (n)u (n W )` where E^` mathematical expectation. It is clear that ruu (W ) V u2 G W . Moreover, it is also known that autocorrelation ryy (W ) is an even function because process y(k) is real. Starting with the case where W > 0, symmetry allows us to deduce the case where W < 0. Considering equation [1.17], we obtain, on the one hand: r yy (0)
E^y (n) y (n)` E^ay (n 1) u n ay (n 1) u n ` a 2 r yy (0) V u2
[1.18]
18
Modeling, Estimation and Optimal Filtering in Signal Processing
with: ryy (0)
V u2
if a z 1 .
1 a2
[1.19]
On the other hand: r yy (W )
E^y (n) y (n W )` E^ay (n 1) u n y (n W )` ar yy (W 1)
[1.20]
Given relations [1.19] and [1.20], we can show by recurrence that for all W > 0: r yy W
V u2 1 a 2
aW
[1.21]
Therefore, for all values of W r yy W
V u2 1 a
2
a
W
[1.22]
Let us express ryy (0) as a function of ruu (0). To do so, we transform the AR model into a MA model by expressing the samples of signal y(k) of equation [1.17] in a recursive manner. Thus: y (k ) a> ay (k 2) u (k 1)@ u (k )
and: y (k ) a^ a> a y k 3 u k 2 @ u k 1 ` u (k )
Finally: y (k )
u (k ) au (k 1) a 2 u (k 2) a 3 u (k 3) a 4 u (k 4) "
[1.23]
Using equation [1.23] above:
r yy (0)
>1 a
2
@
a 4 " ruu (0)
>1 a
2
@
a 4 " V u2
[1.24]
Parametric Models
19
and combining equations [1.24] and [1.19], we can deduce that: 1 1 a 2
f
¦ a 2k
[1.25]
k 0
This implies that |a|<1. This is the criterion that assures the stability of the model. Using the z transform of the model in equation [1.17], we obtain: Y ( z)
1 1 az 1
[1.26]
U ( z)
This leads to the same conclusion, i.e., |a|<1. All the poles should thus be inside the unit circle in the z-plane. 1.3.2. ARMA model case
As before, the ARMA model can be replaced by an AR model by verifying the equality relation: B( z ) A( z )
1 . A' ( z )
The inverse of the model A' ( z )
[1.27] A( z ) is obtained by polynomial division as B( z )
follows: A' ( z ) 1 D 1 z 1 " D i z i "
The following observations can be made: – a stable ARMA model does not necessarily imply an invertible model. Stability in this case simply means that the zeros of A(z) are inside the unit circle in the z-plane. Inversibility requires the fulfilment of another condition: the zeros of B (z ) should also be inside the unit circle; – the wide-sense stationarity of a random process signifies that the mean value of the process is constant over time. It also means that the autocorrelation function depends only on the relative time shift between the samples of the signal, and not on the original point from which this shift is considered.
20
Modeling, Estimation and Optimal Filtering in Signal Processing
1.4. The AR model or the ARMA model?
Even though most in the signal processing community “prefer” to deal with the AR model, it is not easy to provide a definite answer to this question. This is partly due to the difficulty in estimating the ^ bi `i 0, ..., q parameters in the case of the ARMA models. To determine these parameters, the input u(k) would also have to be determined. However, u(k) is not necessarily easy to access or analyze in the case of signal processing. In the case of identification in control, the input is assumed to be a priori known. The performances of the two models have also been compared in real-world situations and applications. For example, for EEG signals, Bohlin drew the following conclusions for the ARMA model [10] [11]: – orders (p, q) are not easy to determine; – the errors entailed in rounding off are higher in the ARMA model; and there is a real risk of obtaining unstable models. The results for spectral analysis are, however, comparable to the AR model. The insufficiency of the AR model has been evoked by several authors, notably for the case of speech analysis. These authors have proposed modified ARMA models and related techniques for the estimation of the parameters [36]. However, the high complexity of these modified models does not always do justice to the expected signal quality, both for transmission and synthesis. In the case of speech synthesis, the most notable rise in the quality undoubtedly comes from the “multi-pulse” excitation [5]. The limitations of the Linear Prediction Coding (LPC) technique for speech synthesis arise from the LPC model being a simple description of the electrical signal and not of the vocal tract. The model’s ^ a i `i 0, ..., p parameters have no simple or direct relation to the physical parameters of the vocal tract. A little further on, we will consider modeling by formants (i.e., the resonances of the vocal tract) and their tracking using nonlinear identification techniques. This in no way detracts from the advantages of the LPC technique, which has been successfully applied in many domains, notably in the synthesis of several languages [38]. It was mentioned above that the basic pitfall of the AR model results from the difficulty encountered in modeling strong periodicities such as voiced sounds in speech, periods of sleep in EEGs, radar signals, etc. In these cases, good results have been obtained using models which contain periodic terms. One such class of models is the sinusoidal models, which will now be presented.
Parametric Models
21
1.5. Sinusoidal models 1.5.1. The relevance of the sinusoidal model
Using the linear acoustic theory of speech production developed in the 1960s as a starting point, we can choose an autoregressive modeling of pth-order for the speech signal: yk
p
¦ ai yk i uk
[1.28]
i 1
This source filter model corresponds to a white input source u(k) processed by a filter with an impulse response h(k) and the following transfer function:
H z
f
¦ h( k ) z k
k f
1 1
p
¦
[1.29]
ai z i
i 1
In practical cases, as discussed before, the value p lies between 10 and 16 for the modeling of the first formants of the speech signal. The notion of a short term predictor is relevant: y(k) can be expressed as a function of a linear prediction based on the last samples of the speech signal. Under these conditions, however, the spectrum of the voiced sounds such as vowels presents a harmonic character which is difficult to quantify. One possible solution would be to increase the AR model order, but this is offset because the aim here is to compress the speech. In this case, we analyze the speech signal frame by frame. These frames do not last longer than about 30 ms, signifying around 250 samples for a signal sampled at 8 kHz. To mitigate this problem, one solution could be an approach wherein the driving process is a Dirac “comb” with a period T0 = 1/f0. While processing speech, f0 corresponds to the fundamental frequency which can vary in the range 80 Hz to 600 Hz. The input and the filtering contribute significantly to the spectrum of the signal under study as:
yk hk * u k
[1.30]
22
Modeling, Estimation and Optimal Filtering in Signal Processing
We obtain:
Y f H f U f By taking U f
1 T0
[1.31] §
n · the desired periodicity is introduced. ¸, 0 ¹
¦G ¨ f T n
©
Figure 1.13. The source filter model: autoregressive approach
As the Dirac comb is difficult to implement, we can suppose that the input satisfies the following relationship:
uk Du k q vk
[1.32]
where D tends towards 1, v(k) is a white Gaussian sequence and q is such that qTs | T0 . This relation can be thought of as a long term predictor since q >> p. The above considerations give rise to Figure 1.14. This modeling scheme is used for speech coding in mobile phones, with the code-excited linear prediction coders (CELP coders) [15]. In the field of speech enhancement, Z. Goh et al. have also put forward the idea of using a representation wherein the input is adjusted according to the frame being analyzed [21] [22]. An alternative approach consists of the direct use of sinusoidal models. Over the past 20 years or so, they have been used for analysis, synthesis, low rate coding and high quality prosodic transformations of speech signals. They also find applications in the processing of musical and radar signals as well as in communications for the modulation of the data to be transmitted.
Parametric Models
23
Figure 1.14. The source filter model: autoregressive approach
We now detail some existing sinusoidal models. More specifically, we will discuss the case where the signal is represented either as a sum of complex exponentials or as a sum of sinusoids. 1.5.2. Sinusoidal models
The simplest representation of a periodic signal consists of a signal y(k) which is a sum of L complex exponentials:
§
L
y(k )
f
·
¦ ai k exp¨¨© j 2Sk f si Mi ¸¸¹
[1.33]
i 1
We can also consider a more general model with damping factors leads to a sum of L damped complex exponentials: L
y (k )
§
¦ ai exp¨¨©D i i 1
k fs
· § · f ¸ exp¨ j 2Sk i M i ¸ ¸ ¨ ¸ fs ¹ © ¹
D i 0.
This
[1.34]
The use of this model for spectral analysis forms the core of the Prony method [46].
24
Modeling, Estimation and Optimal Filtering in Signal Processing
For a random signal with amplitude terms ai having the form: ai
a i e jT i ,
the model of equation [1.33] can be augmented by an additional zero-mean term b(k) which is generally white. This leads to a modeling with two components in its spectrum: one discrete and the other continuous. L
y (k )
§
f
·
¦ ai exp¨¨© j 2Sk f si Mi ¸¸¹ bk
[1.35]
i 1
The model presented in equation [1.35] is the core of “high resolution” spectrum analysis techniques such as Pisarenko’s Harmonic Decomposition, MUSIC (Multiple Signal Classification) and ESPRIT (Estimation of Signal Parameters via Rotational Invariance Techniques). These techniques are based on the analysis of the autocorrelation matrices of the observed data, namely, the separation of the dominant eigenvalues which characterize the signal subspace from the smallest ones which correspond to the variance of the noise b(k). The signal’s frequency profile is found by making use of the orthogonality between the two subspaces. However, it is not always straightforward to find the dominant eigenvalues, especially when the signal has a rich spectral content [44]. For further details, see Appendices A and B. The model in equation [1.35] is also used to retrieve speech signals from noisy observations [16] [17] [18] [19] [24] [25] [26] [48]. Appendix C presents more details on this topic. Since it would be beyond the aim and scope of this chapter to estimate the parameters of sinusoidal models, we refer the reader to Christensen and Li’s contributions for further details [14] [31]. In the early 1980s, McAulay and Quatieri introduced a low-rate signal coding based on an analysis-synthesis scheme [33]. In it, the speech signal y(k) is modeled as a sum of L sinusoidal components:
y(k )
L
L
§
i 1
i 1
©
f
·
¦ai k cosMi k ¦ai k cos¨¨ 2Sk f si Mi 0 ¸¸ ¹
[1.36]
Parametric Models
25
ai k , M i k , M i 0 and f i denote, respectively, the Here, instantaneous amplitude, the instantaneous phase, the original phase and the frequency of the ith component of the model. This equation forms the starting point of several other approaches to modeling signals with strong periodicities [20] [37]. The model in [1.36] can be enriched by adding a noise b(k) to the observation: L
y (k )
¦ ai k cosMi k bk i 1
§ · f ai k cos¨¨ 2Sk i M i 0 ¸¸ bk fs © ¹ 1
L
[1.37]
¦ i
Within the domain of analysis/synthesis of musical signals, this model is known as the “signal + noise” model [40]. It has also been used by Stylianou [42] [43] for the analysis, synthesis and enhancement of speech signals. Amplitude ai(k) is unknown and hence can be modeled by an autoregressive process of order pi: ai (k )
pi
¦D l ,i ai (k l ) ui (k )
[1.38]
l 1
where ui(k) is a white zero-mean Gaussian driving process with variance Vu2i . The inverse filter associated with each frequency band is defined as: $i z
pi
¦D l,i z l l 0
where
Pl ,i
1 Ul ,i exp j 2SPl ,i z 1 p
[1.39]
l 1
fi . fs
When the number of frequency bands equals 1, this model is known as autoregressive amplitude-modulated cosinusoid (ARCOS), and is used in radar applications [9].
26
Modeling, Estimation and Optimal Filtering in Signal Processing
The spectrum has a shape similar to that for an ARMA model of order (2p1, p1). The autocorrelation functions Ryy(W) and Raa(W) corresponding, respectively, to signal y(k) and amplitude ai(k), satisfy the following relation: § f · 1 Raa W cos¨¨ 2SW 1 ¸¸ f 2 s ¹ ©
R yy W
[1.40]
80 60 40 20 0 -20 -40 -60 -80
0
50
100
150
200
250
-0.4
-0.3
-0.2
-0.1
0
300
350
400
450
500
70 60 50 40 30 20 10 0 -10 -0.5
0.1
0.2
0.3
0.4
0.5
Figure 1.15. Plot of an ARCOS process, and its corresponding spectrum (in dB)
Parametric Models
27
Taking the Fourier transform of equation [1.40] gives: S yy f
1 ^S aa f f1 S aa f f1 ` 4
[1.41]
However, considering equation [1.38], the PSD can be expressed using prediction coefficients ^D l ,1 `l 1,..., p and the coefficients related to the inverse 1
filter:
S aa f
1 $ 1 z
* § f ·$ z exp¨¨ j 2S ¸¸ 1 f s ¹ ©
1/ z
[1.42] § f · z exp¨¨ j 2S ¸¸ f s ¹ ©
Consequently, we obtain the power spectral density Saa(f) of the signal s(k):
S aa f O2
B f A f
[1.43]
After the presentation of the different models, we will introduce the state space representation which will be used later in Chapters 5 to 8. 1.6. State space representations 1.6.1. Definitions
In traditional circuit and system analysis, differential equations are used to represent the system. This type of representation directly translates the behavior of the system under study and can also be converted into a representation using a transfer function, without necessarily a perfect equivalence between the time and frequency domain descriptions. A stationary system can be described using a convolution equation, which is perfectly equivalent to a representation using a transfer function because the transfer function is the Laplace transform of the convolution equation. The simplicity of these representations has made them highly popular. Nevertheless, they are ill-adapted to transitory or non-stationary cases. Moreover, representing a system in a state space allows us to benefit from the many results on linear algebra currently available, and to easily identify the salient properties.
28
Modeling, Estimation and Optimal Filtering in Signal Processing
Firstly, let us define x(t) as a column vector of dimension p u 1 containing the components ^xi (t )`i 1,..., p : x(t )
>x1(t )
@
" x p (t ) T .
[1.44]
x(t) is called the state vector, which is the smallest of the components x1(t), …, xp(t) that allows us to determine the system at t > t0 for an initial value x(t0) and a known input u(t). Otherwise expressed, the state of a system is a collection of sufficient information which allows us to determine the evolution of the system if the inputs are known. A system is said to be described using a state space if it is governed by the following differential equation: x (t )
A(t ) x (t ) B (t )u (t )
[1.45]
x (t )
d x(t ) dt
[1.46]
with:
The state vector x(t) is not directly accessible. Generally, only measurements are available. They satisfy: y (t )
H (t ) x(t ) .
[1.47]
For the modeling to be realistic, it is necessary to add a noise term to equation [1.47]. 1.6.2. State space representations based on differential equation representation
Let us consider a linear system governed by the following differential equation: y ( p ) (t ) a p 1 y ( p 1) (t ) " a 0 y (t )
b0 u (t )
[1.48]
where:
y ( p ) (t )
d p y (t ) dt p
[1.49]
Parametric Models
29
Equations [1.48]-[1.49] allow us to choose several possible state space representations. For example, let us construct a state vector x(t) in which the first component is y(t) and the (p-1)th component is the (p-1)th derivative of y(t): x1 (t )
y (t )
x 2 (t )
y (t )
x1 (t )
[1.50]
# x p (t )
y
( p 1)
x p 1 (t )
(t )
Thus: x(t )
>x1 (t ) " x p (t )@ T
and u (t )
>u (t )@
[1.51]
We can restate [1.49] as follows: A(t ) x(t ) B (t )u (t ) ,
x (t )
[1.45]
by defining A and B as follows:
A
ª 0 « # « « 0 « «¬ a 0
1 # 0 a1
0 º " % # »» and B 1 » " » " a p 1 »¼
ª0º «#» « » «0» « » ¬b0 ¼
[1.52]
Considering [1.47], the observation matrix H assumes the following form: H
>1
0 " 0@
[1.53]
State space representations have been the subject of much research effort. The following are considered classic references in the field [1] [3] [8] [13] [28] [30] [39] [41] [50]. Note: a system is said to be stationary if matrices A, B and H are time-invariant. 1.6.3. Resolution of the state equations
For the case of a stationary system, equations [1.45] and [1.47] can be reformulated as follows:
30
Modeling, Estimation and Optimal Filtering in Signal Processing
x (t )
A x(t ) Bu (t )
[1.45]
y (t )
H x(t )
[1.47]
x(t) satisfies the following homogenous differential equation: x (t )
[1.54]
A x(t )
The solution of equation [1.54] is thus given by: x(t )
exp>A(t t 0 )@ x(0)
where: x(t 0 )
x(0)
The general solution for a system with input u(t), at t > t0, is given by: t
x(t )
exp>A(t t 0 )@ x(0) exp>A(t t 0 )@³ exp>A(t W )@Bu (W )dW
[1.55]
t0
The term exp>A(t t 0 )@ is called the “transition matrix” and is also denoted ĭ (t t 0 ) . 1.6.4. State equations for a discrete-time system
Starting from the following equations for continuous systems: x (t )
A(t ) x(t ) B (t )u (t )
[1.45]
y (t )
H (t ) x(t )
[1.47]
For discrete-time systems, we can: – either formally write the state equations as follows: x(k 1) ĭ (k ) x(k ) G (k )u (k )
[1.56]
H (k ) x(k ) v(k )
[1.57]
y (k )
– or discretize the continuous equations.
Parametric Models
31
To do the latter, the following approximated relation is used to calculate the derivative of the state vector: x>(k 1)Ts @ x(kTs ) , Ts
x (kTs )
[1.58]
where Ts is the sampling period. It can also be simplified as follows: x (k )
x(k 1) x(k ) Ts
[1.59]
Equation [1.45] is changed to: x(k 1) x(k ) Ts
A x ( k ) Bu ( k )
[1.60]
Rearranging the terms of the above equation, we obtain the following form:
I ATs x(k ) BTs u (k )
x(k 1)
[1.61]
where I is the identity matrix. Comparing equation [1.61] with equation [1.56] gives: ĭ (k )
I ATs and G (k )
BTs
[1.62]
This result is not unexpected: it was shown in section 1.6.3 that the transition matrix took the following form: ĭ (t t 0 )
exp>A(t t 0 )@
[1.63]
Substituting t = (k + 1)Ts and t0 = kTs, this is changed to:
ĭ (Ts )
exp( ATs ) ĭ (k 1, k ) ĭ( k )
[1.64]
However, if we consider the expansion of the exponential function: exp( ATs )
I ATs
( ATs ) 2 " 2
[1.65]
32
Modeling, Estimation and Optimal Filtering in Signal Processing
and approximate this series to be limited to the 1st-order term, we obtain equation [1.62]. 1.6.5. Some properties of systems described in the state space
1.6.5.1. Introduction Various studies have been conducted for the characterization of systems represented in the state space. Two of these are the notions of observability and controllability. Observability is important for the modeling of signals and controllability characterizes systems we wish to control. 1.6.5.2. Observability Observability stands for the possibility of finding out the state of a system by studying its outputs. It can be expressed in a number of ways. The simplest definition is: a system is said to be observable if a measurement of its output allows the determination of its initial state. In the discussions that follow, we will establish the conditions of observability of such systems. Taking a noise-free observation y(k), for the discrete stationary case: y (k )
[1.66]
H x(k )
The variation of the state vector is given by: x(k 1) ĭ x(k )
[1.67]
For all k >0, p 1@ , y(k) can be expressed as a function of the transition matrix ĭ , of the observation matrix H and of the initial state x(0) of the state vector. y ( 0) y (1)
H x ( 0) H x (1)
Hĭ x (0)
y ( 2)
H x ( 2)
Hĭ 2 x (0)
# y ( p 1)
H x ( p 1)
Hĭ p1 x (0)
In matricial form, the above equation is denoted as follows:
[1.68]
Parametric Models
Y
ª y ( 0) º « y (1) » « » « # » « » ¬ y ( p 1)¼
ª H º « Hĭ » « » x (0) Ȍ x(0) « # » « p 1 » ¬ Hĭ ¼
33
[1.69]
with the observability matrix given by: Ȍ
ªH T «¬
Hĭ T
"
Hĭ
p 1 T
º »¼
T
[1.70]
In order to determine the initial state x(0) from the observation vector Y, the observability matrix Ȍ should be non-singular. Therefore, the condition of observability can be put forth: for a system to be observable, the observability matrix must be of full rank. This notion, first proposed by Kalman, is closely linked to the inversion and deconvolution of systems [29]. 1.6.5.3. Controllability The controllability of systems denotes the possibility that by imposing a particular input, the system can be driven from an initial state x(0) to a desired final state x(k) within a finite duration of time. Such a property is useful in control engineering, for control procedures, guiding aircraft or ballistic devices, etc. Controllability of systems might appear to be of limited interest in signal and image processing, fields whose major aims are modeling and identification, and where the input is almost never accessible. Nevertheless, we will describe the conditions, in terms of ĭ and G, in which a discrete-model system, governed by equations [1.45] and [1.47], must satisfy the following condition: p 1 – the controllability matrix ª¬G ĭG " ĭ G º¼ should have a rank p.
1.6.5.4. Plurality of the state space representation of the system Let us consider the following equations: x(k 1) ĭ (k ) x(k ) G (k )u (k )
[1.56]
H (k ) x(k ) v(k )
[1.57]
y (k )
34
Modeling, Estimation and Optimal Filtering in Signal Processing
A system can have an infinite number of representations, and we can cross over from one to another using a non-singular transformation T. This implies a basic change in the state space: x(k ) T x T (k )
x T (k 1) T 1ĭT x T ( k ) T 1Gu (k ) y(k )
[1.71]
HT x T ( k ) v ( k )
Except for the non-singular transformation T, the triplet [ĭ, G, H] is similar to:
>ĭ
T
T 1ĭT , GT
T 1 G , H T
HT
@
[1.72]
A number of representations, all characterized by the triplet ĭ(k), G(k) and H(k), are thus possible. The major representations among these are the following: – Jordan’s representation; – controllable representation; – observable representation. Different forms can also be adopted to highlight different physical parameters of the process being studied or to identify parameters which play a special role in the particular application [32]. All these forms are of interest in the field of control engineering. 1.6.6. Case 1: state space representation of AR processes
Let us start with an AR model of order p: y (k )
p
¦ ai yk i u(k )
[1.73]
i 1
We can define a state vector by concatenating the p last values of the process, these p values being denoted by the p state variables. x(k ) = > y (k - p + 1) " y (k ) @T
[1.74]
Parametric Models
35
The state vector can be updated to: x(k 1) ĭ x (k ) Gu (k )
[1.75]
where transition matrix ĭ has the following form:
ĭ
ª 0 « 0 « « # « « # « a p ¬
0 º # »» 1 % % % 0 » » " 0 1 » " " a1 »¼
1
0
0 " a p 1
"
[1.76]
and input weight vector G is defined as: T
G
ª º «0 " 0 1» .
» « ¬ p 1 ¼
[1.77]
1.6.7. Case 2: state space representation of MA processes
Let us start with a pth-order MA model: y (k )
b0 u ( k ) " b p u ( k p )
[1.78]
To simplify the state space representation of this system, we can introduce the following processes: x j k
p 1 j
¦ b j i 1uk i , j ^1,..., p` .
[1.79]
i 1
We see that on the one hand: y k
p
¦ l 0
b j u k l
x1 k b0 u k
p
¦ b j uk l b0 uk l 1
[1.80]
36
Modeling, Estimation and Optimal Filtering in Signal Processing
and, on the other hand, for all j ^1,..., p 1` , we have: x j k
p 1 j
¦ b j i 1uk i
i 1 p 1 j
¦ b j i 1uk i b j uk 1 i 2 p j
l i 1
[1.81]
¦ b j l uk 1 l b j uk 1 l 1
x j 1 k 1 b j u k 1
The state vector for the MA process can be written as follows: x(k )
>x1 (k )
" x p (k )
@T
[1.82]
Updating this state vector gives:
x(k )
ª x1 (k ) º « x (k ) » « 2 » « # » « » « # » « x p (k )» ¬ ¼
y k
>1
ª0 1 0 " 0º ª x1 (k 1) º ª b1 º «0 % % % # » « x (k 1) » « b » » « 2» « »« 2 » « # »u k 1 « # % % % 0» « # » « » « »« # % % 1» « » «# » «# «0 " " 0 0» « x p (k 1)» «b p » ¬ ¼¬ ¼ ¬ ¼
[1.83]
and: 0 " " 0@ xk b0 u k
[1.84]
1.6.8. Case 3: state space representation of ARMA processes
Let us consider an ARMA model of order ( p, q) and start with the simplification p q: y (k ) a1 y (k 1) " a p y ( k p )
b0 u ( k ) " b p u ( k p )
[1.85]
To represent this model in the state space, we will take up an observable form appropriate to the estimation of the model’s parameters.
Parametric Models
37
Following relation [1.2], we can write y(k) by introducing additional outputs denoted x1(k), x2(k), …, xp(k). These outputs are defined as follows: y (k )
b0 u (k ) x1 (k )
[1.86]
where the first additional output respects the following condition: x1 (k )
>b1u (k 1) a1 y (k 1)@ " >b p u (k p) a p y(k p)@ >b1u (k 1) a1 y (k 1)@ x 2 (k 1)
[1.87]
Considering equation [1.86] at the instant k-1, equation [1.87] is modified to:
x1 (k )
>b1u (k 1) a1 >b0 u (k 1) x1 (k 1)@ @ x 2 (k 1) (b1 a1b0 )u (k 1) a1 x1 (k 1) x 2 (k 1)
[1.88]
Similarly, we can define the second additional output as follows: x 2 (k 1)
>b2 u (k 2) a 2 y(k 2)@ " >b p u (k p) a p y(k p)@ >b2 u (k 2) a 2 y(k 2)@ x3 (k 2)
[1.89]
Considering equation [1.86] at the instant k-2 , equation [1.87] is modified to: x 2 (k 1)
>b2 u (k 2) a 2 >b0 u (k 2) x1 (k 2)@@ x3 (k 2) (b2 a 2 b0 )u (k 2) a 2 x1 (k 2) x 3 (k 2)
[1.90]
and so on, to the kth instant: x 2 (k )
(b2 a 2 b0 )u (k 1) a 2 x1 (k 1) x 3 (k 1)
[1.91]
All the other additional outputs xi(k) can be deduced in a similar manner: x p (k p 1)
>b p u(k p) a p y(k p)@ (b p a p b0 )u (k p) a p x1 (k p )
[1.92]
Thus, at instant k: x p (k )
(b p a p b0 )u (k 1) a p x1 (k 1)
[1.93]
Writing the above equations in a matricial form, outputs xi appear to be the elements of the following state vector at instant k:
38
Modeling, Estimation and Optimal Filtering in Signal Processing
x(k )
>x1 (k )
" x p (k )
@T,
These elements are expressed as a function of the input u(k – 1). Finally, we can write all the above equations in the form of state equations: ª x1 (k ) º « x (k ) » « 2 » « # » « » « x p 1 (k )» « x p (k ) » ¬ ¼
ª a1 « a 2 « « # « « a p 1 « ap ¬
0 " 0º ª x1 (k 1) º ª b1 a1b0 º »« » « » 0 % % # » « x 2 (k 1) » « b2 a 2 b0 » »« »u (k 1) # # 0 % % 0» « »« » « » # % % 1» « x p 1 (k 1)» «b p 1 a p 1b0 » 0 " 0 0»¼ «¬ x p (k 1) »¼ «¬ b p a p b0 »¼
1
[1.94] Equation [1.86] becomes:
y (k )
ª x1 (k ) º « x (k ) » >1 0 " 0@ «« 2 # »» b0 u (k ) « » ¬« x p (k )¼»
>1
0 " 0@ xk b0 u (k ) [1.95]
We note that equations [1.83] and [1.84] are the same as [1.94] and [1.95] with = 0.
^a i `i 1,..., p
The state equations can also be considered as a representation of a digital filter with input u(k) and output y(k) [27] [45]. This is called the observable form of the equations, with the observability matrix being non-singular for a p z 0 . The representation of the ARMA model [1.2] is thus defined by:
with J
x(k )
J x (k 1) T y (k 1) Bu (k 1)
[1.96]
y (k )
H x ( k ) b0 u ( k )
[1.97]
ª0 «1 « «0 « «# «¬0
0 " 0 0º 0 " 0 0»» 1 " 0 0» , T » # % # #» 0 " 1 0»¼
ªa p º «« # »» , B «¬ a1 »¼
ªb p º «# » « » «¬ b1 »¼
Parametric Models
39
and: H
>0
" 0 1@ .
[1.98]
Matrix J has the following properties: 1. J i
0 i ! p
2. +J i
ª º «0 " 0 1 0 " 0 » » «
i ¬ p i 1 ¼
[1.99]
[1.100]
1.6.9. Case 4: state space representation of a noisy process
From Chapter 5 onwards, we will take up the modeling of a signal disturbed by additive noise, as well as the representation of the corresponding system in the state space. We will also implement techniques for the estimation of the state vector using available noisy observations. To set the stage for this, we will present some additional cases. 1.6.9.1. An AR process disturbed by a white noise Let us first consider a pth-order AR process, denoted s(k), which is disturbed by an additive zero-mean white noise b(k) with a variance R. The observation y(k) is defined as follows: y ( k ) s ( k ) b( k )
[1.101]
The state vector x(k) is then defined as follows: x(k ) = >s (k - p + 1) " s (k ) @ T .
[1.102]
The updating equation of the state vector is the same as in [1.75], while the observation equation is: y (k )
H x ( k ) b( k )
[1.103]
40
Modeling, Estimation and Optimal Filtering in Signal Processing
where:
G
ª º «0 " 0 1»
» « ¬ p 1 ¼
HT
T
[1.104]
1.6.9.2. AR process disturbed by colored noise itself modeled by another AR process2 We sometimes have to process signals disturbed by colored noises. First, let us model this additive colored noise as a qth-order AR process. b( k )
q
¦ c j b(k j ) w(k )
[1.105]
j 1
Here, the sequence w(k) is white, zero-mean and Gaussian, with a variance of W. It is independent of the process u(k) which generates the signal s(k). The state vector x(k) is then composed of the p last values of the signal and the q last values of the additive noise:
>
x(k ) = s T (k ) v T (k )
@
T
[1.106]
where: s (k ) = >s (k - p + 1) " s (k )@ T
[1.107]
b(k ) = >b(k - q + 1) " b(k )@ T .
[1.108]
and:
The available observation y(k) is thus: y (k )
s ( k ) b( k )
H x(k )
[1.109]
where:
2 The use of such a state space representation will be illustrated in Chapter 6, for the
enhancement of a signal disturbed by a colored noise.
Parametric Models
H
ª º «0 " 0 1 0 " 0 1»
» « q 1 ¬ p 1 ¼
41
[1.110]
The equation updating the state vector x(k) can be expressed as follows: x(k ) ĭ x(k 1) GJ (k ) .
[1.111]
where the transition matrix ĭ is partitioned as follows: ĭ
ªĭ s «0 ¬
0º ĭ v »¼
[1.112]
with:
ĭs
ª 0 « # « « 0 « ¬« a p
1 # 0 a p 1
0 º % # »» " 1 » » " a1 ¼»
[1.113]
0 º % # »» . " 1 » » " c1 »¼
[1.114]
"
and:
ĭv
ª 0 « # « « 0 « «¬ c q
1 # 0 c q 1
"
Matrix G is defined by: G
ª 0 " 0 1 0 " " 0º «0 " " 0 0 " 0 1 » ¬ ¼
T
[1.115]
The noise vector is constructed as follows:
J (k ) = >u (k ) w(k )@T
[1.116]
42
Modeling, Estimation and Optimal Filtering in Signal Processing
A system, defined by a signal contaminated with an AR colored noise, is said to be represented in the state space by a “perfect measurement” representation. This name arises from the fact that the noise does not appear explicitly in relation [1.109]. In this case, the observation is a linear combination of state variables. There are no other terms in the observation equation of the state space representation. 1.6.9.3. AR process disturbed by colored noise itself modeled by a MA process3 Let us suppose that the autoregressive signal s(k) is disturbed by a noise b(k) modeled by a qth-order MA process as follows: q
¦ ci w(k i)
b( k )
[1.117]
i 0
where ^ci `i
0 , ..., q
are the MA parameters and w(k) is a zero-mean white Gaussian
noise with variance V w2 . As in section 1.6.7, we introduce the following terms:
[ j (k )
q 1 j
¦ ci j 1 w(k i) ,
j ^1,..., q`
[1.118]
i 1
Using equations [1.79], [1.101] and [1.117], we can write: y (k )
s (k ) s (k )
q
¦ ci w(k i) i 0 q
¦ ci w(k i) c 0 w(k )
[1.119]
i 1
s (k ) [ 1 (k ) c 0 w(k ) .
Let us define the process v(k) = c0 w(k) as a zero-mean white Gaussian noise with variance R V v2 c 02V w2 . We can rewrite equation [1.119] as follows: y (k )
s(k ) [1 (k ) v(k )
[1.120]
We can define the state vector as follows:
3 The use of such a state space representation will be illustrated in Chapters 6 and 7, for the
enhancement of a signal disturbed by a colored noise.
Parametric Models
>s(k )
x(k )
@
" s (k p 1) [ 1 (k ) " [ q (k ) T ,
43
[1.121]
The state space representation corresponding to the system in equations [1.79], [1.101] and [1.117] can be written as: x(k ) ) x(k 1) Gu (k ) ® ¯ y (k ) H x(k ) v(k )
[1.122]
Finally, the matrices corresponding to this state space representation of equation [1.122] are defined as: ª a1 " " a p « 1 0 0 0 « « 0 % 0 # « 0 1 0 « 0 « « « O ( q, p ) « « ¬
)
O ( p, q ) 0
1
0
0 % % # % 0 " "
º » » » » » 0» 0»» 1» » 0¼
[1.123]
G
ª H Tp O( p,1)º « » *n ¼» ¬«O(q,1)
[1.124]
H
>H p
[1.125]
with *n
>c1
Hq
@
" cq
@T . Independently of the values of the positive integers m and
ª º «1 0 " 0» is a row vector with m-1 zeros and O(m,n) is a zero matrix
« » m 1 ¬ ¼ of dimensions m u n .
n, H m
Moreover, the driving process vector u(k) is defined by: u (k )
ª u (k ) º « w(k 1)» ¬ ¼
[1.126]
44
Modeling, Estimation and Optimal Filtering in Signal Processing
This is a zero-mean Gaussian vector with a covariance matrix Q
ªV u2 « «¬ 0
0 º ». V w2 »¼
It should be noted that this state representation is no longer a “perfect measurement” representation. 1.7. Conclusion
This chapter has aimed at presenting two major families of models. One of these comprises the AR, MA and ARMA models and the other family is composed of sinusoidal models and their various forms which encompass a wide range of signals. These models are used in a variety of applications, from biomedical to speech processing and mobile communications. In the next chapter, we will take up the issue of estimating the AR parameters using least squares methods. 1.8. References [1] D. Alpay and I. Gohberg Editors, The State Space Method Generalizations and Applications, Linear operators and linear systems, Operator Theory Advances and Applications, vol. 16, Birkhaüser Verlag, 2000. [2] M. N. Ansourian, J. H. Dripps, G. J. Beattie and K. Boddy, “Autoregressive Spectral Estimation of Fetal Breathing Movement”, IEEE Trans. on Biomedical Engineering, vol. 36, no. 11, pp. 1076-1084, November 1989. [3] M. Aoki, State Space Modeling of Time Series, Springer-Verlag, 1987. [4] M. Arnold, W. H. R. Miltner, H. Witte, R. Bauer and C. Braun, “Adaptive AR Modelling of Nonstationary Time Series by Means of Kalman Filtering”, IEEE Trans. on Biomedical Engineering, vol. 45, no. 5, pp. 553-562, May 1998. [5] B. Atal and J. Remde, “A New Model for LPC Excitation for Producing Natural Sounding Speech at Low Bit Rate”, IEEE-ICASSP ’82, Paris, France, pp. 614-617, 3-5 May 1982. [6] K. E. Baddour and N. C. Beaulieu, “Autoregressive Models for Fading Channel Simulation”, IEEE-GLOBECOM ’01, pp. 1187-1192, Nov. 2001. [7] K. E. Baddour and N. C. Beaulieu, “Autoregressive Modeling for Fading Channel Simulation”, IEEE Trans. on Wireless Commun., vol. 4, no.. 4, pp. 1650-1662, July 2005. [8] A.V. Balakrishan, Elements of State Space Theory of Systems, Optimization Software, Inc. Publications Division, New York, 1983.
Parametric Models
45
[9] O. Besson and P. Stoica, “Sinusoidal Signals with Random Amplitude: Least Squares Estimators and their Statistical Analysis”, IEEE Trans. on Signal Processing, vol. 43, no. 11, November 1995. [10] T. Bohlin, “Analysis of Stationary EEG Signals by the Maximum Likelihood and Generalized Least Squares Methods”, Tehnical paper TP 18.200 Systems Development Division IBM Nordic Laboratory – Sweden. [11] T. Bohlin, “Comparison of Two Methods of Modeling Stationary EEG Signals” IBM J. Res. Dev., vol. 17, pp. 194, 1973. [12] J. A. Cadzow and Th.T. Hwang, “Signal Representation: An Efficient Procedure”, IEEE Trans. on Acoustics Speech and Signal Processing, vol. ASSP-25, no. 6, pp. 461-465, 1977. [13] C. T. Chen, Introduction to Linear System Theory, Holt Rinehart Winston, New York, 1970. [14] M. G. Christensen and S. H. Jensen, “On Perceptual Distortion Minimization and Nonlinear Least-Squares Frequency Estimation”, IEEE Trans. on Audio, Speech and Language Processing, vol. 14, no. 1, January 2006. [15] Coding of speech at 8kbps using conjugate-structure algebraic code-excited linear prediction; “Codage de la Parole à 8 kbit/s par Prédiction Linéaire avec Excitation par Séquences Codées à Structure Algébrique Conjuguée”, ITU-T Recommendation G.729, 1996. [16] M. Dendrinos, S. Bakamidis and G. Carayannis, “Speech Enhancement from Noise: a Regenerative from Noise”, Speech Communications, vol. 10, no. 2, pp. 45-57, February 1991. [17] S. Doclo and M. Moonen, “SVD-Based Optimal Filtering With Applications to Noise Reduction in Speech Signals”, IEEE-WASPAA ’99, New York, USA, October 1999. [18] S. Doclo and M. Moonen, “GSVD-Based Optimal Filtering for Single and Multimicrophone Speech Enhancement”, IEEE Trans. on Signal Processing, vol. 50, no. 9, pp. 2230-2244, September 2002. [19] Y. Ephraim and H. L. Van Trees, “A Signal Subspace Approach for Speech Enhancement”, IEEE Trans. on Speech Audio Processing, vol. 3, no. 4, pp. 251-266, July 1995. [20] E. B. George and M. J. T. Smith, “Speech Analysis/Synthesis and Modification Using an Analysis-by-Synthesis/Overlap-Add Sinusoidal Model”, IEEE Trans. on Speech and Audio Processing, vol 5, no. 5, pp. 389-406, September 1997. [21] Z. Goh, K. C. Tan and B. T. G. Tan, “Speech Enhancement Based On a VoicedUnvoiced Speech Model”, IEEE-ICASSP ’98, Seattle, Washington, USA, vol. 1, pp. 401404, 12-15 May 1998. [22] Z. Goh, K.-C. Tan and B. T. G. Tan, “Kalman-Filtering Speech Enhancement Method Based on a Voiced-Unvoiced Speech Model”, IEEE Trans. on Speech and Audio Processing, vol. 7, no. 5, pp. 510-524, September 1999.
46
Modeling, Estimation and Optimal Filtering in Signal Processing
[23] I. Güler, M. K. Kiymik, M. Akin and A. Alkan, “AR Spectral Analysis of EEG Signal by using Maximum Likelihood Estimation”, Computers in Biology and Medicine, no. 31, pp. 441-450, 2001. [24] J. Jensen, R. Hendricks, R. Heusdens and S. Jensen, “Smoothed Subspace Based Noise Suppression with Application to Speech Enhancement”, Eurasip-EUSIPCO ’05, Antalya, Turkey, 4-8 September 2005. [25] S. H. Jensen, P. C. Hansen, S. D. Hansen and J. Sorensen, “A Signal Subspace Approach for Noise Reduction of Speech Signals”, Eurasip-EUSIPCO ’94, Edinburgh, Scotland, pp. 1174-1177, 13-16 September 1994. [26] S. H. Jensen, P. C. Hansen, S. D. Hansen and J. Sorensen, “Reduction of Broad Band Noise in Speech by Truncated QSVD”, IEEE Trans. on Speech and Audio Processing, vol. 3, no. 6, pp. 439-448, 1995. [27] M. J. Jong, Methods of Discrete Signal and Systems Analysis, McGraw-Hill, 1982. [28] T. Kailath, Linear Systems, Prentice Hall, Englewood Cliffs, 1980. [29] R. E. Kalman, “On the General Theory of Control Systems”, 1st IFAC Congress, Moscow, USSR 1960, “Automation and Remote Control” Butterworths and Co, London, pp. 481-492, 1961. [30] J. B. Lewis, Linear Dynamic Systems, Matix Publishers Inc., Champaign, Illinois, 1977. [31] H. Li, P. Stoica and J. Li, “Computationally Efficient Parameter Estimation for Harmonic Sinusoidal Signals”, Signal Processing, vol. 80, pp. 1937-1944, 2000. [32] P. S. Maybeck, Stochastic Models, Estimation and Control, vol. 1, Academic Press, New York, 1979. [33] R. J. McAulay, T. F. Quatieri, “Speech Analysis/Synthesis Based on a Sinusoidal Representation”, IEEE Trans. on Acoust., Speech, Signal Processing, vol. 34, no. 4, August 1986. [34] R. N. McDonough and W. H. Huggins, “Best Least Squares Representation of Signals by Exponentials”, IEEE Trans. on Automatic Control, vol. AC13, no. 4, pp. 408-412, 1972. [35] F. Miwakeichi, A. Galka, S. Uchida, H. Arakaki, N. Hirai, M. Nishida, T. Maehara, K. Kawai, S. Sunaga and H. Shimizu, “Impulse Response Function Based on Multivariate AR Model Can Differentiate Focal Hemisphere in Temporal Lobe Epilepsy”, Epilepsy Research, vol. 61, no. 1-3, pp. 73-87, September-October 2004. [36] H. Morikawa and H. Fujisaki, “System Identification of the Speech Produced Process Based on the State Space Representation”, IEEE Trans. on Acoust., Speech, Signal Processing, vol. ASSP-32, pp. 252-262, 1984. [37] D. O’Brian and A. I. C. Monaghan, “Concatenative Synthesis Based on a Harmonic Model”, IEEE Trans. on Speech and Audio Processing, vol. 9, no. 1, pp. 11-20, January 2001. [38] A. Rajouani, M. Najim and A. Mouradi, “Synthesis of Arabic Speech by Linear Predictive Coding”, Workshop on Signal Processing and its Applications, 29 September1 October, Porto, 1982.
Parametric Models
47
[39] R. J. Schwartz and B. Fienland, Linear Systems, McGraw-Hill, New York, 1965. [40] X. Serra and J. Smith III, “Spectral Modeling Synthesis: A Sound System Based on a Deterministic Plus Stochastic Decomposition”, Computer Music Journal, vol. 14, no. 4, 1990. [41] T. Söderström, Discrete-Time Stochastic Systems, Estimation and Control, 2nd edition, Springer, 2002. [42] Y. Stylianou, “On the Implementation of the Harmonic Plus Noise Model for Concatenative Speech Synthesis”, IEEE-ICASSP 2000, Istanbul, 5-9 June 2000. [43] Y. Stylianou, “Applying the Harmonic Plus Noise Model in Concatenative Speech Synthesis”, IEEE Trans. on Speech and Audio Processing, vol. 9, no. 1, January 2001. [44] C. W. Therrien, Discrete random signals and statistical signal processing, Prentice Hall 1992. [45] S. A. Tretter, Introduction to Discrete time Signal Processing, John Wiley, 1976. [46] S. Van Huffel, “Enhanced Resolution Based on Minimum Variance Estimation and Exponential Data Modeling”, Signal Processing, vol. 33, no. 3, pp. 333-355, September 1993. [47] H. Wold, “A Study in the Analysis of Stationary Time Series”, Almquist and Wicksells, Uppsala, Sweden 1938. [48] H. Yi and P. C. Loizou, “A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise”, IEEE Trans. on Speech and Audio Processing, vol. 11, no. 4, July 2003. [49] G. U. Yule, “On Methods of Investigating Periodicities in Disturbed Series with Special Reference to Wölfe’s Sunspot Numbers”, Phil. Trans. Roy. Soc. (London), vol. A226, pp. 267-298, 1927. [50] L. Zadeh and C. Desoer, Linear Systems: A State Space Approach, McGraw-Hill, 1963.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Chapter 2
Least Squares Estimation of Parameters of Linear Models
2.1. Introduction The purpose of this chapter is to present different methods used to obtain least squares estimates of the parameters of linear models. To illustrate these approaches, we will focus our attention on the “autoregressive” model parameters introduced in Chapter 1. First, we consider the simple case where the observations are not disturbed by a measurement noise. We present non-recursive techniques, when available samples are processed as blocks of data. This leads to the Yule-Walker equations. These equations can be solved recursively using the Durbin-Levinson algorithm [10]. Thereafter, we take up the recursive least squares (RLS) algorithm, and successively treat cases where the autoregressive process is stationary or non-stationary. The second part of this chapter deals with cases where the observations are perturbed by an additive white noise. Here, we will first analyze the effect of the measurement noise on the estimation of the AR parameters, and then present nonrecursive and recursive methods which give rise to unbiased estimations of the AR parameters. 2.2. Least squares estimation of AR parameters The least squares method is the starting point of various methods for the identification and estimation of parameters. It was introduced by Gauss in 1809, but
50
Modeling, Estimation and Optimal Filtering in Signal Processing
is sometimes attributed to Legendre, who worked towards predicting the movements of planets using measurements taken from a telescope [26]. Let us start with a process y(k) defined as a sum of the p previous measurements, weighted by the parameters ^ ai `i 1, ..., p : y (k )
a1 y (k 1) ! a p y (k p )
[2.1]
To simplify the analysis, we define the two following column vectors:
> y (k 1)
H p (k )
" y (k p )@T
[2.2]
and
T
> a1
" ap
@T
[2.3]
Equation [2.1] can thus be written as follows: y (k )
H p (k ) T T
However, the parameter vector T
[2.4]
> a1
" ap
@T
never satisfies the precise
prediction of all the values of the process y(k) using the p last measurements. Therefore, the process y(k) depends on a prediction error denoted u(k): y (k )
a1 y (k 1) ! a p y (k p ) u k yˆ (k ) u (k )
[2.5]
T
H p (k ) T u (k )
The goal is then to determine the prediction coefficients
^ ai `i 1, ..., p
from the
measurements of the process y(k). 2.2.1. Determination or estimation of parameters? The word “determination” is probably inappropriate here because it leads us to erroneously believe that we can calculate the model parameters without any error. As we will see below, the calculation contains a certain error. Thus, we normally speak of the “estimation” of parameters.
Least Squares Estimation of Parameters of Linear Models
51
We will formalize this problem of estimation using equation [2.5]. Intuition would lead us to believe that if the number of observations is increased, estimating the coefficients ^ ai `i 1, ..., p reduces to the resolution of a system of linear equations. Thus, we can conduct a series of N measurements or observations, with N > p, and write: y (k )
H Tp (k ) T u (k )
y (k 1)
H Tp (k 1) T u (k 1)
# y (k N 1)
[2.6]
H Tp (k N 1) T u (k N 1)
By expressing the above equations in a matrix form, we obtain: y (k ) º ª « » # « » «¬ y (k N 1)»¼
ª H Tp (k ) º ª u (k ) º « » « » # # « »T « » T « H (k N 1)» «¬u (k N 1)»¼ p ¬ ¼
[2.7]
Let us substitute the following simplified notations: Y N (k )
> y (k )
" y (k N 1)@ T
[2.8]
U N (k )
>u (k )
" u (k N 1)@ T
[2.9]
H N (k )
ª H Tp (k ) º « » # » « « H T (k N 1)» ¼ ¬ p
and: y k p º ª y k 1 " » « # # » « «¬ y k N " y k N p 1 »¼
[2.10]
where Y N (k ) and U N (k ) are column vectors of size N and H N (k ) is a matrix with dimensions N u p. Equation [2.7] is thus reduced to:
Y N (k )
H N (k )T U N (k )
[2.11]
52
Modeling, Estimation and Optimal Filtering in Signal Processing
0 , the determination of the terms ^ a i `i
If U N (k )
1, ..., p
is trivial provided that
the p u p matrix H N T (k ) H N (k ) is invertible:
>H
T
N
T
(k ) H N (k )
@
1
H N T ( k )Y N ( k )
[2.12]
When U N (k ) z 0 , the system of equation [2.11] can be rewritten as follows:
T
>H >H
N N
@ (k )@
T
(k ) H N (k )
T
(k ) H N
1
H N T (k )Y N (k )
1
H N T (k )U N (k )
[2.13]
The above equation shows that U N (k ) directly influences the determination of the parameters. The aim is to estimate the vector of the parameters T , and not to determine it. Let Tˆ N be the column vector of the parameters T estimated using N ~ observations, and let T N be the corresponding estimation error vector. ~
T
T Tˆ N
N
[2.14]
y (k ) the error involved when predicting y(k), the following column By denoting ~ error vector is introduced: ~ Y N (k ) Y N (k ) Yˆ N (k ) Y N (k ) H N (k )Tˆ N > ~y (k ) " ~y (k N 1)@ T
[2.15]
The least squares method consists of estimating the parameters which minimize the optimization criterion, J N , which is the sum of the squares of the elementary
^
`
y 2 (k i) errors, i.e. ~ J N (Tˆ N )
N 1
i 0,..., N 1 :
¦ ~y 2 (k i) i 0
[2.16]
Least Squares Estimation of Parameters of Linear Models
53
Again, considering equation [2.15], the above criterion can be expressed using the column error vector: ~ ~ J N (Tˆ N ) Y N T (k )Y N (k )
>Y
N
(k ) H N (k )Tˆ N
@ >Y T
Y N (k )Y N (k ) 2Tˆ N T
T
N
(k ) H N (k )Tˆ N
H NT
(k )Y N (k ) Tˆ N
@ T
[2.17] H NT
(k ) H N (k )Tˆ N
Since J N (Tˆ N ) is a function of Tˆ N , it depends on the p variables ^ai `i
1,..., p
.
The vector Tˆ N should satisfy the two following conditions to give a minimum in criterion [2.17]: – the first of these conditions concerns the gradient of J N , denoted J N , which is defined as the column vector of the p partial derivatives of J N with respect to the AR parameters ^ai `i 1,..., p . This gradient should satisfy: J N (Tˆ N )
[2.18]
0
where 0 denotes the zero vector. Therefore: w J N (Tˆ N ) wa j
0 j 1,..., p
[2.19]
– the second condition concerns the Hessian matrix of J N (Tˆ N ) , which is
denoted H J N , has dimensions pup, and is composed of the second partial derivatives
w2J N of J N (Tˆ N ) . This Hessian matrix should be positive definite, wa i wa j
i.e.: V T H J N V ! 0 independently of the value of the non-zero vector V. [2.20]
For a single-variable function, conditions [2.19] and [2.20] lead to the search for an extremum and then a local minimum.
54
Modeling, Estimation and Optimal Filtering in Signal Processing
We can easily show that for any column vector P
w Tˆ N T P wa i
§ p · ¨ ¦ an pn ¸ ¨ ¸ ©n 1 ¹
w wa i
>p1
"
pp
@T :
p i i 1,..., p
[2.21]
[2.22]
Consequently:
Tˆ
T N
P
ª w Tˆ T P N « «¬ wa1
w Tˆ N T P º » " wa p » ¼
T
P
H N (k ) T Y N (k ) in equation [2.22], we obtain:
Substituting P
Tˆ N T H N (k ) T Y N (k )
H N T (k )Y N (k )
[2.23]
Similarly, for any symmetric matrix R with elements1 rij , and which respects rij
r ji , we obtain:
w Tˆ N T RTˆ N wa i
p p · · w §¨ §¨ a m rmn ¸a n ¸ ¸ ¸ wa i ¨ n 1 ¨© m 1 ¹ ¹ ©
¦ ¦
p
2
¦ a n rni
i 1,..., p
[2.24]
n 1
Consequently:
Tˆ N T RTˆ N Taking R
2 RTˆ N
[2.25]
H N (k ) T H N (k ) , we have:
Tˆ N T H N (k ) T H N (k )Tˆ N
2 H N (k ) T H N (k )Tˆ N
[2.26]
Then, taking into account relations [2.17], [2.23] and [2.26], equation [2.19] becomes:
1 With i being the row and j the column.
Least Squares Estimation of Parameters of Linear Models
J N (Tˆ N )
>
Y N T (k )Y N (k ) 2Tˆ N T H N T (k )Y N (k ) Tˆ N T H N T (k ) H N (k )Tˆ N 2H N T (k )Y N (k ) 2 H N T (k ) H N (k )Tˆ N
@
55
[2.27]
0
Thus: H N T ( k )Y N ( k ) H N T ( k ) H N ( k )Tˆ N
0.
[2.28]
If the matrix H N T (k ) H N (k ) is non-singular, this leads us to the following expression for the least squares estimation of T :
Tˆ N
>H
N
T
(k ) H N (k )
@
1
H N T (k )Y N (k )
[2.29]
Let us consider cases where the estimation of T is biased, i.e. where the following inequality holds:
^ `
E Tˆ N z T
[2.30]
with E^.` being the mathematical expectation. For this, we must recall the expression of the difference between the parameter vector T and its estimation Tˆ N using relation [2.11]. Thus:
Tˆ N
>H >H
N N
@ (k ) @
T
( k ) H N (k )
T
(k ) H N
>
1
H N T (k )Y N (k )
1
H N T (k )>H N (k )T U N (k )@
T H N T (k ) H N (k )
@
1
[2.31]
H N T (k )U N (k )
Shifting T to the left hand side of the equation, we obtain:
Tˆ N T
>H
N
T
(k ) H N (k )
@
1
H N T (k )U N (k ) .
[2.32]
Calculating the mathematical expectation on both sides of equation [2.32] gives:
^
E Tˆ N T
`
>
E ® H N T (k ) H N (k ) ¯
@
1
H N T (k )U N (k )½¾ ¿
[2.33]
56
Modeling, Estimation and Optimal Filtering in Signal Processing
When U N (k ) and H N (k ) are independent of each other, this current equation is modified to:
^
E Tˆ N T
`
>
E ® H N T (k ) H N (k ) ¯
@
1
H N T (k )½¾ E ^ U N (k ) ` ¿
[2.34]
^
Thus, if E ^ U N (k ) ` 0 , the estimator is unbiased, because E Tˆ N T
`
0.
Nevertheless, the least squares estimation of T is biased in two cases: – if U N (k ) and H N (k ) are correlated sequences; – if the mean value of U N (k ) is non-zero.
We also have to verify that condition [2.20] is also satisfied. 2 J N Tˆ N can be calculated as follows: 2 J N (Tˆ N )
>
2 Y N T (k )Y N (k ) 2Tˆ N T H N T (k )Y N (k ) Tˆ N T H N T (k ) H N (k )Tˆ N
>
@
>
2 Tˆ N T H N T (k )Y N (k ) 2 Tˆ N T H N T (k ) H N (k )Tˆ N 2
>
2 H N T (k ) H N (k )Tˆ N
@
@
@
[2.35]
2 H N T (k ) H N (k )
Considering equation [2.10], we check that H N T (k ) H N (k ) is indeed a positive definite matrix of dimensions pup. We have thus far presented a first approach which allows us to express the parameter vector T using equation [2.29]. This approach involves the Nup matrix H N (k ) and requires the inversion of the H N T (k ) H N (k ) matrix. It is an nonrecursive approach. In the section below, we will develop a recursive approach to estimate T .
2.2.2. Recursive estimation of parameters
The aim of this section is to put forth a recursive procedure to estimate the parameters T . Starting from Tˆ N , the estimator of T based on N measurements, we now express Tˆ N 1 which uses N+1 measurements.
Least Squares Estimation of Parameters of Linear Models
57
We know that:
Y N (k )
y k º ª « » # « » «¬ y k N 1 »¼
H N (k )T U N (k )
[2.11]
In addition, a k+1th observation of the process, y(k + 1), is available according to equation [2.5]: H Tp (k 1)T u (k 1) .
y (k 1)
[2.36]
For the rest of this derivation, we will suppose that u is a zero-mean white sequence with variance V u2 . We can introduce a new extended vector Y N 1 (k 1) consisting of N+1 measurements, and defined by means of the equations [2.11] and [2.36]: ª H T (k 1)º ªu (k 1)º « p »T « » «¬ H N (k ) »¼ ¬U N (k ) ¼
Y N 1 (k 1)
ª y (k 1)º « Y (k ) » ¬ N ¼
Y N 1 (k 1)
ª y k 1 º » « # » « «¬ y k N 1 »¼
[2.37]
and:
H N 1 (k 1)T U N 1 (k 1)
[2.38]
By applying [2.29], the estimation Tˆ N 1 of the vector of the AR parameters using N + 1 observations { y(k-N+1), …, y(k+1) } can be expressed as follows:
>H
Tˆ N 1
N 1
T
(k 1) H N 1 (k 1)
@
1
H N 1T (k 1)Y N 1 (k 1)
[2.39]
This can be rewritten in terms of H Tp (k 1) and y(k + 1):
Tˆ
>
N 1
>
@
§ ª T º· ¨ H (k 1) H T (k ) « H p (k 1)» ¸ N p ¨ «¬ H N (k ) »¼ ¸¹ © ª y (k 1)º H p (k 1) H N T (k ) « » ¬ Y N (k ) ¼
@
1
[2.40]
58
Modeling, Estimation and Optimal Filtering in Signal Processing
Expanding the matrix multiplication on the right-hand side of the above equation, we get the following form:
>H >H
Tˆ N 1
p (k
1) H Tp (k 1) H N T (k ) H N (k )
T p ( k 1) y ( k 1) H N ( k )Y N ( k )
@
@
1
[2.41]
To complete the derivation, we use the inversion lemma for a matrix. Let A be a matrix which can be decomposed as follows: B CD T
A
[2.42]
A–1, the inverse matrix of A, is given by:
A 1
B 1 B 1C I D T B 1C
1
D T B 1
[2.43]
Consequently, if we choose: H N T (k ) H N (k ) and C
B
D
H p (k 1)
applying the inversion lemma to equation [2.41] gives:
>
@
H T (k ) H (k ) 1 N ° N 1 ° T T T ® H N (k ) H N (k ) H p (k 1) H p (k 1) H N (k ) H N (k ) ° 1 ° 1 H Tp (k 1) H N T (k ) H N (k ) H p (k 1) ¯
Tˆ N 1
>
@
>
>
u H N T (k )Y N (k ) H p (k 1) y (k 1)
@
@
>
@
½ ° 1 ° ¾ ° [2.44] ° ¿
Taking into account equation [2.31] for Tˆ N , this leads to the “straightforward” calculation:
Tˆ N 1
Tˆ N
>H
N
T
(k ) H N (k )
>
@
1
H p (k 1)
1 H Tp (k 1) H N T (k ) H N (k )
@
1
H p (k 1)
>y(k 1) H
T p (k
1)Tˆ N
@
[2.45]
Least Squares Estimation of Parameters of Linear Models
59
This equation can be condensed using a weighting factor to account for the change brought about by y(k + 1). This factor is called the gain, and is denoted by K N (k 1) .
>
Tˆ N 1 Tˆ N K N (k 1) y (k 1) H Tp (k 1)Tˆ N
@
[2.46]
Equation [2.46] shows that to determine Tˆ N 1 , we only require Tˆ N and the new measurement y(k + 1). The estimation of the parameters is updated by the weighted difference between the effective measurement y(k+1) and the predictable measurement, i.e.: H Tp (k 1)Tˆ N
yˆ (k 1)
[2.47]
It is noteworthy that in the absence of noise, yˆ (k 1) would represent the best prediction of the measurement. In equation [2.45], the denominator term corresponding to the gain, namely
>
1 H Tp (k 1) H N T (k ) H N (k )
@
1
H p (k 1) , is a scalar quantity. The operation of
matrix inversion is replaced by the simple operation of division. 2.2.3. Implementation of the least squares algorithm
In order to implement the least squares algorithm and to facilitate comparisons with other algorithms, especially the Kalman filter presented in Chapter 5, we will adopt the following notation: PN (k )
>H
N
T
(k ) H N (k )
@
1
[2.48]
The gain K N (k 1) is thus given by: K N (k 1)
>
PN (k ) H p (k 1) 1 H Tp (k 1) PN (k ) H p (k 1)
@
1
[2.49]
In order to determine a recursive equation for the calculation of PN (k ) , let us start with the definition: PN 1 (k 1)
>H
N 1
T
(k 1) H N 1 (k 1)
@
1
[2.50]
60
Modeling, Estimation and Optimal Filtering in Signal Processing
On breaking H N 1 (k 1) up into the terms H N (k ) and H Tp (k 1) , we obtain: PN 1 (k 1)
>H
N
T
(k ) H N (k ) H p (k 1) H Tp (k 1)
@
1
Equivalently: PN 1 (k 1)
>P
N
1
(k ) H p (k 1) H Tp (k 1)
@
1
Applying the matrix inversion lemma [2.43] and taking: A
PN 1 1 (k 1), B
PN 1 (k ), C
H p (k 1) and D T
H Tp (k 1) ,
we can derive the following relationship between PN 1 (k ) and PN (k ) : PN 1 (k 1)
>I K
N
@
(k 1) H Tp (k 1) PN (k )
[2.51]
The new algorithm can thus be implemented using the following set of three equations: K N (k 1)
>
PN (k ) H (k 1) 1 H Tp (k 1) PN (k ) H p (k 1)
>
Tˆ N 1 Tˆ N K N (k 1) y (k 1) H Tp (k 1)Tˆ N PN 1 (k 1)
>I K
N
@
(k 1) H Tp (k 1) PN (k )
@
@
1
[2.49] [2.46] [2.51]
The determination of PN (k ) makes it possible to calculate the gain K N (k 1) , which in turn allows the estimation of Tˆ N 1 at the k+1th instant. Figure 2.1 shows a representative case: the recursive estimation of the parameters of a 2nd-order AR model. In addition, u(k) is a real white zero-mean Gaussian noise with unit variance. The parameters of the model are a1 = 0.2 and a2 = –0.7. This figure presents the results averaged over 100 realizations of the process.
Least Squares Estimation of Parameters of Linear Models
Parametres
Parameters
0 .4
a a
61
1 2
- 0 .2 5
- 0 .9
5 0
1 0 0 1 5 0 N o m b r e d 'i t é r a t i o n
2 0 0
2 5 0
Number of iterations
Figure 2.1. An example of recursive estimation of the parameters of a 2nd order AR process (with a1 = 0.2 and a2 = –0.7)
2.2.4. The least squares method with weighting factor
Let us recall the error-minimization criterion [2.16]: J N (Tˆ N )
N 1
¦ ~y 2 (k i)
[2.16]
i 0
J N (Tˆ N ) is the sum of N elementary errors all having the same unit weight. To increase the emphasis on the first or the last measurements, we can adopt a criterion which weights each elementary error ~ y 2 (k ) by a factor O k , such that: J N (Tˆ N )
O k ~y 2 (k ) " O k N 1 ~y 2 (k N 1)
[2.52]
The above relation can also be written with a matricial form by introducing the diagonal weighting matrix W(k) as follows: ~ ~ J N (Tˆ N ) Y N T (k )W (k )Y N (k )
[2.53]
where:
W (k )
ªO k « # « ¬« 0
º % # »» . " O k N 1 ¼»
"
0
[2.54]
62
Modeling, Estimation and Optimal Filtering in Signal Processing
Let us choose O k i Oi where O is a scalar entity and where the current term is obtained using a geometric progression. In that case, taking 0 O d 1 favors the latest measurements with respect to the earliest ones. Thus, Oi behaves as a forgetting factor. The estimator can be found by solving the following equation: J N (Tˆ N )
0 , or
wJ N wa j
0 j 1,..., p
[2.19]
Since:
>Y
J N (Tˆ N )
N
(k ) H N (k )Tˆ N
@
T
>
W (k ) Y N (k ) H N (k )Tˆ N
@
[2.55]
we obtain: J N (Tˆ N ) Y N T (k )W (k )Y N (k ) Y N T (k )W (k ) H N (k )Tˆ N Tˆ N T H N T (k )W (k )Y N (k ) Tˆ N T H N T (k )W (k ) H N (k )Tˆ N
[2.56]
Taking equations [2.19], [2.22] and [2.25] into account, we check that the estimator satisfies the following relation: H N T (k )W (k )Y N (k ) H N T (k )W (k )Y N (k ) 2 H N T (k )W (k ) H N (k )Tˆ N
0
[2.57]
Rearranging the terms of the above equation, we obtain:
Tˆ N
>H
N
T
(k )W (k ) H N (k )
@
1
H N T (k )W (k )Y N (k )
[2.58]
By choosing matrix W(k) to be equal to Ruu 1 (k ) , the inverse of the autocorrelation matrix of the sequence u(k), we obtain the so-called “best linear least squares” estimator. For consistency in the naming convention, we specify that the matrix W(k) = Ruu –1 (k) or, simply put, W(k) = R –1 (k).
Least Squares Estimation of Parameters of Linear Models
63
2.2.5. A recursive weighted least squares estimator
For the weighted least squares case, the non-recursive estimator is given by:
>H
Tˆ N
N
T
(k ) R 1 (k ) H N (k )
@
1
H N T (k ) R 1 (k )Y N (k )
[2.59]
@
[2.60]
If we recall equation [2.48]:
>H
PN (k )
N
T
(k ) R 1 (k ) H N (k )
1
the recursive estimator will be defined by the following set of three equations:
>
PN (k ) H p (k 1) O H Tp (k 1) PN (k ) H p (k 1)
K N (k 1)
>
Tˆ N 1 Tˆ N K N (k 1) y (k 1) H Tp (k 1)Tˆ N PN 1 (k 1)
>I K
N
(k 1) H Tp (k 1)
@
1
@
[2.61]
@ P O(k ) N
2.2.6. Observations on some variants of the least squares method
The least squares method is widely used to obtain model parameters of various types of signals. In the field of speech processing, it has been adapted under various forms: covariance methods, autocorrelation, partial correlation (“parcor”), etc. [3]. The analysis below, first undertaken by Atal and Hanauer [4], aims at finding the coefficients ^ ai `i 1, ..., p that minimize the sum of the squares of the prediction errors. J
¦e k
¦ k
2
(k )
¦ k
p · § ¨ y (k ) a i y (k i) ¸ ¸ ¨ i 1 ¹ ©
¦
p ª 2 « y (k ) a i y (k ) y (k i) i 1 ¬«
¦
p
2
p
º a i a j y (k i) y (k j )» 1 ¼»
¦¦ i 1 j
[2.62]
64
Modeling, Estimation and Optimal Filtering in Signal Processing
Minimizing criterion J with respect to the coefficients a j j 1,..., p gives: wJ wa j
p § · 2¦ y (k j )¨ y (k ) ¦ a i y (k i ) ¸ ¨ ¸ k i 1 © ¹
0 j 1,..., p
Thus: p
¦ a i ¦ y (k j ) y (k i) i 1
k
¦ y (k ) y (k j ) j 1,..., p
[2.63]
k
Defining the autocorrelation function ryy of the signal y (k ) by: r yy (i, j )
¦ y (k i) y (k j )
[2.64]
¦ y (k ) y (k j ) ,
[2.65]
k
and: r yy (0, j )
k
we obtain: p
¦ a i ryy ( j, i)
r yy (0, j ) j 1, " , p
[2.66]
i 1
Varying j from 1 to p, equation [2.66] can be expressed in matrix form as follows: ª r yy (1,1) " r yy (1, p ) º ª a1 º « »« » % # « # »« # » «r yy ( p,1) " r yy ( p, p )» «a p » ¬ ¼¬ ¼
ª r yy (0,1) º « » « # » «r yy (0, p)» ¬ ¼
[2.67]
and condensed to: R yy ( p )T
R yy ( p )
[2.68]
Least Squares Estimation of Parameters of Linear Models
65
with:
T
>a1
" ap
@T
[2.69]
Ryy (p) denotes the pup autocorrelation matrix of process y:
R yy ( p )
ª r yy (1,1) " r yy (1, p) º « » % # « # » «r yy ( p,1) " r yy ( p, p)» ¬ ¼
[2.70]
and RyyT (p) is the pu1 autocorrelation column vector of process y: R yy T ( p)
>ryy (0,1)
" r yy (0, p)
@
[2.71]
Considering the values of the prediciton parameters, and incorporating equations [2.62] and [2.63], the minimum quadratic error E p can be expressed by: p
Ep
¦ y 2 (k ) ¦ a i ¦ y (k ) y (k i ) k
i 1
[2.72]
k
In the following section, we will describe the bounds of summation imposed on criterion J. This description leads us to two classical methods, namely the autocorrelation method and the covariance method. 2.2.6.1. The autocorrelation method The introduction of the autocorrelation method is generally attributed to Markel and Gray [21]. This method consists of taking the summation limits in equations [2.63]-[2.72] to be f and f . Equations [2.64] and [2.72] are thus modified to: r yy (i, j )
f
¦ y (k i) y (k j )
[2.73]
k f
p
Ep
r yy (0,0) ¦ a i r yy (0, i ) i 1
[2.74]
66
Modeling, Estimation and Optimal Filtering in Signal Processing
Equation [2.73] defines the autocorrelation function of the real signal y (k ) . Moreover, if we assume that the signal is stationary, the elements of the pup matrix R yy ( p ) and those of the vector R yy T ( p ) are such that: ryy i, j
ryy i j
[2.75]
The autocorrelation function also verifies the property of symmetry: ryy ( j )
ryy ( j )
[2.76]
Rewriting the matrix of equation [2.67] gives:
R yy ( p )
ryy (1) ª ryy (0) « r (1) r yy (0) « yy « # % « « ryy ( p 2) « ryy ( p 1) ryy ( p 2) ¬
" ryy ( p 2) % %
%
% "
% ryy (1)
ryy ( p 1) º ryy ( p 2)»» » # » ryy (1) » ryy (0) »¼
[2.77]
This symmetric matrix has a peculiar structure, called the Toeplitz structure, in which all diagonal elements are equal. This property can be used to develop several algorithms for the resolution of equation [2.68]. When applying this method to the estimation of the parameters of the signal model, we always consider a finite number N of samples of the signal y (k ) . Thus: y (k )
0 for k 0 and k ! N 1
This changes equation [2.64] to: r yy i, j r yy i j
N 1 i j
¦ yk yk i j
[2.78]
k 0
If the mean of the driving process u is assumed to be zero, the prediction error is as follows: p
Ep
V u2 ( p ) r yy (0) ¦ a i r yy (i ) i 1
[2.79]
Least Squares Estimation of Parameters of Linear Models
67
where V u2 ( p ) denotes the variance of the white sequence u related to the pth-order AR process. Taking into account equations [2.67], [2.79] and [2.77], we obtain the following matricial relation: r yy (1) ª r yy (0) « r (1) r yy (0) « yy « # # « «¬r yy ( p) r yy ( p 1)
" r yy ( p) º ª 1 º " r yy ( p 1)»» «« a1 »» »« # » % # »« » " r yy (0) »¼ «¬a p »¼
ªV u2 ( p)º « » « 0 » « # » « » «¬ 0 »¼
[2.80]
For a p th-order AR model, the resolution of a system of p+1 linear equations, called the “normal” or “Yule-Walker” equations, will provide the estimation of the a i parameters. The word “normal” arises from the equation we previously established j 1,..., p : wJ wa j
p § · 2¦ y ( k j )¨ y ( k ) ¦ ai y ( k i ) ¸ ¨ ¸ k i 1 © ¹
2¦ y ( k j )ek 0 k
which implies that y(k-j) and e(k) are orthogonal. The resolution of normal equations is the subject of a vast amount of published work. An exhaustive review of this literature would be difficult so we cite the following notable contributions: Robinson [28], Kailath [16], Morf [24], Carayannis [5] and Leroux-Gueguen [18]. The first algorithm derived for a recursive solution on the order of these equations is called the Levinson algorithm [19]. It was subsequently improved by Durbin [10]. 2.2.6.2. Levinson’s algorithm We established in section 2.2.6.1 that the estimation of the parameters of an AR model using the least squares criterion leads to the resolution of the Normal or Yule-Walker equations. This resolution is based on the inversion of the R yy ( p 1) matrix. The Toeplitz structure of this matrix allows us to obtain a simplified procedure. The “traditional” resolution of the system in [2.68], i.e., where a NuN matrix is considered, entails a computational cost of the order N 3 . Levinson’s
68
Modeling, Estimation and Optimal Filtering in Signal Processing
algorithm, which makes it possible to reduce this complexity to N 2 , is based on a recursive approach on the model order. Starting from the solution for an order p, the solution for order p+1 can be deduced. This gives rise to special parameters, called the reflection coefficients, which can be interpreted in many different ways. There exists an alternative to Levinson’s algorithm [25]. At the beginning of the 20 th century, Schur introduced entities called the partial correlations useful for the search of polynomial roots. By the end of the 1940s, Levinson spoke of a set of reflection coefficients for recursive solutions to systems of linear equations. Today, these partial correlations and reflection coefficients are widely used in signal processing2, in applications such as analysis and modeling of signals, synthesis and speech coding, to high resolution spectral analysis. Let us derive Levinson’s algorithm in two steps: first of all, we recall the expressions for the linear predictor and for the AR model; thus, we can concentrate on some results concerning the prediction error. Equation [2.80] can be written as below: r yy (1) ª r yy (0) « r (1) r yy (0) « yy « # # « ( ) ( r p r yy p 1) ¬« yy
^ `
Here, a ip
i 1,..., p
r yy ( p) º ª 1 º " r yy ( p 1)»» ««a1p »» »« # » % # »« » r yy (0) ¼» ¬«a pp ¼» " "
ªV u2 ( p )º « » « 0 » « # » « » ¬« 0 ¼»
[2.81]
corresponds to the AR parameters of a p th-order autoregressive
model. The ith parameter ai in the case of order p is different from the ith parameter
in the case of order p+1. Thus, we will denote them aip and aip 1 respectively.
We will increase the dimension of the autocorrelation matrix by adding one row and one column, changing equation [2.81] as follows:
2 In Appendix E, we present the Schur-Cohn algorithm as well as the equivalence that can be
established between the Schur coefficients and Levinson’s reflection coefficients.
Least Squares Estimation of Parameters of Linear Models
ryy (1) ª ryy (0) « r (1) ryy (0) « yy « # # « « ryy ( p) ryy ( p 1) «ryy ( p 1) ryy ( p) ¬
" " % " "
ryy ( p 1)º ª 1 º ryy ( p 1) ryy ( p) »» «a1p » « » »« # » # # »« » ryy (0) ryy (1) » «a pp » ryy (1) ryy (0) »¼ «¬ 0 »¼ ryy ( p)
ªV u2 ( p)º « » « 0 » « # » « » « 0 » « D » ¬ p ¼
69
[2.82]
and, including the notation introduced in equation [2.77]: ª1º «a p » « 1» R yy p 2 « # » « p» «a p » «¬ 0 »¼
ªV u2 ( p )º » « « 0 » « # » » « « 0 » « D » ¬ p ¼
[2.83]
For equation [2.83] to be compatible with equation [2.81], the term D p must satisfy the following condition:
Dp
p
ryy ( p 1) ¦ aip ryy ( p i 1)
[2.84]
i 1
Since the Toeplitz matrix R yy ( p 2) has a special symmetry, equation [2.83] can be rewritten as follows: ª r yy ( 0 ) « r (1 ) yy « « # « r ( p) yy « « r yy ( p 1 ) ¬
r yy (1 ) r yy ( 0 )
" "
r yy ( p ) r yy ( p 1 )
# r yy ( p 1 ) r yy ( p )
% " "
# r yy ( 0 ) r yy (1 )
r yy ( p 1 ) º ª 0 º r yy ( p ) » « a pp » »« » »« # » # » r yy (1 ) » «« a 1p »» r yy ( 0 ) »¼ «¬ 1 »¼
ª Dp º « 0 » » « » « # » « 0 » « 2 «¬ V u ( p ) »¼
[2.85]
Multiplying equation [2.85] by U p 1 which will be defined later and adding the result to relation [2.83], we obtain:
70
Modeling, Estimation and Optimal Filtering in Signal Processing
ryy (1) ª ryy (0) « r (1) ryy (0) « yy « # # « « ryy ( p) ryy ( p 1) «ryy ( p 1) ryy ( p) ¬
" ryy ( p) ryy ( p 1)º ª§ 1 · § 0 ·º ¨ p ¸» «¨ p ¸ » " ryy ( p 1) ryy ( p) » «¨ a1 ¸ ¨ a p ¸» ¸ ¨ » « # U p1¨ # ¸» % # # ¨ ¸» » «¨ ¸ ryy (1) » «¨ a pp ¸ " ryy (0) ¨ a1p ¸» ¨ ¸» ¸ ¨ ryy (0) »¼ «¬© 0 ¹ " ryy (1) © 1 ¹¼
ªVu2 ( p) U p1D p º » « 0 » « » « # » « 0 » « «D U V 2 ( p)» p 1 u ¼ ¬ p
[2.86]
By choosing U p 1 such that:
D p U p 1V u2 ( p ) 0
[2.87]
equation [2.86] is modified to: ryy (1) ª ryy (0) « r (1) ryy (0) « yy « # # « 1) r p r p ( ) ( yy « yy «ryy ( p 1) ryy ( p) ¬
" " % " "
1 ryy ( p 1)º ª º » « p p» ryy ( p 1) ryy ( p) » «a1 U p1ap » »« » # # # »« p p» ryy (0) ryy (1) » «ap U p1a1 » ryy (1) ryy (0) »¼ «¬ U p1 »¼ ryy ( p)
ªVu2 ( p) U p1D p º « » 0 « » « » # « » 0 « » « » 0 ¬ ¼
[2.88]
Rewriting the Yule-Walker relation [2.81] for order p+1, we obtain: ryy (1) ª ryy (0) « r (1) ryy (0) « yy « # # « r p r ( 1 ) yy ( p ) ¬« yy
" ryy ( p 1)º ª 1 º « p 1 » " ryy ( p ) » «a1 » » »« # » % # » « p 1 » ryy (0) ¼» «¬a p 1 »¼ "
ªV u2 ( p 1)º » « 0 » « » « # » « 0 »¼ «¬
[2.89]
Least Squares Estimation of Parameters of Linear Models
71
Thus, comparing equation [2.88] with [2.89] leads to the following recursive relations: ªa1p 1 º « » « # » «a p 1 » « pp 1 » «¬a p 1 »¼
ªa p U p a pp º « 1 » # « » «a p U a p » p 1 « p » «¬ U p 1 »¼
ªa p º ªa pp º « 1 » « » « # »U « # » p 1 « p » «a p » a « p» « 1 » 0 ¬« 1 ¼» ¬« ¼»
[2.90]
or: a ip 1
a ip U p 1 a ppi 1 1 d i d p
[2.91]
a pp 11
U p 1 .
[2.92]
and:
Moreover, we can deduce the relation between the variances of errors when dealing with model orders p and p+1:
V u2 ( p 1) V u2 ( p) U p 1D p
[2.93]
Equation [2.87] becomes:
V u2 p 1 (1 U 2p 1 )V u2 p
[2.94]
U p 1 is called the reflection coefficient. Since the terms V u2 ( p) and V u2 ( p 1) are necessarily positive, equation [2.94] dictates that U p 1 1 . As opposed to the AR models’ transversal parameters
^ai `i
0, ..., p ,
the reflection coefficients have
physical meaning. For example, for a speech signal, they correspond to the reflection coefficients of the propagation waves along the vocal tract. These reflections are a result of the changes in the impedance due to the transitions in the vocal tract. In seismology, when modeling the Earth’s strata, these coefficients can be understood as the reflection of acoustic waves during a change in the nature of the different subterranean layers.
72
Modeling, Estimation and Optimal Filtering in Signal Processing
2.2.6.3. The Durbin-Levinson algorithm Like the Levinson algorithm, the Durbin-Levinson algorithm is a recursive algorithm on the order of an AR process. Recalling the Yule-Walker equations for a pth-order AR process:
R yy ( p )T ( p )
r yy (1) ª r yy (0) « r (1) r yy (0) « yy « # # « ¬«r yy ( p 1) r yy ( p 2) ª r yy (1) º « r (2) » yy » « « # » « » ¬«r yy ( p)¼»
The
T ( p)
#
>
above
a pp
a pp1
R yy ( p )T ( p ) #
relation "
a1p
" r yy ( p 1) º ªa1p º « » " r yy ( p 2)»» «a 2p » »« # » % # »« » r yy (0) ¼» «a pp » " ¬ ¼
[2.68]
R yy ( p)
can
be
expressed
using
the
vector
@ which stores the AR parameters in reverse order:
r yy (1) ª r yy (0) « r (1) r yy (0) « yy « # # « «¬r yy ( p 1) r yy ( p 2) ª r yy ( p ) º «r ( p 1)» yy » « « » # « » «¬ r yy (1) »¼
" r yy ( p 1) º ª a pp º » « " r yy ( p 2)»» «a pp1 » »« # » % # » »« r yy (0) »¼ « a1p » " ¼ ¬
[2.95]
R #yy ( p)
Rewriting equation [2.68] for a p+1th-order the AR process yields: R yy ( p 1)T ( p 1)
R yy ( p 1)
[2.96]
If we partition the R yy ( p 1) matrix by the pup autocorrelation matrix R yy ( p ) for process y, and if we write R yy ( p 1) as a function of the pu1 column autocorrelation vector R yy ( p ) of process y, we obtain:
Least Squares Estimation of Parameters of Linear Models
ª « « « « ¬«
º ªa1p 1 º » « « # » » «a p 1 » »« p » r yy (0) ¼» «a pp11 » ¼ ¬
R #yy ( p )»»
R yy ( p )
R
# yy ( p )
T
º ª « R ( p) » » « yy » « «r ( p 1)» yy ¼ ¬
73
[2.97]
Equating the different blocks of system [2.97]: ªa1p 1 º ° « » # p 1 ° R yy p « # » R yy p a p 1 R yy ( p ) ° «a p 1 » ° ¬ p ¼ ® ªa1p 1 º ° T« » # p 1 ° R p « # » r yy 0 a p 1 r yy p 1 ° yy «a p 1 » ° ¬ p ¼ ¯
[2.98]
1 Multiplying the first element of system [2.98] by R yy p , the first p coefficients
of the p+1th-order AR process can be written as: ªa1p 1 º « » « # » «a p 1 » ¬ p ¼
1 R yy p R yy ( p) R yy1 p R #yy p a pp11
[2.99]
Given equations [2.68] and [2.95], the above equation can be simplified as follows: ªa1p 1 º « » « # » «a p 1 » ¬ p ¼
ªa1p º ªa pp º « » « » p 1 « # » « # » a p 1 « a p » «a p » ¬ p¼ ¬ 1 ¼
T ( p ) T # ( p)a pp11
[2.100]
Introducing equation [2.100] into the second part of system [2.98], we obtain:
R
>T ( p) T
T # yy ( p )
#
@
( p )a pp11 r yy (0) a pp11
r yy ( p 1)
[2.101]
74
Modeling, Estimation and Optimal Filtering in Signal Processing
or equivalently:
ªr (0) R # ( p) T T # ( p)º a p 1 yy »¼ p 1 «¬ yy
T ª«ryy ( p 1) R #yy ( p) T ( p)º» ¼ ¬
[2.102]
Using the following two equalities:
R
T # # yy ( p ) T ( p )
R Tyy ( p)T ( p )
R
T # yy ( p ) T ( p )
R Tyy ( p)T # ( p)
p
¦ aip ryy (i)
[2.103]
i 1
and: p
¦ aip ryy ( p i 1) ,
[2.104]
i 1
we obtain the following expression for a pp11 :
a pp11
º ª «ryy ( p 1) R Tyy ( p )T # ( p )» ¼ ¬ ryy (0) R Tyy ( p )T ( p )
Ep V u2 ( p )
[2.105]
Observing equations [2.80] and [2.105], we notice that the denominator of the latter corresponds to the variance of the driving process in the case of a pth-order regression model. We now have two recursive relations, i.e., equations [2.100] and [2.105], to perform the identification. We can further simplify this algorithm thanks to a recurrence on the variance V u2 ( p) of the driving process:
V u2 ( p 1) r yy (0) R Tyy ( p 1)T ( p 1)
>
@
T
r yy (0) R Tyy ( p ) r yy ( p 1) T ( p 1)
[2.106]
ªa1p 1 º » « r yy (0) R Tyy ( p ) « # » r yy ( p 1)a pp11 «a p 1 » ¬ p ¼
Substituting the various terms of equations [2.79], [2.92], [2.100] and [2.105], we obtain a modified form of equation [2.106]:
Least Squares Estimation of Parameters of Linear Models
>
@
75
V u2 ( p 1) r yy (0) R Tyy ( p) T ( p) T # ( p)a pp11 r yy ( p 1)a pp11
>
@
r yy (0) R Tyy ( p)T ( p) r yy ( p 1) R Tyy ( p)T # ( p) a pp11
V u2 ( p) a pp11 E p
@V ( p)
2
V u2 ( p) a pp11 V u2 ( p )
>1 U
2 p 1
[2.107]
ª1 a p 1 2 ºV 2 ( p ) p 1 » u «¬ ¼
2 u
The above equation is the same as relation [2.94] obtained for Levinson’s algorithm. This new algorithm is initialized by: a11
r yy (1)
[2.108]
r yy (0)
The Levinson and Durbin-Levinson algorithms introduce the reflection coefficients. We will see in the following section that they define a structure called “lattice” filters. 2.2.6.4. Lattice filters Let us recall that equation [2.5], given below, presents the AR process as a linear forward prediction of y (k ) , based on its last p values with an error term u f , p ( k ) : p
y (k )
¦ a ip y ( k i ) u f , p ( k )
yˆ f , p ( k ) u f , p ( k )
i 1
T
H p (k ) T ( p ) u
f ,p
[2.5]
(k )
Let us consider a backward linear prediction of y (k p 1) . This is a prediction based on future samples y (k p) , …, y (k 1) : y (k p 1)
p
¦ aip y(k p 1 i) u b, p (k ) i 1
yˆ b, p (k p 1) u b, p (k ) H p (k ) T T # ( p) u b, p (k )
[2.109]
76
Modeling, Estimation and Optimal Filtering in Signal Processing
The term yˆ f , p (k ) of equation [2.5] is expressed as:
yˆ f , p (k )
ª a1p º » « H p 1 (k ) T « # » a pp y (k p ) «a p » ¬ p 1 ¼
[2.110]
If we insert equation [2.100] adjusted to order p-1 and the reflection coefficient U p of equation [2.92] and [2.109], we obtain: yˆ f , p (k )
H
p 1 ( k )
T
H
p 1 ( k )
T
>T ( p 1) T
#
@
( p 1)a pp a pp y (k p)
>
T ( p 1) a pp H p 1 (k ) T T # ( p 1) y (k p )
>
yˆ f , p 1 (k ) U p yˆ b, p 1 (k p) y (k p )
@
@
[2.111]
This leads to the first recurrence between the error terms u f , p (k ) , u f , p 1 (k ) and u b, p 1 (k ) : u f , p (k )
y (k ) yˆ f , p (k )
u f , p 1 (k ) U p u b, p 1 (k p )
[2.112]
Similarly, we can determine the second recurrence by writing yˆ b, p (k p 1) as: H p (k ) T T # ( p)
yˆ b, p (k p 1)
ªa pp1 º » « y (k 1)a pp H p 1 (k 1) T « # » « ap » ¬ 1 ¼
[2.113]
Substituting equation [2.100], we obtain: yˆ b, p (k p 1) # y (k 1)a pp H p 1 (k 1)T ª¬T ( p 1) T ( p 1)a pp º¼
[2.114]
H p 1 (k 1)T T ( p 1) a pp ª¬ H p 1 (k 1)T T ( p 1) y (k 1) º¼ #
By adjusting equation [2.114] at instant k p and using equations [2.5] and [2.109], we obtain:
Least Squares Estimation of Parameters of Linear Models
yˆ b, p (k p )
H
p 1 ( k )
T
>
T # ( p 1) a pp H p 1 (k ) T T ( p 1) y (k )
>
yˆ b, p 1 (k p) U p yˆ f , p 1 (k ) y (k )
@
@
77
[2.115]
The second recurrence between error terms u f , p 1 (k ) , u b, p (k p) and u b, p 1 (k p ) is thus:
u b, p (k p )
u b, p 1 (k p) U p u f , p 1 (k )
[2.116]
Finally, the recursive equations [2.112] and [2.116] can be represented as a lattice structure (see Figure 2.2).
+
+ Figure 2.2. The basic cell of a lattice structure
2.2.6.5. The covariance method Developed by Atal and Hanauer [4], the covariance method finds an equivalent in the context of identification of systems, namely the Mehra covariance algorithm [22]. This parallel will be exploited in Chapter 6 for the development of new approaches for signal enhancement. Let us consider the case where the summation values on criterion [2.62] lie within the interval p d k d N 1 . Equation [2.64] is modified to: r yy (i, j )
N 1
¦ y (k i ) y (k j )
k p
[2.117]
78
Modeling, Estimation and Optimal Filtering in Signal Processing
and we still have: r yy (i, j )
[2.118]
r yy ( j , i )
However, the condition of equality of r yy (i 1, j 1) and r yy ( j , i ) is no longer respected. We have: ryy (i 1, j 1) N 1
¦ y(k (i 1))y(k ( j 1))
k p
y( p i 1) y( p j 1)
N 1
¦ y(k 1 i) y(k 1 j)
[2.119]
k p 1
l k 1
y( p i 1) y( p j 1)
N 1
¦ y(l i)x(l j) y(N 1 i)x(N 1 j)
l p
y( p i 1) y( p j 1) ryy (i, j) y( N i 1) y( N j 1)
This relationship signifies that the diagonal terms are no longer equal. Thus, R yy ( p ) is still a symmetric matrix, but is no longer Toeplitz. 2.2.6.6. Relation between the covariance method and the least squares method While comparing the two methods, we will see that inspite of the different notations, the approaches are equivalent. Recall that the observation or measurement is written as follows: y (k )
H Tp (k )T u (k ) , k
[2.5]
with: H p (k )
> y(k 1)
" y (k p )@T
[2.2]
and:
T
> a1
" ap
@T .
[2.3]
If we have N available measurements, y (0),..., y ( N 1) , we obtain: Y N ( N 1)
H N ( N 1)T U N ( N 1)
[2.120]
Least Squares Estimation of Parameters of Linear Models
79
with: U N ( N 1)
>u ( N 1)
" u ( 0) @ T
[2.121]
In equation [2.12], we derived an expression for the least squares estimation. The estimation for a noise-free case is given by:
Tˆ N
>H
T N
( N 1) H N ( N 1)
@
1
H NT ( N 1)Y N ( N 1)
[2.122]
thus:
>H
N
@
( N 1) H NT ( N 1) Tˆ N
H NT ( N 1)Y N ( N 1)
[2.123]
The criterion to be minimized, denoted J N (T ) , is given by: J N (T )
>Y N ( N 1) H N ( N 1)T @ T >Y N ( N 1) H N ( N 1)T @ .
[2.124]
Replacing Y N ( N 1) by Y N p ( N 1) and H N ( N 1) by H N p ( N 1) in equation [2.124], we get the covariance method [20]: Y N p ( N 1)
>y( N 1)
" y ( p)@T
H N p ( N 1)
ª H T ( N 1)º « p » # « » « H T ( p) » p ¬ ¼
[2.125]
and: ª y ( N 2) " y ( N p 1)º » «« # % # » «¬ y ( p 1) " »¼ y (0)
[2.126]
80
Modeling, Estimation and Optimal Filtering in Signal Processing
We end up with:
J N p (T )
ª y ( N 1) H T ( N 1)T º p « » # « » « y ( p ) H T ( p )T » p ¬ ¼
¦ >y(k ) H Tp (k ) T @
N 1
T
ª y ( N 1) H T ( N 1)T º p « » # « » « y ( p) H T ( p )T » p ¬ ¼
2
[2.127]
k p
N 1 ª
p º ¦ «« y(k ) ¦ ai y(k i)»» k p¬ i 1 ¼
2
Substituting Y N p ( N 1) for Y N ( N 1) and H N p ( N 1) for H N ( N 1) in equation [2.123], and using equation [2.117], we obtain the following equality: ª r yy (1,1) " r yy (1, p ) º « »ˆ % # « # »T «r yy ( p,1) " r yy ( p, p )» ¬ ¼
ª r yy (0,1) º « » # « » «r yy (0, p )» ¬ ¼
[2.128]
or: R yy ( p, p)Tˆ
R yy ( p)
[2.129]
Taking into account the two notations, we thus demonstrate the equivalence between the covariance method and the least squares method. Now, to compare both methods, we will use the following notations: Y N p ( N p 1)
>0
" 0
y ( N 1) " y (0)@ T
[2.130]
Least Squares Estimation of Parameters of Linear Models
H
Np
( N p 1)
ª H T ( N p 1) º « p » # « » « » H Tp ( N ) « » T « H p ( N 1) » « » # « » T « » H p ( p) « » T « H p ( p 1) » « » # « » T « » H p (1) « » T H p (0) «¬ »¼ " 0 0 ª « # % # « « y ( N 1) " y ( N p 1 ) « y(N p) « y ( N 2) " « # % # « y p y " ( 1 ) ( 1) « « y ( p 2) " y (0) « # % # « « y " ( 0 ) 0 « " 0 0 ¬«
81
y ( N 1) # y(N p)
º » » » » y ( N p 1) » » # » y (0) » » 0 » # » » 0 » 0 ¼»
[2.131]
If we place the above notations in equation [2.17] for the estimator of the least squares method, with y(k) = 0 for k < 0 and k > N–1 we obtain: T
J N p (T )
ª y( N p 1) H T ( N p 1)T º p « » # « » « » y(0) H Tp (0)T ¬ ¼ N p 1
¦ >y(k ) >y(k 1)
ª y( N p 1) H T ( N p 1)T º p « » # « » « » y(0) H Tp (0)T ¬ ¼
" y(k p)@T @ 2
[2.132]
k 0
N p 1 ª
¦
k 0
º « y(k ) ¦ai y(k i)» «¬ »¼ i 1 p
2
Similarly, replacing Y N ( N 1) with Y N p ( N p 1) and H N ( N 1) with H N p ( N p 1) in equation [2.123], or by using equations [2.75] and [2.78], it
follows that:
82
Modeling, Estimation and Optimal Filtering in Signal Processing
ª ryy (0) " r yy ( p 1)º « » ˆ # % # « »T «r yy ( p 1) " » r ( 0 ) yy ¬ ¼
ª r yy (1) º « » « # » «r yy ( p)» ¬ ¼
[2.133]
or: R yy ( p, p)Tˆ
R yy ( p)
[2.134]
2.2.6.7. Effect of a white additive noise on the estimation of AR parameters In many cases, the signal, modeled by a p th order AR process y k , is disturbed by an additive measurement noise bk . The expression for the observations or measurements is thus modified to (see Figure 2.3): z k
y k bk
This is the case, for example, in the field of speech enhancement [23]. In this section, we will analyze the influence of a measurement noise bk , assumed to be white, Gaussian, with zero-mean and variance V b2 . This additive noise provides a biased estimation of the AR parameters. In the next section, 2.2.6.8, we will present different approaches to alleviating this problem, i.e. to obtain unbiased estimates.
Figure 2.3. Representation of an AR process disturbed by an additive noise
To analyze the effect of bk on the estimation of the AR parameters, Kay has proposed to make the comparison between the spectral flatnesses, [ y and [ z , of the process y k and z k respectively [17].
Least Squares Estimation of Parameters of Linear Models
83
Recall that for any given process x, this indicator of spectral flatness is defined as follows:
[x
1/ 2 exp§¨ ³ ln S xx f df ·¸ © 1/ 2 ¹ 1/ 2
³1 / 2
S xx f df
Px
R xx 0
[2.135]
with S xx Z and R xx W being, respectively, the spectral density and the autocorrelation function of x. This indicator has the following properties: – 0 d [ x d 1; –[x
1 if and only if S xx f is constant;
– [ x | 0 if and only if the profile of S xx f shows a peak. The flatness indicators [ z and [ y satisfy the following inequality relationship3:
[z ![y
[2.136]
This inequality makes it possible to show that an additive white noise tends to “flatten” the spectral density of the considered process. The least squares estimation of the AR parameters is biased if it is based directly on observations disturbed by a white additive noise. The corresponding poles of the noisy AR process tend to be located closer to the center of the unit circle in the z-plane. This is shown in Figure 2.4.
3 Kay’s demonstration of this inequality relationship will be considered in Appendix E.
84
Modeling, Estimation and Optimal Filtering in Signal Processing
Figure 2.4. Influence of an additive noise on the estimation of the parameters of a 2nd order AR process. The SNR varies from 30 to –5dB in steps of 3dB (u denote the poles)
Least Squares Estimation of Parameters of Linear Models
85
2.2.6.8. A method for alleviating the bias on the estimation of the AR parameters To reduce the bias on the AR parameter estimation, one solution is to avoid the use of rzz 0 , i.e. the autocorrelation function of the noisy AR process for lag equal to 0; we get modified or overdetermined Yule-Walker (MYW) equations4 [6], [11], [12]: rzz ( p 1) ª rzz ( p) « r ( p 1) rzz ( p) « zz « # # « ¬rzz (2 p 1) rzz (2 p 2)
" rzz (1) º ª a1 º « » " rzz ( p 1)»» « a 2 » »« # » % # »« » " rzz ( p ) ¼ ¬«a p ¼»
ª rzz ( p 1) º « r ( p 2) » » « zz » « # » « ¬ rzz (2 p ) ¼
[2.137]
As an alternative, we can use the expanded and so-called overdetermined YuleWalker equations, with q>p: rzz ( p 1) ª rzz ( p ) « r ( p 1) rzz ( p) « zz « # # « ¬rzz ( p q 1) rzz ( p q 2)
" rzz (1) º ª a1 º « » " rzz ( p 1)»» « a 2 » »« # » % # »« » " rzz ( p) ¼ «¬a p »¼
ª rzz ( p 1) º « r ( p 2) » » « zz « » # « » ¬rzz ( p q )¼
[2.138]
This new approach entirely foregoes the use of rzz 0 , and is instead based on a system of q equations with p(< q) unknowns. Yet another alternative approach consists of the introduction of “noisecompensated” Yule-Walker equations which, however, require an estimation of the additive noise’s variance V b2 , labeled Vˆ b2 [17]: ° ° ® ° ° ¯
rzz (1) ª rzz (0) « r (1) rzz (0) « zz « # # « ¬rzz ( p 1) rzz ( p 2)
" rzz (1 p) º " rzz (2 p)»» Vˆ b2 I p » % # » " rzz (0) ¼
½ª a1 º °« » °« a 2 » ¾« » °« # » °«a p » ¿¬ ¼
ª rzz (1) º « r (2) » « zz » « # » « » ¬rzz (q )¼
[2.139]
4 It should be noted that the MYW equations are used in the ivar function of Matlab’s Identification toolbox. In Chapter 7, we will elaborate upon Friedlander’s interpretation of the MYW equations as an instrumental variable technique [11].
86
Modeling, Estimation and Optimal Filtering in Signal Processing
ª a1 º «a » >rzz ( p i 1) rzz ( p i 2) " rzz (i)@ «« #2 »» rzz ( p i) « » «¬a p »¼
[2.140]
with i=1,…, q Given equation [2.139] the AR parameters can be estimated as follows: ( R zz p V b2 I p ) 1 R zz p
Tˆ LSC
[2.141]
The equation above is nonlinear due to the presence of the generally-unknown V b2 . It can be solved using the following two techniques: – the first consists of the iterative and alternative estimation of parameters T and variance V b2 . Examples of this technique are found in the approach proposed by Zheng et al. [30] and Hasan et al. [14]; – the second technique consists of considering equation [2.141] as a generalized eigenvalue decomposition issue [8]. Let us look at these two algorithms in more detail. The estimation of T (or V b2 ) using equation [2.141] and conditional to V b2 (or T ) is a linear task. Zheng et al. [30] therefore present an iterative approach which alternatively estimates T and V b2 . For this purpose, they introduce an extended vector of the AR parameters:
Te
>T
T
@
T
0 .
[2.142]
The corresponding least squares estimation bias of T e satisfies: e E ® Tˆ LS ½¾ T e ¯ ¿
e V b2 R zz
1
Te
[2.143]
e where R zz denotes the (p+1)u(p+1) autocorrelation matrix of the observations.
The least squares estimation of the parameters is thus corrected as follows:
Te
e
Tˆ LS V b2 R zze
1
Te
[2.144]
Least Squares Estimation of Parameters of Linear Models
>0
Multiplying both sides of equation [2.144] by P
P T eLS
P T e P V b2 R zze
1
Te
e P V b2 R zz
1
87
" 0 1@ gives:
Te.
[2.145]
From equation [2.145], it follows that the variance V b2 can be expressed as follows:
V b2
e P Tˆ LS
P R zze
1
[2.146]
Te
Zheng et al. use equalities [2.144] and [2.146] in an iterative manner, to estimate the AR parameters and the variance of the additive noise. This is illustrated in Figure 2.5 below. The iterative process stops when the estimations at iterations i 1 i i 1 and i, denoted, respectively, Tˆ and Tˆ , are close to one another and satisfy: i1
Tˆ Tˆ i Tˆ
where
i 1
G
[2.147]
stands for the standard L2 and G ranges from 10-6 to 10-3 depending on
.
the desired precision. It should be noted that faster versions of this algorithm are presented in [29] [31].
¨
¸ ¨
¨
¸
¸
¨
¸
Figure 2.5. Block-level schematic of the Zheng method [30]
88
Modeling, Estimation and Optimal Filtering in Signal Processing
Hassan et al. present another method, initially developed for the analysis of vectorial AR processes [14]. We will use this method for a scalar AR process. In [14], the authors derive a second relationship between T and V b2 , using the output p
t k of the inverse filter Az
¦ ai z 1
whose input is z k (see Figure 2.6). The
i 0
autocorrelation function rtt W of t k thus satisfies the following equation: rtt 1 E ^ t k t k 1 ` V b2
p
¦ a i ai 1
[2.148]
i 1
Figure 2.6. Block-level scheme used in Hasan’s method [14]
To estimate T and V b2 , the authors use an iterative method which operates as follows: – rˆtt 1 i is first determined by filtering z k with the inverse filter, defined by i
Tˆ ; – the nonlinear equations [2.141]-[2.148] of the unknown V b2 are resolved using
the Newton-Raphson procedure, and the value of variance Vˆ b2
i 1
is updated;
i 1 – the corrected estimation Tˆ of the AR parameters is deduced from [2.141].
Davila et al. see the solution of the noise-compensated Yule-Walker equations [2.141] as a generalized eigenvalue decomposition issue [8]. In fact, the authors start with an overdetermined system by adding q equations to the system in [2.141], leading to: R zz T
LSC
V b2 B T
LSC
,
[2.149]
Least Squares Estimation of Parameters of Linear Models
89
with: rzz (0) rzz (1) ª rzz (1) « r (2) ( 1 ) r ry (0) zz « zz « # # # « rzz ( p 1) rzz ( p 2) « rzz ( p) « rzz ( p 1) rzz ( p ) rzz ( p 1) « # # # « «r ( p q ) r ( p q 1) r ( p q 2) zz zz ¬ zz
Rzz
B
ª0 1 0 «# 0 1 « «# # % « «0 0 " «0 " " « «# «0 " " ¬
" rzz ( p 1) º " rzz ( p 2)»» » % # » rzz (0) » , " rzz (1) » " » % # » rzz (q) »¼ "
" 0º % # »» % 0» » 0 1» " 0» » #» " 0»¼
[2.150]
[2.151]
and:
T
LSC
[1 T TLSC ]T
[2.152]
Checking equation [2.149], the variance V b2 can be seen as a generalized eigenvalue of the matrices R zz and B. The eigen-subspace corresponding to the vector of the AR parameters is found by solving the following equivalent quadratic equation:
§ A V 2 A V 2 ¨ 0 b 1 b ©
with A 0
T R zz R zz , A 1
2
A 2 ·¸ T ¹
LSC
[2.153]
0 p 1
R zzT B B T R zz and A 2
BT B .
The eigenvalues of equation [2.153] are thus complex conjugates. In a noiseless environment, the eigenvalue is theoretically zero because the variance of the additive noise is zero. For a noisy case, the noise variance is added to all the eigenvalues of R zz and the solution of the equation corresponds to the only real value of V b2 . To solve equation [2.153], we need to solve the following generalized eigenvalue decomposition problem:
90
Modeling, Estimation and Optimal Filtering in Signal Processing
P v V b2 Qv
where v
ª T LSC º « 2 », P ¬«V b T LSC ¼»
[2.154] ª A0 «0 ¬ p 1
0 p 1 º and Q I p 1 »¼
ª A1 «I ¬ p 1
A2 º . 0 p 1 »¼
In practice, however, the estimation of the autocorrelation function of y k contains computation errors. Consequently, the eigenvalue we obtain is not always real. Nevertheless, as its imaginary component is much smaller than the imaginary components of the other eigenvalues of R zz , Davila et al. choose the subspace associated with the eigenvalue of the smallest module. Approaches other than those based on noise-compensated Yule-Walker equations have been proposed, the most notable being that of Deriche et al. [9]. In this approach, the AR parameters are estimated using an iterative expectationmaximization (EM) type algorithm. Finally, Hasan et al. model the autocorrelation function of noisy observations using a sum of exponentially-decreasing sinusoidal components (EDS) [15]. The EDS parameters are then used to estimate the AR parameters. Recently, we have proposed a new way to estimate the AR parameters, the variances of the additive noise and the driving process by using the errors-invariables approach. This method was derived in the field of speech enhancement5 and Rayleigh fading channel estimation6. In the following, we will present a comparative study of these methods, based on N observations, using two different tests.
Test 1: the synthesized AR process is characterized by the following six poles in the z-plane: 0.7 expr j 0.2S , 0.8 expr j 0.4S and 0.85 expr j 0.7S . This process is then disturbed by a zero-mean white Gaussian noise such that the signal-to-noise ratio is 10 dB. Five different methods are closely examined: the Yule-Walker equations, the modified Yule-Walker equations, and the three biascorrection algorithms proposed by Zheng [30], Davila [8] and Hasan [14]. As 5 Speech Enhancement Combining Optimal Smoothing and Errors-In-Variables Identification
of Noisy AR Processes W. Bobillet, R. Diversi, E. Grivel, R. Guidorzi, M. Najim and U Soverini, IEEE Trans. on Signal Processing, Dec. 2007, vol. 55, n°.12, pp. 5564-5578. 6 “Errors-In-Variables Based Approach for the Identification of AR Time-Varying Fading Channels”, A. Jamoos, E. Grivel, W. Bobillet and R. Guidorzi, IEEE Signal Processing Letters, Nov. 2007, vol.14 , no.11, pp. 793-796.
Least Squares Estimation of Parameters of Linear Models
91
Figure 2.7 shows, when the number of available samples is high, several thousand for instance, all five methods give accurate estimations of the AR parameters. Test 2: to test the above algorithms in more realistic conditions, we study their behavior when the number of observations is limited. Let us take 300 samples of an AR process characterized by the following six poles: p1,2
0.98 exp r j 0.1S p3,4
,
0.97 exp r j 0.3S and p5,6
0.8 exp r j 0.84S .
As in Test 1, the signal is disturbed by a white, zero-mean Gaussian noise such that the SNR is 10 dB. As Figure 2.8 shows, Zhang and Davila’s methods give AR parameters which can render the system unstable, since the corresponding poles are outside the unit circle in the z-plane.
Expected poles
Obtained spectrum Expected spectrum
Obtained poles
(b)
(c)
92
Modeling, Estimation and Optimal Filtering in Signal Processing
(d)
(e) Figure 2.7. Test 1: high number of observations. Location of the poles in the z-plane and the associated AR spectra for different off-line and recursive approaches: Yule-Walker (a), MYW (b), Zheng (c), Davila (d), Hasan (e)
(a)
(b)
Least Squares Estimation of Parameters of Linear Models
93
(c)
(d)
(e) Figure 2.8. Test 2: limited number of observations. Location of the poles in the z-plane and the associated AR spectra for different off-line and recursive approaches. Noiseless Levinson Yule-Walker (a), MYW (b), Zheng (c), Davila (d), Hasan (e)
2.2.7. Generalized least squares method
As we saw in section 2.2.1, if the noise is correlated to the observation, the estimation of the parameters is biased. Even if the noise is an uncorrelated white noise, the estimation is generally biased because the ARMA structure modifies this measurement noise to a correlated sequence [7]. Taking up the basic noiseless ARMA model described by the recursive equation, where, for the sake of simplicity, we consider orders p and q to be equal: y (k )
p
¦ i 1
a i y (k i )
p
¦ bi u ( k i ) i 0
[2.155]
94
Modeling, Estimation and Optimal Filtering in Signal Processing
First, let us take the case, shown in Figure 2.9, in which the observations z (k ) are disturbed by an additive noise b(k ) : z (k )
y ( k ) b( k )
[2.156]
Figure 2.9. Output-referred noise
As in section 2.2.6.7, by replacing y (k ) by z (k ) b(k ) in equation [2.155], we obtain: z ( k ) b( k )
p
¦
a i >z (k i ) b(k i )@
i 1
p
¦ bi u(k i)
[2.157]
i 0
Equivalently: z (k )
p
p
i 1
i 0
¦ ai z (k i) ¦ bi u(k i) E (k )
[2.158]
where E (k ) is a correlated process such that:
E (k )
p
¦ a i b( k i )
with a 0
1
[2.159]
i 0
Let us consider a second case, shown in Figure 2.10, in which an additive white noise w(k ) acts on the input u (k ) .
Figure 2.10. Input-referred noise
Least Squares Estimation of Parameters of Linear Models
95
Adding this noise to the process u (k ) of equation [2.155], we obtain: y (k )
p
p
i 1
i 0
¦ a i y(k i) ¦ bi >u(k i) w(k i)@
[2.160]
Equivalently: y (k )
p
p
i 1
i 0
¦ ai y(k i) ¦ bi u(k i) Y (k )
[2.161]
where Y (k ) is a correlated noise such that:
Y (k )
p
¦ bi w(k i)
[2.162]
i 0
Thus, irrespective of the case, adding a noise to the input or to the output of the system corresponds to an ARMA model disturbed by a correlated noise. In the rest of this analysis, we will consider the case of a colored disturbance
H(k):
y (k )
p
p
i 1
i 0
¦ ai y(k i) ¦ bi u(k i) H (k )
[2.163]
The z transform of equation [2.163] gives: A(z)Y ( z )
B( z )U ( z ) (( z )
[2.164]
where Y (z ) , U (z ) and ((z ) are, respectively, the z transforms of y (k ) , u (k ) and H (k ). In addition, A(z) and B(z ) are defined as follows: A(z) 1
p
¦ a i z i
[2.165]
i 1
p
B(z)
¦ bi z i i 0
[2.166]
96
Modeling, Estimation and Optimal Filtering in Signal Processing
Let us assume that the residue H (k ) can be modeled by an AR process: p
H ( k ) ¦ c i H ( k i ) e( k )
[2.167]
i 1
where e(k ) is a white noise whose z transform E (z ) is such that: (( z ) E( z)
1 C ( z)
1 p
¦ ci z
[2.168] i
i 0
Taking equation [2.168] into account, we can rewrite equation [2.164] as: A(z)C (z)Y ( z )
B( z )C (z)U ( z ) E ( z )
[2.169]
Equivalently: A(z)>C (z)Y ( z )@ B ( z )>C (z)U ( z )@ E ( z )
[2.170]
or: B ( z )U a ( z ) E ( z )
A(z)Ya ( z )
[2.171]
These equations are equivalent to the following temporal relations:
y a (k )
p
p
i 1
i 0
¦ a i y a (k i ) ¦ bi u a (k i ) e(k )
[2.172a]
p
y a (k )
¦ ci y ( k i )
[2.172b]
i 0 p
u a (k )
¦ c i u (k i )
[2.172c]
i 0
The input u a (k ) (or output y a (k ) ) of this model can be seen as the signal u (k ) (or y (k ) ) filtered by a filter with transfer function C (z) .
Least Squares Estimation of Parameters of Linear Models
97
Let us look at the estimation of the coefficients ci of equation [2.167]. To do so, we can apply the standard least squares method. If we use the measurements carried out from k N p 1 to k , we obtain the following matricial relationship:
H (k ) ª º « » # « » «¬H (k N p 1)»¼
H (k p ) º ª c1 º " ª H (k 1) « »« # » # % # « »« » «¬ H (k N p ) " H (k N 1)»¼ «c p » ¬ ¼ e( k ) º ª » « # « » «¬e(k N p 1)»¼
Writing:
>H (k )
H N p (k )
" H (k N p 1)@ T ,
H p T (k i)
>H (k i)
" H (k i p 1)@
H e, N p ( k )
ª H T (k 1) º p « » # « » « H T (k N p )» ¬ p ¼
and: H (k p) º " ª H (k 1) » « # % # » « «¬ H (k N p) " H (k N 1)»¼
We can estimate the coefficients c i , 1 d i d p , as follows: ª c1 º « » « # » «c p » ¬ ¼
>H
@
1 T T H e , N p ( k )H N p ( k ) e , N p ( k )H e , N p ( k )
[2.173]
However, the terms H (i ), k d i d k N 1 are unknown and have to be estimated. To do this, we use the procedure introduced in [7]. To simplify the equations, we will adopt the following notation hereafter: Y a, N (k )
> y a (k )
" y a (k N 1)@ T
98
Modeling, Estimation and Optimal Filtering in Signal Processing
> y a (k 1)
H a (k )
H a, N (k )
ª H a T (k ) º » « # » « « H T (k N 1)» ¼ ¬ a
> y (k )
Y N (k )
" y (k N 1)@ T
> y (k 1)
H (k )
" y a (k p ) u a (k ) " u a (k p )@ T
" y ( k p ) u ( k ) " u ( k p )@ T
H N (k )
ª H T (k ) º » « # » « « H T (k N 1)» ¼ ¬
E N (k )
>e(k )
" e(k N 1)@ T
and:
T
>a1
" ap
b0
" bp
@T
Equations [2.163] and [2.172a] can be written respectively as follows: H (k ) T T H (k ) and y a (k )
y (k )
H a ( k ) T T e( k )
If we handle N measurements, we can write, in matricial form: Y N (k )
H N (k )T H N (k )
[2.174]
and: Y a , N (k )
H a , N (k )T E N (k )
[2.175]
– first, we estimate vector Tˆ N using yˆ a (i ) and uˆ a (i ) for k d i d k N 1
Tˆ N
>Hˆ
a, N
T
(k ) Hˆ a , N (k )
@
1
Hˆ a , N T (k )Yˆ a , N (k )
Least Squares Estimation of Parameters of Linear Models
99
– next, we estimate the residual vector Hˆ N (k ) using equation [2.174]: Hˆ N (k ) Y N (k ) H N (k )Tˆ N
[2.176]
– then, we can construct Cˆ ( z ) starting from Hˆ N (k ) as follows: ª cˆ1 º « » «# » «cˆ p » ¬ ¼
>Hˆ
@
1 T ˆ N p (k ) Hˆ TN p (k )Hˆ N p (k ) N p ( k )H
– we next filter k N 1 d i d k
y (i ) and u (i )
so as to obtain
[2.173]
yˆ a (i ) and uˆ a (i ) for
– finally, yˆ a (i ) and uˆ a (i ) allow us to determine the new estimation Hˆ a , N (k ) , which is in turn used to determine the new expression for Tˆ N and then we conduct the same operation again starting with the first step. 2.2.8. The extended least squares method
Let there be a linear system with an additive noise acting upon its output or referred to its input. We saw that the structure of the ARMA model generally contains a correlated noise [ (k ) . Thus: p
y (k ) ¦ a i y (k i ) i 1
p
¦ bi u 0 (k i) [ (k )
[2.177]
i 0
with:
[ (k )
p
¦ c e( k i ) i
with c0
1
[2.178]
i 0
Carrying out the z transform on equation [2.177] gives: A(z)Y z
B( z )U ( z ) C ( z ) E ( z )
[2.179]
To start with, let us suppose that e(k ) is known. We can write the above equation as:
100
Modeling, Estimation and Optimal Filtering in Signal Processing
y (k )
H ( k ) T T e( k )
[2.180]
where:
> y(k 1)
H (k )
and T
>a1
" ap
" y(k p) u(k ) " u(k p) e(k 1) " e(k p)@ T b0
" bp
c1 " c p
@T .
If e(k ) is in fact known, we can use the recursive least squares method. If, on the other hand, e(k ) is an unknown, we can proceed as follows: – we first calculate the residue: eˆ(k )
y (k ) Hˆ (k ) T Tˆ(k 1)
[2.181]
where: Hˆ (k )
> y(k 1)
T " y(k p) u(k 1) " u(k p) eˆ(k 1) " eˆ(k p)@
– we then use the least squares estimator: K (k )
T P (k 1) Hˆ (k ) ª1 Hˆ (k ) P (k 1) Hˆ (k )º «¬ »¼
Tˆ(k ) Tˆ(k 1) K (k )eˆ(k ) P(k )
ª I K (k ) Hˆ T (k )º P (k 1) »¼ «¬
1
[2.182] [2.183] [2.184]
This algorithm goes by several names: the Panuska algorithm, the extendedmatrix method, the approximate maximum likelihood (AML) method. It is used in a wide range of applications. 2.3. Selecting the order of the models
The topic of the determination of the order of the models is a difficult one, without any straightforward solution. We will thus raise it here, and illustrate it with some examples.
Least Squares Estimation of Parameters of Linear Models
101
Which order is sufficient to effectively understand the behavior of a signal? This question is an ambiguous one and depends on the definition of “sufficient”. As it was rightly stated by J. Makhoul in [20], the model itself is improved as its order increases. Nevertheless, where do we stop with this increase in the order? If we normalize the coefficients r yy (i, j ) of the covariance matrix by dividing them by ryy (0) , we obtain the “normalized” error: Vp
Ep
[2.185]
r yy (0)
According to Ulrych et al., the order should be smaller than N/2, where N is the number of available samples [27]. If the order is too high, we carry out an “adjustment” whereby the parameters ai tend towards 0 if k o f [13]. The existent criteria for the order estimation use the prediction error. Their disadvantage stems from the fact that this error diminishes as the order increases. However, beyond a certain point, this diminution tapers off. Akaike developed the following two criteria: the Final Prediction Error (FPE) and the Akaike Information Criterion (AIC), in [1] and [2] respectively. They are defined as: N p 1 N p 1
[2.186]
2p N
[2.187]
FPE ( p)
Ep
AIC ( p )
log( E p )
where p is the order which reduces the criteria to a minimum, N is the number of samples, and E p is the sum of all the prediction errors. The original expression that Akaike provided was: AIC ( p )
2 log(Max. likelihood)
2p N
[2.188]
For Gaussian processes, this expression is reduced to the previous one. Since the number of available samples is limited by the width of the analysis window, a correction factor has to be added. This correction, in the case of a Hamming
102
Modeling, Estimation and Optimal Filtering in Signal Processing
window, consists of taking an effective number of samples N e signals, the order is generally overestimated [13].
0.4 N . For noisy
Figure 2.11 shows an example in which the the order of an AR(2) model is estimated. The noise u(k) is a zero-mean white Gaussian noise with a variance of one. The model parameters are a1 0.2 and a 2 0.7 . 1100
7
1000
6.9 6.8 AIC(p)
FPE(p)
900 800
6.7 6.6
700
6.5
600
6.4 6.3
500
0
2
4 6 Ordre (p)
Order (p)
8
10
0
2
4 6 Ordre (p)
8
10
Order (p)
Figure 2.11. Example: estimating the order of the model
It is generally accepted that for speech signals, the order of the model is between 10 and 16. This allows the modeling of the first formants of the speech signal, the resonances of the vocal tract. It was seen in Chapter 1 that the p-th order AR model allows us to account for up to p resonances in the spectrum in the frequency range > f s / 2 f s / 2> . Even if this representation allows for a good approximation of the spectrum’s envelope, a harmonic description of a voiced signal such as an /a/ or an /i/ requires higher orders. In reality, if the speaker is a human being and if the frequency of his pitch is f 0 = 100 Hz, the lowest possible order is f s / f 0 . If f s 8 KHz, this order will theoretically be at least 80. However, the length of the analysis frames is required to be between 20 and 40 ms for the hypothesis of the quasi-stationarity of the signal to hold true. This limits the number of available samples for the estimation of the AR parameters (128 to 256 for a sampling frequency of f s 8 KHz ). A trade-off thus has to be made between the number of parameters (based on the number of available samples) and the fitting of the model.
Least Squares Estimation of Parameters of Linear Models
103
2.4. References [1] H. Akaike, “Fitting Autoregressive Models for Prediction”, Ann. Inst. Statis. Match., vol. no. 21, pp. 243-247, 1969. [2] H. Akaike, “A New Look at the Statistical Model Identification”, IEEE Trans. on Automatic Control, vol. no. AC-19, no. 6, pp. 716-723, 1974. [3] K. J. Åström and P. Eykhoff, “System Identification: a Survey” Automatica, vol. 7, Issue 2, pp. 123-162, March 1971. [4] B. Atal and A. Hanauer, “Speech Analysis and Synthesis by Linear Prediction of the Speech Wave”, J. Acoust. Society of America, JASA, vol. no. 50, pp. 637-655, 1971. [5] G. Carayannis, N. Kalouptsidis and D. Manolakis, “Fast Recursive Algorithms for a Class of Linear Equations”, IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. ASSP-30, no. 2, April 1982, pp. 227-239, 1982. [6] Y. T. Chan and R. Langford, “Spectral Estimation via the High-Order Yule-Walker Equations”, IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. ASSP-30, pp. 689-698, 1980. [7] D. W. Clarke, “Generalized Least Squares Estimation of the Parameters of a Dynamic Model”, Proceedings of the IFAC, Symposium on Identification, Prague, 1967. Cited in M. Najim, Modélisation et Identification en Traitement du Signal, Edition Masson, 1988. [8] C. E. Davila, “A Subspace Approach to Estimation of Autoregressive Parameters from Noisy Measurements” IEEE Trans. on Signal Processing, vol. 46, no. 2, pp. 531-534, February 1998. [9] M. Deriche, “AR Parameter Estimation From Noisy Data Using the EM Algorithm”, IEEE-ICASSP ‘94, Adelaide, Australia, vol. 4, pp. 69-72, 19-22 April 1994. [10] J. Durbin, “The Fitting of Time Series Models”, Rev. Inst. de Statis., vol. no. 28, no. 3, pp. 233-244, 1960. [11] B. Friedlander, “Instrumental Variable Methods for ARMA Spectral Estimation”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 31, no. 2, pp. 404-415, April 1983. [12] D. Gingras, “Estimation of the Autoregressive Parameter from Observations of a NoiseCorrupted Autoregressive Time Series”, IEEE-ICASSP ‘82, Paris, 3-5, May 1982. [13] D. Graupe, Identification and Adaptive Filtering, Robert E. Krieger Publishing Company Malabar, Florida, 1984. [14] K. Hasan, J. Hossain and Haque A., “Parameter Estimation of Multichannel Autoregressive Processes in Noise”, Signal Processing, vol. 83, no. 3, pp. 603-610, January 2003. [15] K. Hasan and S. A. Fattah, “Identification of Noisy AR Systems Using Damped Sinusoidal Model of Autocorrelation Function”, IEEE Signal Processing Letters, vol. 10, no. 6, pp. 157-160, June 2003.
104
Modeling, Estimation and Optimal Filtering in Signal Processing
[16] T. Kailath, “A View of Three Decades of Linear Filtering Theory”, IEEE Trans. on Information Theory, vol. IT-19, pp. 750-760, 1973. [17] S. M. Kay, “Noise Compensation for Autoregressive Spectral Estimates”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 28, no. 3, pp. 292-303, June 1980. [18] J. Leroux and C. Gueguen, “A Fixed Point Computation of the Partial Correlation Coefficients”, IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. ASSP-25, pp. 257-259, 1977. [19] N. Levinson, “The Wiener RMS Error Filter Design and Prediction”, published as an appendix to the work N. Wiener: Extrapolation Interpolation and Smoothing of Stationary Time Series, John Wiley, New York, 1949. [20] J. Makhoul, “Linear Prediction: a Tutorial Review”, Proc. IEEE, vol. no. 63, no. 4, pp. 561-579, Apr. 1975. [21] J. D. Markel and A. H. Gray, “On Autocorrelation Equation Applied to Speech Analysis”, IEEE Trans. on Audio and Electroacoustics, vol. AU-20, pp. 69-79, 1973. [22] R. K. Mehra, “On Line Identification of Linear Dynamic Systems With Applications to Kalman Filtering”, IEEE Trans. on Automatic Control, vol. AC-16, no. 1, pp. 12-21, 1971. [23] J. L. Melsa and J. D. Tomick, “Linear Prediction Coding with Additive Noise for Applications to Speech Digitalisation”, 14th Allerton Conference on Circuits and Systems, USA, September 1976. [24] M. Morf, Fast Algorithms for Multivariable Systems, thesis, Stanford University, Stanford, 1974. Cited in M. Najim, Modélisation et Identification en Traitement du Signal, Edition Masson, 1988. [25] I. Schur, “Uber Potenzreihem, die Innern des Einheitskreiss beschränkt sind”, J. für Reine und Angew. Math, 147, pp. 205-232, 1917. [26] H. W. Sorensen, “Least Squares Estimation: From Gauss to Kalman”, IEEE Spectrum, pp. 63-68, July 1970. [27] T. J. Ulrych and R. N. Clayton, “Time Series Modeling and Maximum Entropy”, Phys. Earth, Planetary Int., vol. no. 12, pp. 188, 1976. [28] R. A. Wiggins and E. A. Robinson, “Recursive Solution of the Multichannel Filtering Problems”, J. Geophys. Res., pp. 1885-1991, 1965. [29] W. X. Zheng, “Unbiased Identification of Autoregressive Signals Observed in Colored Noise”, IEEE-ICASSP ‘98, Seattle, USA, vol. 4, pp. 2329-2332, 12-15 May 1998. [30] W. X. Zheng, “Autoregressive Parameter Estimation from Noisy Data”, IEEE Trans. on Circuits and Systems II: Analog and Digital Signal Processing, vol. 47, no. 1, pp. 71-75, January 2000. [31] W. X. Zheng, “Fast Identification of Autoregressive Signals from Noisy Observations”, IEEE Trans. on Circuits and Systems II: Express Briefs, vol. 52, no. 1, pp. 43-48, January 2005.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Chapter 3
Matched and Wiener Filters
3.1. Introduction Our main purpose in this chapter is to determine the optimal filter needed to extract the signal from the noise. This optimal filter can be defined as follows: it is a mathematical description of the signal processing operations that have to be conducted on the noisy signal. This description should respect the criteria of optimality that will be described in this chapter. As a prelude, the following points should be noted: – the inputs of these filters are either random signals, or combinations of random and deterministic signals; – we will only cover stationary linear systems in this chapter. When the final aim is to obtain a physical implementation, we will also consider the realizability issues of the filter. We will consider two main types of filters: the matched filter and the Wiener filter. These two classes are, respectively, the solution to the following cases: – detecting the desired signal, whose shape is already known, when it is disturbed by a white or colored noise; – extracting the signal when both the signal and the noise are random processes. When designing these filters, the autocorrelation functions and matrices are assumed to be known.
106
Modeling, Estimation and Optimal Filtering in Signal Processing
The chapter is organized as follows. In the first section, we will look at matched filters and treat the two successive cases where the signal is disturbed by a white additive noise and by a colored one. In the second section, we recall the traditional presentation of the Wiener filter, first for continuous-time and then for discrete-time signals. For this latter type of signal, we will extensively quote from the work carried out by Norbert Wiener which, though first presented in a classified report in 1940-41, was only made public in 1949. Wiener was among the first to view signals and noises as realizations of random processes. 3.2. Matched filter 3.2.1. Introduction Let y (t ) be the observation, which corresponds to the sum of a real signal s1 (t ) whose shape is known a priori and a stationary random noise b1 (t ) . This observation is passed through a filter whose impulse response is denoted h(t). Since the filter is linear, we have, at instant t t1 (see Figure 3.1):
h * y t1 h * s1 t1 h * b1 t1
s 2 t1 b2 t1
[3.1]
where: f
s 2 t1
h * s1 t1 ³ f s1 W ht1 W dW
b2 t1
h * b1 t1 ³ f b1 W ht1 W dW
f
[3.2]
[3.3]
Figure 3.1. A filter with an impulse response h(t)
Our purpose is thus to identify the filter which best detects the presence of s1 (t ) . This filter is the best in the sense that it should provide the maximum signal-to-noise
Matched and Wiener Filters
107
ratio R0 , defined as the ratio of the instantaneous power of the signal s 2 (t ) to that of the noise b2 (t ) at time t s 2 (t1 )
2
^
R0 (t1 )
t1 [7] [10]:
`
[3.4]
E b22 (t1 )
Taking equations [3.2] and [3.3] into account, the above relation is changed to: f
³f
R0 t1
f
f
³f ³f
s1 W ht1 W dW
2
rb1b1 W V ht1 W ht1 V dW dV
[3.5]
where rb1b1 W V E^b1 W b1 V ` is the autocorrelation function of the additive
noise b1 t . Equation [3.5] can only be easily solved if this autocorrelation function is known. For this reason, let us first look at the case where b1 t is a white noise. 3.2.2. Matched filter for the case of white noise If noise b1 t is zero-mean, white and stationary, with autocorrelation function
rb1b1 W V b2G W , the denominator of equation [3.5] is modified to: f
f
rb1b1 W V ht1 W ht1 V dW dV
V b2 ³
f
f ³f G W V ht1 W ht1 V dW dV
³f ³f
f 2 f 2 Vb h f
³
[3.6]
t1 W dW
Thus, equation [3.5] becomes: f
R0 t1
³f
s1 W ht1 W dW
f 2 V b2 h f
³
t1 W dW
2
[3.7]
We look at this problem of optimizing the SNR at the filter output in the frequency domain, where it is easy to solve.
108
Modeling, Estimation and Optimal Filtering in Signal Processing
If we introduce H Z , S i Z and Bi Z , i 1, 2, as the Fourier transforms of ht , s i t and bi t respectively, the Fourier transforms (FT) of equations [3.2] and [3.3] are: S 2 Z
H f S1 Z
FT ^ s2 t `
[3.8]
f
³ f s2 t exp jZt dt and:
Sb2 b2 Z
FT rb2 b2
2
H (Z ) V b2
[3.9]
Substituting [3.8] in equation [3.4] modifies the latter to:
R0 (t1 )
s2 (t1 ) E
^
2
`
FT 1^S 2 (Z )` rb2 b2 0
b22 (t1 )
FT
1
2
t
t1
[3.10]
2
^H (Z ) S1 (Z )` t
t1
rb2 b2 0
Applying the Wiener-Khintchine theorem to noise b2 t , we obtain: FT 1^H (Z ) S1 (Z )` R0 (t1 )
FT
1
^Sb b (Z )` t 2 2
2 t
t1
[3.11]
0
Following equation [3.9], the SNR can be expressed as: f
R0 t1
³ f
H Z S1 Z exp jZt1 dZ
V b2 ³
f f
H Z dZ 2
2
[3.12]
In order to maximize the numerator of R0 t1 , we will use the Schwarz inequality which, for any two functions F Z and G Z , can be defined as:
Matched and Wiener Filters f
³ f F Z GZ dZ
2
f f 2 2 d §¨ ³ F Z dZ ·¸§¨ ³ G Z dZ ·¸ © f ¹© f ¹
109
[3.13]
This equality only holds true if F Z and G Z satisfy the following condition:
F Z OG Z , O real .
[3.14]
where G*(Z) is the complex conjugate of G (Z). If we use inequality [3.13] in the numerator of equation [3.12] by taking H Z and G Z S1 Z exp jZt1 , we obtain:
F Z
2
H Z S1Z exp jZt1 dZ d §¨ f ©
³
f
f
³ f
2 H Z dZ ·¸§¨ ¹©
f · 2 ³ f S1Z dZ ¸¹
[3.15]
Thus, R0 t1 is maximized as follows: f
R0 t1 d
³ f
S1 Z
2
dZ
V b2
[3.16]
This maximum of R0 t1 is independent of the impulse response ht , and
depends only on the signal energy and the variance V b2 . Therefore, R0 t1 is maximized when equality [3.14] is respected. The resulting optimal filter has the following transfer function: H opt Z OS1 Z exp jZt1
[3.17]
where S1* Z denotes the complex conjugate of S1 Z . By taking the inverse Fourier transform of equation [3.17] above, the impulse response hopt t of the optimal filter can be expressed as follows:
110
Modeling, Estimation and Optimal Filtering in Signal Processing
hopt t
1 2S
f
³f H opt Z exp jZt dZ
O f * S1 Z exp jZ t1 t dZ 2S ³f * O § f · ¨ ³f S1 Z exp jZ t1 t dZ ¸ 2S © ¹
[3.18]
Os1* t1 t
If s1 t is real: hopt t Os1 t1 t
[3.19]
Thus, the impulse response of the optimal filter depends only on the signal s1 t . Therefore this filter is “matched” to the waveform of the signal, leading to the name “matched filter”. Moreover, the factor O denotes only the gain of the filter and has the same effect on both the signal and the noise. When O = 1, if s1 t is a known function, hopt t is also known. To ensure that the filter hopt t can be implemented in practice, i.e., to ensure that it verifies hopt t 0 for t 0 , we can choose: 0 hopt t ® ¯ s1 t1 t
for t 0
[3.20]
for t t 0
Equation [3.20] means that at instant t1 , signal s1 t has passed through the filter. Therefore: f
s 2 t
f
³ f hopt u s1 t u du ³ 0
hopt u s1 t u du
t
W
hopt t W s1 W dW t u ³f
Inserting expression [3.20] for the impulse response hopt t :
s2 t
f
³0
³
hopt u s1 t u du t
W t u f
f
³0
s1 t1 W t s1 W dW
s1 t1 u s1 t u du
[3.21]
Matched and Wiener Filters
111
This equation can be related to the autocorrelation function of s1 t given below: rs1s1 t
1 T of 2T
T
³ T s1 W s1 W t dW
lim
[3.22]
To highlight the difficulty in choosing t1 , let us consider a simple example: a signal defined as the exponential function of a time constant W such that:
0 for t 0 s1 t ® W exp / for t t 0 a t ¯
[3.23]
the impulse response of the matched filter is then defined as follows: 0 for t t 0 ° exp a t1 t / W @ > ® for t 0 ° 0 ¯
st t hopt t ® 1 ¯ 0
4 3.5
4
s1 t
3.5
2.5 amplitude
a= 3
2.5 amplitude
a= 3
2 1.5
[3.24]
hopt t
2 1.5
1
1
0.5
0.5
0 -2
for t t t1 for 0 t t1 for t 0
-1
0
1
2
3 t
t 1= 4
5
6
7
0 -2
-1
0
1
2
3 t
t 1= 4
5
6
7
Figure 3.2. Representations of a signal and the impulse response of the matched filter
Thus, the smaller the value of t1 with respect to the total duration of the signal, the less exact the approximation of the filter impulse response will be. To obtain the best filter, i.e., the filter with the highest SNR, t1 has to be very high (ideally infinite). Unfortunately, this is not realistic for real-time applications, where t1 is relatively small. Thus, the real filter will be suboptimal.
112
Modeling, Estimation and Optimal Filtering in Signal Processing
3.2.3. Matched filter for the case of colored noise
3.2.3.1. Formulation of problem In this section, we will treat a real signal disturbed by a colored noise with autocorrelation function rb1b1 . To start, let us consider equation [3.5]: R0 t1
s 2 t1
2
^
E b22 t1
` f
³ f s1 W ht1 W dW
2
[3.5]
f f
³ f ³ f rb b W V ht1 W ht1 V dW dV 1 1
^
A maximization of R0 t1 corresponds to a minimization of the noise power
`
E b22 t1 for a given value of s 2 t1 . This can be seen as a problem of optimization under constraints which entails the minimization of the following quantity:
`
^
E b22 t1 P s 2 t1
Q
[3.25]
where P is called the Lagrange1 multiplier. Q can also be expressed using h t [7]: f
f
³ f ³ f rb b W V ht1 W ht1 V dW dV
Q
1 1
P
f
³ f
[3.26]
s1 W ht1 W dW
This search for hopt t characterizing the optimal filter can be conducted using
traditional variational calculation techniques. First, ht in equation [3.26] is replaced by: ht
1 Maximizing
hopt t J Ght
[3.27]
^
R0 t1 is also equivalent to minimizing E b22 t1
` for a given value of
s 2 t1 . Such an approach would also lead to optimization under constraint, with the 2
Lagrange multiplier.
Matched and Wiener Filters
113
where J is a real variable and Ght is an arbitrary increment. This leads to an expression of Q as a function of J , which will be denoted QJ .
Substituting ht of equation [3.27] in the expression for QJ , we obtain: QJ
³f ³f rb b W V >hoptt1 W JGht1 W @ >hoptt1 V JGht1 V @dWdV f
f
1 1
>
[3.28]
@
f P§¨ ³ hoptt1 W JGht1 W s1W dW ·¸ © f ¹
Thus, Q J is a second-degree polynomial of J such that: QJ Q 0 2JA J 2 B .
To minimize Q, the following condition, which guarantees the presence of an extremum, must be respected: wQJ wJ J
0
[3.29]
0
where:
wQ J wJ
f f
³f ³f rb b W V hopt t1 W Ght1 V dWdV 1 1
³
f f
³
f f
³
f f
³
f f
rb1b1 W V hopt t1 V Gh t1 W dWdV
2J rb1b1 W V Gh t1 W Gh t1 V dWdV
[3.30]
f
P ³ s1 W Gh t1 W dW f
If we take into account equation [3.29] and the even property of the autocorrelation function, i.e., rb1b1 W V rb1b1 V W W , V , we see that:
114
Modeling, Estimation and Optimal Filtering in Signal Processing
wQJ wJ J
f f
³f ³f 2rb b W V hopt t1 V Ght1 W dVdW 1 1
0
P
f
³f s1W Ght1 W dW
[3.31]
0
Equivalently: wQJ wJ J
f
ª
º
f
³f Ght1 W «¬³f 2hopt t1 V rb b W V dV Ps1 W »¼dW 1 1
0
[3.32]
0
Relation [3.32] must be satisfied irrespective of the value of the increment Gh . Therefore, we have:
P
f
³f hopt t1 V rb b W V dV
2
1 1
s1 W
for f W t1 when considering a realizable filter. The factor
P 2
changes only the gain of the filter, and thus acts equally on both
P
= 1, hopt t is fully known, barring a constant 2 factor. The generalized equation for a matched filter, in the case of colored noise is thus:
the signal and the noise. If we take
f
³ f hopt t1 V rb1b1 W V dV
s1 W .
[3.33]
The above equation belongs to the Fredholm equation family. The solution of this equation provides the expression of the optimum filter. To make sure that
wQJ wJ J
0 is a sufficient condition, besides being 0
necessary as shown above, the sign of the second derivative has to be verified: w 2 QJ wJ
2
f f
³f ³f 2 Ght1 V Ght1 W rb b W V dVdW 1 1
Matched and Wiener Filters
w 2 QJ
Since rb1b1 is positive by definition,
wJ 2
115
will be positive or zero.
3.2.3.2. Physically unrealizable matched filter If there is no need for a realizable filter, expression [3.33] for the matched filter becomes: f
³f hopt t1 V rb b W V dV
s1 W for f W f
1 1
[3.34]
This convolution equation is simplified when treated in the frequency domain. Applying the Fourier transformation to both sides of equation [3.34] gives: f ª f
º
³f «¬³f hopt t1 V rb b W V dV »¼ exp jZW dW 1 1
f
³f s1 W exp jZW dW
[3.35]
If Sb1b1 Z denotes the PSD, the Fourier transform of the noise autocorrelation function rb1b1 (t ) , the above equation is modified to: f
³f hopt t1 V exp jZV Sb b Z dV 1 1
Substituting x
S1 Z
[3.36]
t1 V , equation [3.36] becomes [7]:
exp^ jZt1`Sb1b1 (Z )
f
³ f hopt ( x) exp jZx dx
S1 (Z )
[3.37]
Equivalently: exp^ jZt1`Sb1b1 (Z ) H opt Z S1 (Z )
[3.38]
Rearranging the terms of this current equation, we obtain: H opt (Z )
S1 (Z ) exp^ jZt1 ` S b1b1 (Z )
[3.39]
116
Modeling, Estimation and Optimal Filtering in Signal Processing
This might constitute an approximated solution when using recording data, i.e. where the past and the future of the signal with respect to any point in time are known, and the integral from f to f is right. 3.2.3.3. A matched filter solution using whitening techniques In practice, the preponderant and delicate challenge is the solution of equation [3.33], where the main difficulty arises from the upper limit of the integration being t1 and not +. We will present the whitening process often used to solve the Fredholm equations. This technique plays an important part in the areas of detection and estimation and in theoretical physics [7]. We will first introduce the spectral factorization and then take up the resolution of equation [3.36] itself. First and foremost, let us suppose that S xx Z is the output of a filter with
transfer function G Z excited by a zero-mean white noise with variance equal to 1. This case is illustrated in Figure 3.3. We thus have: S xx Z
G Z . 2
[3.40]
D.S.P.
D.S.P. 1
S xx ðw
Figure 3.3. White noise filtering
For a linear, invariant filter with localized constants, G Z can be expressed as: G Z
a 0 ! a n 1 jZ n 1 jZ n
[3.41]
b0 ! b p 1 jZ p 1 jZ p
Thus, the PSD at the filter output, S xx Z
G Z
2
G Z G * Z , can be
written as: S xx Z
Z
Nn Z2 Dp
2
[3.42]
Matched and Wiener Filters
117
Since the process x is real, its autocorrelation function rxx W is also real. From
rxx W being even, it follows that S xx Z is real. Consequently, N n Z 2
Dp Z
2
and
are two constant-coefficient polynomials with degrees n and p
respectively. It can easily be shown that S xx Z takes the following form [7]: S xx Z S xx Z S xx Z
where S xx Z
>S
xx
Z @
. The poles and zeros of S xx Z are located in the
right half and those of S xx Z in the left half of the complex s-plane. It follows 1 , is physically realizable. that the filter, whose transfer function is S xx Z Let us consider the whitening process, which consists of filtering a random colored process to obtain a white noise.
Figure 3.4. The whitening process
Seeing Figure 3.4, S xx Z H Z
>S
xx
2
Z H Z @ >S xx Z H * Z @
Since S xx Z Thus:
>S
xx
Z @
1 ; equivalently: 1
[3.43]
, the above equation is satisfied if S xx Z H Z 1 .
S xx Z H Z S xx Z H * Z 1
118
Modeling, Estimation and Optimal Filtering in Signal Processing
Thus, the physically realizable whitening2 filter has the following transfer function: H Z
1
[3.44]
S xx Z
To solve equation [3.33] for a general case, i.e. when the signal is observed in the presence of a stationary colored noise, the following solution presents itself intuitively: first whiten the colored noise, and then treat it as the detection of a signal disturbed by a white noise. We can show that the optimal solution is in fact the following: use the first filter, with frequency response H 1 Z , to whiten the noise, followed by the second filter, with frequency response X w (Z ) , matched to a signal disturbed by a white noise. A diagram of this algorithm is depicted in Figure 3.5.
s1 t b1 t S1 Z
S b1b1 Z
H 1 Z
s' t b' t
1
S b1b1 Z
whitening
S ' Z S b' b' Z 1
H2(Z) Matched filter for white noise
s2 t b2 t S 2 Z
S b2b2 Z
Figure 3.5. Block-level schematic; colored noise case
According to equation [3.44], we have: S ' Z H 1 Z S1 Z
S1 Z
S b1b1 Z
[3.45]
Let us say that the filter is matched to signal s' t , defined in equation [3.17]: H 2 Z S ' * Z exp jZt1
[3.46]
The transfer function of the matched filter, using equation [3.45], is modified to: 2 A new approach based on inner/outer factorization has recently been proposed and derived for Rayleigh fading channel simulator and texture characterization.
Matched and Wiener Filters
119
*
H 2 Z
ª S Z º « 1 » exp jZt1 « S b b Z » ¬ 11 ¼
[3.47]
Thus, we obtain an explicit expression of the solution which leads to the transfer function of the optimal physically realizable filter [4] [7]: H opt Z
1 S b1b1 Z
f
³0
1 ª « 2S « ¬
º » exp jZt dt exp j X t t d X 1 f S » b1b1 X ¼
³
f
S1 X
It can also be shown that this expression is a solution to the following equation: t1
³f hopt t1 W rb b W V dV 1 1
s1 W
[3.48]
In this first part, we have introduced matched filtering when the signal is disturbed by a white and colored additive noise. This filter is used in the field of radar processing and telecommunications for instance. In the following section, we will look at Wiener filtering when dealing with continuous-time and discrete-time signals. 3.3. The Wiener filter 3.3.1. Introduction
The Wiener filter, described by the Wiener-Hopf equation, has played an important role in the introduction of the notion of optimal filters [9]. The motivation behind describing the Wiener filter in this chapter is twofold: this filter is a non-recursive solution to provide a solution to optimal filtering, and has served to develop the recursive approach of the Kalman filter. Even though the Kalman filter, presented in Chapter 5, is based on an algebraic approach, it is instructive to state that the historical approach introduced by R. E. Kalman leads to a reformulation of the optimal filtering issue and to formulating it as a recursive solution of a differential stochastic equation, in spite of solving a Fredholm equation type. Moreover, in the case of discrete-time signals, we can consider the Wiener filter as a parametric modeling approach based on the least squares method. This filter also benefits from a close correspondence with the stochastic gradient algorithm called the least mean squares (LMS), presented in Chapter 4.
120
Modeling, Estimation and Optimal Filtering in Signal Processing
3.3.2. Formulation of problem sˆ1 (t )
s 1 ( t ) b1 ( t )
y (t )
s2 (t ) b2 (t )
h(t)
b1(t) random and real s1(t) random and real
Figure 3.6. Linear time-invariant filtering
The signal y t is the sum of the desired signal s1 t and an additive noise b1 t . Both the signal and the noise are random stationary processes. We have to find the linear filter, with impulse response h(t), which gives the best estimation sˆ1 t of s1 t . Otherwise stated, we want: sˆ1 t s2 t b2 t
f
³ f hW y t W dW
[3.49]
to be as close to s1 t as possible. The error is defined as the difference between the estimated signal sˆ1 t and the desired signal s1 t : et s1 t sˆ1 t
[3.50]
The Wiener filter, whose impulse response is denoted by hopt t , is obtained upon minimizing the mean square error (MSE):
^
E e 2 t
J
` E^ >s t sˆ t @ ` 1
1
2
[3.51]
where E^.` is the mathematical expectation. For ergodic signals, equation [3.51] is modified as follows: J
1 T of 2T lim
T 2 ³ T e t dt
[3.52]
Matched and Wiener Filters
121
3.3.3. The Wiener-Hopf equation
Criterion [3.51] is defined from the signal s1 (t ) which is, however, not available. As we will soon see, s1 (t ) does not directly intervene in the derivation of the filter, only through its autocorrelation function. We will also see that a priori knowledge of this function is necessary for obtaining the Wiener filter. Combining equations [3.49] and [3.51], criterion J is expressed as follows:
^
`
E e 2 (t )
J
° E ®§¨ s1 (t ) °¯©
³
2 ½ ° h(W ) y (t W )dW ·¸ ¾ f ¹ °¿
f
[3.53]
Making use of the linearity of the mathematical expectation, we obtain:
^
`
E s12 (t ) 2 E ® ¯
J
E ® ¯
f
½
³ f s1 (t )h(W ) y(t W )dW ¾¿
½ ³ f ³ f h(V )h(W ) y(t V ) y(t W ) dWdV ¾¿ f
f
[3.54]
Equivalently:
^
` ³
E s12 (t ) 2
J
f
f
³ f ³ f
f f
h(W ) E^s1 (t ) y (t W )`dW
h(V )h(W ) E^y (t V ) y (t W )` dWdV
[3.55]
Let us introduce the following three functions: the autocorrelation function of signal s1 t , the autocorrelation function of observation y t , and the crosscorrelation function between s1 t and y t . These are denoted respectively as: rs1s1 W E^s1 t s1 t W ` , r yy W E^y t y t W `
and rs1 y W E^s1 t y t W `.
122
Modeling, Estimation and Optimal Filtering in Signal Processing
Taking into account the symmetry of the autocorrelation function3, i.e. ryy V W ryy W V , the expression for criterion J is modified to:
^ `
E e 2 t
J
f
f
f
f
f
f
rs1s1 0 2³ rs1 y W hW dW ³ hV ³ r yy V W hW dWdV
[3.56] To obtain the minimum of this criterion, we will use a procedure similar to the one used for the matched filter in section 3.2.3:
ht hopt t J Ght
[3.57]
^ `
E e2 t becomes a function of J , denoted by J J . For J J to have a minimum, it is necessary that:
wJ J wJ J
0.
[3.58]
0
Placing expression [3.57] of ht in equation [3.56], we obtain: f
>
@ [3.59] V W >h W J GhW @ dW dV
J J rs1s1 0 2 ³ rs1 y W hopt W J GhW dW f
³
f
f
>h V J GhV @ ³ opt
f
f
ryy
opt
We can express J ( k ) as a second-degree polynomial of J J
J
:
J 0 2JC J 2 D
[3.60]
where: D
3 Let rb b 1 1
W
Since y (t )
f f
³ f ³ f ryy V W G hV GhW dVdW E^b1 t b1t W ` , rs1b1 W s1 (t ) b1 (t ) , we have r yy (W )
E^s1 t b1 t W ` and rs1 y W
[3.61] E^s1 t y t W ` .
rs1s1 (W ) rb1b1 (W ) rs1b1 (W ) rb1s1 (W ) .
If s1 t and b1 t are both stationary and uncorrelated, ryy(IJ) is modified to:
r yy W rs1s1 W rb1b1 W because rs1b1 W rb1s1 W 0
Matched and Wiener Filters
123
and:
C
f
f
f
³f rs y W GhW dW ³f GhW ³f ryy W V hopt V dV dW 1
ª
f
f
º
³f GhW «¬ rs y W ³f ryy W V hopt V dV »¼ dW 1
[3.62]
0
The above relation should be respected irrespective of the value of the increment
GhW . Thus, it follows that: f
³f ryy W V hopt V dV
rs1 y W r yy W * hopt W
[3.63]
This integral, a Fredholm equation of the first kind, is called the Wiener-Hopf equation. To determine the solution hopt t of this integral, we need to know the cross-
correlation rs1 y W and the autocorrelation ryy W . We will consider the signal and the noise to be uncorrelated.
We notice that the signal and the noise do not directly intervene in equation [3.63]. Equation [3.58] presents the necessary condition for minimizing J J . As we have: w 2 J J wJ 2
2D ,
[3.64]
we can easily check that the sufficient condition is fullfilled to ensure that J J is minimized. So far, we have not discussed whether the filter can be implemented. For realtime processing, where the filter has to be realizable, the lower limit of the integral equation [3.63] has to be changed from f to 0. This does not simplify the resolution of the Wiener-Hopf equation, because we exclude the use of the Fourier transform. If the problem of physical realization of the filter is not considered, we can take the Fourier transform of equation [3.63]. The Wiener-Hopf equation changes to:
124
Modeling, Estimation and Optimal Filtering in Signal Processing
S s1 y (Z )
S yy (Z ) H opt (Z )
[3.65]
Rearranging the terms of the above equation, we get the following modified form: H opt Z
S s1s1 Z S s1b1 Z
S s1 y Z
S yy Z
S yy Z
[3.66]
If the signal is uncorrelated to the noise, that is if S s1b1 Z 0 , the above equation changes to: H opt Z
S s1s1 Z
S s1s1 Z S b1b1 Z
[3.67]
where S s1s1 Z and S b1b1 Z denote, respectively, the power spectral densities of the
signal s1 t and noise b1 t .
3.3.4. Error calculation in a continuous physically non-realizable Wiener filter
Taking expression [3.56] for the error, and writing it for h(t )
hopt (t ) , we have:
^ `
E e 2 t
J min
rs1s1 0 2
f
f
³ f rs y W hopt W dW 1
[3.56]
f
³ f ³ f ryy V W hopt V hopt W dW dV
Using the property of evenness of the autocorrelation function, and using the Wiener-Hopf equation [3.63], we can introduce the quantity rs1 y in the double integral of equation [3.56]: J min
rs1s1 (0) 2³ ³
f f
f f
hopt (W )rs1 y (W )dW
f hopt (W ) ª« ³ hopt (V )r yy (V W )dV º» dW ¼ ¬ f
Matched and Wiener Filters
125
Simplifying the above equation, we obtain: J min
rs1s1 (0)
f
³ f hopt (W )rs y (W )dW 1
Replacing rs1s1 (0) and rs1 y (W ) by their inverse Fourier transforms: rs1 y (W )
1 2S
³ f S s y (Z )e
rs1s1 (0)
1 2S
³ f S s s (Z )dZ
f
jZW
1
dZ
[3.68]
f
[3.69]
1 1
we obtain: J min
ª 1
º dZ » dW ¼
1 2S
³ f S s s (Z)dZ ³ f hopt (W )«¬ 2S ³ f S s y (Z)e
jZW
1 2S 1 2S 1 2S
® ¯
½ dW º» dZ ¾ ¼ ¿
f
f
f
1 1
1
f
ª
f
f
³ f S s s (Z)dZ ³ f S s y (Z)«¬³ f hopt (W )e
ª «¬
1 1
³
jZW
1
S s s (Z )dZ S s y (Z ) H opt (Z )dZ º» f 1 1 f 1 ¼ f
³
f
[3.70]
³ f >S s s (Z) S s y (Z)H opt (Z)@dZ f
1 1
1
The above relation is simplified if we use expression [3.66] of H opt (Z ):
J min
1 2S 1 2S 1 2S
fª
Ss1 y (Z ) º
³ f ««Ss s (Z) Ss y (Z) S yy (Z) »» dZ ¬
³
1 1
1
¼
* * f ª S s s (Z ) S yy (Z ) Ss y (Z ) S s y (Z ) º 1 1 « 11 » dZ * f « S ( ) Z »¼ yy ¬ * f S s1s1 (Z ) S yy (Z )
³ f
S yy* (Z )
Ss1 y (Z )
2
dZ
[3.71]
126
Modeling, Estimation and Optimal Filtering in Signal Processing
If s1 t and b1 t are not correlated, we get S yy (Z )
S s1s1 (Z ) S b1b1 (Z )
[3.72]
S s1 y (Z )
S s1s1 (Z ) .
[3.73]
Therefore, equation [3.71] is modified as follows: J min
1 2S
f
S s1s1 (Z ) S b1b1 * (Z )
³f S s s (Z ) S b b (Z ) dZ 1 1
[3.74]
1 1
The above equation is useful in explaining the results obtained for the Wiener filter. If the PSDs of the signals s1 t and b1 t are overlapped, the product S s1s1 (Z ) S b1b1 (Z ) is zero and leads to a zero estimation error. The Wiener filter is then no longer required to separate the signal from the noise. A simple low-pass or high-pass filter will eliminate the noise. This simple case is illustrated in Figure 3.7(a). In practice, we notice an overlap of the signal and noise spectra. See Figure 3.7(b ). Thus, the Wiener filter is the only solution towards obtaining a signal estimation. The product S s1s1 (Z ) Sb*1b1 (Z ) in the frequency band of interest is no longer zero, and, consequently, the estimation error is also always non-zero.
S
Figure 3.7. The power spectral densities of the signal and the noise
Matched and Wiener Filters
127
3.3.5. Physically realizable continuous Wiener filter. Rational spectra case
Let us take the Wiener-Hopf equation obtained in section 3.3.3: f
³f ryy W V hopt V dV
rs1 y W
[3.63]
For the filter to be physically realizable, it is necessary that: rs1 y (W )
f
³0hopt (V )ryy (W V )dV
[3.75]
Let us assume the observation to be white and to have zero mean and variance
V y2 , i.e., S yy (Z ) V y2 and r yy (W ) V y2 G (W ) . The above equation becomes: rs1 y (W ) V y2 ³
f 0
hopt (V )G (W V )dV .
[3.76]
This leads to the expression of the optimal Wiener filter as a function of rs1 y (W ) :
hopt (t )
1 ° 2 rs1 y (t ) ®V y °0 ¯
tt0
[3.77]
t0
The above hypothesis of a white noise is not a realistic one. The random, stationary signals have a PSD which can be approximated by an even, rational, nonnegative function. We thus have to look for a specific solution of the Wiener-Hopf equation for the case where: S yy (Z )
N (Z ) D(Z )
[3.78]
We can use the aforementioned results obtained in section 3.2.3.3. on the whitening filter h1 (t ) which transforms signal y (t ) into y1 (t ) . Signal y1 (t ) is a white noise with S y y (Z ) 1 . The PSD of y1 (t ) is then given by: 1 1
128
Modeling, Estimation and Optimal Filtering in Signal Processing
Sy
1
y1
(Z )
2
H 1 (Z ) S yy (Z )
[3.79]
and, since y1 (t ) is a white noise with PSD = 1, we obtain: H 1 (Z )
2
1
[3.80]
S yy (Z )
This signal y1 (t ) is then filtered by a physically realizable optimal filter whose impulse response is given by: h2 (t )
rs1 y1 (t ) for t t 0 ® ¯0 for t 0
[3.81]
We decompose the filter, whose impulse response is h2 (t ) , into two cascaded filters. The first, whose frequency response is designated H 2 a (Z ) makes it possible to compensate for the whitening. The second is defined by a frequency response H 2b (Z ) . The aim behind this decomposition is to avoid the use of the whitening
filter. Thus, H 2 a (Z ) is the inverse of H 1 (Z ) . Filter H 2b (Z ) is the optimal filter: H opt (Z )
H 1 (Z ) H 2 (Z )
H 2b (Z )
H 2 (Z ) H 2 a (Z )
[3.82]
Matched and Wiener Filters
129
Figure 3.8. Overall schematic of the process
The PSD of y (t ) can be approximated to be a rational, even, non-negative function. Therefore: n
S yy (Z )
N (Z ) D(Z )
( jZ z i )( jZ z i* ) i 1 d
( jZ pi )( jZ
[3.83]
p i* )
i 1
where z i and p i are, respectively, the ith zero and the ith pole of S yy (Z ) . We choose the zeros z i to lie in the left half of the s-plane and half the zeros on the imaginary axis, to form a polynomial N (Z ) . Similarly, we can form the polynomial D (Z ) . Filter H 1 (Z ) can thus be expressed as:
130
Modeling, Estimation and Optimal Filtering in Signal Processing
D (Z )
H 1 (Z )
N (Z )
1
[3.84]
S yy (Z )
The expression for H 2 a (Z ) changes to: H 2 a (Z )
1
N (Z )
H 1 (Z )
D (Z )
[3.85]
To calculate the impulse response h2 (t ) of the filter, we first need to calculate the cross-correlation function rs1 y1 (W ) E^s1 (t W ) y1 (t )` where: y1 (t )
f
f
³f h1 (W ) y(t W )dW ³f h1 (u) y(t u )du
[3.86]
Thus: rs1 y1 (W )
f
³f h1 (u )rs y (W u)du 1
h1 (W ) rs1 y (W )
[3.87]
Applying the Fourier transform to this product, we obtain: S s1 y1 (Z )
D (Z )
N (Z )
S s1 y (Z )
1 S yy (Z )
S s1 y (Z )
[3.88]
where polynomial N (Z ) is comprised of the zeros of S yy (Z ) lying in the right half of the complex s-plane and half the zero pairs placed on the imaginary axis. D (Z ) is constructed in the same way, using the poles. Since signal y1 (t ) is a white noise with PSD = 1, the optimal filter in the time domain is:
r (t ) for t t 0 h2 (t ) ® s1 y1 ¯0 for t 0
[3.89]
This is a causal filter. The cross-correlation function rs1 y1 (t ) is obtained upon taking the causal part of the inverse Fourier transform of power inter-spectral density S s1 y1 (Z ) .
Matched and Wiener Filters
131
Moreover, decomposing S s1 y1 (Z ) gives:
S s1 y1 (Z )
S s1 y1 (Z ) S s1 y1 (Z )
where the functions
S s1 y1 (Z ) and
[3.90]
S s1 y1 (Z ) are determined by combining the
terms of the Laurent series of S s1 y1 (Z ) into two groups: – S s1 y1 (Z ) corresponds to the poles of the left half-plane; and – S s1 y1 (Z ) to the poles of the right half-plane. The inverse Fourier transform of
S s1 y1 (Z ) leads to the causal component
rs1 y1 (t ) while the inverse Fourier transform of
S s1 y1 (Z ) leads to the anti-causal
part rs1 y1 (t ) .
rs1 y1 (t ) thus satisfies the following condition:
°° ® ° °¯
rs1 y1 (t )
1 2ʌ
f
³f
S s1 y1 Ȧ e jZt dȦ
0
Using [3.89], we get h2 (t )
tt0
[3.91] otherwise
rs1 y1 (t ) which, when translated into the frequency
domain, gives:
H 2 (Z )
S s1 y1 (Z )
ª D (Z ) º S s1 y (Z )» « ¬« N (Z ) ¼»
[3.92]
The optimal filter thus respects the following condition:
H opt (Z )
º N (Z ) ª D (Z ) H 2b (Z ) S s1 y (Z )» « D (Z ) ¬« N (Z ) ¼»
[3.93]
132
Modeling, Estimation and Optimal Filtering in Signal Processing
Equivalently:
ª S s y (Z ) º « 1 » S yy (Z ) «¬ S yy (Z ) »¼ 1
H opt (Z )
[3.94]
If the signal s1 (t ) and the noise b1 (t ) are not correlated:
and:
S s1 y (Z )
S s1s1 (Z )
S yy (Z )
S s1s1 (Z ) S b1b1 (Z ) .
Taking the above two equations into account, equation [3.94] becomes [7]:
H opt (Z )
>S
1 s1s1 (Z )
S b1b1 (Z )
@
ª S s1s1 (Z ) « « «¬ S s1s1 (Z ) S b1b1 (Z )
>
@
º » » »¼
[3.95]
So far, we have laid emphasis on the rational spectra, which can be used to approximate the non-rational spectra. The one necessary and sufficient condition for factorability is for the integral f log S yy
³f
1 Z
Z 2
dZ
to be convergent. Let us determine the optimal filter knowing that S b1b1 Z k 2 i.e., when b1 is a zero-mean white noise, and S s1s1 Z
1 1 s2
. Upon taking E 2 s jZ
have: S yy Z S s1s1 Z S b1b1 Z k 2 k
E jZ E jZ k 1 jZ 1 jZ
1 1 Z2
S yy S yy
k2
E 2 Z2 1Z2
1 k2 k2
, we
Matched and Wiener Filters
133
and: S s1 y Z S s1s1 Z
1 1 Z2
Thus: S s1 y Z S yy
Z
1
k E jZ 1 jZ
and:
ª S s y Z º 1 ª 1 1 º « 1 » « » «¬ S yy Z »¼ k ¬ E jZ 1 jZ ¼
1 k E 1 1 jZ
Finally: H opt jZ
1
k 2 E 1 E jZ
Having presented the Wiener filter for continuous time signals, we will look at the filter for cases where the signals are discrete-time. 3.3.6. Discrete-time Wiener filter
In this section, we will describe a digital filter, with impulse response h(k ) , which filters a signal x(k ) to produce an output dˆ ( k ) which is closest to the desired output d (k ) .
d (k ) x (k )
h(k )
dˆ ( k )
Figure 3.9. Discrete-time Wiener filter
e(k )
134
Modeling, Estimation and Optimal Filtering in Signal Processing
3.3.6.1. Finite impulse response (FIR) Wiener filter Let us take a digital filter with impulse response parameters ^h(i )`i
0, ..., N 1
concatenated in a column vector H N as follows: H TN
>h(0)
" h( N 1)@
[3.96]
Let us also take N samples of the input signal, denoted x(k ) , x(k 1) , …,
xk ( N 1) stored in a vector X TN (k ) as follows: X TN ( k )
>x(k )
" x ( k ( N 1) )
@
[3.97]
The output signal dˆ ( k ) will thus have the following expression:
dˆ (k )
H TN X N (k )
X TN (k ) H N
N 1
¦ hi xk i . i 0
We can define the error signal e(k ) between the desired response and the filter output: e( k )
d ( k ) dˆ ( k )
d ( k ) H TN X
N
d ( k ) X TN (k ) H N
(k )
[3.98]
The mean square error (MSE), denoted J (k ) , is defined as follows:
J (k )
^
E e 2 (k )
`
>
@
2 E ® d (k ) H TN X N (k ) ½¾ ¯ ¿
[3.99]
Expanding the above definition and using the linearity of the mathematical expectation, we obtain: J (k )
^ E^ d
^ ` (k ) ` 2H R
`
^
`
E d 2 (k ) 2H TN E d (k ) X N (k ) H TN E X N (k ) X TN (k ) H N 2
T N
dx
H TN R xx H N
` is the Nu1 cross-correlation vector between the desired response d (k ) and the input signal, and R E ^ X (k ) X ( k ) ` is the where R dx
^
[3.100]
E d (k ) X N (k )
xx
N
T N
Matched and Wiener Filters
135
NuN autocorrelation matrix of the input signal. It should be noted that J (k ) is a definite quadratic form of H N . We can obtain the minimum value of J (k ) by calculating: wJ (k ) wH N
0 H N H opt N
Equivalently: ª wJ (k ) wJ (k ) º " « » w h w h ( 0 ) ( N 1) ¼ ¬
T
0 HN
H opt N
leading to: 2 R xx H opt N 2 R dx
0
or: H opt N
R xx 1 R dx
[3.101]
where H opt N is the optimal Wiener solution. This above equation, which determines the minimum MSE filter coefficients, is the Wiener-Hopf or normal equation. It is the discrete-time equivalent to the Wiener-Hopf equation for continuous-time cases. The estimation of the signal autocorrelation matrix R xx and the cross-correlation vector R dx requires a large number of realizations, which is not realistic in practice. We can calculate the minimum error by replacing filter H opt N by its equivalent expression given in equation [3.101] and inserting it into equation [3.100]:
136
Modeling, Estimation and Optimal Filtering in Signal Processing
^ E^ d E^ d E^ d
` (k ) ` 2 R R (k ) ` R R R (k ) ` R H
2 2 2
T dx
T dx T dx
xx
xx
1
T
R dx R xx 1 R dx
1 T
R dx R Tdx R xx 1
E d 2 (k ) 2 R xx 1 R dx
J min
T T
R xx R xx 1 R dx R dx
[3.102]
dx
opt N
The error, given by equation [3.100], can be expressed in the following form [8]:
>
J min H N H opt N
J (k )
@
T
>
R xx H N H opt N
@
[3.103]
Indeeed, let us expand equation [3.103]: T
J k
J min H TN R xx H N H opt R xx H opt N N
[3.104]
T
H opt R xx H N H TN R xx H opt N N
All the terms of equation [3.104] are scalar quantities and thus equal to their transposes. Thus, we notice that the last two terms are equal to one another. Replacing J min by its expression, equation [3.104] is modified to:
^
`
T
opt T opt T opt J k E d 2 (k ) RTdx H opt N H N Rxx H N H N Rxx H N 2 H N Rxx H N
Similarly, replacing H opt N by its expression [3.101] gives:
J k
^ E^ d
` (k ) ` H
E d 2 (k ) R Tdx R xx 1 R dx H TN R xx H N R xx 1 R dx 2
T N R xx
T
R dx 2 H TN R dx
H N 2 H TN R dx
If we introduce the term V N
H N H opt N , which is the difference between the
filter H N and the optimal Wiener filter H opt N , we can rewrite equation [3.103] as follows:
J (k )
J min V N T R xx V N
[3.105]
Matched and Wiener Filters
137
R xx is positive by definition, since it is an autocorrelation matrix. Thus, T
V N R xx V N t 0 for all V N z 0 . By consequence, J (k ) is a positive quantity. Considering equation [3.103]:
>
J min H N H opt N
J (k )
@
T
>
R xx H N H opt N
@
[3.103]
J (k ) depends on the autocorrelation matrix R xx , specifically on its eigenvalues
^Oi `i 1, ..., N
^ `
and the corresponding eigenvectors \
i
i 1, ..., N
.
We recall that: R xx \
i
O i \ i , i
and the eigenvalue decomposition of the autocorrelation matrix R xx gives: R xx
PDP 1
where D is the diagonal matrix of the eigenvalues and P is the matrix constructed from the orthogonal eigenvectors. We can also choose normalized eigenvectors; this makes P an orthonormal matrix. This last expression is known as the normal form of R xx . Let us consider equation [3.98]: e( k )
d ( k ) X TN ( k ) H N
Multiplying both sides by X
N
( k ) and then calculating the mathematical
expectation, we obtain: X
N
( k ) e( k )
X
N
(k )d (k ) X
N
(k ) X TN ( k ) H N
Thus:
>
E e( k ) X N ( k )
@
R dx R xx H N
[3.106]
138
>
Modeling, Estimation and Optimal Filtering in Signal Processing
We notice that if the filter takes its optimal value, i.e., if H N
E e( k ) X X
N
N
(k )
@
H opt N , then
0 . This means that the error is orthogonal to the signal vector
( k ) . For this reason, the system of equations R xx H N
R dx is called the normal
equations. 3.3.6.2. Infinite impulse response (IIR) Wiener filter In this section, our purpose is to find a digital filter with infinite impulse response hopt (k ) . Given the filter input x(k ) , the filter output dˆ ( k ) satisfies: dˆ (k )
f
¦ hopt i xk i
[3.107]
i f
Let us express the mean square error J(k), by comparing the output dˆ ( k ) of the adaptive filter with the desired output d (k ) : J (k )
^
E e( k ) 2
>
`
@
2 E ® d (k ) dˆ (k ) ½¾ ¯ ¿ 2 ª f º ½° ° E ®« d ( k ) hopt (i ) x(k i )» ¾ i f °¯¬« ¼» °¿
[3.108]
¦
Introducing the terms rxx (i) E^xk x(k i )` , rxd (i) rdd (i ) E^d k d (k i )` , the above equation is written as:
J (k )
f
f f
i f
i f j f
rdd (0) 2
E^xk d (k i )` and
¦ hopt (i)rdx (i) ¦ ¦ hopt (i)hopt ( j )rxx (i j )
[3.109]
To minimize J (k ) , we obtain: f
¦ hopt (i)rxx ( j i)
i f
rdx ( j ) j
hopt ( j ) * rxx ( j )
rdx ( j ) j
This is the discrete Wiener-Hopf equation for non-causal filters.
[3.110]
Matched and Wiener Filters
139
If we apply the discrete Fourier transform to both sides of equation [3.110], the transfer function of the optimal filter becomes:
H opt (Z )
where S dx (Z )
S dx (Z ) S xx (Z ) f
¦ rdx (k )e jZ k
[3.111]
and S xx (Z )
k f
f
¦ rxx (k )e jZ k
are, respectively,
k f
the inter-spectral density and the power spectral density of the input filter signal. Based on the above theoretical analysis, let us exercise this optimal filtering in the context of speech enhancement in the following section. 3.3.7. Application of non-causal discrete Wiener filter to speech enhancement
3.3.7.1. Modified filter expression Let there be a noisy signal or message defined by:
x(k )
s ( k ) b( k )
[3.112]
where s(k ) is the speech signal and b(k ) a zero-mean additive white noise. Then, let us filter the noisy signal x(k ) using a non-causal Wiener filter to enhance the signal s(k ) . This filter is given by equation [3.111]: H opt (Z )
S sx (Z ) S xx (Z )
[3.113]
We suppose that the speech signal and the noise are not correlated: S xx (Z )
S ss (Z ) S bb (Z ) and S sx (Z )
S ss (Z )
[3.114]
The optimal Wiener filter is thus defined as follows:
H opt (Z )
S ss (Z ) S ss (Z ) S bb (Z )
[3.115]
To enhance the speech signal, we first use a Hanning window on the noisy signal x(k ) , giving rise to the windowed signal x w (k ) . The lengths of the analysis frames
140
Modeling, Estimation and Optimal Filtering in Signal Processing
varies between 10 and 30 ms. The different frames may be totally independent, or overlapped. An overlap in the range of 50% to 70% is generally adopted. In Chapter 6, we will provide more details on this matter. In order to obtain the short term estimation Sˆ w (Z ) of the original spectrum
S w (Z ) , we use the Wiener filter on the noisy windowed signal x w (k ) , denoted X w (Z ) in the frequency domain:
Sˆ w (Z ) H opt (Z ) X w (Z )
[3.116]
Finally, using an inverse Fourier transform allows us to estimate the windowed signal. The reconstitution of the enhanced signal is done by applying the additionoverlap method. The Wiener filter is a zero-phase filter. We thus approximate the phase of the estimated signal to be the same as the phase of the noisy signal, because the human ear is insensitive to the phase variations of a signal. To calculate the optimal filter in equation [3.115], we need to estimate the PSD of the speech signal as well as the PSD of the noise. Since the speech signal is nonstationary, we can consider S w (Z )
2
, where S w (Z ) is the short-term Fourier
transform of signal s w (k ) , instead of the signal’s PSD S ss (Z ) . The noise PSD
S bb (Z ) is approximated by averaging B w (Z ) average is denoted B w (Z )
2
2
over several “silent” frames. This
, where B w (Z ) is the short-term Fourier transform of
noise bw (k ) . Taking into account the above considerations, the expression for the Wiener filter is modified to: H opt (Z )
S w (Z ) 2
2
S w (Z ) B w (Z )
[3.117] 2
Combining equations [3.116] and [3.117] and replacing the signal’s PSD by its estimated value, we obtain the following expression:
Matched and Wiener Filters
Sˆ w (Z )
Sˆ w (Z )
141
2
2
Sˆ w (Z ) B w (Z )
X w (Z )
[3.118]
2
The first solution of this equation is the zero solution, while the second is given by: ª 2 2 Sˆ w (Z ) « X w (Z ) B w (Z ) «¬
º » »¼
1/ 2
[3.119]
This expression is similar to that obtained upon the technique called spectral subtraction [2] [3]. To alleviate the musical noise4, a noise-suppression factor D can be considered such that:
Sˆ w (Z )
ª 2 Sˆ w (Z ) « « « 2 2 « Sˆ w (Z ) D B w (Z ) ¬
º » » » » ¼
1/ a
X w (Z )
[3.120]
A variant of the Wiener filter, called the modified Wiener filter, has been developed in [1]. The advantage conferred by this non-iterative approach is that we can reduce the additive noise without introducing any distortion to the speech signal. In that case, the signal’s spectral density is estimated as follows: S ss (Z )
E x Eb S xx (Z ) Ex E x Eb 2 X w (Z ) Ex
[3.121]
where E x and Eb are, respectively, the energies of the noisy signal and of the noise. Equations [3.120] and [3.121] for a = 2 give the following expression for the Wiener filter: 4 A musical noise is any residual noise whose tonal components are “randomly” dispersed in time and frequency.
142
Modeling, Estimation and Optimal Filtering in Signal Processing
H opt (Z )
ª 2 « X w (Z ) « « Ex 2 2 « X w (Z ) D B w (Z ) E x Eb ¬«
º » » » » ¼»
1/ 2
[3.122]
For any given value of a, we can work in an iterative manner [6]. At any 2
iteration i, we estimate the optimal filter using the estimation Sˆ w (Z , i 1) :
H (Z , i )
2 ª Sˆ w (Z , i 1) « « 2 2 « Sˆ w (Z , i 1) E B w (Z ) ¬
^
º » » » ¼
1/ a
[3.123]
`
We can then deduce the estimated Sˆ w (Z , i ) as follows:
Sˆ w (Z , i )
H (Z , i)Yw (Z )
[3.124]
Figure 3.10 shows how to enhance the speech. Sˆ w (Z )
X w (Z ) xw(k)
Short term Fourier Transform
Attenuation at each frequency
Inverse short term Fourier Transform
Cancellation rule
Estimation during “silent” intervals
B w (Z )
2
Figure 3.10. Schematic of speech enhancement
sˆw ( k )
Matched and Wiener Filters
143
3.3.7.2. Experimental results Let us consider the example of speech enhancement inside a high-end vehicle traveling at 130 km/h5 on the highway. Figure 3.11 shows the PSD of the noise recorded in the vehicle. Figures 3.11 and 3.12 show, respectively, recordings of the noisy signal, of the Wiener-filtered signal and of the original signal. 6000
0 -10 Amplitude
Amplitude
-20 -30 -40
0
-50 -60
0
1000
2000 Frequency
3000
4000
-6000 0
500
a)
1000 1500 Samples
2000
2500
b)
Figure 3.11. Noise spectrum (a) and time-domain representation of the noisy signal (b)
6000
Amplitude
Amplitude
6000
0
-6000
0
500
1000
1500 Sample
2000
2500
0
-6000
0
500
a)
1000 1500 Sample
2000
2500
b)
Figure 3.12. Enhanced signal (a) and original speech signal (b)
3.3.7.3. Enhancement using combination of AR model and non-causal Wiener filter Let the signal be represented by a pth-order autoregressive process: s(k )
p
¦ a i s (k i ) u ( k ) i 1
5 These signals have been kindly provided by Matra Company.
[3.125]
144
Modeling, Estimation and Optimal Filtering in Signal Processing
where s (k ) is the speech signal, ^a i `i
1,..., p
are the prediction coefficients, and u(k)
is the white Gaussian zero-mean input with variance V u2 . Moreover, the additive noise b(k) is white Gaussian zero-mean noise with variance V b2 . Let the signal column vector, the noisy observation vector and the AR parameter vector be defined respectively as follows: s [ s ( N 1) " s (0)]T
[3.126]
y [ y ( N 1) " y (0)]T
[3.127]
T [a1 " a p ]T
[3.128]
The method we propose consists of getting the maximum a posteriori estimation of the model parameters. The estimation of the coefficient vector T should thus maximize the a posteriori probability:
Tˆ argmax p ( s | y ) T
[3.129]
Lim introduces an iterative method which leads to a suboptimal solution [6]. At each iteration, two steps are followed: – the coefficients of the AR model are first estimated using the technique of maximum a posteriori:
Tˆ n
arg max p ( s | sˆ n-1 , y,Vˆ u2, n 1 ) T
[3.130]
sˆ n-1 is the estimation of the speech vector at iteration n-1 and Vˆ u2,n 1 is the estimation of the variance of process u(k) during iteration n-1. In practical cases, the correlation method is used to estimate the coefficients. To update the variance of signal u(k), we use the Parseval theorem as follows:
Matched and Wiener Filters
N 2S
S
Vˆ u2,n
³
S
1
2
p
¦ aˆ i,n e
N 1
¦ y 2 (k ) NVˆ b2
dZ
145
[3.131]
k 0
jiZ
i 1
where Vˆ b2 is the estimation of the additive noise’s variance, calculated or updated during the “silent” intervals; – in the second step, we estimate the signal vector of the speech: sˆ n
argmax p ( s | Tˆ n ,y,Vˆ u2,n ) s
This can be carried out using a non-causal Wiener filter:
H (Z , k )
Sˆ ss (Z , k )
[3.132]
Sˆ ss (Z , k ) Vˆ b2
where the estimation of the short term PSD of the speech, at iteration n, is expressed as follows: Sˆ ss (Z , k )
Vˆ u2, n p
[3.133]
2
1 ¦ aˆ i , n e jiZ i 1
The schematic of this method is given in Figure 3.13. b(k ) s (k )
x (k )
sˆ( k ) Wiener filtering Estimation of coefficients
Figure 3.13. A non-causal Wiener filter using an autoregressive signal model
146
Modeling, Estimation and Optimal Filtering in Signal Processing
Even though this approach does not add any musical noise, it presents two major disadvantages: – the number of iterations varies according to the nature of the frame being treated (voiced6, unvoiced7, silence, etc); – distortion of the speech in the enhanced signal can manifest itself if the number of iterations is increased beyond a certain limit. To alleviate the above problems, the following constraints are imposed on the coefficients obtained after each iteration, so as to obtain a stable model: – the poles must be close to the unit circle and to each other in the z-plane; – there should be no abrupt change in the locations of the poles from one speech frame to another, and from one iteration to another. The latter condition is difficult to satisfy in practice, because it requires the permanent calculation of poles at each point in time. To reduce the calculation costs, Hansen and Clements propose the use of line spectrum pairs [5]; see Appendix D for more details. Using the properties of the polynomials P(z) and Q(z), Hansen and Clements propose two types of constraints: interplot and intraplot. The interplot constraints consist of smoothing the position coefficients using a triangular window. The width of the window is chosen depending on the type of speech frame (voiced, unvoiced, silence, etc.) The position coefficients are closely related to the positions of the shapers. For the first coefficient, the smallest width of the window is chosen so as not to disturb the perceptual characteristics of the speech. As far as intraplot constraints are concerned, they may be imposed on the position coefficients or on the autocorrelation function. 3.4. References [1] L. Arslan, A. McCree, V. Viswanathan, “New Methods for Adaptive Noise Suppression”, IEEE-ICASSP ‘95, Detroit, Michigan, USA, vol. 1, pp. 812-815, 8-12 May 1995. [2] M. Berouti, R. Schwartz, J. Makhoul, “Enhancement of Speech Corrupted by Acoustic Noise”, IEEE-ICASSP ‘79, Washington DC, USA, pp. 208-211, 2-4 April 1979. [3] S. F. Boll, “Suppression of Acoustic Noise in Speech Using Spectral Substraction”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. no. 27, pp. 113-120, April 1979.
6 A sound is called “voiced” if its production is accompanied by a vibration of the vocal
chords. It possesses a quasi-periodic character. Examples of such sounds are vowels. 7 A sound is called “unvoiced” if no vibration of the vocal chords accompanies the production
of the sounds. Examples are the sounds /p/, /t/ or /k/.
Matched and Wiener Filters
147
[4] H. L. Van Trees, Detection, Estimation and Modulation Theory, Part I, John Wiley & Sons, Inc., 1968. [5] J. H. L. Hansen, M. A. Clements, “Iterative Speech Enhancement with Spectral Constraints”, IEEE-ICASSP ‘87, Dallas, Texas, vol. 1, pp. 189-192, 6-9 April 1987. [6] J. S. Lim, A. V. Oppenheim, “Enhancement and Bandwidth Compression of Noisy Speech”, Proceedings of the IEEE, vol. 67, no. 12, pp. 1586-1604, December 1979. [7] J. B. Thomas, An Introduction to Statistical Signal Processing, John Wiley & Sons, Inc., 1969. [8] B. Widrow, S. D. Stearns, Adaptive Signal Processing, Englewood Clifs, Prentice Hall, 1985. [9] N. Wiener, Extrapolation, Interpolation and Smoothing at Stationary Time Series, Wiley & Sons, New York, 1949. [10] L. A. Zadeh, J. R. Ragazzini, “Optimum Filters for the Detection of the Signals in Noise”, Proc. IRE, vol. 40, pp. 1223, 1952.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Chapter 4
Adaptive Filtering
4.1. Introduction The classification of adaptive algorithms can follow various rules. Nonetheless, all recursive approaches can be written under the following generalized form:
T (k 1) T (k ) K (k ) F T (k ), x(k )
[4.1]
where all the parameters combined in vector T are updated using the function F ( . ) . This function is specific for each particular algorithm and generally depends on a state vector x(k ) . The parameter K (k ) is a weighting coefficient. Its expression depends on the particular algorithm that is studied. In addition, K (k ) may be used to respect a particular optimization criterion, to ensure the convergence of the algorithm, etc. This chapter is dedicated to recursive algorithms which require no prior information. These algorithms are versatile: they adjust themselves according to the statistical analysis carried out on the observed signals. For our purposes, an adaptive filter will be defined as a digital filter whose coefficients are updated over time according to the appropriate criteria. As shown in Figure 4.1, X N k is the vector which concatenates the last N values, up to the instant k, of the input signal; y(k) denotes the filter output and d(k) the desired
150
Modeling, Estimation and Optimal Filtering in Signal Processing
response. We have to find the optimal filter that makes it possible to minimize the error e(k), with respect to appropriate criteria, between the filter output and the desired response [13] [16] [23]. d(k) X N(k)
y(k) HN(k-1)
- + e(k)
Figure 4.1. Adaptive filter
The least-squares criterion is the most often used for this filter because it leads to simple results. The filters can either be finite impulse response (FIR) filters or infinite impulse response (IIR) filters. As for their structure, it is either transversal or lattice. In what follows, we will pay close attention to Nth-order adaptive FIR filters. The aim is to progressively reduce the difference between the optimal filter H opt N and
the adaptive filter H N k by successive iterations, starting from arbitrary initial values for the adaptive filter coefficients. Two main approaches can be considered for solving the least squares question: – the first of these is based on the recursive least squares (RLS) algorithm. This algorithm is also called the exact least squares algorithm1; – the second approach is based on the stochastic gradient method and is generally known in signal and image processing as the least mean squares (LMS) algorithm. We will emphasize this second approach and present the following two variants of the LMS: - the normalized least mean squares (NLMS), - the affine projection algorithm (APA). We will also compare the LMS to the exact least squares algorithm. 1 This elaboration is similar to that of the recursive estimation of the AR parameters presented in Chapter 2. To make use of the homogenous notations, we will briefly take up the RLS estimator here. Moreover, this will allow a direct comparison with the performance of the second approach presented above.
Adaptive Filtering
151
The stochastic gradient family of algorithms is very popular thanks to the simplicity of its implementation. This approximation is far too problematic to be mathematically legitimate. It has been analyzed from different probabilistic viewpoints, ranging from the martingale theory to the theory of large deviations (the probabilistic analysis of rare events), by relating it to the stochastic approximation theory [1] [20] [24] or to the ordinary differential equation (ODE) analysis [14] [15]. More recently, researchers at Stanford have shown that LMS is optimal in the sense of the Hf norm [11]. We will notice as we go along that the different algorithms are distinguished by their: – precision; – numerical complexity; – convergence speed while treating the variation of the difference between the optimal filter and the adaptive filter constructed from successive iterations. Moreover, their behavior depends on the type of input and on the presence or absence of additive noise. The use of these algorithms will be illustrated through the estimation of the AR parameters and through speech enhancement. 4.2. Recursive least squares algorithm We will first present the RLS family of algorithms. Since 1950, a vast amount of work has been carried out on this topic. It is generally accepted that Plackett [19] first introduced them. Since the 1970s, some faster versions of this algorithm have been developed. 4.2.1. Exact RLS method Throughout this section, we will use the following notation:
H N (k )
>h0 (k )
" h N 1 (k )@ T
[4.2]
X N (i )
>x(i)
" x(i N 1)@ T
[4.3]
152
Modeling, Estimation and Optimal Filtering in Signal Processing
In Figure 4.1, we see that the filter output signal y (i / k ) at time instant i satisfies: N 1
¦ hl k xi l
y (i / k )
[4.4]
l 0
T
H N (k ) X N (i)
T
X N (i ) H N (k )
The least squares criterion J (k ) consists of minimizing the sum of the squares of the elementary errors between the desired response and the filter output, i.e.: k
¦ >d (i) y(i / k )@ 2
J (k )
[4.5]
i 1
Combining equations [4.4] and [4.5], we obtain: k
k
i 1 k
i 1
¦ d 2 (i) 2¦ H N T (k ) X N (i)d (i)
J (k )
[4.6]
¦ H N (k ) X N (i ) X N (i ) H N (k ) T
T
i 1
Using the same approach we previously described in Chapter 2, we can determine the filter H N (k ) which allows J k to be minimized: – firstly, J k , the gradient of J k , defined as the column vector of the N partial derivatives of J k with respect to the filter coefficients h j (k ) j 1,..., N ,
^
`
should be zero: wJ k wh j
0 j
0,..., N 1
[4.7]
– moreover, the NuN Hessian matrix H J composed of the second partial derivatives
w 2 J k of J k , must be positive definite. whi wh j
Adaptive Filtering
153
Taking into account equations [4.6] and [4.7], we obtain the following equality2:
R xx (k ) H N (k ) R xd (k )
0
[4.8]
¦ X N (i) X N T (i)
[4.9]
where: k
R xx (k )
i 1
and: k
R xd (k )
¦ X N (i)d (i)
[4.10]
i 1
If R xx (k ) is invertible, we can express H N (k ) as follows: H N (k )
R xx 1 (k ) R xd (k )
[4.11]
We must find a recursive estimation procedure for the filter H N (k ) . More specifically, we must derive H N (k 1) , which considers k + 1 measurements, from the impulse response H N (k ) of the filter based on k measurements. Thus, we have: H N (k 1)
R xx 1 (k 1) R xd (k 1)
[4.12]
where: R xx (k 1)
k 1
¦ X N (i) X N T (i)
[4.13]
i 1
T
R xx (k ) X N (k 1) X N (k 1)
2 In this case, the Hessian matrix of
is therefore positive definite.
J k corresponds to the autocorrelation matrix of x, and
154
Modeling, Estimation and Optimal Filtering in Signal Processing
and: k 1
¦ X N (i)d (i)
R xd (k 1)
[4.14]
i 1
R xd (k ) X N (k 1)d (k 1)
Taking equations [4.13] and [4.14] into account, equation [4.12] becomes: H N (k 1)
>R
xx ( k )
X N (k 1) X N T (k 1)
>R xd (k ) X N (k 1)d (k 1)@
@
1
[4.15]
At this stage, let us introduce the inversion lemma of a matrix. Consider matrix A given by: A
B CD T
[4.16]
whose inverse is given by: A 1
B 1 B 1C I D T B 1C
1
D T B 1
[4.17]
If we substitute: B
R xx (k ) and C
D
X N (k 1)
ª «¬
T
º »¼
applying the inversion lemma to the matrix R xx ( k ) X N ( k 1) X N ( k 1) of equation [4.15] gives: ° 1 R xx 1 (k ) X N (k 1) X N T (k 1) R xx 1 (k ) ½° R ( k ) ¾ ® xx 1 X N T (k 1) R xx 1 (k ) X N (k 1) °¿ °¯ >R xd (k ) X N (k 1)d (k 1)@
H N (k 1)
A straightforward development of the calculation gives:
[4.18]
Adaptive Filtering
H N (k 1)
H N (k )
R xx 1 (k ) X N (k 1) T
155
1
1 X N (k 1) R xx (k ) X N
>d (k 1) X (k 1)
T N
(k 1) H N (k )
@
[4.19]
The above equation can also be written in a compact form by introducing a weighting factor to account for the update brought about by d (k 1) . This weighting factor is called the gain and denoted K N (k 1) : H N (k 1)
>
H N (k ) K N (k 1) d (k 1) X N T (k 1) H N (k )
@
[4.20]
According to the above equation, the estimation of the parameters is updated using only the current measurement d (k 1) and the gain K N (k 1) . This filter, however, is not as efficient in a non-stationary environment. To improve its tracking capabilities for non-stationary signals, two variants of this algorithm have been proposed: the “forgetting factor” RLS method and the “sliding window” method. We will now present the first of these two methods. 4.2.2. Forgetting factor RLS method
In equation [4.6] defining the error-minimization criterion J, we can see that all the basic errors have the same weight. We can attribute a higher weight to the last measurements by weighting each basic error e 2 (i / k ) by a factor Ok i such that 0 O d 1 : k
J
¦ Ok i >d (i) y(i / k )@ 2
[4.21]
i 1
The impulse response vector of the filter H N (k ) is given by setting: J
0
[4.22]
156
Modeling, Estimation and Optimal Filtering in Signal Processing
Thus, the filter itself is expressed by the following relation:
>
H N (k ) K N (k 1) d (k 1) X N T (k 1) H N (k )
H N (k 1)
@
[4.23]
where: K N (k 1)
R xx 1 (k ) X N (k 1)
O X N T (k 1) R xx 1 (k ) X N (k 1)
[4.24]
and:
R xx 1 k 1
>I K
T N ( k 1) X N k 1
@R
xx
1
k
O
[4.25]
Equation [4.23] corresponds to equation [2.61] for the least-squares recursive estimation of the AR parameters. The coefficients of the adaptive filter thus correspond to the AR parameters to be estimated. 4.3. The least mean squares algorithm
Let us consider criterion J , which aims at minimizing the mean square error between the desired response d (k ) and the output of the adaptive filter y (k ) : J
H EQM k
>
E ® d (k ) H N T (k 1) X N (k ) ¯
@
2½
¾ ¿
d 2 (k ) 2 H N T (k 1) E^ X N (k )d (k )`
^
`
H N T (k 1) E X N (k ) X N T (k ) H N (k 1)
where E^.` is the mathematical expectation.
[4.26]
Adaptive Filtering
157
We can define the optimal filter by proceeding in a manner similar to those adopted in Chapters 2 and 3. The optimal filter H opt N should respect two conditions: – that the gradient of J , J , defined as the column vector of the N partial derivatives of J with respect to the filter coefficients h j (k ) j 1,..., N , is zero, i.e.
^
wJ wh j
0 j 1,..., N or J
`
0
[4.27]
– that the NuN Hessian matrix of J , composed of the second partial derivatives 2
w J of J , is positive definite. whi wh j
Using the two previous conditions, we obtain the following relation for the optimal filter: H opt N
^
`
E X N (k ) X N T (k )
1
E ^ X N (k )d (k )`
R xx 1 R xd
[4.28]
where R xx stands for the autocorrelation matrix of the input signal x(k ) and R xd is the cross-correlation vector between the filter input and the desired response. Equation [4.28] can be related to the Yule-Walker equations obtained in Chapter 2, as well as to the Wiener filter equation in Chapter 3. The adaptive least mean squares algorithm was introduced by Widrow and Hoff in 1959, and it allows us to recursively obtain a suboptimal filter by using the gradient-type optimization method. The “steepest descent” method makes it possible to look for the minimum in the mean square error hyperspace, by going in the opposite direction to that of the gradient (for more details, the reader is referred to Appendix G). The filter coefficient vector H N k is subsequently updated as follows: H N k H N k 1
D 2
J
[4.29]
158
Modeling, Estimation and Optimal Filtering in Signal Processing
As we will see below, the step size D adjusts the convergence of the algorithm. The gradient of the mean square error is given by the following expression: J
^
` (k 1)@ `
2 E^ X N (k )d (k )` 2 E X N (k ) X N T (k ) H N (k 1)
^
>
T
2 E X N (k ) d (k ) X N (k ) H N
[4.30]
By combining the two above equations, and observing that: ek d (k ) H N (k 1) T X
N
(k )
[4.31]
it follows that: H N k H N k 1 DE^X N k ek `
[4.32]
However, this updating of the filter’s coefficients, which corresponds to a “deterministic” gradient algorithm, has a limited potential because it is difficult to evaluate the quantity E ^ X N k e k ` . For this reason, this quantity is replaced by its instantaneous value. The adaptive filter’s coefficients can be updated as follows: H N k H N k 1 D X N k ek
[4.33]
At first sight, this procedure is debatable. However, since the algorithm itself is recursive, it calculates the temporal averaging by successive iterations. The convergence of the LMS algorithm has been the focus of several research efforts, best surveyed and resumed in [16]. From equations [4.31] and [4.33], we notice that the LMS algorithm benefits from a low calculation cost in terms of the number of addition and multiplication operations. This complexity is directly proportional to the filter order. Thus, for a filter order of N, it needs 2N+1 multiplications and 2N additions at each iteration. As opposed to the deterministic gradient, the LMS does not allow us to attain the minimum of the EQM, because of fluctuations induced by the estimated gradient. These fluctuations have a zero mean and bounded variance which is proportional to the step size [3]. In terms of the dynamic range of the optimal filter’s difference, the LMS algorithm is unable to retain the value of the optimal filter on a permanent basis [7].
Adaptive Filtering
159
Indeed, taking up equation [4.33] for the filter and replacing the error by its expression [4.31], we obtain:
>
H N k H N k 1 D X N k d (k ) H N (k 1) T X N (k )
Being a scalar quantity, H N (n 1) T X N (n)
@
[4.34]
X N T (n) H N (n 1) . Therefore, the
above equation is modified to: H N k H N k 1 D X N k X N (k ) T H N (k 1) D X N k d (k )
[4.35]
By introducing the optimal filter H opt N , which is the solution for the WienerHopf equation, and setting: H N (k ) H opt N
' H N (k )
[4.36]
equation [4.35] gives rise to: H N ( k ) H opt N
' H N (k )
H N k 1 D X
N
k X N (k ) T H N (k 1) D X N k d (k ) H opt N
[4.37]
Equivalently:
' H N (k )
>I D X >I D X
@H k 1 D X k d (k ) H (k ) X (k ) @' H (k 1) D X (k )>d (k ) X ' H (k 1) D X (k )>d (k ) X (k ) H @ T N (k ) X N (k )
Tk , k 1
N
N
N
T
N
opt N
N
N
N
T N
N
T opt N (k ) H N
@
[4.38]
opt N
with:
Tk , k 1
>I D X
N
(k ) X
N
(k ) T
@
[4.39]
Moreover, bm k d (k ) X TN (k ) H opt N corresponds to the model noise. If the algorithm attains the Wiener solution at iteration k 0 , i.e. H N (k 0 )
H opt N ,
the difference between the optimal filter and the adaptive filter at the next iteration is not zero because:
160
Modeling, Estimation and Optimal Filtering in Signal Processing
' H (k 0 1) DX N (k 0 )bm (k 0 ) z 0
[4.40]
This difference between the adaptive filter and the Wiener filter is illustrated in Figure 4.2:
bm(k) +
H opt N
+
d(k) +
X N (k )
e(k)
-
__ HHNN (k k 1)
Figure 4.2. Block-level description of the LMS adaptive filtering
From the above study of the difference between the LMS and the optimal filters, we can deduce the relation that D should satisfy for the algorithm to converge. By averaging the terms on both sides of equation [4.35], we obtain:
>
E H N ( k ) H opt N
@
I D Rxx E > H N (k 1) H opt N @
if and only if:
>
@
E X N k X TN k H N k 1
>
@
E X N k X TN k E > H N k 1 @ .
If we define D and P as follows: D
diag >O1 " O N @ where ^Oi `i
1,..., N
are the eigenvalues of R xx
and:
P
^ `
ª\ " \ º where \ i N» «¬ 1 ¼
i 1,..., N
are the unit-norm eigenvectors,
[4.41]
Adaptive Filtering
161
equation [4.41] is modified as follows:
>
E H N k H opt N
@
>
P ( I D D) P T E H N k 1 H opt N
@
[4.42]
Multiplying both sides by P T , we obtain:
^
P T E >H N k @ H opt N
`
^
P T P( I D D) P T E >H N k 1 @ H opt N ( I D D) P
T
>
^E>H
N
k 1 @ H
opt N
`
`
[4.43]
@
leads to: Substituting U N k P T E H N k H opt N U N (k )
( I D D)U N (k 1)
[4.44]
The ith component of vector U N (k ) , denoted u i (k ) , should satisfy the relation: u i (k )
(1 DO i )u i (k 1) i [1, N ]
[4.45]
The above is a scalar first-order difference equation. By successive substitutions, we can express u i (k ) starting from u i (0) as follows: u i (k )
(1 DO i ) k u i (0) i [1, N ]
[4.46]
The demonstration of the mean convergence of the LMS algorithm towards the optimal solution H opt N
is thus reduced to a demonstration that the error
ui (k ) converges towards zero i [1, N ] .
The eigenvalues ^Oi `i
1,..., N
of R xx , which are real and positive because R xx is a
real, symmetric positive definite matrix, should satisfy the following convergence condition: 1 DO i 1 i [1, N ]
[4.47]
Rearranging the terms, we obtain: 0D
2
Oi
i [1, N ]
[4.48]
162
Modeling, Estimation and Optimal Filtering in Signal Processing
Thus, the necessary and sufficient condition for the convergence of the LMS algorithm makes use of the largest eigenvalue of the input signal autocorrelation matrix R xx : 0 D
2
[4.49]
O max
We can define a condition for the choice of the step size which would be easiest to handle. Let us first recall the trace of the input signal autocorrelation matrix: trace R xx
¦ Oi tO max
[4.50]
i
Given [4.50], a more restrictive definition of equation [4.49] can be written as follows: 0D
2 2 trace R xx O max
[4.51]
Let us introduce W j as a time-constant term defined as the time needed so that U N (W j)
exp 1^U N (0)` . For each u i (k ) and its associated eigenvalue Oi , we
obtain: u i (W j )
W
(1 DO i ) j u i (0)
exp 1^u i (0)`
i [1, N ]
[4.52]
Thus:
W j log1 DO i 1
[4.53]
if Oi 1 , log1 DO i | DO i and W j DO i | 1 . Therefore:
Wj |
1
DO i
[4.54]
Adaptive Filtering
163
The time constant W j thus depends on the spread of the eigenvalues, and will be driven by the smallest eigenvalue of the signal autocorrelation matrix, i.e. 1 Wj | .
DO min
If signal x(k ) is a white zero-mean sequence with variance V x2 , the above condition is modified to: 0D
2
NV x2
.
Indeed, we have: traceR xx
N
¦ Oi i 1
NR xx 0
>
@
NE x 2 k
NV x2 .
Each eigenvalue is associated with a mode of convergence called the “eigenmode”. It should be noted that the eigenmodes associated with the small eigenvalues of R xx converge more slowly than those associated with the large eigenvalues. The convergence speed, proportional to the step size, is inversely proportional to the spread of the eigenvalues and is independent of the initial conditions. Consequently, the speed of convergence is governed by the smallest eigenvalue of R xx . In addition, the convergence is not uniform. This is a major drawback of the LMS algorithm. The choice of step size D is the basis for defining two families of the LMS algorithms. Indeed, D can have a constant value or can be variable. Application example: let us estimate the AR parameters of a second-order AR 2 1 and a 2 , assuming that 1,000 samples of the process defined by a1 2 4 realization of the signal are available.
164
Modeling, Estimation and Optimal Filtering in Signal Processing
4
3
2
1
0
-1
-2
-3
-4
-5
0
500
1000
Result averaged over 100 realizations
1500
Result for one realization
a1
0.8
0.6
0.4
a2
0.2
0
-0.2
0
200
400
600
800
1000
1200
1400
Number of iterations Figure 4.3. LMS-based parameter estimation for a second-order AR model ( D =10-2)
Adaptive Filtering
165
0.8
a1 0.6
0.4
a2
0.2
0
-0.2
0
200
400
600
800
1000
1200
1400
Number of iterations Figure 4.4. LMS-based parameter estimation of a second-order AR model ( D =4.5u10-3)
0.8
a1 0.6
0.4
a2
0.2
0
-0.2
0
200
400
600
800
1000
1200
1400
Number of iterations
Figure 4.5. LMS-based parameter estimation of a second-order AR model ( D =4.5u10-3). Other initial conditions similar to the case presented above
166
Modeling, Estimation and Optimal Filtering in Signal Processing
Figures 4.3 and 4.4 provide an example of the estimation of the parameters of a second-order AR process for two different values of the step size D . We note that as D increases, so does the convergence speed. However, this comes at the cost of a higher estimation variance. Thus, a trade-off has to be made between the convergence speed and the precision of the estimation. Moreover, as can be observed in Figure 4.5, the initial conditions do not play a significant role in the recursive estimation of the AR parameters. Note: LMS-based estimation of a noisy signal’s AR parameters If the AR process is disturbed by an additive zero-mean white noise with a variance V b2 , the least-squares estimation of the AR parameters using noisy observations z (k ) becomes biased (see Chapter 2). To solve this bias problem, modified versions of the LMS filter have been developed over the past few years. They will be introduced in the following and are respectively named: – the Ȗ-LMS algorithm; – the ȡ-LMS algorithm; – the ȕ-LMS algorithm. Treichler et al. have proposed the Ȗ-LMS algorithm [22]. This is a recursive resolution of the noise-compensated Yule-Walker equations using a gradient-type algorithm. The ȡ-LMS algorithm is based on the updating of the AR parameters with a gradient-type equation [26]. This equation is itself based on a preliminary estimation of the process using the last estimated AR parameters. By means of a statistical study, the authors demonstrate that the variance of the parameters obtained using the ȡ-LMS filter is lower than for those obtained using the Ȗ-LMS algorithm. Nevertheless, to implement the Ȗ-LMS and ȡ-LMS algorithms, the variance of the additive noise should be known beforehand. To alleviate this, Zhang et al. [27] propose the joint estimation of the AR parameters and variance V n2 , using the ȕLMS algorithm. This leads to unbiased AR parameters. Nevertheless, all the above recursive approaches require a large number of samples, i.e., in the order of several thousand, to compensate for the effects of the noise. The speed of the convergence of the LMS algorithm decreases rapidly in case of correlated inputs. One alternative is to use least-squares algorithms. Other solutions are based on transformations in the frequency domain [5] [17]. This latter class of solutions brings two improvements: an increase in the convergence speed and a
Adaptive Filtering
167
reduction in complexity. In the same context, we can also mention the use of other transforms such as the Hartley transform, or the wavelet-based decomposition [4] [25]. When the noise happens to be a pulse noise, its influence can be reduced by using a combination of adaptive filters and nonlinear filters such as order filters. This provides a robust estimation of the noisy signal parameters besides eliminating the pulse noise [12] [21]. LMS has a privileged status in the family of recursive algorithms. It is the simplest in terms of implementation and does not require prior information. Usually, it is not considered as a part of the optimal algorithm family, but Kailath et al. demonstrated in 1996 that the LMS algorithm is optimal as regards the norm H f [10] [11]. 4.4. Variants of the LMS algorithm
As we mentioned earlier, the main advantage of the LMS is its low calculation complexity. However, the traditional LMS suffers from the following problems: – the convergence speed depends on the step size D . The smaller the D , the slower the convergence will be. Moreover, the variance of the residual error is also proportional to D . A trade-off thus has to be found while choosing the step size: convergence speed versus precision. Moreover, this requires a priori information about the input signal autocorrelation matrix; – even though the convergence of the LMS is independent of the initial conditions, it is not uniform and depends on the eigenvalue spread. This explains in part the slow speed of the LMS to converge. To reduce the effects of these two shortcomings, several improved versions of the LMS algorithm have been introduced. The normalized least mean squares was presented independently by Naguno and Noda on the one hand, and by Albert and Gardner in 1969 on the other. It was only in 1980 that the name NLMS was proposed by Bitmead and Anderson [2]. The modified least mean squares (MLMS) algorithm is another recursive method for the estimation of the input signal power. Finally, we will mention the affine projection algorithm. 4.4.1. Normalized least mean squares (NLMS)
In 1984, Goodwin and Sin [9] viewed the NLMS algorithm as the resolution of a constrained optimization problem. Stated as a formal definition: given an
168
Modeling, Estimation and Optimal Filtering in Signal Processing
observation vector X N (k ) and the desired response d(k), the aim is to determine the coefficient vector of the adaptive filter at iteration k, while minimizing the Euclidean norm of the difference between H N (k ) and H N (k 1) , and while respecting the following condition: H N T (k ) X N (k )
d (k ) .
We should thus minimize: H N (k ) H N (k 1)
2
>H N (k ) H N (k 1)@ T >H N (k ) H N (k 1)@ [4.55]
¦ >h j (k ) h j (k 1)@ 2 N
j 1
with the following constraint: H N T (k ) X N (k ) d (k )
N
¦ h j (k ) x(k j 1) d (k )
[4.56]
0
j 1
In order to solve this constrained optimization problem, we use the Lagrange multiplier method. This method consists of introducing an unknown scalar P linked to the constraint. This scalar quantity is called the Lagrange multiplier3. We then consider a criterion corresponding to a linear combination of criterion [4.55] and constraint [4.56]. Thus, the quantity that we seek to minimize is: J
N
ª
j 1
¬j 1
º
¦ >h j (k ) h j (k 1)@ 2 P ««¦ h j (k ) x(k j 1) d (k )»»
The filter H N (k ) wJ whi (k )
N
>h1 (k ), " , h N (k )@T
[4.57]
¼
is obtained by solving the equation:
2>hi (k ) hi (k 1)@ Px(k i 1)
0 i >1, N @
[4.58]
3 We have already used a constrained optimization approach in Chapter 3 above, while seeking the expression of the Wiener filter.
Adaptive Filtering
169
or: 2>hi (k ) hi (k 1)@ Px(k i 1) i >1, N @
[4.59]
Multiplying both sides of the above equation by x(k i 1) i >1, N @ , adding the intermediate results, and using equation [4.57], we obtain: 2
P
N
¦
x 2 (k i 1)
i 1
2 X N (k )
2
2 X N (k )
2
N ª º «d (k ) hi (k 1) x(k i 1)» i 1 ¬« ¼»
¦
>d (k ) X
T N
(k ) H N (k 1)
@
[4.60]
e( k )
If we substitute the value of P in equation [4.59], we obtain the following recursive equation: H N k H N k 1
1 X N (k )
2
X
N
k ek
[4.61]
or, by introducing the scalar quantity D , we obtain: H N k H N k 1
D X N (k )
2
X
N
k ek
[4.62]
Finally, by introducing a normalized error e(k) using an Euclidean norm, we get the following form for the algorithm: ek d (k ) X N (k ) T H N (k 1)
H (k )
>X
N
(k ) T X N (k )
@
1
e( k )
H N k H N k 1 D X N k H k
[4.31]
[4.63]
[4.64]
170
Modeling, Estimation and Optimal Filtering in Signal Processing
However, when X N (k ) is small, stability issues have to be considered due to the limited precision of the digital calculations. To alleviate this, we introduce a regularization scalar E . H N is thus updated as follows: H N k H N k 1
D E X N (k )
2
X N k ek
[4.65]
The above equation justifies the term “normalized LMS”. The product of the error ek and the input signal sample vector X N (k ) is normalized with respect to the Euclidean norm of vector X N (k ) . We can study the dynamic range of the difference between X N (k ) and H opt N of the optimal filter and determine the convergence condition of the NLMS algorithm. Using the same procedure as followed in equations [4.36]-[4.51] above, we obtain:
' H N (k )
>
@
ª I D E X (k ) T X (k ) 1 R º ' H (k 1) xx » N N N «¬ ¼
>
D E X N (k ) T X N (k )
@
1
bm (k )
>
@
leads to: Moreover, imposing U N k P T E H N k H opt N U N (k )
>
@
1 ª º T D » U N (k 1) « I D E X N (k ) X N (k ) ¬ ¼
[4.66]
where D is the diagonal eigenvalue matrix of R xx and P is the associated eigenvalue matrix. We can show that the algorithm converges if and only if: 0
D O i 2, i E trace( R xx )
[4.67]
Introducing the largest eigenvalue O max of the correlation matrix R xx , a more restrictive condition is given by: 0D
2> E trace ( R xx )@
O max
[4.68]
Adaptive Filtering
171
In practice, another condition is used which, though much more restrictive, requires less prior knowledge on the signal: 0D 2
[4.69]
Given equations [4.31], [4.64] and [4.66], the NLMS algorithm is slightly more complex than the simple LMS. Indeed, it requires 3N additions, 3N+1 multiplications and one division. This calculation cost can be reduced if we avoid the calculation of the input vector Euclidean norm at each iteration. The above-mentioned simplification is the basis of the modified least mean squares algorithm: H N k
H N k 1
D X
N (k )
2
E
X N k ek
[4.70]
where:
X N (k )
2
X N (k 1)
2
x 2 k x 2 k N
[4.71]
The MLMS algorithm is based on a recursive method of estimating the input signal’s power. In practice, however, we avoid storing the additional value of the input signal required to rigorously calculate, in a recursive manner, the input signal power. Instead, we introduce a factor J , lying between 0 and 1. The coefficients can be updated as follows:
S x (k ) JS x (k 1) 1 J x 2 k H N k
H N k 1
D S x (k ) E
[4.72]
X
N
k ek
[4.73]
1 . The algorithm requires only 2N+3 N multiplications, 2N+2 additions and 1 division. For example, we can choose J
1
172
Modeling, Estimation and Optimal Filtering in Signal Processing
4.4.2. Affine projection algorithm (APA)
Let us consider the NLMS equation: H N (k )
>
ª I D X (k ) X T (k ) X (k ) N N N «¬
>
T
D X N (k ) X N (k ) X N (k )
@
1
@
1
X N T (k )º» H N (k 1) ¼
[4.74]
d (k )
The column vector X N (n) gives rise to a projection matrix on the vector space. This matrix is given by:
>
X N (k ) X N T (k ) X N (k )
@
1
X N T (k ) .
We can define the projection of H N (n 1) on this vector space as follows:
>
X N ( n) X N ( n ) T X N ( n )
@
1
X N (n) T H N (n 1) .
In the NLMS algorithm, the update of the filter’s coefficients can be understood as a one-dimensional affine projection. The APA is a generalization of the NLMS. It consists of considering L observers [18]. To elaborate upon this analogy, let us suppose that the response is no longer a scalar quantity, but a vector d L (k ) storing L consecutive desired responses:
d L (k )
ª d (k ) º » « # » « «¬d (k L 1)»¼
X N , L (k ) is no longer a vector of N consecutive samples of the input signal, but
a matrix with N rows and L columns, defined as follows:
x(k 1) x(k 2) ª x(k ) « x(k 1) x(k 2) x(k 3) « « X N , L (k ) x(k 2) x(k 3) x(k 4) « # # # « « x(k N 1) x(k N ) x(k N 1) ¬
x(k L 1) º x(k L) »» x(k L 1) » " » % # » " x(k L N 2)»¼ " "
[4.75]
Adaptive Filtering
173
As with the NLMS, the APA algorithm can be described by the three following steps: first, we calculate the error vector; then we deduce the normalized error vector; finally, this normalized vector is used to update the filter’s coefficients: e L k d L (k ) X N , L T (k ) H N (k 1)
H L (k )
>X
N , L (k )
T
X N , L (k ) E I L
@
1
[4.76]
[4.77]
e L (k )
H N k H N k 1 D X N , L k H L n
[4.78]
Here H L (k ) is no longer a scalar, but a vector with dimensions Lu1. We can observe the dynamic range of the difference between HN and the optimal filter. Like the LMS, the APA algorithm cannot lock the optimal filter’s values. Thus,
' H N (k ) Tk , k 1 ' H N (k 1)
>
D X N , L (k ) X N , L T (k ) X N , L (k ) E I
@
1
b L, m (k )
[4.79]
with: Tk , k 1
>
T ªI D X N , L (k ) X N , L (k ) X N , L (k ) E I «¬
@
1
X N , L T ( k ) º» ¼
[4.80]
and:
' H N (k )
H N (k ) H opt N
[4.36]
Even though this algorithm corresponds to the optimal Wiener filter at iteration k 0 , the Wiener and the adaptive filters are not the same at the next iteration. The APA does not lock itself but oscillates around a mean value with a certain variance. As we noted earlier, the study of the convergence of adaptive algorithms can be based on the study of the dynamic range of the difference between the optimal filter and the adaptive filter.
174
Modeling, Estimation and Optimal Filtering in Signal Processing
^Oi `i 1,..., L
Let
and
^\ `
i i 1,..., L
denote, respectively, the eigenvalues of
X N , L T (k ) X N , L (k ) and the associated eigenvectors. Let us look at the singular
value decomposition of the data matrix X N , L (k ) : X N , L (k ) U (k )6(k )V T (k )
[4.81]
Matrices U (k ) and V (k ) have dimensions NuN and NuL respectively. Additionally, 6(k ) is a NuN diagonal matrix comprised of N-L zeros and L singular non-zero values arranged in decreasing order. Let Q (k ) be the matrix expressed as follows: Q (k )
X N , L (k )[ X N , L T (k ) X N , L (k ) E I ] 1 X N , L T (k )
[4.82]
Using the eigenvalue decomposition of X N , L T (k ) X N , L (k ) and the singular value decomposition of X N , L (k ) , we obtain: ª O1 «O E « 1 « 0 « Q(k ) U (k ) « # « # « « # « ¬ 0
If
E
0 % % 0 0 "
" %
OL
OL E % 0 "
º " " 0» » 0 0 #» % 0 # »U T (k ) » 0 % # »» % % 0» » " 0 0¼
is considered to be negligible compared to the smallest eigenvalue
min ^Oi ` of X N , L T (k ) X N , L (k ) , it follows that:
i 1,..., L
[4.83]
Adaptive Filtering
ª1 0 " " " 0 º «0 % % 0 0 # » « » «# % 1 % 0 #» T Q (k ) U (k ) « »U ( k ) # % % # 0 0 « » « # 0 0 % % 0» « » ¬0 " " " 0 0¼
175
[4.84]
Consequently, if we take the L=N observation vectors into account, we obtain: X N , L (k )[ X N , L T (k ) X N , L (k ) E I ] 1 X N , L T (k ) | U ( k ) IU T ( k ) | I .
[4.85]
Imposing:
>
@
U N k U T (k ) H N k H opt , N
we obtain: U N (k )
>I D Q(k )@ U N (k 1)
>
D U T (k ) X N , L (k ) X N , L T (k ) X N , L (k ) E I
@
1
b L, m (k )
[4.86]
Moreover, when G is negligible compared to the smallest of the eigenvalues of X N , L T (k ) X N , L (k ) , L=N and the model’s noise vector b L ,m (k ) is zero, we have: U N (k ) | >1 D
@ U N (k 1)
[4.87]
Still taking the model noise to be zero, we can also look at the convergence conditions that the adaptive step size D should respect. Using equations [4.83] and [4.86], we obtain:
176
Modeling, Estimation and Optimal Filtering in Signal Processing
U N k
ª§ «¨1 D «¨© « « U (k ) « « « « « « ¬«
O1 · ¸ O1 E ¸¹
k
0
"
#
% k § ON · ¸ % ¨¨1 D ON E ¸¹ © % 0
# 0
0 "
0 #
%
0 "
º " " 0» » 0 0 #» » T % 0 # »»U (k )U N 0 [4.88] » 0 % #» % % 0» » " 0 0¼»
Using an approach analogous to that used for LMS and NLMS derivation, we can show that the convergence is assured if and only if: 1D
Oi Oi G
1 i >1, N @
[4.89]
Again, taking G to be negligible compared to the smallest eigenvalue of X N , L T (k ) X N , L (k ) , we are led to the following condition: 0D 2.
[4.90]
A fast version of this algorithm has been developed by Gay [8]. It must also be noted that under certain conditions, the APA filter can behave like the RLS filter as far as convergence is concerned. Figure 4.6 depicts the estimation of the coefficients of the FIR filter h(k ) using the APA algorithm. The filter input is a speech signal x(k ) while the desired signal d (k ) corresponds to the sum of the filter output and a zero-mean, white Gaussian noise b(k ) : d (k )
x ( k ) h(k ) b( k ) .
The signal-to-noise ratio is assumed to be equal to 30dB. Moreover, N = 256. Our aim here is to estimate the coefficients of the impulse response h(k ) using the samples of the signals x(k ) and d (k ) . Moreover, we will also evaluate the convergence speed as well as the precision of the adaptive filter being used. To do this, we will look at the difference between the coefficients of the “optimal” filter
Adaptive Filtering
177
and the adaptive filter, for a given number of iterations. The criterion is defined as follows:
J
ª H k H opt T Hˆ k H opt N N N N 10 log 10 « « opt T opt HN HN ¬
º»
[4.91]
» ¼
where H N (k ) is the column vector which combines the N coefficients of the impulse response h(k ) . The simulations are performed in the following conditions:
D= 0.5, N = 16 and L = 1, 2, 10 or 25 “observers”. 5 Erreur paramétrique (dB) Mean parametricmoyenne error (dB)
4000 3000 2000 1000 0
L=1 L=2 L = 10 L = 25
0
-5
-10
-1000 -2000
-15
-3000 -4000 0
50
100 150 Échantillon
200
250
Sample
a) impulse response of the filter
-20
0
5000 10000 Numéro de l'itération (k)
15000
Number of iterations (k)
b) J according to L
Figure 4.6. Application: estimation of FIR filter parameters using the APA algorithm
178
Modeling, Estimation and Optimal Filtering in Signal Processing
4.5. Summary of the properties of the different adaptive filters LMS
MLMS
APA
RLS
1 observer: NLMS
Several observations Faster with increasing number of observers
Slow convergence
Slower than LMS
Slower than LMS
Sensitivity to input signal
Insensitive to input signal statistics
Insensitive to input signal
Quick convergence
Table 4.1. Summary of the convergence of the algorithms: LMS, NLMS, MLMS, APA and RLS algorithms
LMS
0D
2 trace R xx
MLMS
0D 2
APA 1 observer: NLMS
Several observers
0D 2
0D 2
Table 4.2. Summary of the convergence of the different algorithms, according to step size: LMS, NLMS, MLMS, APA and RLS algorithms
Figure 4.7 presents the convergence speed and the precision of the different adaptive filters, as a function of the number of iterations. We select D 1 for the NLMS algorithm, O 0.995 for the RLS algorithm and D 1 for the APA algorithm. We consider L = 10, 25 or 50 observers.
Adaptive Filtering
179
Erreur paramétrique moyenne (dB)
Mean parametric error (dB)
5 APA, L = 10 APA, L = 25 APA, L = 50 NLMS RLS
0 -5 -10 -15 -20 -25 -30 -35
0
5000
10000 Numéro de l'itération (k)
15000
Number of iterations (k)
Figure 4.7. Example: estimation of FIR filter parameters using various algorithms
4.6. Application: noise cancellation
Let there be a noisy signal defined as follows: d (k )
s ( k ) b( k )
[4.92]
where s (k ) is the speech signal and b(k ) is the additive noise.
d(k)=s(k)+b(k)
+
e(k)
+ -
x(k) h(k)
y(k)
Figure 4.8. Block-level representation of a noise-cancellation system
180
Modeling, Estimation and Optimal Filtering in Signal Processing
The adaptive filter in this case provides an estimation of the noise y (k ) which gives an estimation of the additive noise b(k ) . The filter coefficients are updated using the following error: e( k )
d (k ) y (k )
s ( k ) b( k ) y ( k )
Firstly, we will suppose that s (k ) is not correlated to either b(k ) or x(k ) . The simulations are carried out under the following conditions: N = 10, P
10 8 .
The global signal-to-noise ratio is 7.23 dB for the noisy signal d (k ) and 17.80 dB for the enhanced signal e( k ).
Figure 4.9. The noisy speech signal and its spectrogram
Adaptive Filtering
181
Figure 4.10. The enhanced speech signal and its spectrogram
Figure 4.11. The noiseless speech signal and its spectrogram
In this chapter we have presented and illustrated the LMS, NLMS, APA and RLS. In the following chapter, we will introduce the Kalman filter, which will be subsequently illustrated in Chapter 6.
182
Modeling, Estimation and Optimal Filtering in Signal Processing
4.7. References [1] A. Benveniste, M. Métivier and P. Priouret, Adaptive Algorithms and Stochastic Approximations, Springer Verlag, 1990, Translation of: Algorithmes adaptatifs et approximations stochastique, Masson, Paris 1987. [2] R. R. Bitmead and B. D. O. Anderson, “Performance of Adaptive Estimation in Dependant Random Environment”, IEEE Trans. on Automatic Control, vol. AC 25, 782787. 1980. [3] N. J. Bershad and P. L. Feintuch, “Analysis of the Frequency Domain Adaptive Filter”, Proc. IEEE, vol. 67, pp. 1658-1659, December 1979. [4] P. K. Bondyopadhayay, “Application of Running Hartley Transform in Adaptive Filtering”, Proc. IEEE, vol. 76, pp. 1370-1372, October 1988. [5] M. Dentino, J. McCool and B. Widrow, “Adaptive Filtering in the Frequency Domain”, Proc. IEEE, vol. 66, pp. 1658-1659, December 1978. [6] E. R. Ferrara, “Fast Implementation of LMS Adaptive Filters”, IEEE Trans. on Acoutics, Speech and Signal Processing, vol. ASSP-28, pp. 474-475, April 1980. [7] S. Florian and N. J. Bershad, “A Weighted Normalized Frequency Domain LMS Adaptive Algorithm”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. ASSP-36, pp. 1002-1007, July 1988. [8] S. L. Gay and S. Tavathia, “The Fast Affine Projection Algorithm”, IEEE-ICASSP ’95, Detroit, Michigan, USA, pp. 3023-3026, 9-12 May 1995. [9] G. C. Goodwin and K. S. Sin, Adaptive Filtering, Prediction, and Control, Prentice Hall, 1984. [10] B. Hassibi and T. Kailath, “Mixed Least-Mean-Squares/H f -Optimal Adaptive Filtering”, Asilomar Conference on Signals, Systems and Computers, vol. 1, pp. 425-429, August 1996. [11] B. Hassibi, A. Sayed and T. Kailath, “Hf Optimality of the LMS Algorithm”, IEEE Trans. on Signal Processing, vol. 44, no. 2, pp. 267-280, February 1996. [12] T. I. Haweel and P. M. Clarkson, “A Class of Order Statistic LMS Algorithms”, IEEE Trans. on Signal Processing, vol. SP-40, no. 1, pp. 44-51, January 1992. [13] S. Haykin, Adaptive Filter Theory, Prentice Hall, 1996. [14] L. Ljung, “On Positive Real Transfer Functions and the Convergence of Some Recursions”, IEEE Trans. on Automatic Control, vol AC-22, no. 4, pp. 539-551, August 1977. [15] L. Ljung, “Analysis of Recursive Stochastic Algorithms”, IEEE Trans. on Automatic Control, vol. AC-22, no. 4, pp. 551-575, August 1977. [16] O. Macchi, Adaptive Processing: The Least Mean Square Approach with Applications in Transmission, Wiley, 1995.
Adaptive Filtering
183
[17] P. Mansour and A. H. Gray Jr, “Unconstrained Frequency Domain Adaptive Filter”, IEEE Trans. on Acous. Speech and Signal Processing, vol. ASSP 30, pp. 726-734, October 1982. [18] K. Ozeki and T. Umeda, “An Adaptative Filtering Algorithm Using an Orthogonal Projection to an Affine Subspace and its Properties”, Electronics and Communication in Japan, vol. 67 A, no.5, 1984. [19] R. L. Plackett, “Some Theorems in Least Squares”, Biometrika, 37, pp. 149-157, 1950. [20] H. Robbins and S. Monro, “A Stochastic Approximation Method”, Ann. Math. Stat., vol. 22, pp. 400-407, 1951. [21] R. Settineri, M. Najim and D. Ottaviani, “Order Statistic Fast Kalman Filter”, IEEEISCAS 1996, Chicago, USA, pp. 116-119. [22] J. R. Treichler, “Transient and Convergent Behavior of the Adaptive Line Enhancer”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 27, no.1, pp. 53-62, February 1979. [23] J. R. Treichler, C. Richard Johnson Jr and M. G. Larimore, Theory and Design of Adaptive Filters, Wiley Intersciences, 1987. [24] Y. Z. Tsypkin, Foundations of the Theory of Learning Systems, Academic Press, 1973. [25] T. N. Wong and C. P. Kwong, “Adaptive Filtering Using Hartley Transform and Overlap Save Method”, IEEE Trans. on Signal Processing, vol. 39, no.7, pp. 1708-1711, July 1991. [26] W-R. Wu and P-C Chen, “Adaptive AR Modeling in White Gaussian Noise”, IEEE Trans. on Signal Processing, vol. 45, no.5, pp. 1184-1191, May 1997. [27] Y. Zhang, C. Wen, Y. and C. Soh, “Unbiased LMS Filtering in the Presence of White Measurement Noise with Unknown Power”, IEEE Trans. on Circuits and Systems-II: Analog and Digital Signal Processing, vol. 47, no. 9, pp. 968-972, September 2000.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Chapter 5
Kalman Filtering
5.1. Introduction In the previous chapters, we presented different techniques for the estimation of the parameters of linear models. We noted the non-recursive character of the Wiener filter which was applied to identification. This chapter will present the Kalman filter, a recursive alternative to the Wiener filter. It was first introduced in the 1960s [12] [13]. Kalman transformed the integral equation of the continuous-time Wiener filter to differential equations [27]. Other approaches were also used to obtain the Kalman filter: – Sage and Masters’ technique based on the least squares method [23]; – Athans and Tse’s technique based on the Pontriaguin maximum principle [3]. We hope that this chapter will serve as a reader-friendly introduction to the Kalman filter. We present this filter using an algebraic approach which, though it may not be the most elegant, has the advantage of not requiring any prior knowledge. This algebraic description can also be the starting point for more formal descriptions, such as that presented by T. Kailath, A. Sayed and B. Hassibi in [11]. In this description, the Kalman filter is presented with a special emphasis on its utilization in the estimation of model parameters. We also present the so-called “extended” Kalman filter for nonlinear estimation.
186
Modeling, Estimation and Optimal Filtering in Signal Processing
5.2. Derivation of the Kalman filter 5.2.1. Statement of problem Let there be a system which is represented in the state space domain as follows: x(k 1)
y (k )
) (k 1, k ) x(k ) G (k )u (k )
[5.1]
H (k ) x(k ) v(k )
[5.2]
The driving process u (k ) and the measurement noise are both assumed to be zero-mean, so that: E > u (k )@ 0
[5.3]
E > v (k )@ 0
[5.4]
Moreover, u (k ) and v(k ) are independent of each other, white and have covariance matrix1 Q(k ) and variance R(k ) respectively:
>
@
E > u (k )@E v T (l )
>
@
Q(k )
[5.6]
R (k )
[5.7]
E u (k )v T (l )
E u (k )u T (k )
>
E > v ( k )v ( k ) @
@
0 k, l
[5.5]
In addition, both processes satisfy the following conditions:
>
E x 0 (k )u T (k )
E > x 0 ( k )v ( k ) @
@
0 k t 0
0 k t 0
[5.8]
[5.9]
1 Since u (k ) is zero-mean, the covariance matrix of u (k ) is the same as its autocorrelation
matrix.
Kalman Filtering
187
Our task here is to estimate the state vector x(k ) taking into account the information available at time n , which can be before, after or at the instant k . In so doing, we will consider three different cases: – k n : we have to estimate the state vector by taking into account all the observations available at time k; in this case, a filtering is carried out; – k n : we only consider part of the total available measurements, and carry out an interpolation or smoothing; – k ! n : we have to predict the state vector; this corresponds to a prediction or an extrapolation. Irrespective of which of the above cases is being treated, xˆ (k / n) denotes the estimation of the state x(k ) at instant k, considering the information available up to instant n. Our aim is to obtain a recursive estimation of the state vector, i.e., given a new measurement at instant k 1 , to provide a new estimation of the state vector using its previous estimation at instant k. This is commonly known as the “one-step predictor”. We will look closely at the “propagation” and “update” steps which correspond, respectively, to the relationship between xˆ (k 1 / k ) and xˆ (k / k ) and between xˆ (k / k ) and xˆ (k / k 1) . If we consider the estimation errors to propagate through the error covariance matrix, the deduction of the Kalman filter equations is greatly simplified. 5.2.2. Propagation step: relationship between xˆ ( k 1 /k ) and xˆ (k/k ) ; recurrence relationship between the error covariance matrices P ( k 1 /k ) and P (k/k )
Taking into consideration the linearity of equation [5.1], the state vector estimation xˆ (k 1 / k ) is also characterized by the transition matrix ĭ (k 1, k ) . More formally, the estimation xˆ (k 1 / k ) can be defined as follows:
xˆ(k 1/ k )
E > x(k 1) y(1),", y(k )@
From equation [5.1], it follows that:
[5.10]
188
Modeling, Estimation and Optimal Filtering in Signal Processing
xˆ ( k 1 / k )
ĭ ( k 1, k ) E > x ( k ) y (1), " , y ( k ) @
G ( k ) E > u ( k ) y (1), " , y ( k ) @
.
[5.11]
As: E >u( k ) y (1), " , y ( k )@
0,
[5.12]
we obtain: xˆ (k 1 / k ) ĭ (k 1, k ) xˆ (k / k ) .
[5.13]
Let us introduce ~ x (k / k ) as the error in estimating the state vector x(k ) at instant k: ~ x (k / k )
x (k ) xˆ (k / k ) .
[5.14]
The a posteriori2-error correlation matrix at k is defined by:
P(k / k )
^
E > x(k ) xˆ (k / k )@> x(k ) xˆ (k / k )@ T
>
E ~ x (k ) ~ x T (k )
@
`
[5.15]
Using state equations [5.1] and [5.2] and the linear relation [5.13] between xˆ ( k 1 / k ) and xˆ ( k / k ) , we can express the a priori-error correlation matrix. Indeed, by subtracting equation [5.13] from equation [5.1], we get: x(k 1) xˆ (k 1 / k ) ĭ (k 1, k )>x(k ) xˆ (k / k )@ G (k )u (k )
[5.16]
Let P (k 1 / k ) be the a priori2-error correlation matrix defined as follows: P(k 1 / k )
^
E >x(k 1) xˆ (k 1 / k )@ > x(k 1) xˆ (k 1 / k )@ T
and using equation [5.16], it follows that:
2 The terms “a posteriori” and “a priori” will be justified a little further.
`
[5.17]
Kalman Filtering
^
P(k 1 / k ) ĭ(k 1, k ) E >x(k ) xˆ(k / k )@ >x(k ) xˆ(k / k )@ T
>
@
`ĭ
T
(k 1, k )
G(k ) E u(k )u T (k ) G T (k )
189
[5.18]
Equivalently: P (k 1 / k ) ĭ (k 1, k ) P (k / k )ĭ T (k 1, k ) G (k )Q(k )G T (k )
[5.19]
We have thus established a recursive relation between the a priori-error correlation matrix and the a posteriori-error correlation matrix. 5.2.3. Update step: relationship between xˆ (k/k ) and xˆ ( k/k 1) ; recursive relationship between P (k/k ) and P ( k/k 1)
Let us consider the recursive estimation of the state vector. Hereafter, we will adopt the following linear form of the state vector estimation: xˆ (k / k )
>
xˆ (k / k 1) K (k ) y (k ) H (k ) xˆ (k / k 1)
@
[5.20]
This form is inspired by the linear recursive algorithm derived analytically in the section on the recursive least squares estimation in Chapter 2. The a posteriori estimation xˆ (k / k ) of the state vector corresponds to the prediction xˆ (k / k 1) updated by a corrective term, i.e. the weighted difference between the actual measurement and its prediction. This approach is similar to the one we adopted when deriving the least squares method in Chapter 2.
Figure 5.1. Kalman filter
K (k ) is known as the gain of the filter, or the Kalman gain. We would expect K (k ) to be independent of the measurement, to ensure linearity of the algorithm.
190
Modeling, Estimation and Optimal Filtering in Signal Processing
Taking up the expression for the estimation error ~ x (k / k ) and replacing the estimated value xˆ (k / k ) by its expression [5.20], we obtain: ~ x (k / k )
x(k ) xˆ (k / k ) x(k ) xˆ (k / k 1) K (k )> y (k ) H (k ) xˆ (k / k 1)@
x(k ) xˆ (k / k 1) K (k )>H (k ) x(k ) v(k ) H (k ) xˆ (k / k 1)@ >I K (k ) H (k )@ >x(k ) xˆ (k / k 1)@ K (k )v(k )
[5.21]
We can then deduce the expression for the correlation matrix of the a posteriori error at time k, namely P (k / k ) , as a function of P (k / k 1) :
^
T P( k / k ) E ~ x (k / k ) ~ x (k / k )
`
E^ > I K (k ) H (k ) ~ x (k / k 1) K (k )v(k ) @ ~ > I K (k )H (k ) x (k / k 1) K (k )v(k ) @T
`
Equivalently: P(k / k )
>I K (k ) H (k ) @ P (k / k 1) >I H T (k ) K T (k )@
K (k ) R(k ) K T (k )
[5.22]
We have thus established a second recursive relationship between the a posteriori error correlation matrix and the a priori error correlation matrix. 5.2.4. Expression of the Kalman filter gain
To ensure that the algorithm is optimal, the gain K (k ) should be chosen so as to minimize the mean square error on the estimation of the state vector. The criterion J (k ) which needs to be minimized can be defined as follows: J (k )
>
x (k / k ) T ~ x (k / k ) E ~ trace >P (k / k )@
@
>
x (k / k ) ~ x (k / k ) T trace ~
@.
[5.23]
Considering equations [5.23] and [5.15], the Kalman filter is a minimumvariance type of filter. The optimal gain satisfies the following condition: wJ (k ) wK (k )
0
[5.24]
Kalman Filtering
191
In the above equation, we calculate the derivative of a scalar quantity J (k ) with respect to a vector K (k ) . Taking equation [5.22] into account, we obtain:
>
^
@
w trace >I K (k ) H (k )@P(k / k 1) I H T (k ) K T (k ) K (k ) R(k ) K T (k ) wK (k )
`
0
[5.25] This amounts to: P(k / k 1) H T (k )
K (k ) H (k ) P(k / k 1) H T (k ) K (k ) R(k )
[5.26]
Thus:
>
P ( k / k 1) H T ( k ) H ( k ) P ( k / k 1) H T ( k ) R ( k )
K (k )
@
1
[5.27]
Given equation [5.22] above, the relationship between P (k / k ) and P (k / k 1) can be rewritten as follows: P(k / k ) P ( k / k 1) K ( k ) H ( k ) P ( k / k 1) P ( k / k 1) H T ( k ) K T ( k ) T
T
[5.28]
T
K ( k ) H ( k ) P ( k / k 1) H ( k ) K (k ) K (k ) R ( k ) K ( k )
Rearranging the elements of this equation gives: P (k / k ) P ( k / k 1) K ( k ) H ( k ) P ( k / k 1) P ( k / k 1) H T ( k ) K T ( k )
>
T
@
[5.29]
T
K ( k ) H ( k ) P ( k / k 1) H ( k ) R ( k ) K ( k )
Using the expression for the Kalman filter gain, we can simplify the above equation to: P (k / k ) P ( k / k 1) K ( k ) H ( k ) P ( k / k 1) P ( k / k 1) H T ( k ) K T ( k ) T
T
P ( k / k 1) H ( k ) K ( k )
[5.30]
192
Modeling, Estimation and Optimal Filtering in Signal Processing
and consequently: P( k / k )
>I K (k ) H (k )@P(k / k 1)
[5.31]
For a continuous-time state space representation, equation [5.28] has the form of the Ricatti differential equation. By analogy, this equation is called the discrete Ricatti equation. By applying the matrix inversion lemma to equation [5.31], we obtain: P 1 (k / k )
P 1 (k / k 1) P 1 (k / k 1) K (k ) H (k )>1 H (k ) K (k )@1
[5.32]
Replacing the Kalman gain in the above equation by its expression [5.27] gives: P 1 (k / k )
P 1 (k / k 1)
>
@
1 P 1 (k / k 1) ª P (k / k 1) H T (k ) H (k ) P (k / k 1) H T (k ) R (k ) º H ( k ) [5.33] «¬ »¼
>
@
ª1 H (k ) P (k / k 1) H T ( k ) H (k ) P (k / k 1) H T (k ) R( k ) 1 º «¬ »¼
1
Some of the elements of equation [5.33] can be rewritten as follows, subject to the condition that R (k ) is non-zero:
>H (k )P(k / k 1)H
T
(k ) R (k )
>
@
1
@
ª1 H (k ) P (k / k 1) H T (k ) H (k ) P (k / k 1) H T (k ) R (k ) 1 º «¬ »¼
> >H (k ) P(k / k 1)H
T
@
T
(k ) R(k ) H (k ) P(k / k 1) H (k )
@
1
[5.34]
1
R 1 (k )
Equation [5.33] is modified to: P 1 (k / k )
P 1 (k / k 1) H T (k ) R 1 (k ) H (k )
[5.35]
The terms P (k / k ) P 1 (k / k ) and R 1 (k ) R (k ) are equal to identity matrix I. Inserting these values in the gain equation, we obtain:
Kalman Filtering
K (k )
>P(k / k)P
1
@
>
@>
@
(k / k ) P(k / k 1)H T (k ) R1(k )R(k ) H (k )P(k / k 1)H T (k ) R(k )
>
@
P(k / k )P1(k / k )P(k / k 1)H T (k )R1(k ) H (k )P(k / k 1)H T (k )R1(k ) 1
1
193
[5.36]
1
In the above equation, replacing P 1 ( k / k ) by its expression [5.35] gives: K (k )
>
P ( k / k ) H T ( k ) R 1 ( k ) 1 H ( k ) P ( k / k 1) H T ( k ) R 1 ( k )
>H (k ) P(k / k 1) H
T
@
( k ) R 1 ( k ) 1
1
@
[5.37]
Thus: K (k )
P ( k / k ) H T ( k ) R 1 ( k ) .
[5.38]
Considering the equation: xˆ (k / k )
>
xˆ (k / k 1) K (k ) y (k ) H (k ) xˆ (k / k 1)
@
[5.20]
The significance of each term in equation [5.20] can be highlighted by the following qualitative reasoning. Given expression [5.27] of the Kalman gain, we see that: – for constant R (k ): if P ( k / k ) is small, the gain will also be small. The confidence attributed to the estimation obtained from the model is increased. If, however, P ( k / k ) is high, signifying a low degree of confidence in the state vector estimation, the gain will be high. The contribution from the weighted correction to the gain will be higher; – for constant P ( k / k ): if R (k ) is small, the measurements will be slightly noisy. These measurements will be weighted higher because of the gain value. If, on the other hand, R (k ) is high, the gain will be smaller. The importance of the second term in equation [5.20] will be smaller. J. S. Demetry, in [4], has shown that the Kalman filter gain minimizes not only the trace of the error covariance matrix, but also any linear combination of the diagonal elements of this matrix. We can conclude from this result that if the state vector contains physical quantities such as speed, position, etc., we need not preoccupy ourselves with the absence of physical significance of the sum of errors associated with the quantities.
194
Modeling, Estimation and Optimal Filtering in Signal Processing
5.2.5. Implementation of the filter
To make use of the set of recursive equations which characterize the Kalman filter, we should choose the initial conditions of the state vector’s estimation, namely xˆ (0 / 0) , as well as the covariance matrix of the associated error, denoted P (0 / 0) . If we have no prior information whatsoever on the state, we adopt the following initial state for the state vector: xˆ (0 / 0)
E >x (0)@ .
[5.39]
If E >x (0)@ is not known, we take it to be equal to zero. In addition: P ( 0 / 0)
P (0)
^
T E >x(0) xˆ (0 / 0)@>x (0) xˆ (0 / 0)@
`
[5.40]
Applying equations [5.31], [5.38] and [5.27] to P(k / k 1), P(k / k ) and gain K (k ) , we obtain: P (k / k 1) ĭ (k , k 1) P (k 1 / k 1)ĭ T (k , k 1)
[5.41]
G (k 1)Q(k 1)G T (k 1) K (k )
P(k / k )
>
P ( k / k 1) H T ( k ) H ( k ) P ( k / k 1) H T ( k ) R ( k )
@
1
[5.27]
>I K (k ) H (k )@P(k / k 1)
[5.38]
We can evaluate these three quantities independently of the measurements, even before these measurements are treated by the filter. To determine the three quantities, we require the prior knowledge of P (0 / 0) . For k
1 , we calculate P(1 / 0) , K (1) , xˆ (1 / 0) and xˆ (1 / 1) as follows:
P (1 / 0) where P (1 / 0) ĭ (1,0) P(0 / 0)ĭ T (1,0) G (0)Q(0)G T (0) K (1) where K (1)
>
P (1 / 0) H T (1) H (1) P (1 / 0) H T (1) R (1)
xˆ (1 / 0) where xˆ (1 / 0) ĭ(1,0) xˆ (0 / 0) xˆ (1 / 1) where xˆ (1 / 1)
xˆ (1 / 0) K (1)> y (1) H (1) xˆ (1 / 0)@
@
1
Kalman Filtering
195
For k 2 , we proceed in the same manner adopted to update the state vector’s a priori and a posteriori estimations, and so on. The choice of the initial values is a delicate process because it needs to ensure the rapid convergence of the algorithm. For any value of xˆ (0 / 0) , even an arbitrary one in the extreme case, the algorithm processing the observed measurements carries out the necessary corrections. To make up for the lack of information on P (0 / 0) , we adopt P(0 / 0) DI , where I is the identity matrix and D an arbitrary scalar.
Figure 5.2. A flowchart of the Kalman filter
The flowchart in Figure 5.2 depicts how the Kalman filter works. Let us take up the equation relating to P(k / k ) to P(k / k 1) : P(k / k )
>I K (k ) H (k )@P(k / k 1)
[5.38]
We saw earlier that the Kalman filter is based on the introduction of a gain term which minimizes the criterion J (k ) defined as follows: J (k )
trace >P (k / k )@ .
[5.23]
Combining equations [5.38] and [5.23] gives: trace >P(k / k )@ trace >P(k / k 1)@ trace >K (k ) H (k ) P(k / k 1)@
[5.42]
196
Modeling, Estimation and Optimal Filtering in Signal Processing
However, matrix K (k ) H (k ) P (k / k ) is either positive definite or positive semidefinite. Consequently, it follows that all the eigenvalues are either positive or zero. This leads us to the following inequality: trace >P(k / k )@ trace >P (k / k 1)@
[5.43]
Thus, the estimation error decreases as the algorithm proceeds. 5.2.6. The notion of innovation
Let us again consider the equation for the recursive estimator: xˆ (k / k )
xˆ (k / k 1) K (k ) > y (k ) H (k ) xˆ (k / k 1)@
[5.44]
If this prediction is a perfect one, in the absence of measurement noise, the correction is induced by the “innovation” e(k ) , i.e.: e( k )
y (k ) H (k ) xˆ (k / k 1)
0
[5.45]
In a more rigorous approach, when the prediction is perfect and the filter is considered to be optimal, the innovation is a white process. We can also prove that if the filter is optimal, the innovation e(k ) is a white sequence with a zero mean. e(k ) no longer contains any information that may be correlated with the observation, which could improve the update of the state vector [10]. We can verify the degree of optimality and the filter performance by testing the “whiteness” of the innovation. To do this, Mehra has proposed the following statistical test: rˆee ( j ) d
1.95rˆee (0) N
for j ! 0
[5.46]
Here, rˆee ( j ) is the estimation of the innovation’s autocorrelation function, and N is the number of samples available. Alternatively, we could also use the test proposed by Stoica [25]: j
¦ rˆee 2 (i) d ( j +1.65
2 j )rˆee 2 (0)/N
i 1
This test is more reliable than that in [5.46].
[5.47]
Kalman Filtering
Model equation
x ( k 1)
Observation equation
y(k )
A priori information
E ªu ( k ) º «¬ »¼
ĭ ( k 1, k ) x ( k ) G ( k )u( k )
H (k ) x(k ) v(k )
0 and E >v ( k )@ 0
E ªu ( k ) v ( l ) º «¬ »¼
0 k, l
E ª«u( k )u T (l )º» ¼ ¬
Q ( k )G ( k l )
E ªv ( k )v ( l ) º «¬ »¼
R ( k )G ( k l )
E ª« x 0 ( k )u T ( k )º» ¼ ¬ E ª x 0 ( k )v (k )º «¬ »¼
Filter equations
197
xˆ ( k / k 1)
0 k t 0 0 k t 0
ĭ ( k , k 1) xˆ ( k 1 / k 1)
xˆ(k / k) xˆ(k / k 1) K(k)ª y(k) H(k)xˆ(k / k 1)º «¬ »¼ Gain expressions
>
K (k )
P(k / k 1) H T (k ) H (k ) P(k / k 1) H T (k ) R(k )
K (k )
P ( k / k ) H T ( k ) R 1 ( k )
A posteriori covariance matrix
P( k / k )
A priori covariance matrix
P ( k / k 1)
Initial conditions
xˆ (0 / 0)
ª I K ( k ) H ( k )º P( k / k 1) »¼ «¬ ĭ ( k , k 1) P ( k 1 / k 1)ĭ T ( k , k 1) G ( k )Q ( k )G T ( k ) E >x (0)@
^
P(0 / 0) P(0) E >x(0) xˆ(0 / 0)@>x(0) xˆ(0 / 0)@ T Table 5.1. Kalman filter equations
`
@
1
198
Modeling, Estimation and Optimal Filtering in Signal Processing
If the driving process is not Gaussian, the performance of the Kalman filter is degraded and there may be a real risk of divergence. A more robust version has been proposed for this case by Tsai and Kurtz [26]. 5.2.7. Derivation of the Kalman filter for correlated processes
So far, we have considered the driving process and the noise to be uncorrelated. In this section, we will see how the system representation is modified for the case of correlated processes. First of all, let us introduce the following notations: x 1 (k 1) ĭ1 (k 1, k ) x 1 (k ) G1 (k )u (k )
[5.48]
H 1 ( k ) x 1 ( k ) H 2 ( k )v ( k )
[5.49]
y (k )
We saw earlier that processes u and v satisfy:
>
@
Q1 (k )G (k l )
[5.50]
>
@
R1 (k )G (k l ) .
[5.51]
E u (k )u T (l )
E v(k )v T (l )
In some cases, u and v may be correlated. We will consider these two processes to be themselves generated by white sequences K 2 (k ) and K 3 (k ) as follows: u (k 1) ĭ 2 (k 1, k )u (k ) G 2 (k )K 2 (k 1)
[5.52]
v(k 1) ĭ3 (k 1, k )v(k ) G3 (k )K 3 (k 1)
[5.53]
K 2 (k ) and K 3 (k ) have zero-means and the following covariance matrices:
>
E K (k )K 2
T 2
(l )
@
Q 22 (k )G (k l )
[5.54]
Kalman Filtering
>
E K (k )K 3
T 3
(l )
@
Q33 (k )G (k l )
199
[5.55]
Moreover, we suppose that these two noise vectors satisfy:
>
E K (k )K 2
T 3
(l )
@
Q 23 (k )G (k l )
[5.56]
Starting from these assumptions, let us look at the extended vector x (k ) which combines x 1 (k ) , u (k ) and v(k ) as follows :
x(k )
ª x 1 (k )º « u (k ) » « » «¬ v(k ) »¼
[5.57]
This combination allows us to represent the dynamic ranges of x1 (k ) , u (k ) and v(k ) . Moreover, we can also clearly note that the driving processes of u(k ) and v (k ) , which are K ( k ) and K ( k ) respectively, directly intervene in the expression 2
3
of vector x1 ( k ) . Taking into account model [5.48] and expressions [5.52] and [5.53], we can write the model equation as follows: G1 ( k ) 0 º ªĭ1 ( k 1, k ) » x(k ) « x( k 1) « ĭ2 ( k 1, k ) 0 0 » «¬ ĭ3 ( k 1, k )»¼ 0 0 0 º ª 0 ªK ( k 1)º « «G2 ( k ) 0 »» « 2 » «K ( k 1) ¼» «¬ 0 G3 ( k )»¼ ¬ 3
[5.58]
To contract this expression, we can introduce vector w(k): w(k )
ªK 2 (k )º «K (k ) » ¬« 3 ¼»
[5.59]
200
Modeling, Estimation and Optimal Filtering in Signal Processing
which is known as the extended input:
>
E w(k ) w T (l )
@
ª Q 22 (k ) Q23 (k )º «Q T (k ) Q (k ) »G (k l ) . 33 ¬ 23 ¼
[5.60]
The quantities ĭ (k 1, k ) , Q(k ) and G (k ) are defined as follows:
ĭ (k 1, k )
G1 (k ) 0 ªĭ1 (k 1, k ) º « » 0 ĭ ( k 1 , k ) 0 2 « » «¬ 0 0 ĭ 2 (k 1, k )»¼
[5.61]
Q(k )
ª Q 22 (k ) Q 23 (k )º «Q T (k ) Q (k ) » 33 ¬ 23 ¼
[5.62]
G (k )
0 º ª 0 «G (k ) 0 »» « 2 «¬ 0 G3 (k )»¼
[5.63]
and:
where the “0” represent zero matrices of appropriate dimensions. The expanded state-system equation thus becomes: x ( k 1)
ĭ ( k 1, k ) x ( k ) G ( k ) w( k 1)
[5.64]
and the measurement equation can be written as: y (k )
ª H ( k ) 0 H ( k )º x ( k ) 2 «¬ 1 »¼
[5.65]
We see that in the expression of measurement [5.65], there is no explicit term for the additive noise. Nevertheless, we know that the Kalman gain is given by: K (k )
P ( k / k ) H T ( k ) R 1 ( k )
[5.38]
This gain expression is no longer valid because now R(k) = 0. We thus have to start over again with the calculation [24].
Kalman Filtering
201
As we mentioned in Chapter 1, this representation of the system in the state space is called the “perfect measurement” representation, or noiseless measurement. There are several shortcomings of this representation. The main disadvantage is that due to a perfect measurement y(k), the equations for the Kalman filter are those presented in Table 5.1 with R = 0. The state estimation, in this case, can give rise to some numerical difficulties if we can no longer guarantee that matrix HP(k / k 1) H T is invertible for all values of k. In such a situation, there are two solutions to ensure that the Kalman filter functions properly: – the first of these is an ad hoc solution consisting of adding a correction term in ª º the « HP ( k / k 1) H T » matrix so that it is still invertible; ¬ ¼ – the second solution aims at reducing the order of the state model. To do so, a solution based on a change of the basis vectors of the state space has been proposed. It leads to a state vector with observation y(k) as one of its elements. For further details, the reader is referred to [17]. 5.2.8. Relationship between the Kalman filter and the least squares method with forgetting factor
We saw earlier that the a priori error covariance matrix has the following form: P (k / k 1) ĭ (k , k 1) P (k 1 / k 1)ĭ T (k , k 1) G (k 1)Q(k 1)G T (k 1)
[5.41]
To enable a direct comparison between the Kalman filter and the least squares method with forgetting factor, we have to use the same model in the two cases. This will be achieved if the matrices ĭ (k , k 1) and G (k 1) are defined as follows: ĭ (k , k 1) G (k 1)
I
[5.66]
[5.67]
I
This changes equation [5.41] to: P (k / k 1)
P(k 1 / k 1) Q(k 1)
[5.68]
202
Modeling, Estimation and Optimal Filtering in Signal Processing
We also saw that: P(k / k )
>I K (k ) H (k )@P(k / k 1)
[5.31]
If we replace the term P(k / k 1) by its expression [5.68], we obtain: P(k / k )
>I K (k ) H (k )@>P(k 1 / k 1) Q(k 1)@
[5.69]
We established earlier for the least squares method that: P(k )
>I K (k ) H (k )@ P(k 1) O
[5.70]
If we impose the following equality, the two algorithms are identical: P(k 1) Q(k 1)
P(k 1)
O
[5.71]
Rearranging the terms of the above equation gives: Q(k 1)
1 O
O
P (k 1)
[5.72]
After the description of the Kalman filter and its relation to the least squares method, we proceed to use it for the estimation of parameters.
5.3. Application of the Kalman filter to parameter estimation 5.3.1. Estimation of the parameters of an AR model
Let us look at the following AR process: y (k )
p
¦ a i y (k i) u (k )
[5.73]
i 1
Here, y (k ) is the signal at instant k and u (k ) is a zero-mean white Gaussian noise with a unit variance.
Kalman Filtering
203
We adopt a state space representation highlighting the parameters which need to be identified, i.e., the prediction coefficients ^ a i `i>1,..., p @ . We then choose the state
vector x(k ) such that the components ^x i (k )`i>1,..., p @ correspond to the prediction coefficients:
x(k )
ª x1 (k ) º » « « # » « x p (k )» ¼ ¬
ª a1 º « » « # » «a p » ¬ ¼
[5.74]
Moreover, we note that: x(k )
>a1
" ap
@T
T (k )
[5.75]
The observation vector is then constructed by storing the following p values of the observation: H (k )
> y (k 1)
" y ( k p )@
Since the model is assumed to be stationary, parameters
[5.76]
^ ai `i>1,..., p @
can be
considered constant. The state space model can thus be defined by the following two equations: x (k 1) y (k )
[5.77]
x (k )
H (k ) x(k ) u (k )
[5.78]
With respect to the state space representation given in equations [5.1] and [5.2], the above equations lead to the following choice for the driving process covariance matrix and the transition matrix: Q (k )
0 and ĭ (k 1, k )
I.
[5.79]
Since the variance of the additive noise is equal to 1, the recursive least squares algorithm and the Kalman filter are both expressed using the same equations. Let us then estimate the AR parameters of a second-order AR process with parameters defined as follows:
204
Modeling, Estimation and Optimal Filtering in Signal Processing
2 and a 2 2
a1
1 4
[5.80]
2 0 -2 0
100
200
300
400
500
600
recursive estimation of the AR parameters
amplitude
1 0.8 0.6 0.4 0.2 0
0
100
200
300 400 samples
500
600
Figure 5.3. Estimation of AR parameters
The advantage of the Kalman filter over the RLS arises from its ability to track the parameters. For this purpose, let us generate a second-order AR process which will subsequently be subjected to an abrupt change, so that the AR parameters assume the following values: a1
2 and a 2 2
1 for samples 1 to 700 4
a1
1.2135 and a2
a1
0.99 and a2
0.5625 for samples 701 to 1,400
0.9801 for samples 1,401 to 2,100
[5.81]
[5.82]
[5.83]
Kalman Filtering
205
The respective poles associated with these parameters are the following: p1, 2
3S · 1 § exp¨ r j ¸ , p1, 2 4 ¹ 2 ©
S· 3 § exp¨ r j ¸ and p1, 2 4 5¹ ©
§ S· 0.99 exp ¨ r j ¸ . 3¹ ©
In this non-stationary case, a i a i (k ) . The equation to update the state thus takes one of the following two forms: – in the first form, the state vector is updated as follows: x(k 1)
ª a k º x(k ) wk « 1 » wk ¬a 2 k ¼
[5.84]
where wk is a zero-mean white Gaussian noise. Its variance is chosen by the user. If it is chosen too small, the tracking of the variation of the parameters is not assured; if the chosen value is too large, the estimations will have a large variance; – in the second form, the range of the variation is known and verifies the following: x(k 1) ĭ (k 1, k ) x(k ) w(k )
[5.85]
Unfortunately, matrix ĭ (k 1, k ) is rarely known. The determination of both the matrix and of state vector x(k ) is now a nonlinear estimation issue.
206
Modeling, Estimation and Optimal Filtering in Signal Processing
10 0 -10 0
500
0
500
1000 samples
1500
2000
1000 1500 tracking the AR parameters, Q=0
2000
2 1 0 -1 -2
2
1
0
-1
-2
0
200
400
600
800 1000 1200 1400 tracking the AR parameters, Q=0.0001
1600
1800
2000
0
200
400
600
800 1000 1200 1400 tracking the AR parameters, Q=0.001
1600
1800
2000
2
1
0
-1
-2
Figure 5.4. Tracking the AR parameters
Kalman Filtering
207
If we subject the filter to a strong variation of the parameters, we see that the estimation is not followed up [2]. On the other hand, if we inject the noise w(k ) , the transition is closely tracked even for low values of variance. This improvement can be explained by the fact that in the first case ( Q 0 ), the gain tends towards zero after several iterations. The filter consequently loses its adaptability. 5.3.2. Application to speech analysis
The Kalman filter was first applied to speech signals by Matsui et al. [16]. At about the same time, Gueguen and Carayannis also took up the modeling of speech signals using the Kalman filter [9]. Gibson and Melsa presented a comparative study of various recursive algorithms used for estimation [5] [6]. Mack and Jain introduced a modified version of the Kalman filter for better tracking of parameters [15]. Finally, many research efforts have been carried out in the field of communications, and especially in equalizers for digital transmissions. In this context, we can cite two ground-breaking efforts, those of Godard [7] and Morikawa [19]. We will present some results for the application of the Kalman filter to the analysis of speech signals [1]. Let this speech signal be modeled using an AR model: x(k ) a1 x(k 1) " a p x(k p)
u (k )
[5.73]
where x(k ) denotes the kth sample of the speech. To implement the filter, we must set some parameters. We adopt the following analysis conditions and initial conditions: – order of the model: p
12 ;
– R(k ) 100 for voiced sounds and R (k ) 1000 for unvoiced sounds [5] [6]; – P(0 / 0) 10 I . The analysis is carried out over a time equal to the period of the pitch. This pitch can be estimated using the method presented in [6]. We can also use the approaches introduced by Griffin and used in the IMBE coder [8]. Finally, we consider the following variations of the error criteria, first used by Gueguen and Carayannis [9]. To do this, we define the prediction error as follows: e( k )
y (k ) H (k ) xˆ (k )
[5.86]
208
Modeling, Estimation and Optimal Filtering in Signal Processing
and the variation norm of the parameter vector as follows:
G x (k )
T T xˆ (k ) xˆ (k ) xˆ (k 1) xˆ (k 1)
[5.87]
The parameters estimated by the Kalman filter are similar to those obtained by the covariance and autocorrelation methods presented in Chapter 2.
Figure 5.5. Speech analysis without pitch detection; the signal being analyzed, the variation in the norm of the parameter vector, and the associated prediction error
5.4. Nonlinear estimation
In practice, we often encounter signals modeled using nonlinear mathematical equations. For example, this holds true for phase- or frequency-modulation. In all the above sections, we considered the models to be linear, both in the model representation and in the state space representation. In what follows, we will consider nonlinear processes in the state space, and we will develop a nonlinear filter called the extended Kalman filter (EKF). For this, we adopt the following representation for continuous-time systems:
Kalman Filtering
x (t )
f x(t ), t G (t )u (t )
209
[5.88]
where f x (t ), t denotes a nonlinear function of the state and the model parameters. The observation vector may also be nonlinear: y (t )
hx(t ), t v(t )
[5.89]
We assume u(t) and v(t) to be random white processes with the following correlations:
>
E u (t )u T (t W )
@
E > v(t )v(t W )@
Q(W )G (t , W )
R (W )G (t , W )
[5.90]
[5.91]
The new solution consists of the linearization of equations [5.88] and [5.89] around a reference point or a set of reference points, called the reference trajectory. This name comes from the first applications of this solution, which were in the area of aerospace. 5.4.1. Model linearization: linearized Kalman filter
We will linearize the model around a reference trajectory assumed to be known. This trajectory, denoted x r (t ) , is the solution to the following homogenous equation [18]: x r (t )
f x r (t ), t
[5.92]
We associate this trajectory with a nominal measurement: y r (t )
hx r (t ), t
[5.93]
We will establish a “disturbed” equation which defines the difference between the effective state and the state given by the reference trajectory. To start, if we subtract [5.92] from [5.88], we obtain: x (t ) x r (t )
f x(t ), t f x r (t ), t G (t )u (t )
[5.94]
210
Modeling, Estimation and Optimal Filtering in Signal Processing
It would be beyond the scope of this book to describe the integration of this nonlinear stochastic differential equation. However, a more realistic alternative, more consistent with our aim, would be to carry out a Taylor-series development around the reference trajectory x r (t ) : f x(t ), t
wf x(t ), t f x r (t ), t >x(t ) x r (t )@ w x(t )
H r (t )
[5.95]
x (t ) x r (t )
This development is carried out supposing that the difference x(t ) x r (t ) remains small. In addition, H r (t ) contains all the higher-order terms of the Taylor expansion. Limiting ourselves to the first-order approximation, equation [5.94] can be rewritten as: x (t ) x r (t )
wf x(t ), t w x(t )
>x(t ) x r (t )@ G(t )u (t )
[5.96]
x ( t ) x r (t )
Introducing the factor defined as follows:
G x(t )
x(t ) x r (t ) ,
[5.97]
equation [5.96], which estimates this factor, is modified to:
G x (t )
wf x(t ), t w x(t )
G x(t ) G (t )u (t )
[5.98]
x (t ) x r (t )
To simplify the analysis, we will denote: F x r (t ), t
wf x(t ), t w x(t ) x (t )
.
[5.99]
x r (t )
Combining the two equations above, we obtain:
G x (t )
F x r (t ), t G x(t ) G (t )u (t ) .
[5.100]
If we develop the term hx(t ), t of the observation equation using the limited Taylor series expansion, around the reference trajectory and up to the first order, we obtain:
Kalman Filtering
hx(t ), t hx r (t ), t
whx(t ), t w x(t )
>x(t ) x r (t )@ .
211
[5.101]
x (t ) x r ( t )
Denoting: H x r (t ), t
whx(t ), t w x(t ) x (t )
,
[5.102]
x r (t )
the observation equation [5.89] is changed to:
Gy (t )
H x r (t ), t G x(t ) v(t )
[5.103]
The linearization of model equations around the reference trajectory leads to the following:
G x (t )
F x r (t ), t G x(t ) G (t )u (t )
[5.104]
Gy (t )
H x r (t ), t G x(t ) v(t )
[5.105]
The discrete-time model associated with the continuous-time model is formally defined as follows:
G x(k 1) ĭ ( x r (k ) ; k 1, k ) G x(k ) G (k ) u (k )
[5.106]
Gy (k )
[5.107]
H ( x r (k ); k )G x(k ) v(k )
We will take up the calculation of the transition matrix later. Since this model is linear, we can apply the filter expressions we have previously derived for the standard Kalman filter and thus estimate the new state vector Gx(k ) . In this case, the above equations have the following form:
G xˆ (k / k 1) ĭ x r (k 1); k , k 1 G xˆ (k 1 / k 1)
>
G xˆ (k / k ) G xˆ (k / k 1) K x r (k ); k G y(k ) H x r (k ); k G xˆ (k / k 1)
[5.108]
@
[5.109]
212
Modeling, Estimation and Optimal Filtering in Signal Processing
G xˆ (k / k )
xˆ (k / k ) x r (k )
[5.110]
The gain of the filter is expressed as follows: K ( x r ( k ); k )
>
P ( k / k 1) H T ( x r ( k ); k ) H ( x r ( k ); k ) P ( k / k 1) H T ( x r ( k ); k ) R ( k )
@
1
[5.111] The covariance matrices of the a priori and the a posteriori errors are defined respectively as: P (k / k 1) ĭ ( x r (k 1); k , k 1) P (k 1 / k 1)ĭ T ( x r (k 1); k , k 1) G (k 1)Q(k 1)G T (k 1)
>I K(xr (k);k)H (xr (k);k)@P(k / k 1)>I K(xr (k);k)H (xr (k);k)@T
P(k / k )
K ( xr (k ); k )R(k )K T ( xr (k );k )
[5.112]
[5.113]
5.4.2. The extended Kalman filter (EKF)
We can now expect a better estimation by taking the last estimation as the reference trajectory of the state, and linearizing the model around this estimation. This gives rise to the extended Kalman filter (EKF). Let: x r (k )
xˆ (k / k 1)
[5.114]
Equation [5.110] can be rewritten as:
G xˆ (k / k )
xˆ (k / k ) xˆ (k / k 1)
[5.115]
Given the following:
G xˆ (k / k 1) 0 , the filter equations become:
[5.116]
Kalman Filtering
xˆ (k / k )
xˆ (k / k 1) K (k ) > y (k ) H ( xˆ (k / k 1); k ) xˆ (k / k 1)@
[5.117]
P(k / k 1) ĭ( xˆ(k 1/ k 1); k , k 1) P(k 1/ k 1)ĭT ( xˆ(k 1/ k 1); k , k 1) G(k 1)Q(k 1)GT (k 1)
K (k )
P ( k / k 1) H T ( xˆ ( k / k 1); k )
>H ( xˆ (k / k 1); k ) P (k / k 1) H
P(k / k )
T
( xˆ ( k / k 1); k ) R ( k )
213
@
1
[5.118]
[5.119]
>I K (k ) H ( xˆ (k / k 1); k )@P(k / k 1)>I K (k ) H ( xˆ (k / k 1); k )@T K (k ) R(k ) K T (k )
[5.120] We can draw the following summary observations: – the extended Kalman filter is nonlinear; – the Kalman gain cannot be determined a priori in the case of nonlinear models; – the parameters of the filter are changed to random functions which depend on the estimation. The EKF tends to diverge if the transition from the old to the new estimation is outside the limits of the linear zone. In this context, we can cite the work carried out by Ljung on the convergence of the EKF [14]. The estimations obtained using this filter are often biased, and a new variant of the EKF which avoids this bias has been introduced in [28]. A suboptimal technique which foregoes the use of the nonlinear filter has been described by Nelson and Stear [21]: the authors decompose the estimation using two separate and distinct estimators. 5.4.3. Applications of the EKF
The actual, and potential, applications of the EKF cover the distinct domains of estimation: – the joint estimation of the state and the parameters; – the estimation of noisy signals;
214
Modeling, Estimation and Optimal Filtering in Signal Processing
– the estimation of the transition matrices in the case of non stationary signals. 5.4.3.1. Parameter estimation of a noisy speech signal We mentioned above that the parameter estimation of noisy signals is biased. For speech signals, this is a case where we have a signal s (k ) buried within surrounding noise. We thus have a joint estimation task: – on the one hand, the estimation of the speech signal; – on the other hand, the estimation of the parameters [20]. Let us consider the modeling of a speech signal using an AR process: s (k )
a1 (k ) s (k 1) " a p (k ) s (k p ) u (k )
[5.121]
Let there be a vector x(k 1) which combines the last p values of signal s:
s (k 1)
ª s(k ) º « » # « » «¬ s (k p 1)»¼
[5.122]
The dynamic of the state is then driven by the following relation: ª s (k ) º « » # « » «¬ s (k p 1)»¼
ª a1 (k ) « 1 « « # « ¬ 0
" a p 1 (k ) a p (k )º ª1 º ª s (k 1) º « » » 0 0 0 »« " # »» « »u (k ) « « » % # # #» » «¬ s (k p)»¼ « » 1 0 ¼ " ¬0¼ [5.123]
and the measured signal which makes up the observation is given by: y (k )
s (k ) v(k )
[5.124]
where v(k ) is the measurement noise. In matrix form, the model can be written as follows: s (k 1)
y (k )
F (k 1, k ) s (k ) Gu (k )
G T s (k 1) v(k )
[5.125]
[5.126]
Kalman Filtering
215
where transition matrix F (k+1,k) is as follows:
F (k 1, k )
ª a1 (k ) « 1 « « # « ¬ 0
" a p 1 (k ) a p (k )º " 0 0 »» % # # » » " 1 0 ¼
[5.127]
and: G
>1
0 " 0@T
[5.128]
The updates in the prediction coefficients are modeled by:
T (k 1) T (k ) Z (k )
[5.129]
where:
T (k )
>a1 (k )
@
" a p (k ) T
[5.130]
and Z (k ) is a zero-mean white noise. The overall model which describes our task is thus:
T (k 1) T (k ) Z (k ) s (k 1) y (k )
F (k 1, k ) s (k ) Gu (k )
[5.131]
G T s (k 1) v(k )
The twin aim is thus to estimate the AR parameters T (k ) on the one hand and the state s (k ) on the other. For such a purpose, we can construct the extended state vector as: x(k )
ªT (k )º « s (k ) » ¬ ¼
[5.132]
216
Modeling, Estimation and Optimal Filtering in Signal Processing
This gives rise to the following model: x(k 1) M ( x(k ); k 1, k ) īȍ (k )
[5.133]
where: ȍ (k )
ªZ (k )º « u (k ) » ¼ ¬
[5.134]
Taking into account the particular form of the model equations, we can write:
M ( x(k ); k 1, k )
ª Ip «0 ¬ pu p
º ªT (k )º F (k 1, k )»¼ «¬ s (k ) »¼ 0 pu1
M ( x(k ); k 1, k )
ª Ip «0 « 1u p «01u p « « # «01u p ¬
§ª I p 0 pu( p 1) ¨« ¨« 0 01u( p 1) ¨ « 1u p ¨ « 0 ( p 1)u p I p 1 ©¬
0 pu p
"
a1 (k ) " 1
"
#
%
0
"
0 pu1 º a p 1 (k ) a p (k )»» ªT (k )º 0 0 »« » » ¬ s(k ) ¼ # # » 1 0 »¼ 0 pu1
º ª 0 pu p » « 0 » « 01u p » « 0 ( p 1)u1 ¼» ¬« 0 ( p 1)u p
0 pu1
º· »¸ T (k ) » ¸ »¸ 0 ( p 1)u1 ¼» ¸ ¹ 0 pu p
ªT (k )º « s(k ) » ¬ ¼
where I p is the unity matrix of order p.
M ( x(k ); k 1, k ) is thus expressed as a function of the state vector: M ( x(k ); k 1, k )
A x(k ) H T x T (k ) B x(k )
[5.135]
0 pu1 º » 0 » 0 ( p 1)u1 »¼
[5.136]
with:
A
ª Ip « « 01u p «0 ( p 1)u p ¬
0 pu( p 1) 01u( p 1) I p 1
Kalman Filtering
H
>01u p
1 01u( p 1)
B
ª0 p u p «0 ¬ pu p
Ip º . 0 pu p »¼
@
217
[5.137]
[5.138]
The observation equation, which links the measurement to the state, is: y (k )
H x(k 1) v(k )
[5.139]
The element H T x T (k ) B x(k )
in the transition matrix, which is itself quadratic in x (k ) , explicitly shows the nonlinearity of our task. Thus: ĭ xˆ (k / k 1); k , k 1
wM x(k ), k w x(k )
>
x ( k ) xˆ ( k 1 / k 1)
w A x(k ) H T x T (k ) B x (k ) w x (k )
A H T x T (k ) B B T
@ x ( k ) xˆ ( k 1 / k 1)
x ( k ) xˆ ( k 1 / k 1)
Equivalently: ĭ xˆ (k 1 / k 1); k , k 1
T A H T xˆ (k 1 / k 1) B B T .
H xˆ (k / k 1); k H .
[5.140]
[5.141]
Thus, by applying the results obtained in the previous section, the state vector can be estimated using the following equations: xˆ (k / k 1)
T A xˆ (k 1 / k 1) H T xˆ (k 1 / k 1) B xˆ (k 1 / k 1)
[5.142]
218
Modeling, Estimation and Optimal Filtering in Signal Processing
P (k / k 1) ĭ ( xˆ (k 1 / k 1); k , k 1) P(k 1 / k 1)ĭ T ( xˆ (k 1 / k 1); k , k 1)
[5.143]
īQ(k 1) ī T
K (k )
>
P ( k / k 1) H T HP ( k / k 1) H T R ( k )
>
@
1
@
xˆ (k / k )
xˆ (k / k 1) K (k ) y (k ) H T xˆ (k / k 1)
P(k / k )
>I K (k ) H @P(k / k 1)>I K (k ) H @T K (k ) R(k ) K T (k )
[5.144]
[5.145]
[5.146]
where:
>
E ȍ (k )ȍ T (l )
@
Q(k )G (k l )
[5.147]
and E >v(k )v(l )@ R (k )G (k l )
[5.148]
We can estimate the parameters of the speech signal model using the extended Kalman filter. The parameters thus determined for the noisy signal are very close to those obtained by the traditional method for a noiseless signal. This highlights two aspects: filtering of a noisy signal, and modeling using the extended Kalman filter. 5.4.3.2. Application to tracking formant trajectories of speech signals Once we know the fundamental frequency and the parameters ai of the LPC model for the speech signal, we can synthesize the speech for both limited and unlimited vocabularies. However, in this synthesis, we use parameters ai which do not have an overt physical interpretation. The only quantities which have physical significance are the formants, which are the resonance frequencies of the vocal tract, and the bandwidths which are associated with these formants. However, the determination of the formants is itself based on the LPC modeling. Speech synthesizers based on formant synthesis give high listening quality, quantified by the intelligibility rates.
Kalman Filtering
219
It is useful to mention another application of the EKF as concerns formant trajectories: the tracking of the nonlinear parameters of the model. This application is attributable to Rigoll [22]. Let us take up the model of the speech signal using an AR process: y (k )
a1 (k ) y (k 1) " a p (k ) y (k p ) u (k )
[5.73]
The transfer function associated with this model is: H ( z, k )
1 1 a1 (k ) z 1 " a p (k ) z p
[5.149]
The search for formants f i (k ) and bandwidths bi (k ) is carried out by looking for the poles of H ( z , k ) using second-order resonators: H ( z, k )
1
1 ci (k ) z m
i 1
1
d i (k ) z
2
[5.150]
where: c i (k )
2 exp> 2bi (k )Ts @ cos>2Sf i (k )Ts @
[5.151]
d i (k )
exp> 2Sbi Ts @
[5.152]
and:
Unfortunately, the relation between the LPC model’s parameters ai and the resonators’ parameters > f i (k ), bi (k )@ can be expressed as follows: y (k )
g > f 1 (k ), " , f m (k ), b1 (k ), " , bm (k )@ u (k )
[5.153]
where g is a nonlinear function of > f i (k ), bi (k )@ , which are considered as the parameters of the following state vector: x(k )
> f1 (k ), " , f m (k ), b1 (k ), " , bm (k )@T
[5.154]
220
Modeling, Estimation and Optimal Filtering in Signal Processing
Apart from showing the application of the EKF to the characterization of speech signals, this example highlights the search for the poles of a polynomial. Moreover, using equation [5.73], we can detail function g: y (k )
H (k )T (k ) v(k )
[5.155]
H (k )
>y(k 1) " y(k p)@
[5.156]
where:
T (k )
>a1 (k ) " a p (k )@T
[5.157]
v(k )
u (k ) .
[5.158]
The system model thus becomes: x(k 1) y (k )
f ( x(k )) w(k )
g ( x(k )) v(k )
[5.159]
[5.160]
For a more in-depth explanation, the reader is referred to [22]. 5.5. Conclusion
The stated aim of this chapter was to present the simple and extended versions of the Kalman filter on the one hand, and to illustrate their implementation using classic applications such as parameter estimation. In the following chapter, we will take up the use of Kalman filtering in signal enhancement.
Kalman Filtering
221
5.6. References [1] D. Aboutajdine, T. Lakhdar Ghazal and M. Najim, “Speech Analysis Using Kalman Filtering”, Porto Workshop on Signal Processing and its Applications, Porto, July 1982. [2] D. Aboutajdine and M. Najim, “Adaptive Filter Structures for Deconvolution of Seismic Signals”, IEEE Trans. on Geosc. and Remote Sensing, vol.-GE-23, no. 1, pp. 72-73, 1985. [3] M. Athans and E. Tse, “A Direct Derivation of the Optimal Linear Filter Using the Maximum Principle”, IEEE Trans. on Automatic Control, vol.-AC-12, pp. 690-698, 1967. [4] S. Demetry, “A Note on the Nature of Optimality in the Discrete Kalman Filter”, IEEE Trans. on Automatic Control, vol.-AC-15, pp. 603-604, 1970. [5] J. D. Gibson, J. L. Melsa and S.K. Jones, “Digital Speech Analysis Using Sequential Estimation”, IEEE Trans. on Acoustics Speech and Signal Processing, vol.-ASSP-23, no. 4, pp. 362, August 1975. [6] J. D. Gibson and J. L. Melsa, “Unified Development of Algorithms Used for Linear Predictive Coding of Speech Signals”, Comp. Elec. Eng., vol. 3, pp. 75-91, 1976. [7] D. Godard, “Channel Equalization Using Kalman Filter for Fast Transmission”, IBM J. Res. Dev., pp. 267-273, 1974. [8] D. W. Griffin and J. S. Lim, “Multiband Excitation Vocoder”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. no. 36, no. 8, pp. 1223-1235, August 1988. [9] C. Gueguen and G. Carayannis, “Analyse de la parole par filtrage optimal de Kalman”, Automatismes, vol. 18, no. 3, pp. 99-105, March 1973. [10] T. Kailath, “An Innovation Approach to Least-Squares Estimation. Part I: Linear Filtering in Additive White Noise”, IEEE Trans. on Automatic Control, vol.-AC-13, pp. 646-655, 1968. [11] T. Kailath, A. Sayed and B. Hassibi, Linear Estimation, Prentice Hall, 2000. [12] R. E. Kalman and R. S. Bucy, “New Results in Linear Filtering and Prediction Theory”, Trans. ASME, Series D Journal of Basis Eng., vol. 38, pp. 95-101, 1960. [13] R. E. Kalman, “A New Approach to Linear Filtering and Prediction Problems”, ASME, Series D, Journal of Basis Eng., vol. 82, pp. 34-45, 1960. [14] L. Ljung, “Asymptotic Behaviour of the Extended Kalman Filter as a Parameter Estimator for Linear Systems”, IEEE Trans. on Automatic Control, vol-AC-24, pp. 36-51. January 1979. [15] G. A. Mack and V. K. Jain, “Speech Parameter Estimation by Time-Weighted-Error Kalman Filtering”, IEEE Trans. on Acoustics Speech and Signal Processing, vol.-ASSP31, no. 5, pp. 1300-1303, 1983. [16] E. Matsui, T. Nakajima, T. Suzuki and M. Omura, “An Adaptive Method for Speech Analysis Based on the Kalman Filtering Theory”, Electron, Japan UDC534, pp. 210-219, 1972.
222
Modeling, Estimation and Optimal Filtering in Signal Processing
[17] P. S. Maybeck, Stochastic Models, Estimation and Control, vol. I, Academic Press, New York, 1979. [18] P. S. Maybeck, Stochastic Models, Estimation and Control, vol. II, Academic Press, New York, 1982. [19] H. Morikawa, “Quantization and its Evalution of Transmission Characteristics of Fading Channel in the Adaptive Receiving System based Kalman Filter”, Electronics and Com. in Japan, vol-67B, no. 3, pp. 28-36, 1984. [20] L. Melsa and J.D. Tomick, “Linear Predictive Coding with Additive Noise for Application to Speech Digitalisation”, 14th Allerton Conference on Circuit and Systems, September 1976, USA. [21] L. W. Nelson and N. Stear, “The Simultaneous On-Line Estimation of Parameters and State in Linear Systems”, IEEE Trans. on Automatic Control, pp. 94-98, 1976. [22] G. Rigoll, “A New Algorithm for Estimation of Formant Trajectories Directly from the Speech Signal Based on an Extended Kalman Filter”, IEEE-ICASSP ’86, Tokyo, Japan, 711 April 1986. [23] A. P. Sage and G. W. Masters, “Least Squares Curve Fitting and Discrete Optimum Filtering”, IEEE Trans. on Education, vol-E-10, no. 1, pp. 29-36, 1967. [24] A. P. Sage and J. L. Melsa, Estimation Theory with Applications to Communications and Control, McGraw-Hill, 1971. [25] P. Stoica, “A Test for Whiteness”, IEEE Trans. on Automatic Control. vol. AC-22, pp. 992-993, December 1977. [26] C. Tsai and L. Kurtz, “An Adaptive Robustizing Approach to Kalman Filtering”, Automatica, vol. 19, no. 3, pp. 278-288, 1983. [27] N. Wiener, The Extrapolation, Interpolation and Smoothing of Stationary Time Series with Engineering Applications, John Wiley, New York, 1949. [28] T. Yoshimura, K. Konishi and T. Soeda, “A Modified Extended Kalman Filter for Linear Discrete Time Systems with Unknown Parameters”, Automatica, vol. 17, no. 4, pp. 657660, 1981.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Chapter 6
Application of the Kalman Filter to Signal Enhancement
6.1. Introduction The Kalman filter, introduced in the previous chapter, is extensively used in signal analysis, for a variety of applications such as biomedical, navigation, guidance, econometrics, etc. It has been the topic of a large amount of research. The reader is referred to [2] [5] [20] [25] and [32]. This list is by no means an exhaustive one. This chapter will mainly be concerned with the use of Kalman filtering in the following case: given a signal disturbed by an additive noise, how can we enhance the signal when only a single sequence of the noisy signal is available, and when there is no a priori information either on the signal or on the noise? This formulation applies, for example, to the case of the single-channel enhancement of a speech signal disturbed by an additive noise. We will start out with the case of a signal disturbed by a white noise. Assuming that the speech signal can be modeled by a pth-order autoregressive process, we will look at the state space representation of the system. Thereafter, we will present the Kalman filter whose conventional form requires prior knowledge of the state space’s dynamic parameters and of the variances of both the driving process and the additive noise. We will then review the single-channel enhancement methods based on the Kalman filter. Then, we will propose several alternative approaches which forego the variances of the driving process and the measurement noise, traditionally denoted Q and R respectively. For example, we will present a method which is based
224
Modeling, Estimation and Optimal Filtering in Signal Processing
on Mehra’s work initially developed in the field of identification [26] [27]. We will then take advantage of the approach used by Carew and Belanger [4] which was earlier presented as an alternative to Mehra’s approach in the area of control. Next, we will consider the enhancement as a stochastic realization issue in the domain of identification. For this latter case, these new approaches are based on the subspace methods of identification, also originally proposed in control [35] [36]. To conclude the chapter, we will treat the case of a colored noise and summarily mention the work carried out for signals disturbed by impulse noise, such as [33]. 6.2. Enhancement of a speech signal disturbed by a white noise 6.2.1. State space representation of the noisy speech signal Let us assume that the speech signal s(k) can be modeled by a pth-order AR process: s (k )
p
¦ a i s(k i) u (k )
[6.1]
i 1
where u(k ) is the driving process. Moreover, the observation is a combination of a speech component and an additive zero-mean white noise component with variance R such that: y ( k ) s ( k ) b( k )
[6.2]
As we saw in Chapter 5, the Kalman filter allows a recursive procedure for the state estimation. As our objective here is to estimate the speech signal, we can construct the state vector x(k ) by combining the q (t p) last values of the speech signal s. As stated in section 1.6.9, these values are known as the state variables. x(k ) = >s (k - q + 1) " s (k ) @T
[6.3]
Therefore, the state space representation of the system in equations [6.1] and [6.2] is defined as [13] [30]: x(k 1) ĭ x (k ) Gu (k )
y (k )
H x ( k ) b( k )
[6.4]
[6.5]
Application of the Kalman Filter to Signal Enhancement
225
Here, ĭ is the quq transition matrix having the following form:
ĭ
º ª «0 1 0 " " " " 0 » » « 0 " " " 0 » «0 0 1 « 1 % # » » «0 " 0 «# 0 % % # » # » « # % % % # » # «# «# # % % 0 » # » « 0 1 » " " 0 «0 " 0 «0 " 0 a a p 1 " " a1 » p
«
» p ¼» ¬« q p
[6.6]
G and H are the input and observation vectors respectively, and are defined as follows:
G
HT
ª º «0 " 0 1»
» « ¬ q 1 ¼
T
[6.7]
As explained in Chapter 1, this is the canonical, controllable state space representation. 6.2.2. Speech enhancement procedure The different approaches to signal enhancement using the Kalman filter generally operate in the following steps. First, the speech signal is windowed.
226
Modeling, Estimation and Optimal Filtering in Signal Processing
Figure 6.1. Frame-by-frame processing of a speech signal
For the sake of simplicity, all the frames have the same length, nominally 30 ms to fulfill the hypothesis of quasi-stationarity of the signal. In practice, we can take a 50% overlap and a Hanning or Hamming window defined by: w(k )
a (1 a ) cos(2S
k ) with k = 0,…, N-1. N
[6.8]
where a = 0.5 for a Hanning window and a = 0.54 for a Hamming window. Moreover, N denotes the number of samples per frame. To estimate the signal using the Kalman filter for each frame, we first need to know the corresponding variances Q of the driving process and R of the measurement noise, as well as the dynamic state space matrix triplet > ĭ, G, H @ . Thus, the enhanced signal can be defined as follows: sˆ(k )
H xˆ (k / k )
[6.9]
where xˆ (k / k ) is the a posteriori prediction of the state vector x(k ) based on the k observations {y(1), y(2), ..., y(k)}. Since we have adopted the controllable state space representation, only the prediction coefficients ^a i `i 1,..., p need to be estimated. As the process is assumed to be quasistationary, these coefficients are constant over the analysis frame.
Application of the Kalman Filter to Signal Enhancement
227
We can now retrieve the speech signal by applying a Kalman algorithm (filtering or smoothing) on each frame. Finally, we reconstruct the complete enhanced signal using an addition-overlap method [31]. This is illustrated in Figure 6.1. The use of this method is justified by the following reasons. If we chose to forego the overlap, the final reconstitution of the signal would be based on the division of each enhanced frame by the window. However, this would increase the estimation uncertainty at the beginning and the end of each frame, where the window value is close to zero. A 50% overlap is advantageous because it gives two estimations for each speech sample, which can subsequently be combined. Let sˆi (k i ) be the k i th sample of the ith plot, and sˆi 1 (k i 1 ) the k i 1 th sample of the i+1th plot. Both samples correspond to an estimation of the kth sample s k weighted by the window w as follows: sˆi (k i )
s k wk i
sˆi 1 (k i 1 )
[6.10]
s k wk i 1 .
[6.11]
Considering equations [6.10] and [6.11], we can easily derive s k as follows: s k
sˆi (k i ) sˆi 1 (k i 1 ) wk i wk i 1
[6.12]
If d is the number of recovery samples between two adjoining frames, we can easily express k i 1 as a function of k i : ki
k i 1 d
[6.13]
Moreover, if we take into account the 50% overlap, giving d=N/2 if N is even, and the window expression introduced in [6.8], we can also simplify equation [6.12]. Thus, we have: k k ª º wk i wk i 1 2a (1 a ) «cos(2S i ) cos(2S i 1 )» N N ¬ ¼ k i 1 k i 1 º ª 2a (1 a ) « cos(2S ) cos(2S ) N N »¼ ¬
[6.14] 2a
228
Modeling, Estimation and Optimal Filtering in Signal Processing
Equation [6.12] is modified to the following:
sk
sˆi (k i ) sˆi 1 (k i 1 ) 2a
[6.15]
The major difference between the various estimation approaches lies in the estimation of the noise variance and the prediction coefficients ^a i `i 1,..., p of the speech signal. 6.2.3. State of the art dedicated to the single-channel enhancement methods using Kalman filtering
In this section, we will review existing speech enhancement methods that we have come across so far, notably in [3] [9] [10] [11] [16] [17] [22] [23] [24] and [40]. In the context of signal enhancement, one of the pioneering approaches proposed was introduced by Paliwal and Basu [30]. In it, the signal and noise sequences are both assumed to be known, independently of one another. The model’s state space parameters are then estimated using the speech signal, before the signal is contaminated with the noise. As for the variance of the additive noise, it is estimated directly from the noise sequence. The authors then use a delayed version of the Kalman filter to estimate the speech signal. This procedure is depicted in Figure 6.2. It should be noted that this approach is limited to theoretical study and cannot be implemented for real cases. Later, the method proposed by Gibson et al. provided a sub-optimal solution which is a simplified version of the expectation-maximization (EM) algorithm [13]. More specifically, the prediction coefficients ^a i `i 1,..., p of the speech signal and the variance Q of the driving process are estimated from the noisy speech signal. The variance of the additive noise is estimated, or updated, during the periods of silence. This requires the use of a vocal activity detection (VAD). To improve the enhancement, the original noisy speech signal is filtered two or three times. At each iteration, the parameters of the speech are estimated using the enhanced signal. This is graphically shown in Figure 6.3. However, in their respective works, Paliwal [30] and Gibson et al. [13] do not identify the method used for estimating the signal parameters. The latter implicitly quote the publication of Friedlandler [7], and state that the resolution of modified Yule-Walker equations could be a prospective solution, even though the results
Application of the Kalman Filter to Signal Enhancement
229
obtained could be unsatisfactory, especially for wideband noisy signals, when only a limited number of samples are available.
Figure 6.2. The approach proposed by Paliwal and Basu
Figure 6.3. The approach proposed by Gibson et al.
230
Modeling, Estimation and Optimal Filtering in Signal Processing
In the approach used by Oppenheim et al., the state vector has a size q=p+1 and contains the p+1 last samples of the signal [29].
x s (k ) = >s (k - p) " s(k )
@T
[6.16]
The AR parameters are estimated from the resolution of Yule-Walker1 equations using the estimated values of the signal parameters: ªa p (k )º « # » » Rˆ ss ( p 1) « « a1 (k ) » » « ¬ 1 ¼
ª 0 º « # » » « « 0 » » « ¬Q(k )¼
[6.17]
where Rˆ ss ( p 1) is an estimation of the signal (p+1)u(p+1) autocorrelation matrix. Indeed, assuming that the signal is ergodic, we can approximate the signal autocorrelation matrix using the following matrix estimation:
k
Rˆ ss (k )
¦ Ok i x s (i) x s T (i) i 1
k
¦O
[6.18] k i
i 1
Here, 0 O d 1 is the forgetting factor which makes it possible to track the nonstationary feature of the signal. x s (k ) x s T (k ) = xˆ s (k/k ) xˆ s T (k/k ) + P (k/k )
[6.19]
Moreover, the estimation of the variance of the additive noise is expressed as: Rˆ (k )
1 K 1 K k
9 (k )
[6.20]
where 0 K d 1 is a forgetting factor. The element 9 (k ) is recursively calculated as follows:
1 For further details, the reader is referred to Chapter 2, section 2.2.6.
Application of the Kalman Filter to Signal Enhancement
ª
º
9 (k ) K >9 (k 1)@ « y 2 (k ) 2 y (k ) sˆ(k/k ) + s 2 (k )» «¬
»¼
231
[6.21]
This method can be used to enhance speech recorded in a noisy environment as well as for the active cancellation of noise [41]. The estimation of the signal AR parameters from noisy observations is one of the main difficulties in the implementation of Kalman-filter based enhancement methods2. Gabrea proposes using an EM-type algorithm such as the one proposed by Deriche in [6] to estimate the prediction coefficients ^a i `i 1,..., p . This algorithm iteratively maximizes the log-similarity likelihood of the parameters to be estimated, assuming that the signal and the driving process are both Gaussian. Recently, Gannot et al. have taken up this maximum likelihood parameter estimation problem [12]. They propose the use of Kalman smoothing in the EM algorithm to obtain the estimations of the state xˆ (k/N ) and the covariance matrix P (k / N ) . To do this, they first apply Kalman filtering k >1, N @ and then use the following backward-recursive algorithm [28]: xˆ (k/N )
xˆ (k/k ) A(k )> xˆ (k 1/N ) - ĭ xˆ (k/k )@
[6.22]
P (k/N )
P (k/k ) A(k ) > P (k 1/N ) - P (k 1/k )@ A T (k )
[6.23]
where: A(k )
P (k/k )ĭ T P(k 1/k ) -1
[6.24]
xˆ (k/N ) and P (k / N ) are involved in the maximization step whose purpose is to estimate the prediction coefficients ^a i `i 1,..., p and the variances of the driving
process and the measurement noise. The equations are similar to the Yule-Walker equations. Goh et al. have proposed a Kalman filtering method in which the excitation varies according to the nature of the segment being analyzed [14] [15]. The procedure in this solution is the following. If the frame is unvoiced, the state space representation of the system is similar to that described so far. If, on the other hand, 2 The reader is referred to Chapter 2, section 2.2.6.7, which covers the influence of an additive white noise on the estimation of AR parameters.
232
Modeling, Estimation and Optimal Filtering in Signal Processing
the frame is voiced, excitation u is assumed to be periodic, with a period approximately equal to the period of the pitch. The driving process can be modeled as follows: u (k )
E (k , p 0 ) u (k p 0 ) d (k )
[6.25]
where d is a stationary zero-mean white noise with a covariance Q. In addition, p0 is the normalized instantaneous period of the pitch which needs to be estimated, and E k , p 0 is an indicator of the instantaneous periodicity. The above equation can alternatively be written in the following form: u (k )
p0 max
¦ E k , l u(k l ) d k
[6.26]
l 1
Here, p0 max is a constant whose value is equal to the maximum of the normalized instantaneous period of the pitch. For example, for a signal sampled at 8 kHz, p0 max can be equal to 160 and:
E k , l 0, l z p 0 . The state space dynamic of the system leads to an extended state vector x(k ) defined as follows:
>
x(k ) = s k 1 T
@
u k
T
[6.27]
with: u (k ) = >u (k - p 0 max + 1) " u (k ) @T
[6.28]
s (k ) = >s (k - p + 1) " s (k ) @T
[6.29]
and:
This approach is adapted to the type of frame being analyzed. However, it suffers from two disadvantages. Firstly, an ambiguity in terms of the type of modeling is introduced when the plots are mixed. Secondly, an easy solution does not always exist for the pitch extraction and the voiced/unvoiced decision in a noisy environment. These two concerns are not addressed in the work presented in Goh et al. [14].
Application of the Kalman Filter to Signal Enhancement
233
6.2.4. Alternative methods based on projection between subspaces
6.2.4.1. Introduction The Kalman filtering-based approaches that we examined in the above section are centered around a canonical state space representation. The major difficulties in their practical implementation lie in the estimation of the signal’s AR parameters. Using the subspace-based techniques for the state space identification of the system, we have proposed a new enhancement approach which foregoes the explicit modeling of the signal and the resulting estimation of the model parameters3. Subspace methods for identification were first introduced by Van Overschee et al. and originally used in the field of control [35] [36] . They also benefit from a close relationship with the Kalman filter. Their major advantage is that the state space matrices needed to apply Kalman filters are directly obtained from the noisy observations. Moreover, as opposed to the EM-type approaches, no iteration of the process is required to improve the estimation of the speech signal. Unlike the approaches presented in the preceding section, all the methods that we will present here are based on an estimation of the set >ĭT , H T , QT , R @ , in a state space base (see Chapter 1, section 1.6.5.4) associated with the following one:
x T (k 1) ĭT x T (k ) GT u (k )
[6.30]
H T x T ( k ) b( k )
[6.31]
y (k )
where ĭT is the transition matrix, H T is the observation vector, QT is the correlation matrix of GT u (k ) and R is the variance of b(k ) . 6.2.4.2. Preliminary observations Starting with parameter i, which has to be greater than the size of the state vector, we first define the system’s extended observation matrix Ȍ T ,i : ȌT, i
ªH T «¬ T
H T ĭT T
"
H
T ĭT
i 1 T
º »¼
T
[6.32]
3 This idea is the result of discussions with Mr. M. Verhaegen during the Mathematics for Systems Workshop, 25-31 July 1997, Saint Emilion, France. We duly acknowledge him for this.
234
Modeling, Estimation and Optimal Filtering in Signal Processing
We also concatenate the noisy observations in a i u j Hankel matrix as follows:
Y1 / i
y 2 ª y 1 « y 2 y 3 « « # # « ¬ y i y i 1
" " % "
y j º y j 1 »» » # » y i j 1 ¼
[6.33]
Finally, we introduce the notion of orthogonal projection of one subspace on another: given the two matrices L and M , which have dimensions i L u j and i M u j respectively, we can define the row subspace L / M as the projection of the row subspace of L on the row subspace of M . L/M
>
E j LM T
@ E >MM @ j
T
1
M
[6.34]
The mathematical expectation will henceforth be approximated by a time average: E j >@ .
lim
j of
1 >@. . j
In practice, since only a finite number of observations are available, we will 1 replace E j >@ . by the operator >@ . and take j N i 1 , in equations [6.33] and j [6.39]. N is the number of samples of the analyzed frame. 6.2.4.3. Relation between subspace-based identification methods and the Kalman algorithm The various versions of subspace identification methods [35] [36] [38] [39] have their roots in the pioneering works of Akaike [1] and Kung [21], which deal with the realization issue. Given the observations, the stochastic realization problem consists of determining the dynamic parameters and the noise variances, in this case the set >ĭT , H T , QT , R @ , from the correlation function of the observations. Kung [21] develops an algorithm that provides the state space model from a Hankel matrix containing Markov parameters, which however are rather dfficult to calculate in practice. This Hankel matrix is then expressed as the product of the controllability and observability matrices of the system.
Application of the Kalman Filter to Signal Enhancement
235
To avoid constructing the observation covariance matrix, the subspace methods use orthogonal projections between some row subspaces of the Hankel data matrix. These projections, Yi 1 / 2i / Y1 / i and Yi 2 / 2 i / Y1 / i 1 , can be expressed according to the extended observation matrix and the two sequences X T , i and X T , i 1 : Yi 1 / 2i / Y1 / i
ȌT, i XT, i
Yi 2 / 2i / Y1 / i 1 Ȍ T , i 1 X T , i 1
[6.35]
[6.36]
Van Overschee et al. have put forward an interpretation of these two sequences by establishing a link with the Kalman algorithms [35] [36]. The authors have shown that X T , i and X T , i 1 can be considered the outputs of a bank of Kalman filters at the ith and i+1th iterations. To do this, the authors introduce the following quantities: 6T , i
º ª E N i 1 « X T , i X TT , i » ¼ ¬
[6.37]
and: 6 T , i 1
º ª E N i 1 « X T , i 1 X TT , i 1 » ¼ ¬
[6.38]
where X T , i can be expressed in the following form: XT, i
ªˆ º ˆ «¬ x T (i 1 / y (1),... y (i )) " " x T (i j / y ( j )... y (i j 1)) »¼ [6.39]
where xˆ T ( N 1 / y ( N i 1),..., y ( N )) is the estimation of the state vector x ( N 1) . It takes into account the i previous observations ^y ( N i 1),... y ( N )` , in the state space base associated with the set >ĭT , H T , QT , R @ .
Thus, the subspace-projection identification methods are based on a prediction of the state by the optimal Kalman filter, given the i last observations [35] [36]. The authors of [18] and [19] also propose that these approaches be reformulated to suit speech enhancement.
236
Modeling, Estimation and Optimal Filtering in Signal Processing
6.2.4.4. Signal prediction using the optimal Kalman filter The kth sample of the frame can be estimated thanks to the estimation xˆ T (k ) of
the state vector in the state space base associated to >ĭT , H T , QT , R @ . In fact, we have: s (k )
H T xˆ T (k )
[6.40]
Equation [6.35] can also be written in the following form: XT, i
Ȍ T , i Yi 1 / 2i / Y1 / i
[6.41]
Here, Ȍ T , i denotes the pseudo-inverse of matrix Ȍ T ,i . We can thus obtain the prediction of the state using the optimal Kalman filter based on the i last observations ^y ( k i ),... y ( k 1)` . However, Ȍ T ,i and observation vector H T are not known and thus have to be estimated. According to [6.35], we can estimate Ȍ T ,i by using the singular value decomposition Yi 1 / 2i / Y1 / i , since Yi 1 / 2i / Y1 / i is the product of the extended observability matrix and the state sequence X T , i , according to equation [6.35]: U6V T
Yi 1 / 2i / Y1 / i
[6.42]
and: ȌT, i
U6
1
2
.
[6.43]
Starting from equation [6.32], we can thus extract H T , which is the first row of the Ȍ T ,i matrix. This procedure avoids the direct estimation of the variance of the driving process, the noise variance and the transition matrix. Despite this advantage, this direct reformulation is difficult in terms of implementation because the choice of parameter i is a delicate one. To improve the state estimation and consequently the performance of the enhancement approach, we can foresee a prediction of the kth sample based on the first k observations or N samples of the frame.
Application of the Kalman Filter to Signal Enhancement
237
6.2.4.5. Kalman filtering and/or smoothing combined with subspace identification methods In this section, we propose the use of the subspace methods to obtain the set
>ĭT , H T , QT , R @ . Then, we will use a Kalman filter to estimate the speech signal. As we saw earlier, we can extract H T from the extended observability matrix Ȍ T ,i . The structure of this matrix also allows us to obtain the transition matrix ĭT as follows: Ȍ T , i 1ĭT
Ȍ T , i ĭT ª H ĭ T «¬ T T ȌT, i
H
2 T ĭT
T
"
H
i 1 T ĭT
T
º »¼
T
,
[6.44]
Thus:
ĭT
where Ȍ T , i
ȌT, i ȌT, i
[6.45]
is the pseudo-inverse of the matrix Ȍ T , i .
Moreover, equation [6.36] is equivalent to: X T , i 1 Ȍ T , i 1 Yi 2 / 2i / Y1 / i 1
We thus obtain the pair
>QT , RT @
[6.46]
from the sequences X T , i and X T , i 1 as
follows [35]:
UT
ª X T , i 1 º ª ĭT º «Y »« »XT, i ¬ i 1 / i 1 ¼ ¬ H T ¼
[6.47]
238
Modeling, Estimation and Optimal Filtering in Signal Processing
The covariance pair >QT , RT @ is deduced from the residue UT [6.48]. 1 UT UT T j
ªQ T « 0 ¬
0 º R T »¼
[6.48]
The different algorithms implemented in [18] and [19] require an RQ factorization and a singular value decomposition. The computational complexity of these algorithms is relatively high. Expressions of the state sequences
Yi 2 / 2i / Y1 / i 1
Yi 1/ 2i / Y1/i
X T, i X T , i 1
\T , i Yi 1/ 2i / Y1/ i \ T , i Yi 2 / 2i / Y1/ i 1
\ T , i pseudo-inverse of
UT
\ T, i
\
T , i X T , i 1
Approximation of \ T ,i
\ T, i X T, i
ª T «¬ H T
HT)T T
L
HT)T i Tº»¼
T
\ T,i
ª X T , i 1 º ª )T º «Y » « » X T, i ¬ i 1/ i 1¼ ¬ H T ¼
Extraction of dynamic parameters
and
1 N
UT UTT
ªQT « 0 ¬
0 º R T »¼
Extraction of noise variances
Figure 6.4. Subspsace methods for identification
6.2.4.6. Simulation results In this section, we will carry out the enhancement of the speech signal “Le tribunal va bientôt rendre son jugement”, sampled at 8 kHz. This signal is disturbed by a white Gaussian noise with a signal-to-noise ratio of 15 dB.
Application of the Kalman Filter to Signal Enhancement
Figure 6.5. Time-domain representation of the noisy signal and the enhanced signal
Frequency (Hz)
Frequency (Hz)
Time (s)
Time (s) Original signal
Noisy signal
Frequency (Hz)
Frequency (Hz)
Time (s) Filtered signal
Time (s) Smoothed signal
Figure 6.6. Example of signal enhancement; input SNR: 15 dB
239
240
Modeling, Estimation and Optimal Filtering in Signal Processing
We see in the above figures that a residual noise remains in the enhanced signal. Kalman smoothing can appreciably reduce this noise [28]. 6.2.5. Innovation-based approaches
6.2.5.1. Introduction In this section, we will present approaches which require prior knowledge of neither the covariances of measurement noise and driving process nor of the transition matrix. We will concentrate on parameter estimation in system identification and benefit from the advantages of results obtained by Mehra [26] [27], and Carew and Belanger [4]. The first of the new approaches is inspired by the work that Mehra has performed in the field of identification [26] [27]. Mehra initially developed a method that aims at obtaining unbiased and consistent estimates of the covariances of the driving process and the additive noise. In addition, he showed that the optimal steady-state Kalman gain could always be estimated. Subsequently, Gabrea et al. [8] have proposed a reformulation of his approach in the field of speech signals: the Kalman gain is calculated in an iterative way, as long as the innovation sequence is not white. This procedure requires neither the covariance of the driving process nor the variance of the additive noise and uses an estimation of the autocorrelation function of the innovation. However, the transition matrix and the observation vector are still unknown and thus must be estimated. When choosing the controllable canonical state space representation of the system, the Modified Yule Walker equations can be solved to estimate the prediction cofficients ^a i `i 1,... p . However, this technique may lead to unsatisfactory results for wideband noisy frames. For this reason subspace identification methods can be used to estimate the dynamic parameters. A second approach is based on the work carried out by Carew and Belanger [4]. In this work, the aim is to carry out an iterative procedure for the estimation of the optimal Kalman gain from a suboptimal gain. Lastly, this method proposes the estimation of the variances of the excitation and measurement noises after filtering the signal with a Kalman filter. Once the two variances have been estimated, the speech signal is itself estimated using a standard Kalman filter.
Application of the Kalman Filter to Signal Enhancement
241
6.2.5.2. Kalman-filter based enhancement without direct estimation of variances Q and R Let us take up the state space representation associated with the AR model of equations [6.4] and [6.5]. The state vector has a size p, which is equal to the AR process order. The optimal Kalman gain can be calculated iteratively, without directly using the variances of the driving process and the measurement noise. In fact, the solution consists of calculating the Kalman gain from the innovation’s autocorrelation function ree j . The filtering stage is “global” in the sense that the gain stays constant in an analysis frame. Finally, by verifying that the innovation sequence is white, we can check whether the optimal solution has been attained [26] [27]. The main steps in this approach consist of: – windowing the signal and estimating the corresponding transition matrix and observation vector; – calculating the autocorrelation function of the innovation sequence; – developing an iterative procedure to obtain the optimal Kalman gain. To do so, we first filter the signal with an initial gain K (0) and then calculate the innovation as follows: e( k )
y ( k ) H xˆ ( k / k 1)
[6.49]
The state vector is updated as follows: xˆ (k 1 / k ) ĭ xˆ (k / k 1) ĭK (0)e(k )
[6.50]
and the enhanced signal is deduced as given: sˆ( k )
H xˆ ( k / k )
[6.51]
The innovation sequence thus obtained is used to improve the estimation of the Kalman gain. This gain is expressed, at the j+1th iteration, as follows:
K ( j 1)
Hĭ º ª « H >ĭ ( I - K ( j ) H )@ĭ » » K ( j) « » « # « p -1 » ¬ H >ĭ ( I - K ( j ) H )@ ĭ ¼
1
ª rˆeej (1) º » « j « rˆee (2) » / rˆ j (0) « # » ee » « j ¬«rˆee ( p)¼»
[6.52]
242
Modeling, Estimation and Optimal Filtering in Signal Processing
where rˆeej ( k ) denotes the autocorrelation function of the innovation. It is calculated with the Kalman gain K ( j ) . The way the Kalman gain is updated is explained in Appendix I or detailed in [26] [27]. It should be noted that the above calculation is reiterated as long as the innovation is not a white sequence. To verify whether the sequence is indeed white, the following statistical test proposed by Stoica4 is used: reej (k )
d 1,95
reej (0) N
for k ! 0
[6.53]
The enhanced signal is then reconstructed using addition-overlap.
Noisy speech signal and enhanced speech signal
Figure 6.7. Example of signal enhancement
6.2.5.3. Kalman-filter based enhancement using a suboptimal gain Based on a prediction of the state vector using a suboptimal gain, we propose an alternative to the method proposed by Mehra to calculate the optimal Kalman gain [4]. As in the subsection above, the filtering is global, i.e., the gain is constant over the entire analysis window. This new approach is divided into four successive steps: – windowing the signal and estimating the pair >ĭ , H @ ; 4 P. Stoica, “A Test for Whiteness,” IEEE Trans. on Automatic Control, vol. AC-22, pp. 992993, 1977.
Application of the Kalman Filter to Signal Enhancement
243
– calculating the Kalman gain using the Carew-Belanger approach; – filtering the observations with gain Kˆ j to regain the signal; – reconstituting the overall signal using addition-overlap methods. The Carew-Belanger approach consists of the iterative resolution of a set of three equations linking the optimal case to the suboptimal case [4]. More specifically, these equations determine the exact relationship between the autocorrelation function of the innovation sequence in the optimal case to the same function in the suboptimal case. Let x (k / k 1) be the a priori estimation of x(k ) when the gain K * is suboptimal, and xˆ (k / k 1) the a priori estimation of x(k ) when gain Kˆ is optimal. In the following analyses, matrix P * ( k / k 1) is defined as follows: P (k / k 1)
T E ® xˆ(k / k 1) x (k / k 1) xˆ(k / k 1) x (k / k 1) ½¾ [6.54] ¯ ¿
Moreover, if we assume the filter converges, i.e., if:
lim P * (k / k 1)
k of
P*
[6.55]
then the optimal gain can be obtained iteratively. At the jth iteration, designated by the subscript j hereafter, the calculation of the optimal gain is described as follows: rˆeej (0) = ree * (0) - H P *j H T
Kˆ j
[6.56]
K * - ( I - K * H ) P *j H T / rˆeej (0) Hĭ ª º « H ĭ( I - K * H ) ĭ » » « « » # p -1 » « * ¬ H ĭ( I - K H ) ĭ ¼
>
>
@
@
1
ª ree * (1) º « * » « ree (2) » / rˆ j (0) « # » ee « * » «¬ree ( p )»¼
P*j+1 ĭ( I K *H ) P*j ( I K *H )T ĭT rˆeej (0)ĭ( Kˆ j K * )(Kˆ j K * )T ĭT
[6.57]
[6.58]
244
Modeling, Estimation and Optimal Filtering in Signal Processing
where the autocorrelation function of the innovation, denoted ree * (l ) , is estimated from the samples e * ( k ) with 1 d k d N , as follows: ree * (l )
1 N
N -1
¦ e* (i)e* (i l )
[6.59]
i l
The reader is referred to Appendix J for more details on the derivation of equations [6.56], [6.57] and [6.58]. 6.2.5.4. Alternative approach to Kalman-filter based enhancement, using the estimation of variances Q and R This approach allows the noise variances to be estimated. It is based on the properties of the innovation sequence. We will first take up the estimation of the noise variance, followed by the variance of the driving process. Assuming the noisy signal y(k) to have been filtered with suboptimal initial gain K, we have:
PH T
Hĭ ª º « H >ĭ ( I - KH )@ĭ » » K ree (0) « « » # « p -1 » ¬ H >ĭ ( I - KH )@ ĭ ¼
1
ª ree (1) º « r (2) » « ee » « # » « » ¬ree ( p)¼
[6.60]
However, according to section 5.2.6: e( k )
y k H xˆ (k / k 1)
H~ x (k / k 1) b(k ) .
The variance R of the additive noise b(k ) is thus related to the autocorrelation function of the estimation error by the following equation: R = ree (0)- HPH T
[6.61]
Using the estimated value of transition matrix ĭ , equations [6.60] and [6.61] are respectively modified to:
Application of the Kalman Filter to Signal Enhancement
PH T
º ª Hĭˆ » « H ĭˆ ( I - KH ) ĭˆ » K rˆee (0) « » « # « p -1 » «¬ H ĭˆ ( I - KH ) ĭˆ »¼
>
1
@
>
@
ª rˆee (1) º » «ˆ « ree (2) » « # » » « ¬rˆee ( p )¼
245
[6.62]
and:
Rˆ rˆee (0)- H PH T
[6.63]
The estimation of the variance of the driving process is no longer necessary. We can iteratively calculate the autocorrelation matrix of the prediction error as follows: P (k/k - 1)= ĭ>I-K (k - 1) H @P (k 1/k - 2)>I-K (k - 1) H @T ĭ T
[6.64]
+ ĭK (k - 1) K T (k - 1)ĭ T R GG T Q
While lim P (k/k - 1) k of
P and lim K (k - 1) k of
K are satisfied, equation [6.64] is
changed to [26]: P = ĭ>I - KH @P>I - KH @T ĭ T +ĭKK T ĭ T R GG T Q
[6.65]
Expanding this form of equation [6.61], we obtain: P = ĭPĭT ĭKHPĭT ĭPHT K T ĭT ĭKHPHT K T ĭT+ĭKRKT ĭT GGT Q
>
@
ĭPĭT GGT Q ĭKHPĭT ĭPHT K T ĭT ĭK HPHT R K T ĭT
[6.66]
Using equation [6.61] of the innovation’s autocorrelation function: P = ĭPĭ T GG T Q ĭKHPĭ T ĭPH T K T ĭ T ĭKree 0 K T ĭ T
>
@
ĭPĭ T GG T Q ĭ Kree 0 K T KHP PH T K T ĭ T
[6.67]
If we note: ȍ
ĭ[ Kree (0) K T KHP PH T K T ]ĭ T
[6.68]
246
Modeling, Estimation and Optimal Filtering in Signal Processing
equation [6.65] is changed to: P = ĭPĭ T +GG T Q ȍ
[6.69]
Replacing P of the above equation by its expression and iterating the procedure, we obtain [26]: P = ĭ j P(ĭ j ) T +
j -1
¦
ĭ i GG T (ĭ i ) T Q
i =0
j -1
¦ ĭ i ȍ(ĭ i ) T
j t 1
[6.70]
i=0
Multiplying both sides by H and by (ĭ - j ) T H T , and given that Q is a scalar, we obtain: HP(ĭ - j ) T H T = Hĭ j PH T +
j -1
j -1
i =0
i =0
¦ Hĭ i GG T (ĭ i- j ) T H T Q ¦ Hĭ i ȍ(ĭ i- j ) T H T
[6.71]
Thus, we can obtain the variance Q of the driving noise u(k) by taking into account the symmetry of matrix P and using relationship [6.7] between G and H: HP(ĭ - j ) T H T Hĭ j PH T Q
j -1
¦ Hĭ i ȍ(ĭ i- j ) T H T i =0
j -1
¦ Hĭ H i
T
H (ĭ
i- j T
) H
[6.72] T
i =0
We will now show that the denominator of the above equation is zero for all 0 j p , and that H (ĭ - j ) T H T is zero for 0 j p : – first, given the definition of the observation vector in the context of speech enhancement, the product H(ĭ -j )T H T corresponds to the coefficients p, p of the inverse of matrix ĭ j ; – moreover, given the very peculiar structure of the pup transition matrix ĭ :
Application of the Kalman Filter to Signal Enhancement
ĭ
ª 0 « 0 « « # « « 0 « a p ¬
1
0
0
% % " "
" a p 1
" 0 º % # »» % 0 » » 0 1 » " a1 »¼
247
[6.6]
If we define matrix B as the product ĭ A, we see that its p-1 first rows are equal to the last p-1 rows of matrix A. In the special case where A = ĭ , it can be easily shown that p-1th row of matrix ĭ corresponds to the p-2th row of ĭ 2 , the p-3th row of ĭ 3 and the p-jth row of ĭ j with 0 j p . Consequently, the minor p, p of matrix ĭ j with 0 j p is necessarily zero, as is the p, p element of (ĭ j ) -1 .
Equation [6.72] can thus be used when j is lower than p. If j is chosen to be greater than p, the calculation cost goes up. Using the estimated values of ĭ and PH T , equation [6.72] is changed to:
( PH T ) T (ĭˆ Qˆ
) H T Hĭˆ j PH T
-j T
j -1
¦ Hĭˆ i ȍˆ (ĭˆ i- j ) T H T i=0
j -1
¦
[6.73]
Hĭˆ i H T H (ĭˆ i - j ) T H T
i =0
In this section, we have presented various speech enhancement methods to be used when the additive noise is white. In the next section, we take up the case of colored noise. 6.3. Kalman filter-based enhancement of a signal disturbed by a colored noise
In this section, we will take up a realistic case, i.e. Kalman-based enhancement when the speech signal is disturbed by a colored noise. To accomplish this, we start by modeling both the speech signal and the noise b(k) by autoregressive processes of orders p and q respectively. We will present some Kalman filter-based methods that deal with this issue. The state space representation is that described in Chapter 1, section 1.6.9. This representation, however, leads to a perfect-measurement state space representation, as mentioned in Chapter 5, section 5.2.7.
248
Modeling, Estimation and Optimal Filtering in Signal Processing
We proceed in the following three steps to enhance a speech signal disturbed by a colored noise: – the noisy speech signal is first windowed in frames lasting approximately 30 ms each. A vocal activity detector is implemented to detect whether the windowed signal corresponds to a silent frame or not. These periods of silence allow the driving process’ variance as well as the noise’s dynamic parameters to be estimated; – the transition matrix ĭ s is then estimated from the noisy signal frames; – the speech signal is recovered using Kalman filtering. The enhanced signal is reconstructed using the addition-overlap method. The method presented in the case of white noise in Figure 6.3 was extended by Gibson for colored noise [13]. The noise parameters are estimated during the periods of silence. The signal parameters are obtained iteratively. The method originally developed by Oppenheim et al. [29] has been generalized by Verbout [37]. In that case, the vector is composed of the p+1 last samples of the signal and the q last samples of the additive noise. Lastly, Gannot et al. have also proposed an EM-type solution for colored noise [12]. The estimation of the transition matrix ĭb and ĭ s can be done by using the Yule-Walker equations. Indeed, if we suppose that a set of N samples of the observed signal is available and that this set is associated with a zone of silence, we can write: q
y (k )
b( k )
¦ c j b(k j ) w(k )
for m d k d m N 1
[6.74]
j 1
where m is the index of the first sample of the block. Taking the above equation into account, we can write the autocorrelation of observation y (k ) as: q
r yy ( j )
¦ ci ryy ( j i) WG ( j )
[6.75]
i 1
As this autocorrelation is an even function, the Yule-Walker equations can be expressed as follows:
Application of the Kalman Filter to Signal Enhancement
" ryy (q) º ª 1 º " ryy (q 1)»» «« c1 »» »«#» % # »« » " ryy (0) »¼ «¬cq »¼
ryy (1) ªryy (0) « r (1) ryy (0) « yy « # # « «¬ryy (q) ryy (q 1)
ªW º «0» « ». «#» « » ¬0¼
249
[6.76]
Thus, assuming that the signal observed in this frame is ergodic, we can estimate r yy ( j ) for 0 d j d q as follows: rˆyy ( j )
1 M
M + m -1
¦ y(l ) y(l j) .
[6.77]
l m j
Equation [6.76] can be rewritten as follows: rˆyy (1) ªrˆyy (0) «ˆ rˆyy (0) « ryy (1) « # # « ˆ ˆ «¬ryy (q) ryy (q 1)
rˆyy (q) º ª 1 º » " rˆyy (q 1)» «« cˆ1 »» »«#» % # »« » ˆ " ryy (0) »¼ «¬cˆq »¼ "
ªWˆ º « » «0» «#» « » «¬ 0 »¼
[6.78]
By solving this system of equations using the Levinson algorithm, we can estimate the transition matrix ĭˆ v :
ĭˆ v
ª0 «# « «0 « ¬«cˆ q
1 # 0 cˆ q -1
0º % # »» " 1» » " cˆ1 ¼» "
[6.79]
The estimation of matrix ĭ s is slightly more complicated than ĭˆ v because this matrix has to be estimated from noisy speech signal samples. Since signal s (k ) and noise b(k ) are uncorrelated, we express the autocorrelation of the observations from the autocorrelation functions of the speech signal and the noise; thus, the YuleWalker equations for signal s (k ) make it possible to estimate ĭ s . Let there be a block of N samples of y (k ) : y k s k bk with m M d k d M m N 1
[6.80]
250
Modeling, Estimation and Optimal Filtering in Signal Processing
The autocorrelation of y (k ) is defined by: r yy ( j ) = E{ y (k ) y (k - j )} = E{s (k ) s (k - j )} + E{b(k )b(k - j )}
[6.81]
= rss ( j ) rbb ( j )
Consequently, the autocorrelation functions of the signal, the noise and the noisy observations satisfy the following condition: rss ( j ) r yy ( j ) rbb ( j ) .
[6.82]
The term rbb ( j ) can be estimated using the values of y (k ) over the period of silence ( m d k d M m 1 ) and by assuming the signal to be ergodic: rˆbb ( j )
1 M
M + m -1
¦ y(l )y(l j )
[6.83]
l m j
Taking equations [6.82] and [6.83] into account, we can write that: rˆss ( j )
1 N
m + M + N -1
¦ y(l ) y(l j) - D rˆbb ( j )
[6.84]
l m M j
where the factor D is chosen such that the signal’s autocorrelation matrix stays positive definite. At that stage, the signal’s AR parameters are estimated by solving the YuleWalker equations, using the above equation for rˆss ( j ) . Thus, we have: rˆss (1) ª rˆss (0) «ˆ rˆss (0) « rss (1) « # # « ¬rˆss ( p) rˆss ( p 1)
and:
" rˆss ( p) º ª 1 º »« » " rˆss ( p 1)» « aˆ1 » »« # » % # »« » " rˆss (0) ¼ «¬aˆ p »¼
ªQˆ º « » «0» «#» « » «¬ 0 »¼
[6.85]
Application of the Kalman Filter to Signal Enhancement
ĭˆ s
ª0 « # « «0 « «¬aˆ p
1 # 0 aˆ p -1
0º % # »» " 1» » " aˆ1 »¼
251
"
[6.86]
In the chapters that follow, we will take up other cases of enhancement of signals disturbed by colored noise, notably modeled using MA processes. 6.4. Conclusion
In this chapter, we have described several methods whose aim is to enhance signals disturbed by an additive noise using Kalman filters. We have highlighted the practical difficulties in the implementation of these approaches. These difficulties are linked to the estimation of the model parameters, i.e. the AR parameters, the noise variance, the driving process variance, etc. In the chapters that follow, we will present alternative approaches which aim at alleviating the problems mentioned above. More specifically, we will cover two techniques, namely the instrumental variables and H filtering. 6.5. References [1] H. Akaike, “Markovian Representation of Stochastic Processes by Canonical Variables”, SIAM Journal of Control, vol. 13, pp. 162-173, 1975. [2] J. Benesty, S. Makino and J. Chen, Speech Enhancement, Springer, 2005. [3] H. Cai, E. Grivel and M. Najim, “A Dual Kalman Filter-Based Smoother for Speech Enhancement”, IEEE-ICASSP ’03, Hong Kong, 6-10 April 2003. [4] B. Carew, and P. R. Belanger, “Identification of Optimum Filter Steady-State Gain for Systems with Unknown Noise Covariances”, IEEE Trans. on Automatic Control. vol. AC-18, pp. 582-587, December 1973. [5] C. K. Chui and G. Chen, Kalman Filtering, with Real Time Applications, Springer Series in Information Sciences, Springer Verlag, 1991. [6] M. Deriche, “AR Parameter Estimation from Noisy Data Using the EM Algorithm”, IEEE-ICASSP ’94, Adelaide, Australia, vol. IV, pp. 69-73, 19-22 April 1994. [7] B. Friedlandler and B. Porat, “The Modified Yule Walker Method of ARMA Spectral Estimation”, IEEE Trans. on Aerospace and Electronic Systems, vol. AES-20, no. 2, pp. 158-172, March 1984. [8] M. Gabrea, E. Grivel and M. Najim, “A Single Microphone Kalman Filter-Based Noise Canceler”, IEEE Signal Processing Letters, pp. 55-57, March 1999.
252
Modeling, Estimation and Optimal Filtering in Signal Processing
[9] M. Gabrea, “Adaptive Kalman Filtering-Based Speech Enhancement Algorithm”, Canadian Conference on Electrical and Computer Engineering ’01,13-16 May 2001, vol. 1, pp. 521-526, 2001. [10] M. Gabrea, “Speech Signal Recovery in Colored Noise Using an Adaptive Kalman Filtering”, IEEE-CCECE 2002. Canadian Conference on Electrical and Computer Engineering, vol. 2, pp. 974-979, 12-15 May 2002. [11] M. Gabrea, “Robust Adaptive Kalman Filtering-Based Speech Enhancement Algorithm”, IEEE-ICASSP ’04, Montreal, Canada, vol I, pp. 301-304, 17-21 May 2004. [12] S. Gannot, D. Burchtein and E. Weinstein, “Iterative and Sequential Kalman Filter-Based Speech Enhancement Algorithms”, IEEE Trans. on Speech and Audio Processing, pp. 373-385, July 1998. [13] J. D. Gibson, B. Koo and S. D. Gray, “Filtering of Colored Noise for Speech Enhancement and Coding”, IEEE Trans. on Signal Processing, vol. 39, no. 8, pp. 17321742, August 1991. [14] Z. Goh, K.-C. Tan and B. T. G. Tan, “Speech Enhancement Based on a VoicedUnvoiced Speech Model”, IEEE-ICASSP ’98, Seattle, Washington, USA, vol. no. 1, pp. 401-404, 12-15 May 1998. [15] Z. Goh, K.-C. Tan and B. T. G. Tan, “Kalman-Filtering Speech Enhancement Method Based on a Voiced-Unvoiced Speech Model”, IEEE Trans. on Speech and Audio Processing, Volume 7, no. 5, pp. 510-524, September 1999. [16] V. Grancharov, J. Samuelsson and B. Kleijn, “Improved Kalman filtering for Speech Enhancement”, IEEE-ICASSP ’05, Philadelphia, PA, USA, 18-23 March 2005. [17] V. Grancharov, J. Samuelsson and B. Kleijn, “On Causal Algorithms for Speech Enhancement”, IEEE Transactions on Speech and Audio Processing, vol. 14, no. 3, pp. 764 – 773, May 2006. [18] E. Grivel, M. Gabrea and M. Najim, “Subspace State Space Model Identification for Speech Enhancement”, IEEE-ICASSP ’99, Phoenix, Arizona, USA, vol. no. 2, pp. 781784, March 1999. [19] E. Grivel, M. Gabrea and M. Najim, “Speech Enhancement as a Realization Issue”, Signal Processing, vol. 82, no. 12, 2002, pp. 1963-1978. [20] S. Haykin, Adaptive Filter Theory, Prentice Hall information and system sciences series, Thomas Kailath, series editor. 1996 [21] S. K. Kung, “A New Low-Order Approximation Algorithm Via Singular Value Decomposition”, 12th Asilomar Conference on Circuits, Systems and Computers, pp. 705-714, 1978. [22] N. Ma, M. Bouchard and R. A. Goubran, “Perceptual Kalman Filtering for Speech Enhancement in Colored Noise”, IEEE-ICASSP ’04, Montreal, Canada, vol. I, vol. 1, pp. 717-20. 17-21 May 2004.
Application of the Kalman Filter to Signal Enhancement
253
[23] N. Ma, M. Bouchard and R. A. Goubran, “A Perceptual Kalman Filtering-Based Approach for Speech Enhancement”, 7th International Symposium on Signal Processing and its applications, vol. 1, pp. 373–376, 1-4 July 2003. [24] N. Ma, M. Bouchard and R. A. Goubran, “Frequency and Time Domain Auditory Masking Threshold Constrained Kalman Filter for Speech Enhancement”, 7th International Conference on Signal Processing ICSP ’04. 2004, vol.3, pp. 2659-2662, 31 August-4 September 2004. [25] P. S. Maybeck, Stochastic Models, Estimation, and Control, Volume 1, Academic Press, Orlando 1979. [26] R. K. Mehra, “On the Identification of Variances and Adaptive Kalman Filtering”, IEEE Trans. on Automatic Control. vol. AC-15, No. 2, pp. 175-184, April 1970. [27] R. K. Mehra, “On-Line Identification of Linear Dynamic Systems with Applications to Kalman Filtering”, IEEE Trans. on Automatic Control. vol. AC-16, no. 1, pp. 12-21, February 1971. [28] J. M. Mendel, Lessons in Estimation Theory for Signal Processing, Communications and Control, Prentice Hall, 1995. [29] A. V. Oppeinheim, E. Weinstein, K.C. Zangi, M. Feder, and D. Gauger, “Single-Sensor Active Noise Cancellation”, IEEE Trans. on Speech and Audio Processing, vol. 2, no. 2, April 1994. [30] K. K. Paliwal, and A. Basu, “A Speech Enhancement Method Based on Kalman Filtering”, IEEE-ICASSP ’87, Dallas, USA, pp. 177- 180, 1987, April 1987. [31] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signal, Englewood Cliffs, Prentice Hall, 1978. [32] Special Issue on Applications of Kalman Filtering, IEEE Trans. on Automatic Control, vol. AC-28, no. 3, 1983. [33] R. Settineri, M. Najim and D. Ottaviani, “Order Statistic Fast Kalman Filter”, IEEE-ISCAS 1996, Chicago, USA, pp. 116-119. [34] P. Stoica, “A Test for Whiteness”, IEEE Trans. on Automatic Control, vol. AC-22, pp. 992-993, December 1977. [35] P. Van Overschee and B. de Moor, “Subspace Algorithms for the Stochastic Identification Problem”, Automatica, vol. 29, no. 3, pp. 649-660, 1993. [36] P. Van Overschee and B. de Moor, “N4SID: Susbspace Algorithm for the Identification of Combined Deterministic and Stochastic Systems”, Automatica, vol. 30, no. 1, pp. 7593, 1994. [37] S. M. Verbout, “Signal Enhancement for Automatic Recognition of Noisy Speech”, RLE Technical Report, no. 584, MIT, May 1994. [38] M. Verhaegen and P. Dewilde, “Subspace Model Identification, Part I: The Output-Error State Space Model Class of Algorithms”, Int. Journal. Control, vol. no. 56, no. 5, pp. 1187-1210, 1992.
254
Modeling, Estimation and Optimal Filtering in Signal Processing
[39] M. Verhaegen, “Identification of the Deterministic Part of MIMO State Space Models given in Innovations Form from Input-Output Data”, Automatica, vol. 30, no. 1, pp. 6174, 1994. [40] C. H. You, S. N. Koh and S. Rahardja, “Kalman Filtering Speech Enhancement Incorporating Masking Properties for Mobile Communication in a Car Environment”, IEEE ICME ’04. 2004, 27-30 June 2004, vol. 2, pp. 1343-1346, 2004. [41] K. C. Zangi, “Optimal Feedback Control Formulation of the Active Noise Cancellation Problem: Pointwise and Distributed”, RLE Technical Report no. 583, MIT, May 1994.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Chapter 7
Estimation using the Instrumental Variable Technique
7.1. Introduction In this chapter, we present the instrumental variable (IV) technique, an alternative to the generalized least squares estimation methods. The IV technique provides consistent estimations from noisy observations. It is based on the use of the system’s exogenous variables, i.e. the “instrumental variables”1. Instrumental variable techniques have been used mainly in control engineering, but historically they were derived early in the 1940s in econometrics [19]. The instrumental variables are obtained by processing the input sequence, which is assumed to be known. For this purpose, we can consider a finite impulse response (FIR) filter whose coefficients can themselves be updated using a Kalman filter. Such an approach, described using a block diagram in Figure 7.1, was proposed by Young [29] for identification in control engineering.
1 It should be noted that the performance of the “instrumental” estimator depends on the choice of these instrumental variables.
256
Modeling, Estimation and Optimal Filtering in Signal Processing
Noise b(k ) Known input u (k )
System to be identified
FIR filter
Observation y (k )
Instrumental Variables
IV Estimator System parameters
FIR filter parameters
Kalman filter
Figure 7.1. Block-level description of the identification method proposed by Young [29] in the field of control engineering
However, in the field of signal processing, we usually do not have prior knowledge of the input sequence and the only data available are the noisy observations. Therefore, the above approach has to be modified. Thus, the method proposed by Friedlander, Stoica and Soderström [12] can be considered. They suggest obtaining the instrumental variables by pre-filtering the noisy observations, for instance by carrying out a Kalman filtering (see Figure 7.2). In this chapter, we will first present a review of the IV techniques currently used for AR parameter estimation. Thereafter, we will propose a new approach combining the IV techniques and the Kalman filtering. More specifically, the Kalman filter provides the instrumental variables, i.e. a filtered version of the noisy observations, which are then used for the estimation of AR parameters. However, in order to use a Kalman filter, the AR parameters must be known beforehand. This thus leads to a nonlinear estimation issue, i.e. the estimation of both the signal and its AR parameters from noisy observations. To avoid using the extended Kalman filter, two interactive Kalman filters can be considered [15]. The first of these filters uses the last available estimation of the AR parameters and gives the filtered version of the signal, whereas the second filter makes it possible to update the AR parameters by using the last available version of the filtered signal.
Estimation using the Instrumental Variable Technique
257
Figure 7.2. Block-level description of the proposed estimation method, in the field of signal processing
In the last part of this chapter, we study the relevance of this approach when the additive measurement noise is white, by carrying out a comparative study between the method developed in [15] and the existing methods [5] [7] [13] [25] [27] [30]. 7.2. Introduction to the instrumental variable technique 7.2.1. Principle Let us consider a pth-order AR process y (k ) defined in Chapters 1 and 2 as follows: y (k )
p
¦ a i y(k i) uk
[7.1]
i 1
where u (k ) is a zero-mean white Gaussian process. In the following, the AR parameter vector will be denoted T
> a1
" ap
@T .
Let us assume that only N samples of the following noisy observation are available: z k
y ( k ) b( k )
where b(k ) is a zero-mean white additive Gaussian noise.
[7.2]
258
Modeling, Estimation and Optimal Filtering in Signal Processing
The noisy observation z k respects the following regression relation, as seen in section 2.2.6.7 of Chapter 2: z (k )
p
¦ ai zk i E (k )
[7.3]
i 1
where E ( k )
u (k ) b(k )
p
¦ a i b(k i ) . i 1
The above equation can be written in matrix form as follows:
Z N (k )
where Z p (k 1)
ª Z (k 1) T º E (k ) º ª p « » « » # # « »T « » T « Z (k N ) » «¬ E (k N 1)»¼ p ¬ ¼ < N (k ) T % N (k ) .
>z (k 1)
[7.4]
" y (k p)@T .
When the observations are disturbed by an additive noise, the standard least squares approaches lead to biased estimations of the model parameters. Many studies have been carried out to counteract the effects of the bias2. Instead of using bias compensation estimation techniques3, instrumental variable methods make it possible to weaken this bias. This type of method has been historically designed as an alternative to the traditional LS technique to obtain consistent estimates of the parameters from noisy observations [26]. They consist of using the instrumental variables which are exogenous to the system and are asymptotically uncorrelated to the measurement noise. These variables are usually stored in a matrix, called the instrumental matrix and denoted M (k ) .
A series of estimators Tˆ N has a limit in probability, P lim Tˆ N , if for all real N o f
positive values of H , we have: lim Prob§¨ max Tˆ N T t H ·¸ © N ¹
N o f
0
[7.5]
2 For more details, the reader is referred to section 2.2.6.7 and Appendix E. These sections
describe the bias introduced by an additive noise to the parameter estimation. 3 For more details, the reader is referred to sections 2.2.6.8 and 4.3 where various bias
compensation techniques are presented.
Estimation using the Instrumental Variable Technique
When P lim Tˆ N N o f
259
T , the estimator is said to be consistent.
For the estimation problem defined in equations [7.1]-[7.2], the q u N matrix M (k ) is said to be instrumental if it satisfies the following conditions: – Condition 1: M (k ) §1 · P lim¨ M (k )% N (k ) ¸ N ¹
N o f©
– Condition 2:
is asymptotically uncorrelated to
% N (k ) , i.e.,
0 , for all values of k;
1 M (k )< N (k ) is invertible and its inverse has a limit in N
probability. Pre-multiplying both sides of equation [7.4] by M (k ) , we obtain the following expression:
>M (k )< N (k )@1 M (k )Z N (k )
T >M (k )< N (k )@1 M (k )% N (k )
[7.6]
The IV estimator Tˆ IV of parameters T is thus defined as follows:
Tˆ IV
>M (k )
[7.7]
The error associated with this estimation is:
Tˆ IV T
>M (k )
[7.8]
We show that Tˆ IV T converges in probability towards 0. The estimator is therefore consistent. It should be noted that if we take M (k ) < N (k ) T , estimator [7.7] corresponds to the least squares estimator. The IV method can thus be considered as a generalization of the least squares method. However, Condition 1 above is no longer respected. A careful choice of matrix M (k ) is the key factor in the implementation of an IV method [20]. In the next section, we will review various ways that have been proposed to define M (k ) .
260
Modeling, Estimation and Optimal Filtering in Signal Processing
7.2.2. Review of existing instrumental variable methods for the estimation of AR parameters
In 1967, Wong and Polak published the first article on the use of IV methods to characterize linear systems [26]. Today, these methods are used in many areas, notably in biomedical applications [3], antenna filtering to estimate the direction of arrival (DOA) [8] [21], adaptive estimation of the AR parameters [2], speech enhancement [14], estimation of hidden parameters of Markov models [23], etc. In [22], Stoica et al. analyze how the IV techniques make it possible to “improve” the estimation of the sinusoidal model’s parameters. The link between the IV techniques and the modified Yule-Walker equations4 (MYW) has been established by Friedlander [9]. In fact, if the M (k ) matrix is chosen according to the following definition: M (k )
>Z N (k p 1) Z N (k p 2) " Z N (k p N )@
[7.9]
the estimator of equation [7.7] can be viewed as the autocorrelation function of observation z k when the number of samples is infinite, i.e.
TˆVI
R zz ( p) 1 R zz ( p)
[7.10]
Equivalently:
R zz ( p)
rzz ( p 1) " rzz (1) º ª rzz ( p ) « r ( p 1) rzz ( p) rzz (2) »» « zz « # % # » » « ¬rzz (2 p 1) rzz (2 p 2) " rzz ( p )¼
R zz ( p )
ªrzz ( p 1)º » « # » « «¬ rzz (2 p ) »¼
and:
Equation [7.10] corresponds to the modified Yule-Walker equations presented in Chapter 2. They are used for instance in the function ivar in the System 4 Chapter 2, section 2.2.6.8.
Estimation using the Instrumental Variable Technique
261
Identification Matlab toolbox for the standard implementation of the IV estimates. The performance of this estimator applied to an ARMA process is detailed in [11]. In addition to the block-based approaches, several recursive IV methods have already been proposed [10], notably the recursive method based on the MYW equations [9]. Friedlander et al. have moreover suggested obtaining the instrumental variables by using a pre-filtered version of the noisy observations [12]. To do this, a Kalman filter can be used. However, as this filter requires prior knowledge of the AR parameters, this leads to the so called dual estimation problem [28]. When the state vector in the state space representation of the system is defined as the concatenation of p samples of the signal and the AR parameters, an extended Kalman filter (EKF) is needed to estimate the state vector. A nonlinear solution would consist of using an EKF, but the convergence properties of the EKF are not guaranteed due to the approximations introduced by the linearization of the problem. In the framework of control, Nelson et al. [17] have used two successive Kalman filters. The first aims at estimating the parameters and, once the convergence has been reached, the signal is retrieved by deriving a second infinite-horizon Kalman filter based on the innovation model and the steady-state Kalman gain. In extreme cases, when the signal-to-noise ratio is small, the EKF may even diverge. To avoid this problem, Chui et al. have put forward the use of an EKF whose nominal trajectory is calculated using an auxiliary linear Kalman filter [4]. In the next section, we present a dual approach, using two mutually interactive Kalman filters [15]. 7.3. Kalman filtering and the instrumental variable method
The approach proposed here is illustrated in Figure 7.3. It consists of using two mutually interactive Kalman filters [15]. Each time a new observation is available, the signal is estimated using the latest estimated value of the parameters, and conversely the parameters are estimated using the latest a posteriori signal estimate. In other words, for each time step, one Kalman filter provides the instrumental variables, i.e. the signal estimates, while the second Kalman filter makes it possible to recursively solve equation [7.7]. Both Kalman filters are all the more mutually interactive as the variance of the innovation of the first filtering is used to drive the gain of the second filtering. It should be noted that this method extends the so-called MISP (Mutually Interactive State/Parameter estimation) algorithm, initially developed by Todini et al. [24] for AutoRegressive Moving Average eXogeneous (ARMAX) model identification, and more recently investigated by Mantovan et al. in [16].
262
Modeling, Estimation and Optimal Filtering in Signal Processing
z (k )
z (k 1)
Kalman filter 1: estimation of the state vector xˆ (k 1 / k 1)
Kalman filter 1: estimation of the state vector xˆ (k / k )
Kalman filter 2: estimation of the AR parameters Tˆ(k 1 / k 1)
Kalman filter 2: estimation of the AR parameters Tˆ(k / k )
Figure 7.3. Principle of the dual interactive Kalman filter
7.3.1. Signal estimation using noisy observations
Our purpose is to estimate the signal y (k ) which is modeled by a pth order AR process. We start with the system’s state space representation given by equations [7.1]-[7.2], with the following state vector: x(k )
> y (k )
" y (k p 1)@T
[7.11]
This state vector satisfies the following relations: x(k ) ® ¯ z (k )
) (k , k 1) x(k 1) Gu (k ) H x ( k ) b( k )
[7.12]
where u (k ) is the driving process, considered to be a zero-mean white Gaussian process, with a constant variance Q V u2 . Let us first assume that the estimation of the AR parameters is available. We can then define the transition matrix ) (k , k 1) , the input vector G and the observation vector H as follows:
Estimation using the Instrumental Variable Technique
) (k , k 1)
ª aˆ1 " " aˆ p º » « 0 0 0 » « 1 « 0 % 0 # » » « 0 1 0 ¼ ¬ 0
263
[7.13]
and:
H
ª º «1 0 " 0»
» « p 1 ¬ ¼
GT
[7.14]
The first Kalman filter of Figure 7.3 provides the estimation of the state vector at instant k, given l observations. This estimation is denoted xˆ (k / l ) . In the following, we denote P (k / l ) as the covariance matrix of the error associated with this estimation. The equations for updating the filter are then given by the following expressions: xˆ (k / k 1)
) (k / k 1) xˆ (k 1 / k 1)
[7.15a]
P (k / k 1)
) (k / k 1) P(k 1 / k 1)) (k / k 1) T GQG T
[7.15b]
e( k )
[7.15c]
y (k ) H xˆ (k / k 1)
K (k )
P (k / k 1) H T HP (k / k 1) H T R
1
[7.15d]
xˆ (k / k )
xˆ (k / k 1) K (k )e(k )
[7.15e]
P(k / k )
I K (k ) H P(k / k 1)
[7.15f]
where K (k ) is the Kalman filter gain and e(k ) is the innovation process. When the filter reaches optimality, the innovation e(k ) becomes a random zeromean white process whose variance C (k ) is expressed as follows [1]:
264
Modeling, Estimation and Optimal Filtering in Signal Processing
C (k )
H T P (k / k 1) H T R
[7.16]
Finally, the estimation yˆ (k / k ) is obtained as follows: yˆ (k / k )
H xˆ (k / k )
[7.17]
To estimate the AR parameters, we propose using a second Kalman filter. 7.3.2. Estimation of AR parameters using the filtered signal
When the AR process is considered to be stationary, its AR parameters are constant over time and satisfy:
T (k ) T (k 1) .
[7.18]
To estimate these parameters using the enhanced observations, we can express the observation’s estimation yˆ (k / k ) as a function of T k as follows: yˆ (k / k )
H >) (k , k 1) xˆ (k 1 / k 1) K (k )e(k )@ xˆ (k 1 / k 1) T T (k ) HK (k )e(k ) H T k T (k ) eT (k ).
[7.19]
where H T k xˆ (k 1 / k 1) T is the observation vector. The above equation [7.19] is the key step of this dual-filter method. It points out how the instrumental variables are used to estimate the AR parameters. When the first Kalman filter is optimal, the process eT (k ) HK (k )e(k ) is a zero-mean
random white process uncorrelated to H T k . Thus, equation [7.19] gives a consistent estimation of the AR parameters. Moreover, as we observe in equation [7.19], the variance RT (k ) of process eT (k ) is defined as follows: RT (k )
HK (k )C (k ) K (k ) T H T .
[7.20]
Equations [7.18]-[7.19] thus constitute a state space representation for the Kalman filter-based estimation of AR parameters as given below:
Estimation using the Instrumental Variable Technique
T (k ) T (k 1) ® ¯ yˆ (k / k ) H T (k )T (k ) eT (k ) .
265
[7.21]
If Tˆ(k / l ) is the estimation of the AR parameters at instant k, taking into account l observations, and if PT (k / l ) is the associated covariance matrix, the equations for updating the second Kalman filter are given as follows:
Tˆ(k / k 1) Tˆ(k 1 / k 1)
[7.22a]
PT (k / k 1)
[7.22b]
K T (k )
PT (k 1 / k 1)
PT (k / k 1) H T T (k ) H T (k ) PT (k / k 1) H T T (k ) RT (k )
Tˆ(k / k ) Tˆ(k / k 1) K T (k )eT (k ) PT (k / k )
I K T (k ) H T (k ) PT (k / k 1)
[7.22c]
[7.22d] [7.22e]
The estimation Tˆ(k / k ) is then fed into the first Kalman filter which in turn estimates the AR process at time k+1. However, this method is only applicable if the variances of the two processes u (k ) and b(k ) are known. The following section presents a method for the estimation of these variances. 7.3.3. Estimation of the variances of the driving process and the observation noise
To estimate the variance of the driving process, we use the Riccati equation from equations [7.15b] and [7.15f] above, i.e.: P(k / k )
) (k , k 1) P (k 1 / k 1)) (k , k 1) T *Q(k 1)* T K (k ) HP(k / k 1)
[7.23]
266
Modeling, Estimation and Optimal Filtering in Signal Processing
As P (k / k 1) is a symmetric matrix and given equations [7.15d] and [7.16], Pk / k 1 and C (k ) are related as follows: HP(k / k 1)
C (k ) K (k ) T
[7.24]
Combining the above two equations, the covariance matrix Q of the driving process is expressed as follows: § P(k / k ) ) (k , k 1) P (k 1 / k 1)) (k , k 1) T D¨ ¨ K (k )C (k ) K (k ) T ©
Q
with D
· T ¸D ¸ ¹
[7.25]
ª T º 1 T «¬G G »¼ G being the pseudo-inverse matrix of G .
Since C (k ) is the variance of the innovation e(k ) , we can recursively estimate Q as follows: Qˆ (k ) where L(k )
k 1 ˆ 1 Q(k 1) DL(k ) D T k k
[7.26]
P(k / k ) )(k , k 1) P(k 1 / k 1))(k , k 1) T K (k )e(k ) 2 K (k ) T .
The variance of the additive noise b(k ) , denoted R, is derived from equation [7.16] as follows: Rˆ (k ) with T ( k )
1 k 1 ˆ R(k 1) T (k ) k k
[7.27]
e( k ) 2 HP ( k / k 1) H T .
7.3.4. Concluding observations
In addition to the instrumental variable method that we have detailed above, several other methods use the dual estimation of a process and its parameters. For example, Oppenheim et al. proposed the modeling of undesired noise using an AR process [18] for noise cancellation. They use a Kalman filter to estimate the noise, whereas a LMS filter makes it possible to update the AR parameters. In [7], a Kalman smoothing is used to enhance an AR process disturbed by a noise, which
Estimation using the Instrumental Variable Technique
267
can be additive, AR or impulse. The parameter estimation step is based on the Levinson algorithm or the recursive least square lattice (RLSL) algorithm. However, the authors do not explain how the enhanced signal is used to estimate the AR parameters. All these techniques are inherently IV, but the method we presented in the previous section differs from them in one important aspect: the variance of the innovation is used to define the state space representation of the AR parameters. 7.4. Case study
In this section, we propose to illustrate the performances of the method presented in section 7.3 for the estimation of AR parameters when the noise is white, Gaussian and additive. For this purpose, we will compare it with the block-based methods, the recursive bias-compensation methods. 7.4.1. Preliminary observations
To set the stage for the comparison between the new method of section 7.3 above and the existing methods, some preliminary tests are conducted. The performance of the algorithm based on the mutually interactive Kalman filters is appreciable when the SNR is greater than 0 dB. Moreover, the estimation of the observation noise’s variance R (k ) is a key factor here because this estimation determines the amount of noise to be cancelled. Finally, as is the case for most recursive methods, the timedependent behavior of the estimation T (k ) depends on the driving process and the initial conditions. We used equations [7.26] and [7.27] to estimate, respectively, the matrices Q(k ) and R(k ) . Figures 7.4 and 7.5 show the simulation results obtained for a synthetic AR process with 200 samples. The poles of this process are: p1,2
0.92 exp( r j 0.3S ) .
This process is disturbed by a zero-mean white Gaussian noise, with a SNR of 10 dB. The different methods are compared using: – the mean estimation of the AR parameters for 100 realizations of the additive noise; – the location of the estimated poles in the z-plane; – the study of the corresponding AR spectrum.
268
Modeling, Estimation and Optimal Filtering in Signal Processing
For 100 realizations of the additive noise, the mean estimated poles are: pˆ1, 2
(0.971 r 0.007) e(r j (0.293 r 0.004)S ) .
We note in Figure 7.5 that the new dual Kalman filter-based method cannot discriminate between the variances of the driving process and the additive noise. Thus, in the following, we will assume that R(k ) is estimated in the segments which do not contain the signal. Im(z)
Desired poles and spectra
1
p1 Re(z) 1
p2 Normalized frequency
Im(z) 1
Re(z)
Levinson’s algorithm
1
Normalized frequency
Im(z) 1
Proposed method
Re(z) 1
Normalized frequency
Figure 7.4. Location of the poles in the z-plane, and corresponding spectra for Levinson’s algorithm and the proposed method based on equations [7.26] and [7.27]
Estimation using the Instrumental Variable Technique
269
Figure 7.5. Convergence of the AR parameter estimation and variances Q(k) and R(k), using the new algorithm, for one realization of the signal
This experiment was conducted again, taking into account the variance R which is now known a priori. We see from Figures 7.6 and 7.7 that the AR parameters and the variance Q(k ) converge towards the desired values. However, the processing speed is low for certain combinations of parameters. This occurs, for example, when the signal’s spectrum contains sharp resonances or neighboring resonances, i.e. when the distance between the two resonances is lower than 0.1S. A large number of samples is thus required to ensure the convergence of the algorithm. Biased least square estimations of T (k ) and Q(k ) can be used as the initial values of the algorithm to speed up the convergence of the estimation.
270
Modeling, Estimation and Optimal Filtering in Signal Processing Im(z)
Desired poles and spectra
1
p1 Re(z) 1
p2 Normalized frequency Im(z) 1
Proposed method
Re(z) 1
Normalized frequency
Figure 7.6. Location of the poles in the z-plane and the corresponding AR spectra; for the proposed method using equation [7.26], variance R(k) being known a priori
Figure 7.7. Convergence of the AR parameters’ estimation and the variance Q(k) using Kalman filtering, for one realization, with variance R(k) being known a priori
Estimation using the Instrumental Variable Technique
271
7.4.2. Comparative study. Case 1: white additive noise
In this section, we review the different existing methods for the estimation of AR parameters. Eight methods including four offline methods will be considered: – Davila method [5] offline method, – Zheng method [30] offline method, – Hasan method [13] offline method, – modified Yule-Walker equations [9] offline method, – Ȗ-LMS algorithm [25] online method, – ȡ-LMS algorithm [27] online method, – Doblinger approach [7] online method, – the instrumental variable approach using dual Kalman filtering. When the number of samples is relatively high, i.e. greater than 2,000, all the above methods provide unbiased and/or consistent estimations of the AR parameters. Thus, to distinguish their performances, we will consider the borderline cases and present the results for the following two tests. The first consists of analyzing the performance of the algorithms given only a limited number of samples, i.e. several hundred, which corresponds to the real cases of coding or speech enhancement. In the second test, we consider AR processes with two sharp peaks in their spectrum. Test no. 1: limited number of samples of the noisy observations. For this test, we generate 300 samples of an AR process which is characterized by the following poles:
p1.2
0.98 exp(r j 0.1S ) , p3.4
0.97 exp(r j q0.3S ) , p5.6
0.8 exp(r j q0.7S ) .
This process is then disturbed by an additive zero-mean white Gaussian noise such that the SNR is 10 dB. Test no. 2: AR process with two closely-spaced sharp peaks. Here, we generate 512 samples of an AR process characterized by the following poles: p1.2 0.98 exp(r j 0.2S ) , p3.4 0.98 exp(r j 0.3S ) .
This process is then disturbed by an additive zero-mean white Gaussian noise such that the SNR is 15 dB.
272
Modeling, Estimation and Optimal Filtering in Signal Processing
Method Levinson
Q
a1
15.57 -0.99 (0.73) (0.006)
a2
a3
a4
a5
a6
Rejected cases5
-0.08 (0.01)
0.31 (0.01)
0.01 (0.01)
-0.23 (0.01)
0.16 (0.01)
0
1.39 (1.42) 2.62 Zheng (3.77) 26.71 Hasan (16.17) 5.66 MYW (0.83) 13.92 Ȗ-LMS (1.99) 14.71 ȡ-LMS (1.55) 15.57 Doblinger (0.73)
-1.77 (0.13) -1.20 (0.39) -0.96 (0.10) -1.64 (0.04) -0.85 (0.01) -0.79 (0.01) -1.29 (0.01)
1.31 (0.47) 0.14 (0.71) 0.27 (0.13) 0.67 (0.09) -0.005 (0.02) 0.03 (0.01) 0.26 (0.02)
-0.88 (0.79) -0.01 (0.43) 0.07 (0.09) 0.45 (0.11) 0.31 (0.02) 0.18 (0.01) 0.50 (0.04)
1.46 (0.75) 1.48 (0.87) 0.01 (0.12) -0.08 (0.10) -0.14 (0.02) -0.14 (0.01) -0.08 (0.03)
-1.81 (0.40) -2.29 (0.97) -0.14 (0.11) -0.81 (0.08) -0.32 (0.02) -0.22 (0.01) -0.55 (0.02)
0.94 (0.10) 1.20 (0.40) 0.12 (0.05) 0.65 (0.03) 0.54 (0.01) 0.49 (0.01) 0.44 (0.01)
Proposed Method
4.00 (1.13)
-1.52 (0.01)
0.43 (0.03)
0.69 (0.03)
-0.23 (0.03)
-0.73 (0.02)
0.61 (0.01)
0
Expected Value
1
-2.06
1.84
-0.98
0.80
-0.97
0.58
/
Davila
16 31 30 0 0 0 0
Table 7.1. Test 1, mean values of the estimated AR parameters. The last column presents the number of realizations unaccounted for in the averaging
5 The last column gives the number of realizations unaccounted for while calculating the mean. These are rejected because they lead to an unstable system.
Estimation using the Instrumental Variable Technique
Im(z)
p5
1
Desired poles and spectra
p3
Expected poles
Expected spectrum
p1 Re(z) 1
p2 p6
p4
Normalized frequency
Im(z)
Re(z)
Davila method [5]
Normalized frequency
Im(z)
Re(z)
Zheng method [30]
Normalized frequency
Im(z)
Re(z)
Hasan method [13]
Normalized frequency
Im(z)
MYW equations [9]
273
Re(z)
Normalized frequency
Figure 7.8. Test 1, poles and spectra estimated by different block-based methods
274
Modeling, Estimation and Optimal Filtering in Signal Processing
Im(z)
p5
1
Desired poles and spectra
p3
Expected poles
Expected spectrum
p1 Re(z) 1
p2 p6
p4
Normalized frequency
Im(z)
Re(z)
Ȗ-LMS algorithm [25]
Normalized frequency
Im(z)
Re(z)
ȡ-LMS algorithm [27]
Normalized frequency
Im(z)
Re(z)
Doblinger method [7]
Normalized frequency
Im(z)
Proposed method [15]
Re(z)
Normalized frequency
Figure 7.9. Test 1, poles and spectra estimated by different recursive methods
Estimation using the Instrumental Variable Technique
275
Considering Figures 7.8 and 7.9 and Table 7.1, we see that when the number of samples is limited, the block-based methods are prone to divergence. It should be noted that Diversi et al. [6] confirm the divergence and instability of the iterative Zheng method. The Hasan method gives rise to biased estimations of the AR parameters.
Q
a1
a2
a3
a4
IV Method under study
50.56 (1.50) 2.82 (1.95) 7.5 (7.88) 58.35 (13.64) 12.50 (1.60) 15.92 (3.31) 29.19 (6.27) 50.56 (1.50) 30.00 (2.67)
-1.05 (0.01) -2.76 (0.01) -2.45 (0.08) -0.98 (0.03) -2.72 (0.01) -0.94 (0.01) -0.84 (0.02) -1.37 (0.02) -1.47 (0.02)
0.40 (0.01) 3.83 (0.02) 3.14 (0.17) 0.31 (0.03) 3.72 (0.02) 0.21 (0.01) 0.04 (0.02) 0.82 (0.04) 0.96 (0.04)
0.37 (0.01) -2.71 (0.02) -2.07 (0.16) 0.40 (0.01) -2.61 (0.02) 0.42 (0.01) 0.55 (0.03) 0.10 (0.04) 0.02 (0.03)
-0.05 (0.01) 0.96 (0.01) 0.73 (0.05) -0.01 (0.02) -0.92 (0.01) 0.02 (0.01) -0.02 (0.02) -0.06 (0.02) -0.07 (0.01)
Expected value
1
-2.74
3.75
-2.63
0.92
Method Levinson Davila Zheng Hasan MYW Ȗ-LMS ȡ-LMS Doblinger
Table 7.2. Test 2, mean estimated values of the AR parameters
Rejected cases
0 32 45 15 0 0 0 0 0 /
276
Modeling, Estimation and Optimal Filtering in Signal Processing
The pole estimations obtained from the recursive methods lie in the unit circle in the z-plane. However, the methods based on the LMS algorithm remain in the transitory zone. The Doblinger method and the new dual Kalman filter-based method both allow the detection of the frequencies of the strongest peaks in the process’s spectrum. In the second experiment, the model’s order is known beforehand to be 4. From Figures 7.10 and 7.11 and Table 7.2, we see that the Davila method is highly unstable: 60% of the realizations are not eligible for consideration. We note moreover that not all the methods converge. In this experiment, it is difficult to choose the model’s order according to the traditional criteria6. We thus started with order p=6. The study consists of comparing the location of the poles and the shapes of the AR spectra. As Figures 7.12 and 7.13 show, the block-based methods are still unstable. The Dolbinger method and the new instrumental variable method both allow the determination of the lowfrequency peaks in the spectra. However, the shape of the spectrum is modified in the high-frequency region. The calculation cost of the dual Kalman filter is of the order of p 3 calculations per iteration. This value is comparable to the computational cost of the Doblinger method, but it remains higher than the LMS algorithm-based methods. For an AR process disturbed by a white noise, the dual Kalman filter-based method achieves the best trade-off between stability and estimation accuracy. This comes at the cost of a relatively high calculation cost.
6 For more details on these criteria, the reader is referred to the last section of Chapter 2.
Estimation using the Instrumental Variable Technique
Im(z) 1
Desired poles and spectra
p3 p1
Expected poles
Expected spectrum
Re(z) 1
p2 p4
Normalized frequency
Im(z)
Re(z)
Davila method [5]
Normalized frequency
Im(z)
Re(z)
Zheng method [30]
Normalized frequency
Im(z)
Hasan method [13]
Re(z)
Normalized frequency
Im(z)
MYW equations [9]
Re(z)
Normalized frequency
Figure 7.10. Test 2, p=4, poles and spectra estimated by different block methods
277
278
Modeling, Estimation and Optimal Filtering in Signal Processing
Im(z) 1
Desired poles and spectra
p3 p1
Expected poles
Expected spectrum
Re(z) 1
p2 p4
Normalized frequency
Im(z) 1
Re(z) 1
Ȗ-LMS algorithm [25]
Normalized frequency
Im(z) 1
ȡ-LMS algorithm [27]
Re(z) 1
Normalized frequency
Im(z) 1
Doblinger method [7]
Re(z) 1
Normalized frequency
Im(z) 1
Proposed method [15]
Re(z) 1
Normalized frequency
Figure 7.11. Test 2, p=4, poles and spectra estimated by recursive methods
Estimation using the Instrumental Variable Technique
Im(z) 1
Desired poles and spectra
p3 p1
Expected poles
Expected spectrum
Re(z) 1
p2 p4
Normalized frequency
Im(z) 1
Davila method [5]
Re(z) 1
Normalized frequency
Im(z) 1
Zheng method [30]
Re(z) 1
Normalized frequency
Im(z) 1
Re(z) 1
Hasan method [13]
Normalized frequency
Im(z) 1
MYW equations [9]
Re(z) 1
Normalized frequency
Figure 7.12. Test 2, p=6, poles and spectra estimated by block methods
279
280
Modeling, Estimation and Optimal Filtering in Signal Processing
Im(z) 1
Desired poles and spectra
p3 p1
Expected poles
Expected spectrum
Re(z) 1
p2 p4
Normalized frequency
Im(z) 1
Re(z) 1
Ȗ-LMS algorithm [25]
Normalized frequency
Im(z) 1
Re(z) 1
ȡ-LMS algorithm [27]
Normalized frequency
Im(z) 1
Re(z) 1
Doblinger method [7]
Im(z)
Normalized frequency
1
Proposed method [15]
Re(z) 1
Normalized frequency
Figure 7.13. Test 2, p=6, poles and spectra estimated by recursive methods
Estimation using the Instrumental Variable Technique
281
7.5. Conclusion
In this chapter, we have presented the instrumental variable techniques. These techniques present an alternative to the traditional least squares estimation methods. The key step in the success of these methods is the choice of the instrumental variables. We have also proposed, for the estimation of AR parameters, a new approach in which the instrumental variables are defined using a filtered version of the AR process’s noisy observations. This filtered version is obtained by Kalman filtering. The Kalman filter is optimal only if two very strong assumptions on driving process and observation noise are respected in the state space of the system. These two assumptions are the whiteness and Gaussian nature of the processes. These hypotheses are very limiting for real-life situations. In the next chapter, we will take up Hf estimation techniques which ease these limitations on the processes. 7.6. References [1] B. D. O. Anderson and J. B. Moore, Optimal Filtering, Ed. T. Kailath, Prentice Hall. Chapter 10, 1979. [2] V. Buzenac-Settineri and M. Najim, “OLRIV: A New Fast Algorithm for RectangularBlock Toeplitz Systems”, IEEE Trans. on Signal Processing, vol. 48, no. 9, pp. 25192534, September 2000. [3] M. Chan, J. Aguilar-Martin, P. Celcis and J. P. Marc Vergnes, “Instrumental Variable Techniques in Cerebral Blood Flow Estimation Using Very Few Samples”, IFAC Symp. on Identification and System Parameter Estimation, July 1985. [4] C. K. Chui, G. Chen and H. C. Chui, “Modified Extended Kalman Filtering and a RealTime Parallel Algorithm for System Parameter Identification”, IEEE Trans. on Automatic Control, vol. 35, no. 1, pp. 100-104, January 1990. [5] C. E. Davila, “A Subspace Approach to Estimation of Autoregressive Parameters from Noisy Measurements”, IEEE Trans. on Signal Processing, vol. 46, no. 2, pp. 531-534, February 1998. [6] R. Diversi, U. Soverini and R. Guidorzi, “A New Estimation Approach for AR Models in Presence of Noise”, XVIth IFAC World Congress, Prague, 2005, 3-8 July 2005. [7] G. Doblinger, “Smoothing of Noisy AR Signals Using an Adaptive Kalman Filter”, EURASIP-EUSIPCO, vol. 2, pp. 781-784, September 1998. [8] K. Dogançay, “Bias Compensation for the Bearings-Only Pseudolinear Target Track Estimator”, IEEE Trans. on Signal Processing, vol. 54, no. 1, pp. 59-68, January 2006. [9] B. Friedlander, “Instrumental Variable Methods for ARMA Spectral Estimation”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 31, no. 2, pp. 404-415, April 1983.
282
Modeling, Estimation and Optimal Filtering in Signal Processing
[10] B. Friedlander, “The Overdetermined Recursive Instrumental Variable Method”, IEEE Trans. on Automatic Control,” vol. AC-29, no. 4, pp. 353-356, April 1984. [11] B. Friedlander and K. C. Sharman, “Performance Evaluation of the Modified YuleWalker Estimator”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. ASSP33, no. 3, pp. 719-725, June 1985. [12] B. Friedlander, P. Stoica and T. Södertröm, “Instrumental Variable Methods for ARMA Parameter Estimation”, 7th IFAC/IFORS Symposium on Identification and System Parameter Estimation, York, England, pp. 29-36, July 1985. [13] M. K. Hasan, J. Hossain and A. Haque, “Parameter Estimation of Multichannel Autoregressive Processes in Noise”, Signal Processing, vol. 83, no. 3, pp. 603-610, January 2003. [14] D. Labarre, E. Grivel, M. Najim and E. Todini, “Two-Kalman Filters Based Instrumental Variable Techniques for Speech Enhancement”, IEEE-MMSP, Sienna, Italy, 29 September–1 October 2004. [15] D. Labarre, E. Grivel, M. Najim and E. Todini, “Consistent Estimation of Autoregressive Parameters from Noisy Observations based on Two Interacting Kalman Filters”, Signal Processing, vol. 86, no. 10, pp. 2863-2876, October 2006. [16] P. Mantovan, A. Pastore and S. Tonellato, “Recursive Estimation of System Parameter in Environmental Time Series”, M. Vichi and O. Optiz (Eds), Classification and Data Analysis, Springer, pp. 311-318, 1999. [17] L. W. Nelson and E. Stear, “The Simultaneous On-Line Estimation of Parameters and States in Linear Systems”, IEEE Trans. on Automatic Control, vol. 21, no. 2, pp. 94-98, February, 1976. [18] A. V. Oppenheim, E. Weinstein, K. C. Zangi, M. Feder and D. Gauger, “Single-Sensor Active Noise Cancellation”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 2, no. 2, pp. 285-290, April 1994. [19] O. Riersol, “Confluence Analysis by Means of Lag Moments and Other Methods of Confluence Analysis”, Econometrica, vol. 9, no. 1, pp. 1-23, January 1941. [20] T. Södertröm and P. Stoica, “Comparison of some Instrumental Variable Methods – Consistency and Accuracy Aspects”, Automatica, vol. 17, pp. 101-115, January 1981. [21] P. Stoica, M. Cedervall and T. Södertröm, “Adaptive Instrumental Variable Method for Robust Direction-of-Arrival Estimation”, IEE Radar Sonar and Navigation, vol. 14, no. 2, pp. 45-53, April 1995. [22] P. Stoica, B. Friedlander and T. Söderström, “On Instrumental Variable Estimation of Sinusoid Frequencies and the Parsimony Principle”, IEEE Trans. on Automatic Control, vol. AC-31, no. 8, pp. 793-795, August 1986. [23] J. S. Thorne and J. B. Moore, “An Instrumental Variable Approach for Identification of Hidden Markov Models”, 5th International Symposium on Signal Processing and its Applications (ISSPA ’99), Brisbane, Australia, vol. 1, pp. 103-106, 22-25 August 1999.
Estimation using the Instrumental Variable Technique
283
[24] E. Todini, “Mutually Interactive State/Parameter Estimation (MISP) – Application of Kalman Filter to Hydrology, Hydraulics and Water Resources”, AGU Chapman Conf., University of Pittsburg, May 1978. [25] J. R. Treichler, “Transient and Convergent Behavior of the Adaptive Line Enhancer”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 27, no. 1, pp. 53-62, February 1979. [26] K. Y. Wong and E. Polak, “Identification of Linear Discrete Time Systems Using the Instrumental Variable Method”, IEEE Trans. on Automatic Control, vol. 12, no. 6, pp. 707-718, December 1967. [27] W.-R. Wu and P.-C. Chen, “Adaptive AR Modeling in White Gaussian Noise”, IEEE Trans. on Signal Processing, vol. 45, no. 5, pp. 1184-1191, May 1997. [28] T. Yoshimura, K. Konishi and T. Soeda, “An Extended Kalman Filter for Linear Discrete Time Systems with Unknown Parameters”, Automatica, vol. 17, no. 4, pp. 657-660, July 1981. [29] P.-C. Young, “An Instrumental Variable Method for Real-Time Indentification of Noisy Process”, Automatica, vol. 6, no. 2, pp. 271-287, March 1970. [30] W. X. Zheng, “Autoregressive Parameter Estimation from Noisy Data”, IEEE Trans. on Circuits and Systems II: Analog and Digital Signal Processing, vol. 47, no. 1, pp. 71-75, January 2000.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Chapter 8
H Estimation: an Alternative to Kalman Filtering?
8.1. Introduction In the previous chapter, parametric approaches have proved to be powerful tools for the resolution of many problems in signal processing. Nevertheless, we must guard against over-estimating their usefulness and take proper account of their limitations. Any given model is at best an approximation of the real world and modeling uncertainties always exist. The challenge in this approximation is twofold: choosing the most appropriate representation of the signal, and taking into account the properties of the noise which often disturbs the observations. It should be noted that this noise is itself modeled, leading to additional model uncertainties. Further errors are introduced during the estimation of the model parameters. This estimation heavily depends on strong statistical assumptions. In Kalman filtering, for example, the maximum likelihood estimation of the state vector is obtained if and only if the driving process and the observation noise are both white, Gaussian and uncorrelated. Moreover, the classical algorithms give biased or non-consistent estimations when the observations are disturbed by an additive measurement noise. For further details on this matter; see section 2.2.6.7. In this chapter, we analyze the relevance of the H-based approaches in signal processing. The major advantage of these approaches is that the assumptions required for their implementation are less restrictive than those needed for the Kalman filter.
286
Modeling, Estimation and Optimal Filtering in Signal Processing
In the first section of this chapter, we will introduce the reader to the subject of estimation based on the minimization of the H norm. For more details on this topic, see B. Hassibi, A. Sayed and T. Kailath [15]. The second part of the chapter is dedicated to the estimation of AR parameters using a H filter to retrieve a signal from noisy observations. More particularly, we propose a method based on the combination of two H algorithms. This method allows for the consistent estimation of AR parameters. Finally, we study the use of H estimation techniques in speech enhancement. 8.2. Introduction to H estimation The H norm was introduced by Zames [39] in 1981 in the context of control engineering. This norm provides an appropriate framework for the optimal management of control aims, which are often contradictory to one another. The H filtering can be thought of as a specific case where the only constraints are those placed on the performance. All H estimation techniques aim at reducing the estimation error by accounting for the noises and the modeling uncertainties. To do this, these techniques consider the worst-case scenario1. As stated by Hassibi et al. in [15], H estimation is more robust to uncertainties in the system representation than Kalman filtering-based estimation. No prior assumption is required here, either for the driving process or for the observation noise. Thus, H filtering is similar to the “bounded error” estimation techniques [7]. We will see in the following sections that the H filtering theory can be derived using operator theory, system theory and game theory [15]. This multi-pronged approach to understanding the H theory is best described in the book published, in 1999, by Hassibi, Sayed and Kailath [16]. In this work, the authors use the following quotation from Kimura [17]: It is remarkable that H control allows such a multitude of approaches. It looks entirely different from different viewpoints. This fact certainly implies that H control is quite rich in logical structure and is versatile as an engineering tool. However, the original question of what is the theoretical core of H control remains unanswered. Indeed every fundamental notion mentioned has a method for solving the H control problem associated with it. Unfortunately, however, lengthy chains of reasoning and highly technical manipulations are their common characteristic features.
1 I.e., the case of the worst possible disturbances.
H Estimation
287
We will first define the H norm and then consider H-based filtering. Special attention will be paid to the recursive H filtering based on the Riccati equation. 8.2.1. Definition of the H norm
Input sequence u
T
Output sequence y
Figure 8.1. Transfer operator T
In Figure 8.1, the transfer operator T maps the input sequence u into the output sequence y. Its H norm is defined as follows: T
f
sup u z0
y u
2
[8.1]
2
where u 2 , the l2-norm2 of the causal sequence u(k), can be expressed as follows: u
2 2
¦ u 2 k .
[8.2]
k t0
Based on the above definition, the H norm can be viewed as the maximum energy gain from the input u to the output y. 8.2.2. H filtering Let us consider a signal s (k ) , which is estimated using observations y (k ) disturbed by an additive measurement noise v(k ) . To estimate s (k ) , we assume that it is the output of a system excited with the driving process u (k ) . This scenario
2 The l p norm of a vector combining N successive samples of the input sequence u is defined
by u
N 1 p
¦ uk
k 0
1/ p p
. Moreover, u
f
max u k .
0d k d N 1
288
Modeling, Estimation and Optimal Filtering in Signal Processing
is depicted in Figure 8.2 and can be defined by the following state space representation: x(k ) )(k , k 1) x(k 1) Gu (k ) ° ® y (k ) H x(k ) v(k ) °s (k ) L x(k ) ¯
[8.3]
Here, L is a vector which connects the state vector to the physical quantity that needs to be estimated. In our case, this quantity is the signal3 s (k ) . If sˆ(k ) is an estimation of s (k ) , the estimation error is defined as follows: e( k )
sˆ(k ) s (k )
[8.4]
-
Figure 8.2. Basic schematic representation for H estimation
The H techniques thus aim at minimizing the H norm of the system shown in Figure 8.3 and defined as follows: N 1
¦e(k) 2
Jf
sup u(k ),b(k ),e(0)
1
Q
N 1
¦u(k)
k 0
2
1
R
N 1
k 0
v(k) x(0) xˆ(0) T 301x(0) xˆ(0)
¦
[8.5]
2
k 0
The operation sup . denotes the upper limit and N is the number of available samples. Q and R are two positive weighting scalars to be tuned, in order to adjust the filter behavior. These weighting factors respectively play the same role as the 3 Generally, this quantity is a vector. The matrix L then links the state vector to the vector we wish to estimate.
H Estimation
289
variances of the driving process and the observation noise when using Kalman T filtering [30]. Moreover, the term x(0) xˆ (0) 3 01 x(0) xˆ (0) makes it possible to account for the uncertainty in the initial value of the estimation error of the state vector.
Figure 8.3. Transfer operator between the “disturbances” and the estimation error
As stated by Shaked [27] and Hassibi [15], it is often impossible to directly minimize criterion [8.5]. When it is possible, a closed-form solution of the above H filtering issue does not always exist. Thus, a “suboptimal” solution5 is often sought. This solution consists of making criterion J f smaller than the following upper limit: Jf J 2
[8.6]
Factor J of the above limit is called the “disturbance attenuation level” [2] [18] [28] [40] or “H estimation level” [27]. The choice of J will be further discussed in section 8.2.3. In the context of this chapter, we focus our attention on condition [8.6] which gives the solution of a Riccati-type quadratic equation [5] [11] [23]. The structure of the filter thus obtained is similar to the Kalman filter, thereby easing the comparisons between the Kalman and H filters. Considering equation of the state space representation [8.3], we note that the implementation of an H2-type approach is based on the assumption that the measurement noise, the driving process and the state vector are all random. Their variances are denoted Q and R respectively. If these processes are random, the linear estimator of s (k ) using the noisy observations y (0),..., y (k ) is based on the
5 Condition [8.6] is more a problem of feasiblilty than one of optimization, as stated by Vikalo et al. [34] [35]. There are several solutions. Additional H2 can also be imposed, leading to hybrid H2/ H approaches.
290
Modeling, Estimation and Optimal Filtering in Signal Processing
minimization of the mean of the error energy e(k ) . The resolution of this problem leads to the Kalman filter presented in Chapter 5. In the H approach, the additive noise, the driving process and the initial state vector need not be considered random. No statistical assumptions have thus to be made on the driving process and the observation noise. They are just assumed to have finite energy. From equation [8.5] and the representation of the H norm, the purpose of these methods is to minimize the worst possible effects of the disturbances on the estimation error. As mentioned by Grimble et al. in [14], the H optimal estimation problem consists of designing an estimator that minimizes the peak error power in the frequency domain whereas Kalman filtering aims at minimizing the average error power. The above observations justify the name “disturbance attenuation level” given to factor J . The estimation error’s energy must therefore be smaller than an upper limit which depends on a term with the following structure:
¦ > Q 1 u(k ) 2 R 1 v(k ) 2 @ x(0) xˆ (0) T 3 01 x(0) xˆ (0)
N 1 k 0
The H estimation problem can be formulated from a game theory viewpoint. In fact, given equation [8.6], let the signal processing engineer be the first player, whose task is to minimize the error on the signal estimation. His rival, the “disturbances”, aim at increasing the estimation error. The search for the estimator is thus equivalent to the resolution of the following minmax problem:
¦>
@
ª ° N1 º ½°º ªN1 2 T min« max ® s(k) sˆk J 2 « Q1 u(k) 2 R1v(k) 2 x(0) xˆ(0) 301x(0) xˆ(0) » ¾» sˆk «x0 , uk , vk ° ¬«k 0 ¼» °¿»¼ ¯k 0 ¬ [8.7]
¦
where J 2 is a scalar quantity which defines the influence of disturbances u(k ) , v(k ) and x(0) xˆ (0) 3 01 x(0) xˆ (0) . The solution of the above equation is similar to that obtained by solving a Riccati equation. It should be noted that the works presented by Yaesh in [38], Theodor in [32], Cai in [2], Zhuang in [40], etc. all concern this interpretation. T
According to Li et al. [21], the H filtering can be viewed from different angles. We will not take up frequency-domain analysis, such as the historical approaches of Grimble et al., [12] [13], the interpolation methods [8] or the J-spectral factorization [3] here because they lie outside the scope of this book.
H Estimation
291
There are other approaches for the resolution of equation [8.6]. These are generally based on the resolution of a constrained optimization problem. Thus, in control engineering, criterion [8.6] is reformulated to enable resolution using linear matrix inequality (LMI) techniques. The LMI, sometimes thought of as a stability condition, can be solved using an optimization procedure. This procedure is available to the engineer in the LMI and optimization toolboxes of Matlab. The above methods require in-depth knowledge beyond the scope of this book, and will thus not be developed further. The interested reader is referred to [10] [21] and [24] for further details. 8.2.3. Riccati equation-based recursive solution of H filtering For a given disturbance attenuation level, the estimation sˆ(k ) which respects condition [8.6] can be obtained provided that: P 1 (k 1 / k ) H T H J 2 LT L ! 0
[8.8]
where the matrix P (k 1 / k ) satisfies a Riccati recursion, which can be split into the two following relations: P(k 1 / k )
P(k / k )
) (k 1, k ) P (k / k )) (k 1, k ) T GQG T
>
P(k / k 1) P(k / k 1) H T
>
0 º ªH º ªR T « 0 J 2 » « L » P(k / k 1) H ¬ ¼ ¬ ¼
with M
@
ªH º LT M 1 « » P(k / k 1) ¬L¼
[8.9]
[8.10]
@
LT .
Here, as opposed to the Kalman filter case, matrix P(k 1 / k ) does not denote the covariance of the estimation error. According to Yaesh et al. [38], matrix P (k 1 / k ) corresponds to an upper bound of the Kalman filter error covariance matrix:
^
E x(k ) xˆ (k 1 / k ) x(k ) xˆ (k 1 / k )
T
` d P(k 1 / k )
[8.11]
292
Modeling, Estimation and Optimal Filtering in Signal Processing
Given the similarities between the Kalman and the H filters, this matrix P ( k 1 / k ) corresponds to the covariance matrix of the error in the “worst-case scenario”.
P ( k / k 1)
a priori calculation
K(k)
accounting for the measurement
xˆ ( k / k )
a posteriori calculation
P(k / k )
k=k+1
Figure 8.4. Structure of the Hf filter
If condition [8.8] holds, the a posteriori estimations of the state vector x(k ) and of y (k ) are updated as follows: yˆ (k / k )
L xˆ (k / k )
[8.12]
xˆ (k / k )
xˆ (k / k 1) K (k ) e(k )
[8.13]
xˆ (k / k 1)
where: K (k )
)(k , k 1) xˆ (k 1 / k 1)
P (k / k 1) H R H P (k / k 1) H T
is the filter gain and e( k )
y (k ) H xˆ (k / k 1)
is the innovation.
[8.14]
1
H Estimation
Model equation
x ( k 1)
Observation equation
y(k )
H (k ) x(k ) v(k )
Linear combination of x (k )
s( k )
L x( k )
Parameters to be adjusted
Choice of weighting scalars Q and R Determination of the disturbance attenuation level J
Equation for updating the estimated state vector
xˆ ( k / k 1)
Gain
Riccati recursion
Initial conditions
293
ĭ ( k 1, k ) x ( k ) G ( k )u( k )
ĭ ( k , k 1) xˆ ( k 1 / k 1)
xˆ(k / k) xˆ(k / k 1) K(k)> y(k) H(k)xˆ(k / k 1)@ K (k )
>
P ( k / k 1) H T ( k ) H ( k ) P ( k / k 1) H T ( k ) R ( k )
P(k 1 / k )
) ( k 1, k ) P ( k / k )) ( k 1, k )T GQG T
>
@
ªH º LT M 1 « » P(k ¬L¼
P(k / k )
P(k / k 1) P (k / k 1) H T
with M
0 º ªH º ªR T « 0 J 2 » « L » P( k / k 1) H ¬ ¼ ¬ ¼
>
LT
@
xˆ (0 / 0) and P (0 / 0) Table 8.1. Equations of the Hf filter
The following observations can be made on the equations of the H filter in Table 8.1: – the H has a structure similar to the Kalman filter. More specifically, both filters are defined by Riccati recursive equations. For the H filter, equation [8.10] contains the additional term for the disturbance attenuation level J . Moreover, when J tends towards infinity, the expressions of the H filter correspond to the Kalman filter [15]; – the Riccati equation [8.9-8.10] does not always have a positive definite solution, considering the very definition of matrix M and factor J contained in this
294
Modeling, Estimation and Optimal Filtering in Signal Processing
matrix. Unlike the Kalman filter, the existence of the H filter is subject to condition [8.8]; – the choice of L is crucial here because of the role it plays in the Riccati equation. As we saw in the chapter on Kalman filtering, the estimation of a linear combination of the state vector elements is a linear combination of the estimations of the state variables. Moreover, the optimal Kalman gain minimizes the linear combinations of the diagonal elements of the error’s covariance matrix. For H filtering, we lay most importance on the predefined linear combination L x(k ) of the state variables; – the value of J is of prime importance because it defines the H norm of the system being studied. In this case, this is the same as guaranteeing the following condition: P 1 (k 1 / k ) H T H J 2 LT L ! 0
Thus, while applying the H filter to the estimation of autoregressive parameters, we need to estimate the vector of the AR parameters. L thus corresponds to identity matrix I. At time instance k, the condition for the existence of the H filter is [8.8]: P 1 ( k 1 / k ) H T H J 2 I ! 0
[8.15]
According to the above equation, the eigenvalues of the J 2 I matrix, i.e. J 2 , should be smaller than the eigenvalues of the matrix P (k / k 1) 1 H T H , i.e., J 2
should be greater than the eigenvalues of P (k / k 1) 1 H T H following inequality:
J 2 ! max§¨ eig P (k / k 1) 1 H T H ©
where max eig M
1 ·
1
, leading to the
[8.16]
¸ ¹
is the largest eigenvalue of the M matrix.
Starting from this, the authors of [30] propose a recursive control of parameter J using the following updating equation:
J (k ) D max§¨ eig P(k / k 1) 1 H T H ©
where D is greater than 1.
1 ·
¸ ¹
[8.17]
H Estimation
295
8.2.4. Review of the use of H filtering in signal processing
While the H theory has been widely applied in the field of control [36] [37], its use in signal processing is subject to more and more attention. Over the past few years, the signal processing community has shown an increasing interest. To the best of our knowledge, H norm-based techniques have so far been used in areas such as adaptive noise cancellation [25] [26], filter bank design [33], equalization [6] [16] [40], parallel symbol-channel estimation [2] [18], and implementation of multi-user detectors for communication systems with inter-user interferences [34]. The authors usually justify the choice of the H techniques by the fact that they are well-adapted to situations where the statistical characteristics of the disturbances are either unknown or difficult to model and analyze. The authors also highlight the fact that the H techniques give results which are quite close to the results obtained using H2 techniques6. Several authors, notably Shen et al. [28] [29] [30] and Shimizu et al. [31], propose the use of H filtering for cases where the signal is modeled by an AR process disturbed by an additive noise. The approaches taken by these authors differ from one another in the way the AR parameters are estimated and in the way the weight is controlled. Thus, Shimizu et al. [31] take the iterative approach and propose the use of a total least squares method based on the Gao algorithm [9] for the derivation of the AR parameters from the noisy observations. On the other hand, Shen et al. [2] [30] propose a dual filtering using two H filters connected in series. The first of these two filters gives the estimation of the AR parameters’ vector Tˆ , while the second estimated the AR process using the available noisy observations y (k ) .
^y (k )`k
1... N
Hf filter
Tˆ
Hf filter
^sˆ(k )`k
1... N
Figure 8.5. The Shen et al. method [2] [30]. N is the number of available samples
The Shen approach, shown in Figure 8.5, has been applied in areas such as signal enhancement [30] and wireless communications for the simultaneous estimation of the transmission channel and the transmitted data [2]. 6 In the statistical sense of the term.
296
Modeling, Estimation and Optimal Filtering in Signal Processing
In the following section, we will consider several cases which will show that this cascade of two H filters is not necessarily the most relevant solution, and that an alternative approach consists of the use of a parallel combination of two H filters. 8.3. Estimation of AR parameters using H filtering
In this section, we first test the behavior of H filtering applied to noisy observations for the estimation of autoregressive parameters. This test will also highlight the limitations of this type of filtering. Thereafter, we propose an alternative method, consisting of the combination of two mutually interactive H estimators7. 8.3.1. H filtering for the estimation of AR parameters
Among the research efforts towards the use of H filtering for parameter estimation, the work of Grimble et al., described in [14], deals with the identification of a noiseless ARMAX8 process using H filtering. The results obtained by the authors are quite close to those obtained using the Kalman filter. Let s (k ) be a pth-order AR process defined as follows: s (k )
p
¦ a i s(k i) uk
[8.18]
i 1
where T
> a1
" ap
@T
is the vector of the AR parameters, and u (k ) the driving
process. Only the following observation of the noisy process is available: y (k )
s ( k ) b( k ) .
[8.19]
The state space representation of the system of equations [8.18]-[8.19] is thus defined as follows:
7 Dual Hf Algorithms for Signal Processing – Application to Speech Enhancement, D. Labarre, E. Grivel, M. Najim and N. Christov, IEEE Trans. on Signal Processing, vol. 55, no. 11, pp. 5195-5208. 8 For a definition of the ARMAX model, see Chapter 1.
H Estimation
T (k ) T (k 1) ° T ® y (k ) Y p (k 1) T (k ) E (k ) ° ¯T (k ) L T (k )
297
[8.20]
where L corresponds to the p u p identity matrix. This above representation is sometimes said to be “degenerated” because the driving process has a zero value. Elements Y p (k 1) and E (k ) are respectively defined as follows: Y p (k 1)
>y (k 1)
" y (k p )@T
[8.21]
and: p
E ( k ) u ( k ) b( k ) ¦ a i b( k i )
[8.22]
i 1
An Hf-based filtering technique based on representation [8.20] allows us to obtain a recursive estimation of the state vector T (k ) . Process E (k ) is colored and no prior information is available on it. Due to this lack of information, the use of the Hf filter seems more appropriate, compared to Kalman filtering, for the estimation of the AR parameters using noisy observations. To analyze the behavior of the Hf filter, we conduct the following two tests on synthetic AR signals [19]. Experiment 1: a large number of samples are available
We generate 2,000 samples of an AR process characterized by the following poles in the z-plane: p1,2
0.75 exp( j r 0.2S ) , p3,4 p5,6
0.8 exp( j r 0.4S ) and
0.85 exp( j r 0.7S ) .
The driving process u (k ) is zero-mean, white and Gaussian, with a variance
V u2
1 . The AR process is then disturbed by a zero-mean white Gaussian noise sequence. The resulting signal-to-noise ratio is equal to 10 dB.
298
Modeling, Estimation and Optimal Filtering in Signal Processing
Experiment 2: only a small number of samples are available.
For this experiment, we generate 512 samples of an AR process characterized by the following poles: p1,2
0.98 exp( j r 0.1S ) , p 3,4
p5,6
0.97 exp( j r 0.3S ) and
0.8 exp( j r 0.7S ) .
The driving process u (k ) is again zero-mean, white and Gaussian, with a variance V u2 1 . As for Experiment 1, the AR process is then disturbed by a zeromean, white Gaussian noise sequence. The resulting signal-to-noise ratio is equal to 10 dB. Method Kalman filter H filter Expected value
V u2
a1
a2
a3
a4
a5
a6
1.42
-0.51
0.54
-0.17
0.33
-0.20
0.14
(0.02)
(0.01)
(0.02)
(0.02)
(0.02)
(0.02)
(0.01)
1.28
-0.52
0.55
-0.23
0.33
-0.17
0.11
(0.02)
(0.01)
(0.02)
(0.02)
(0.02)
(0.02)
(0.01)
1
-0.71
0.82
-0.49
0.61
-0.40
0.26
Table 8.2. Experiment 1, mean values and variations of the estimated AR parameters based on 100 realizations of the additive observation noise
Tables 8.2 and 8.3 show that the estimations of the AR parameters are biased in both cases, and the results obtained are approximately the same for Kalman and H filters. This similarity can be attributed to the fact that the observation noise E (k ) in the state space representation depends on the parameters that we seek to estimate. Method Kalman filter H filter Expected value
V u2
a1
14.52
-0.99
0.07
r 0.64
r 0.03
r 0.05
14.69
-1.01
0.08
r 0.64
r 0.03
r 0.05
1
-2.06
1.84
-0.98
a2
a3
a4
a5
a6
0.30
0.01
-0.22
0.17
r 0.07
r 0.06
r 0.06
r 0.04
0.31
-0.00
-0.23
0.17
r 0.07
r 0.06
r 0.06
r 0.04
0.80
-0.97
0.58
Table 8.3. Experiment 2, mean values and variations of the estimated AR parameters based on 100 realizations of the additive observation noise
H Estimation
299
The following solution, proposed by Labarre et al., and based on two Hf working in parallel, alleviates the bias problem: – the first Hf filter enables the enhancement of the noisy observations by using the last available estimation of the AR parameters; – the second filter allows the update of the estimation, by using the last available estimation of the AR process and the innovation process, both taken from the first estimator. 8.3.2. Dual H estimation of the AR process and its parameters
Let us consider the basic block-level schematic given in Figure 8.6. To estimate the AR process, we first construct the state vector as follows:
>s(k )
x(k )
" s (k p 1)@T
[8.23]
The state space representation of the system of equations [8.18]-[8.19] is thus defined by the following set of equations: x(k ) )(k , k 1) x(k 1) Gu (k ) ° ® y ( k ) H x ( k ) b( k ) °s (k ) L x(k ) ¯
where:
) (k , k 1)
ª a1 " " a p º « 1 0 0 0 »» « « 0 % 0 # » » « 0 1 0 ¼ ¬ 0
and: H
L
GT
ª º «1 0 " 0»
» « p 1 ¬ ¼
[8.24]
300
Modeling, Estimation and Optimal Filtering in Signal Processing
Given the attenuation level J , a Hf filter allows us to obtain a filtered estimation of the signal: sˆ(k / k )
L xˆ (k / k )
H xˆ (k / k )
[8.25]
The AR parameters, however, are not known and have to be estimated. Let us assume them to be constant over an analysis frame, i.e.:
T (k ) T (k 1)
[8.26]
The filtered version of the signal can be expressed in terms of the AR parameters as follows: sˆ(k / k )
xˆ (k 1 / k 1) T (k ) H K (k ) e(k ) T
H T (k ) T (k ) eT (k )
where H T (k )
xˆ (k 1 / k 1) T and eT (k )
[8.27]
H K ( k ) e( k ) .
Taking equations [8.26] and [8.27] into account, the state space representation of the AR parameters is written as follows: T (k ) T (k 1) ° ®sˆ(k / k ) H T (k ) T (k ) eT (k ) °T (k ) L T (k ) T ¯
[8.28]
where LT is the p u p identity matrix. If we assume the attenuation level J T of the second Hf filter to be known, this second filtering operation makes it possible to update the estimation of the AR parameters. A new weight, RT , is now introduced.
H Estimation
y ( k 1)
y (k )
H filter xˆ ( k / k )
Qk 1
H filter xˆ ( k 1 / k 1)
Rk 1
H filter Tˆ( k / k )
System state at time k-2
301
Qk
Rk
H filter ˆ T ( k 1 / k 1)
RkT1
RkT
System state at time k-1
System state at time k
System state at time k+1 time
Figure 8.6. Block-level description of the dual H-based estimation
Tuning the weights R , QT and RT is a delicate process. In [30], Shen et al. use an EM approach, but do not detail how it is implemented. Thus, we propose an alternative to this approach. We saw in section 8.2.2 above that these weighting terms are equivalent to the variances of the driving process and observation noise variances in the Kalman filter. Moreover, recalling the similarity between the structures of the Kalman and Hf filters and the interpretation of the P(k 1 / k ) matrix, the weighting matrices can be adjusted using a method analogous to the dual approach presented in section 7.3. First, let us assume that the characteristics of the additive observation noise are almost time-invariant. The R matrix can thus be determined or updated in the frames where there is no signal.
302
Modeling, Estimation and Optimal Filtering in Signal Processing
To update the weight Q, we propose the following method: Q( k )
k 1 1 Q(k 1) DM (k ) D T k k
M (k )
P(k / k ) )(k , k 1) P(k 1 / k 1) )(k , k 1) T
[8.29]
where:
K (k ) e(k ) 2 K (k ) T and: D
* * * T
T
In addition, RT (k ) is defined as: RT (k )
with C (k )
HK (k )C (k ) K (k ) T H T
[8.30]
H T P (k / k 1) H T R .
In the rest of this section, we test the new approach by applying it to noisy synthetic AR data. We then compare the new method to the dual Kalman filtering presented in section 7.3. The tests conducted are the same as the two described in section 8.3.1 above. Method Dual Kalman filtering Dual H filtering Expected value
V u2
a1
a2
a3
a4
a5
a6
1.03
-0.75
0.86
-0.55
0.64
-0.41
0.26
(0.04)
(0.005)
(0.01)
(0.01)
(0.01)
(0.01)
(0.003)
1.03
-0.75
0.86
-0.55
0.63
-0.41
0.26
(0.04)
(0.005)
(0.01)
(0.01)
(0.01)
(0.01)
(0.003)
1
-0.71
0.82
-0.49
0.61
-0.40
0.26
Table 8.4. Experiment 1, mean values and variations of the estimated AR parameters based on 100 realizations of the additive observation noise
H Estimation
303
Given the results shown in Tables 8.4 and 8.5, we can draw the following conclusions: – compared to the direct application of the H filter on the noisy data, the algorithm based on the dual structure allows the reduction of the errors in the estimation of the AR parameters; – similar performances are obtained for the dual methods based either on Kalman filtering or the Hf filter. This confirms the statements of [6] [26] [33]. Method Dual Kalman filtering Dual H filtering Expected values
V D2 4.40
a1 -1.52
a2 0.48
a3 0.60
(0.81)
(0.06)
(0.13)
(0.18)
4.38
-1.48
0.42
0.61
(0.81)
(0.06)
(0.13)
(0.18
1
-2.06
1.84
-0.98
a4 -0.26
a5 -0.55
a6 0.49
(0.18)
(0.14)
(0.07)
-0.15
-0.70
0.56
(0.18)
(0.14)
(0.07)
0.80
-0.97
0.58
Table 8.5. Experiment 2, mean values and variations of the estimated AR parameters based on 100 realizations of the additive observation noise
To further improve the estimation of the signal s (k ) , we can replace the filtering operation by a H smoothing. To implement this smoothing procedure, we must find an estimation xˆ (k / M ) of the state vector x(k ) for all values of k M , taking ^ y (i)`i 1,...,M observations into account, where M is a fixed integer. It has previously been shown that the Kalman and H smoothing operations are identical [15]. In [1] [22] , the authors concentrate on fixed-delay smoothing. For this type of smoothing, we must find an estimation xˆ ( k m / k ) of the state vector xˆ ( k ) for all values of k, taking ^ y (i )`i 1,..., k into account, with m being a fixed integer. To obtain this estimation, we define the state vector as follows: x(k )
>s(k )
" s (k m)@T
[8.31]
with m t p . The AR model has an order p. To deduce the smoothed estimation of the signal sˆ(k m / k ) from the estimation xˆ ( k / k ) , we define the vector L as follows:
L
ª º «0 " 0 1» .
» « m ¬ ¼
[8.32]
304
Modeling, Estimation and Optimal Filtering in Signal Processing
In the following section, we study the applicability of the Hf filtering in real cases, by taking the example of speech signal enhancement. 8.4. Relevance of H filtering to speech enhancement
Speech signals exhibit various spectral characteristics. In fact, there can be noiselike sounds known as unvoiced sounds such as the consonants /p/, /m/, etc., pseudoperiodic sounds, i.e. voiced sounds such as the vowels /a/, /e/, etc. and mixed sounds, called fricative sounds, which are combinations of voiced and unvoiced sounds, such as the consonants /z/, /v/, etc. Moreover, the additive observation noise can be colored or white, stationary or non-stationary. Therefore, speech enhancement makes it possible to cover a wide range of signal processing cases. We consider the following two protocols [19]: Protocol 1: we will first consider the simple textbook case where both the noiseless speech signal s (k ) and the observation noise b(k ) are assumed to be
available. Once the AR parameters are determined from s (k ) , we implement an Hf filter to enhance the speech signal from the noisy observations y (k ) s (k ) b(k ) . We will also analyze the relevance of the fixed-interval smoothing operation. Protocol 2: in order to evaluate the Hf filter with respect to the errors in the estimation of the AR parameters, we first estimate the AR parameters using the noisy observations y (k ) . Compared to Protocol 1, this estimation introduces additional errors. We then implement the iterative approach described in Figure 8.7. The filtering operation can be implemented using either a Kalman filter or an Hf filter.
The initial conditions for the experiments are as follows. The signal /WAZIWAZA/9, sampled at 16 kHz, is disturbed by an additive noise. The resulting SNR is equal to 15, 10, or 5 dB. We study three different types of additive noise: Noise 1: zero-mean white Gaussian noise (100 realizations). Noise 2: colored moving average (MA) noise generated from the following zeros: 0.8 expr j 0.1S , 0.8 expr j 0.9S (100 realizations). Noise 3: noise recorded in a car going at 110 km/h (1 realization).
9 The authors would like to thank the Signal Processing Department of ENST Paris for providing us with the speech signal.
H Estimation
305
Speech enhancement procedure is based on a frame-by-frame analysis, with a frame overlap of 50%. In addition, a Hamming window is considered. For more information, see Chapter 6. The width of each frame is fixed N = 512 samples. The order of the AR process used to model the speech signal is set to 10. The quality of the enhanced signal is measured by the following three criteria: informal subjective tests (IST), SNR improvement and spectral analysis.
Estimation of AR parameters
Estimation of AR parameters
Tˆ1 Filtering
Tˆ 2
^sˆ1 (k )`1: N
^sˆ2 (k )`1:N
Filtering
^y (k )`1: N Figure 8.7. Block-level presentation of the speech enhancement method used in Protocol 2
When the additive observation noise is colored, it can be modeled by a MA process. In our simulation tests, the model order is assigned to 4 for the case “noise 2” and to 6 for “noise 3”. The state space representation required to implement the Kalman filter is described in section 1.6.9.2 of Chapter 1. For white Gaussian noises, the Kalman filter provides a maximum likelihood optimal estimation of the state vector. The attenuation level J is very high. As the results show in the tables below, the addition of Hf-based filtering or smoothing does not appreciably improve the performance.
306
Modeling, Estimation and Optimal Filtering in Signal Processing
Input SNR (dB)
Protocol 1 15
10
5
Kalman filter
3.88
5.24
6.74
H filter
3.87
5.24
6.73
Kalman smoothing
7.13
8.91
10.60
H smoothing
7.13
8.90
10.61
Protocol 2
Input SNR (dB)
15
10
5
Kalman filter
3.60
4.91
6.41
H filter
3.60
4.91
6.41
Table 8.6. Mean SNR improvement, noise 1
Protocol 1 Input SNR (dB) 15 Kalman filter 5.48 H filter 5.47 Kalman smoothing 10.3 H smoothing 10.11
10 7.35 7.37 12.69 12.45
5 9.15 9.35 14.98 14.45
Protocol 2 15 4.40 4.34
10 6.04 5.91
5 7.66 7.18
Input SNR (dB) Kalman filter H filter
Table 8.7. Mean SNR improvement, noise 2
H Estimation
a)
c)
Time (s)
b)
Time (s)
d)
Time (s)
Time (s)
Figure 8.8. Enhancement of a voiced speech segment (vowel /A/). Protocol 1 and noise 1, a) original signal, b) noisy signal (10 dB), c) signal obtained after Hf filtering, d) signal obtained after Hf smoothing
Input SNR (dB) Kalman filter H filter Kalman smoothing H smoothing
Protocol 1 15 0.62 0.59 2.06 2.37
10 1.10 1.15 2.87 3.47
5 2.05 2.11 3.84 4.56
Input SNR (dB) Kalman filter H filter
Protocol 2 15 0.59 0.59
10 1.02 1.02
5 1.76 1.75
Table 8.8. Mean SNR improvement, noise 3
307
308
Modeling, Estimation and Optimal Filtering in Signal Processing
As reported in Tables 8.7 and 8.8, H2 and Hf-based methods provide similar results when the additive observation noise is colored. According to IST criteria, the signal quality is almost the same with both methods. The use of Hf filtering or smoothing is advantageous because it does not require the parametric modeling of the additive noise and thus has a lower calculation cost.
Time (s) Time-domain representation of the noisy signal
Time (s) Time-domain representation of the signal enhanced by a Kalman filter
Time (s) Time-domain representation of the signal enhanced by a Hf filter
H Estimation
309
Frequency (KHz)
Frequency (KHz)
Time (s)
Time (s) Original signal
Frequency (KHz)
Noisy signal
Frequency (KHz)
Time (s) Kalman filtered signal
Time (s) Hf filtered signal
Figure 8.9. Example of speech enhancement. Protocol 1 and noise 1
The mutually interactive Hf filter-based approach presented in the section above can be tested for the same application and is compared to: – the Shen approach, described in [30], based on two series-connected Hf filters; – the mutually interactive Kalman filter-based method, proposed by Labarre et al. in [20]. We notice from Table 8.9 that the methods based on parallel-connected filters give higher SNR gains than the approach proposed by Shen et al. [30]. For the case of white additive noise, similar results are obtained when using mutually interactive Kalman and Hf filters. For colored noise, the structure based on the two Hf filters gives gains slightly lower than those obtained by the mutually interactive Kalman filters. However, the calculation complexity is lower because prior modeling for the measurement noise is not needed.
310
Modeling, Estimation and Optimal Filtering in Signal Processing
Input SNR (dB) Interactive Kalman filters [20] Interactive H filters Shen method [30] Input SNR (dB) Interactive Kalman filters requiring a modeling of additive noise [20] Interactive H filters, not requiring noise modeling Shen method [30]
Noise 1 15
10
5
3.27
4.29
5.47
3.28 2.67
4.33 3.33
5.56 3.94
10
5
1.35
2.14
2.75
1.30
2.08
2.65
0.56
0.95
1.41
Noise 3 15
Table 8.9. SNR improvement, comparing the dual structures noise 1 and noise 3
8.5. Conclusion
In this chapter we have presented the Hf techniques. These techniques are an attractive alternative to Kalman filtering because they do not require any assumptions about the driving process and the observation noise in the state space representation. Moreover, they provide estimations of the signals considering the “worst-case scenario”. This makes them more robust compared to the Kalman filter as concerns the uncertainties on the models and the disturbances that the system undergoes. We first analyzed the relevance of the Hf filter for the estimation of AR parameters. Thereafter, we proposed a dual approach based on two interactive Hf filters operating in parallel. The first of the two filters estimates the signal while the second allows the update of the AR parameter estimation. Finally, we studied the use of Hf filtering and Hf smoothing on an experimental level, for applications in speech enhancement. For colored observation noises, the Hf-based estimation methods benefit from the advantage that they do not require an additional model for characterizing the additive noise. In the next chapter, we will introduce particle filtering, as an alternative to the Kalman filter in nonlinear and/or non-Gaussian cases.
H Estimation
311
8.6. References [1] P. Bolzern, P. Colaneri and G. De Nicolao, “On Discrete-Time H Fixed-Lag Smoothing”, IEEE Trans. on Signal Processing, vol. 52, no. 1, pp. 132-141, January 2004. [2] J. Cai and X. Shen, J. W. Mark, “Robust Channel Estimation for OFDM Wireless Communication Systems – An H Approach”, IEEE Trans. on Wireless Communications, vol. 3, no. 6, pp. 2060-2071, November 2004. [3] P. Colaneri, M. Maroni and U. Shaked, “H Prediction and Smoothing for Discrete-Time Systems: a J-Spectral Approach”, IEEE-CDC’98 (Conference on Decision and Control), Florida, USA, vol. 3, pp. 2836-2842, December 1998. [4] J. S. Demetry, “A Note on the Nature of Optimality in the Discrete Kalman Filter”, IEEE Trans. On Automatic Control, vol. AC-15, pp. 603-604, October 1970, quoted in M. Najim, Modélisation et Identification en Traitement du Signal, Masson, 1988. [5] J. C. Doyle, K. Glover, P. P. Khargonekar and B. A. Francis, “State Space Solution to Standard H2 and H Control Problem”, IEEE Trans. on Automatic Control, vol. 34, no. 8, pp. 831-847, August 1989. [6] A.T. Erdogan, B. Hassibi and T. Kailath, “On Linear H Equalization of Communication Channels”, IEEE Trans. on Signal Processing, vol. 48, no. 11, pp. 3227-3231, November 2000. [7] E. Fogel and Y. F. Huang, “On the Value of Information in System Identification– Bounded Noise Case”, Automatica, vol. 18, no. 2, pp. 229-238, 1982. [8] M. Fu, “Interpolation Approach to H Optimal Estimation and its Interconnection to Loop Transfer Recovery”, System and Control Letters, vol. 17, no. 1, pp. 29-36, July 1991. [9] K. Gao, M. O. Ahmad and M. N. S. Swamy, “A Constrained Anti Hebbian Learning Algorithm for Total Least Squares Estimation with Application to Adaptive FIR and IIR filtering”, IEEE Trans on Circuit and System II, Analog and Digital signal processing, vol. 41, pp. 718-729, November 1994. [10] J. C. Geromel, “Optimal Linear Filtering Under Parameter Uncertainty”, IEEE Trans. on Signal Processing, vol. 47, no. 1, pp. 168-175, January 1999. [11] K. Glover and J. C. Doyle, “State Space Formulae for Stabilizing Controllers”, Systems and Control Letters, vol. 11, pp. 167-172, 1988. [12] M. J. Grimble, “H Design of Optimal Linear Filters,” Linear Circuits, Systems and Signal Processing: Theory and Applications, C. I. Byrnes, C. F. Martin, R. E. Saeks (Eds.), pp. 533-540, North Holland, Amsterdam, the Netherlands, 1988. [13] M. J. Grimble and A. E. Sayed, “Solution of the H Optimal Linear Filtering Problem for Discrete-Time Systems”, IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 38, no. 7, pp. 1092-1104, July 1990. [14] M. J. Grimble, R. Hashim and U. Shaked, “Identification Algorithms Based on Hf StateSpace Filtering Techniques”, IEEE-CDC92 (Conference on Decision and Control), Tucson, Arizona, USA, vol. 2, pp. 2287-2292, 16-18 December 1992.
312
Modeling, Estimation and Optimal Filtering in Signal Processing
[15] B. Hassibi, A. H. Sayed and T. Kailath, Indefinite-Quadratic Estimation and Control, a Unified Approach to H2 and H Theories, SIAM Edition, 1999. [16] B. Hassibi, A. T. Erdogan and T. Kailath, “MIMO Linear Equalization with an H Criterion”, IEEE Trans. on Signal Processing, vol. 54, no. 2, pp. 499-511, February 2006. [17] H. Kimura, Chain-Scattering Approach to Hf Control, Birkhauser, Boston, MA, 1997, quoted in B. Hassibi, A. H. Sayed, T. Kailath, Indefinite-Quadratic Estimation and Control, a Unified Approach to H2 and H Theories, SIAM Edition, 1999. [18] H. Kulatunga and V. Kadirkamanathan, “Multiple Hf Filter-Based Deterministic Sequence Estimation in Non-Gaussian Channels”, IEEE Signal Processing Letters, vol. 13, no. 4, pp. 185-188, April 2006. [19] D. Labarre, E. Grivel, M. Najim and N. Christov, “Relevance of H Filtering for Speech Enhancement”, IEEE-ICASSP ’05, vol. 4, pp. 169-172, March 2005. [20] D. Labarre, E. Grivel, Y. Berthoumieu, M. Najim and E. Todini, “Consistent Estimation of Autoregressive Parameters From Noisy Observations Based on Two Interacting Kalman Filters”, Signal Processing, vol. 86, no. 10, pp. 2863-2876, October 2006. [21] H. Li and M. Fu, “A Linear Inequality Approach to Robust H Filtering”, IEEE Trans. on Signal Processing, vol. 45, no. 9, pp. 2338-2350, September 1997. [22] L. Mirkin, “Continuous-Time Fixed-Lag Smoothing in an H Setting”, IEEE-CDC ’01 (Conference on Decision and Control), vol. 4, pp. 3512-3517, 2001. [23] K. M. Nagpal and P. P. Khargonekar, “Filtering and Smoothing in an H Setting”, IEEE Trans. on Automatic Control, vol. 36, no. 2, pp. 152-166, February 1991. [24] R. M. Palhares and P. L. D. Peres, “LMI Approach to the Mixed H2/H Filtering Design for Discrete-Time Uncertain Systems”, IEEE Trans. on Aerospace and Electronics Systems, vol. 37, no. 1, pp. 292-296, January 2001. [25] S. Puthusserypady and T. Ratnarajah, “H Adaptive Filters for Eye Blink Artifact Minimization From Electroencephalogram”, IEEE Signal Processing Letters, vol. 12, no. 12, pp. 816-819, December 2005. [26] B. Sayyarrodsari, B. Hassibi, J. How and A. Carrier, “An H-Optimal Alternative to the FxLMS Algorithm”, IEEE-ACC ’98 (American Control Conference), Philadelphia, USA, vol. 2, pp. 1116-1121, 24-26 June 1998. [27] U. Shaked and Y. Theodor, “H-Optimal Estimation: a Tutorial”, IEEE-CDC ’92 (Conference on Decision and Control), Tucson, Arizona, USA, vol. 2, pp. 2278-2286, 16-18 December 1992. [28] X. Shen, “Discrete H Filter Design with Application to Speech Enhancement”, IEEEICASSP ’95. pp. 1504-1507, Detroit, MI, USA, 9-12 May 1995. [29] X. Shen, L. Deng and A. Yasmin, “H-infinity Filtering for Speech Enhancement”, 4th International Conference on Spoken Language Processing, Philadelphia, PA, USA, 3-6 October 1996.
H Estimation
313
[30] X. Shen and L. Deng, “A Dynamic System Approach to Speech Enhancement Using the H Filtering Algorithm”, IEEE Trans. on Speech and Audio Processing, vol. 7, no. 4, pp. 391-399, July 1999. [31] J. Shimizu and S. K. Mitra, “H Filtering for Noise Reduction Using a Total Least Squares Estimation Approach”, ICASSP ’98, Seattle, Washington, USA, vol. 3, pp. 1645-1648, 12-15 May 1998. [32] Y. Theodor and U. Shaked, “Game Theory Approach to H-Optimal Discrete-Time Fixed-Point and Fixed-Lag Smoothing”, IEEE Trans. on Automatic Control, vol. 39, no. 9, pp. 1944-1948, September 1994. [33] H. Vikalo, B. Hassibi and T. Kailath, “Mixed H2/H Optimal Signal Reconstruction in Noisy Filter Banks”, IEEE-ICASSP ’00, Istanbul, Turkey, vol. 1, pp. 500-503, June 2000. [34] H. Vikalo, B. Hassibi and T. Kailath, “On Robust Multiuser Detection”, Thirty-Fourth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, USA, pp. 1168-72, October 2000. [35] H. Vikalo, B. Hassibi, A. T. Erdogan and T. Kailath, “On Robust Signal Reconstruction in Noisy Filter Banks”, Signal Processing, vol. 85, pp. 1-14, January 2005. [36] J. Wang, D. A. Wilson and G. D. Halikias, “H Robust-Performances Control of Decoupled Active Suspension Systems Based on LMI Method”, IEEE-ACC ’01 (American Control Conference), Arlington, VA, USA, vol. 4, pp. 2658-2663, 25-27 June 2000. [37] S. L. Xie, X. N. Zhang, J. H. Zhang and L. Yu, “Hf Robust Vibration Control of a Thin Plate Covered with a Controllable Constrained Damping Layer”, Journal of Vibration and Control, vol. 10, no. 1 pp. 115-133, January 2004. [38] I. Yaesh and U. Shaked, “Game Theory Approach to Optimal Linear State Estimation and Its Relation to the Minimum H-Norm Estimation”, IEEE Trans. on Automatic Control, vol. 37, no. 6, pp. 828-831, June 1992. [39] G. Zames, “Feedback and Optimal Sensitivity: Model Reference Transformations, Multiplicative Seminorms, and Approximate Inverses”, IEEE Trans. on Automatic Control, vol. AC-26, no. 2, pp. 301-320, April 1981. [40] W. Zhuang, “Adaptive Hf Channel Equalization for Wireless Personal Communications”, IEEE Trans. on Vehicular Technology, vol. 48, no. 1, pp. 126-136, January 1999.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Chapter 9
Introduction to Particle Filtering
Particle filtering, also known as sequential importance sampling (SIS), has found widespread use, over the past 15 years or so, as an alternative to Kalman filtering for applications in sequential Bayesian estimation. This filter provides a solution when dealing with nonlinear and/or non-Gaussian estimation. In the past three decades, the following approaches have been used: – the extended Kalman filter, described in Chapter 5; – grid-based methods [2]. However, these two approaches suffer from the drawbacks of limited accuracy. The particle filter, based on Monte Carlo sampling techniques, benefits from the following three advantages: – the estimation is no longer based on Gaussian assumption; – when the estimation problem is nonlinear, the Monte Carlo methods forego the linearization step, unlike the extended Kalman filter; – as opposed to the grid-based methods, the particle filter is flexible to the dynamics of the process being studied. Furthermore, the computational power is lower for the particle filter. This chapter introduces the reader to the Monte Carlo estimation techniques, focussing mainly on importance sampling techniques. We present the recursive version of the importance sampling filter and present ways to implement particle filtering.
316
Modeling, Estimation and Optimal Filtering in Signal Processing
9.1. Monte Carlo methods Monte Carlo methods were first used in the domain of statistical physics during the Second World War, and most notably for the conception of the atomic bomb. In those days, there was only an incomplete understanding of the phenomena taking place at the atomic scale. Practically useable results were thus obtained by turning to simulation methods which used the statistical properties of the phenomena under study. Monte Carlo methods aim at estimating the statistical properties of a random variable x . To do this, the probability density function p(x) of x is represented by a set of random samples independently and identically generated according to the density p (x) . These samples are called “particles” and are denoted: x i ~ p ( x) for i = 1, 2,…, M.
At that stage, p(x) is approximated by the following discrete distribution [3]: pˆ ( x)
1 M
M
¦ G (x xi ) ,
[9.1]
i 1
where G ( . ) is the Dirac measure. As M tends towards infinity, pˆ ( x) converges towards the probability density function p (x) i.e.:
pˆ ( x) M o o p( x) . f
[9.2]
1 . As M depicted in Figure 9.1, the distribution of the samples is relevant to the characteristics of the density p(x) .
In the first equation, we note that each sample is weighted by a factor
Introduction to Particle Filtering
317
Figure 9.1. Monte Carlo approximation of a probability density
The above approximation is very useful for the estimation of the mathematical expectation of f (x) , defined as follows: E^ f ( x)`
³ f ( x) p( x)dx ,
[9.3]
where E^ . ` denotes the mathematical expectation and f is an arbitrary function. If we replace the probability density p (x) by its approximation pˆ ( x) reported in [9.1], we obtain the following estimation: E^ f ( x)` |
1 M
M
¦ f (xi ) .
[9.4]
i 1
where { x i ~ p( x) }i=1:M.. However, it is usually impossible to consider p(x). We can alleviate this difficulty by using the importance sampling where we consider an additional density function q( x) , called the importance density, which achieves the following trade-off: – q( x) should allow us to easily generate the samples; – q( x) should be as close to the probability density function p (x) as possible i.e., the ratio between p (x) and q( x) should remain constant: p( x ) | constant. q( x )
[9.5]
318
Modeling, Estimation and Optimal Filtering in Signal Processing
Thus, the support of the proposal distribution q might include that of the target distribution p. Given a total of M samples generated according to the importance density function q(x) , i.e., x i ~ q ( x) i 1:M , mathematical expectation [9.3] of f (x)
^
`
is estimated as follows: E^ f ( x)`
³
f ( x)
p ( x) 1 q( x)dx | q( x) M
M
p( x i )
¦ f ( x i ) q( x i )
[9.6]
i 1
Let us now consider a simple example to illustrate the importance sampling. Let us say we wish to estimate the following integral: A
1
³ 0 f ( x)dx ,
[9.7]
where function f is defined as: f ( x)
ex 1 e 1
[9.8]
Theoretically, we obtain A e 2 | 0.418 . We now estimate the quantity A using e 1
the importance sampling method. Introducing the importance density q(x) , we can now estimate the integral A by: A|
1 M
M
f (xi )
¦ q( x i )
[9.9]
i 1
where { x i ~ q(x) }i=1:M . Summation [9.9] is thus carried out according to the importance density q(x) . Let us now consider the following two particular cases of the importance density: Case 1: q(x) corresponds to U (x) , the following uniform density: U ( x)
1 , 0 d x d 1 ® ¯0 , otherwise.
[9.10]
Introduction to Particle Filtering
319
Case 2: q(x) satisfies the following condition:
q( x)
2 x , 0 d x d 1 ® ¯0 , otherwise.
[9.11]
As depicted in Figure 9.2, the second case is more relevant because the f ( x) f ( x) over the [0,1] range. quotient shows narrower variations than 2x U ( x) From Figure 9.3, we see that for the second choice of the importance density function, the estimation converges more quickly towards the expected value when the number of samples is increased. The Monte Carlo approximation [9.6] is also valid for a vectorial function f. This technique forms the basis of particle filtering methods to sequentially estimate vector x(k ) of the following representation: x(k ) ® ¯ y (k )
f ( x(k 1) , u (k ) ) g ( x ( k ) , b( k ) )
[9.12]
where f and g are arbitrary functions. Variables y (k ) , u (k ) and b(k ) are, respectively, the observation, the driving process and the observation noise. The probability density functions of these vectors are assumed to be known. Given equation [9.12], this assertion means that the transition density function px (k ) x(k 1) and the likelihood p y (k ) x(k ) are known. The recursive estimation of x (k ) , which uses the importance sampling method, is called particle filtering. The principle of this filter is explained in the following section.
320
Modeling, Estimation and Optimal Filtering in Signal Processing
f ( x)
2x
f ( x) 2x
U ( x)
x a)
q( x ) U ( x )
q( x )
2x
b) Figure 9.2. a) Representation of f ( x ) and selected importance density functions, b) example giving the distribution, over the range [0,1], of the particles arbitrarily generated in the two cases
Real value of A
q( x ) U ( x ) q( x )
2x
Theoretical value of A
M, number of generated particles
Figure 9.3. Example of the estimation of integral function A for different numbers of particles
Introduction to Particle Filtering
321
9.2. Sequential importance sampling filter
For the sake of clarity and simplicity in the representations, the time-domain argument k will be placed in the subscript of the equations in this section. Thus, the notations of the state vector x(k ) , the observation y (k ) and the likelihood p y (k ) x(k ) are respectively modified to x k , y k and p y k x k . Let h be an arbitrary function. The collection of the last k available observations is denoted Yk ^y i `i 1,...,k , while the state process is denoted X k ^x i `i 1,..., k . Particle filtering aims at recursively estimating the a posteriori filtering density function px k Yk . This distribution is obtained by marginalizing out the density p X k Yk . The key idea of particle filters is to take advantage of importance sampling in order to estimate the following mathematical expectation: E ^h( x k )`
³ h ( x k ) p x k
Yk d x k
³ h( x k ) p X k
Yk dX k .
[9.13]
For this purpose, the mathematical expectation of h( x k ) can be expressed as follows, for a given importance density function1 q( x k Yk ) [1]: E ^h( x k )`
pX
Y
³ h( x k ) q X kk Ykk q X k
Yk dX k
[9.14]
By applying the Bayes rule: p X k Yk
p Yk X k p( X k ) , p(Yk )
[9.15]
we can rewrite the mathematical expectation of h( x k ) as follows: E ^h( x k )`
where wk ( X k )
1 h( x k ) wk X k q X k Yk dX k p Yk
³
[9.16]
p Yk X k p( X k ) . q X k Yk
1 The choice of this importance density function will be discussed further in the section
reviewing existing particle filters (see section 9.3.3).
322
Modeling, Estimation and Optimal Filtering in Signal Processing
In the above equation, the probability density function p(Yk ) can alternatively be written as follows, by making use of wk ( X k ) : p(Yk )
³ pYk
X k p( X k )dX k
³ pYk
X k p( X k )
³ wk ( X k ) q X k
q X k Yk dX k q X k Yk
[9.17]
Yk dX k
Finally, we obtain the following expression for the mathematical expectation of h( x k ) :
³ h( x k )wk X k qX k Yk dX k ³ wk X k q X k Yk dX k
E ^h( x k )`
[9.18]
E q ^h( x k ) wk X k ` E q ^wk X k `
where Eq ^ .
`
represents the mathematical expectation, calculated using the
importance density function q( X k Yk ) . Using a Monte Carlo approximation of equation [9.14], we obtain the following estimation: E ^h( x k )` |
where
^X
i k
1 M
¦ w~ k X ki hx ik M
[9.19]
i 1
`
~ q ( X k Yk ) i
1:M
and
~ Xi w k k
1 ¦ w X M wk X ki M
k
are normalized
i k
i 1
weights. Xk
This approach requires the resampling of the whole trajectory ^x i `i 1,...,k at each instant. This can be done by using a sequential sampling
scheme which consists of choosing q ( X k Yk ) such that q ( X k 1 Yk 1 ) is its
Introduction to Particle Filtering
323
marginal distribution at instant k-1. Mathematically, this can be written as follows: q ( X k Yk )
q ( x k Yk , x k 1 )q( X k 1 Yk 1 ).
Thus, instead of generating the whole trajectory at each instant, this alternative allows for the following propagation of the particles: X ki
( X ki 1 , x ik )
^
[9.20]
`
with x ik ~ q( x k Yk , X ki 1 )
i 1:M
The weighting factors ^wk `i
. The previous trajectories are left unchanged. 1:M
are known as the importance weights and are
calculated using the following recursive relation [1]:
wk
wk 1
p( y k x ik ) p ( x ik x ik 1 ) q( x ik x ik 1 , Yk )
[9.21]
In the above equation, the likelihood p( y k x ik ) , the transition density function p( x ik x ik 1 ) and the importance density function q( x ik x ik 1 , Yk ) are known.
The SIS filter is based on equations [9.19] and [9.21]. In this filter, the mathematical expectation E ^h( x k )` is estimated in the following steps: – M offspring particles are sampled using the importance density function q( x ik x ik 1 , Yk ) ;
– weights wk are associated with each particle according to equation [9.21]; – E^h( x k )` is estimated according to equation [9.19]. However, such algorithms are known to suffer from degeneracy issues after a few iterations. In fact, after a few iterations, a unique particle characterizes the a posteriori density of the state vector. The normalized weight of this particle is then very close to 1 while all the other weighting factors are close to 0, as depicted in Figure 9.4.
324
Modeling, Estimation and Optimal Filtering in Signal Processing
p ( x k Yk )
i particles x k
xk
xk Figure 9.4. Pictographic illustration of the degeneracy
A suitable measure of the degeneracy is the effective sample size Meff, proposed by Kong et al. [9]. We cannot evaluate Meff exactly, but an estimate is given by: M eff
1
¦ M
i 1
~ xi w k k
2
[9.22]
1 , the effective size of the particle M cloud takes its maximal value, M. This is the case in the ideal Monte Carlo sampling of equation [9.4]. If one of the weights has a value of 1 and the others are all 0, M eff assumes its minimal value, equal to 1. Thus, the degeneracy increases as the
If all the normalized weights are equal to
effective size of the particle cloud decreases. This inverse proportionality stems from the fact that the variance of the importance weights increases with successive iterations2 [4]. The addition of a resampling step to the algorithm alleviates this problem of degeneracy. This additional resampling step results in maintaining and/or multiplying particles with higher weights and eliminating particles with low weights. The weights of all the 1 particles thus generated ~ . This process is depicted in Figure 9.5. x ik are equal to M
2 Thus, the probability of one of them having the value 1.
Introduction to Particle Filtering
^x
8 88
i k
8
325
~ ( xi ) ,w k k
`
~i 1 ½ ®x k , ¾ M¿ ¯
eliminated preserved and multiplied
Figure 9.5. The resampling step
The most popular algorithm remains that presented in [1]. Other alternatives, such as residual or stratified resampling, are presented in [10]. The sequential importance sampling filter with the resampling stage is divided into the following steps: –M q( x ik
particles
x ik 1 , Yk
are
sampled
using
the
importance
density
function
);
– weights wk ( x ik ) are associated with each particle according to equation [9.21]; – E^h( x k )` is estimated using equation [9.19]; threshold – if M eff M eff , the resampling step is used.
The choice of the importance density function defines the performance of the filter. According to this choice, different implementations of the particle filtering can be obtained.
326
Modeling, Estimation and Optimal Filtering in Signal Processing
9.3. Review of existing particle filtering techniques
The first particle filtering method, introduced by Gordon et al. in 1993, is based on the following items [7]: – the importance density function is chosen to be equal to the transition density: q( x ik x ik 1 , Yk )
p( x ik x ik 1 )
[9.23]
This modifies the weight updating equation [9.21] to: wk ( x ik )
wk 1 ( x ik 1 ) p( y k x ik )
[9.24]
– a resampling step is used at each iteration. Thus all the weights are reinitialized 1 to , making them directly proportional to the likelihood function: M wk ( x ik ) v p( y k x ik ) .
[9.25]
The algorithm thus obtained, called the sampling importance resampling (SIR) filter, proceeds according to the following four successive steps: – M particles are sampled using the importance density function p( x ik x ik 1 ) ; – weights wk ( x ik )
p ( y k x ik ) are assigned to each particle;
– E^h( x k )` is estimated using equation [9.19]; – a resampling step is applied. These steps are depicted in Figure 9.6.
Introduction to Particle Filtering
p( x ik x ik 1 )
327
Prediction
i 1½ ®x k , ¾ M¿ ¯
p( yk x ik )
Correction
{ x , p( y i k
k
{
yk
x ik )
Resampling
~i 1 ½ ®x k , ¾ M¿ ¯
p( x ik 1 x ik )
Prediction
1½ i ® x k 1 , ¾ M¿ ¯
Figure 9.6. Illustration of one iteration of the SIR filter
The main advantage of this method is the simplicity in its implementation. However, it suffers from two major shortcomings: – the resampling step is conducted on a discrete population of samples. Thus, an excessive use of this step might lead to a loss of diversity in the particle cloud. This phenomenon is accentuated in the presence of a measurement noise or a state noise with low variance; – the importance density function is independent of the new observation y k . Thus, the samples are not necessarily located in regions of high likelihood value of the state space, and the particle filter is especially sensitive to outliers. As Figure 9.7 shows, if the particle cloud does not cover the entire likelihood range, the weights are redundant and do not reflect the a posteriori density. Some particle filtering strategies have been proposed to overcome these limitations.
328
Modeling, Estimation and Optimal Filtering in Signal Processing
i
p ( x ik x ik 1 )
p( y k x k )
i
xk Figure 9.7. Difficulty arising from the choice of the importance density: there are no particles in the likelihood range
The regularized particle filter (RPF) [11] improves the fifth step, i.e. the resampling of the SIR filter by using a continuous approximation of the a posteriori density. This approximation is written as follows:
pˆ ( x k Yk )
1 M
¦ w~ k x ik K T x k x ik M
[9.26]
i 1
~ are the normalized weights, K x where w k T k
(det A) 1
A 1 x k
) is a zeroT T nx mean density function with a variance equal to 1, n x is the size of the state K(
vector x k , and T is a scaling factor. Moreover, matrix A satisfies S
AAT ,
where S is the empirical covariance matrix of the particle cloud. The function K ( . ) is chosen in such a way as to minimize the quadratic error between the a posteriori probability density function and the estimated density function given by: E
^³ pˆ x
`
Yk px k Yk d x k . 2
k
[9.27]
Introduction to Particle Filtering
329
When all the weights are equal, the Epanechnikov kernel is the optimal choice for K . [11]. This kernel is defined as follows: K opt x k
nx 2 § ° 2c ¨©1 x k ® nx °0 otherwise, ¯
2
·¸ , if x 1 k ¹
where c nx is the area of the unit circle when n x when n x
[9.28]
2 , the volume of the unit sphere
3 and, more generally, the volume of the hypersphere of dimension n x .
Moreover, for Gaussian vectors u k and v k , the optimal scaling factor Topt has been shown to be equal to:
Topt
ª 1 n x 4 2 S «8 «¬ c nx
nx
1
1
º nx 4 n 4 M x » »¼
[9.29]
Figure 9.8 illustrates the principle of the continuous approximation of density pˆ ( x k Yk ) . The RPF consists of the following successive steps:
– M particles are sampled using importance density function p( x ik x ik 1 ) ; – weight wk ( x ik )
p ( y k x ik ) is applied to each particle;
– the weights are normalized; – E ^h( x k )` is estimated using equation [9.19]; – a resampling technique is implemented based on probability density function pˆ ( x k Yk ) . This smoothed approximation is obtained by convolution of the discrete
particle approximation by the chosen regularization kernel.
330
Modeling, Estimation and Optimal Filtering in Signal Processing
wk x 3k
wk x 2k
wk x1k
1
xk
4 k
k
2
3
4
xk xk
wk x 7k
5 k
k
xk
w x w x wk x 6k
5
6
xk
7
xk xk xk
(a) estimated density pˆ ( x k Yk )
K T x k x ik
1
xk
2
xk
3
4
xk xk
5
6
7
xk xk xk
xk
(b) Figure 9.8. Discrete (a) and continuous (b) representations of the a posteriori density for the resampling step in the RPF
Introduction to Particle Filtering
331
However, when the number of particles tends towards infinity, the estimated continuous density pˆ ( x k Yk ) does not necessarily converge towards the a posteriori
density px k Yk . We notice in the above figure that the second maximum of
pˆ ( x k Yk ) no longer corresponds to the last particle xik . Moreover, this example does
not address the difficulty in choosing the importance density. According to Doucet et al. [4], choosing the following importance density: q( x ik x ik 1 , Yk )
p( x ik x ik 1 , Yk )
[9.30]
has the advantage of minimizing the variance of the weights. In this case, the density is said to be optimal. Unfortunately, an analytical expression of this distribution cannot usually be obtained, making it difficult to simulate. To solve these problems, alternative approaches have been put forward. The solution proposed by Pitt et al. [12] consists of propagating only particles with high predictive likelihood. Therefore, the sampling space is increased so that not only the particle offspring but also the indexes of the particles to be propagated are chosen randomly [1] [13].
p ( x ik x ik 1 )
p( yk xik )
x ik Figure 9.9. The ASIR filter is the same as generating a particle cloud after accounting for the new observation
The ASIR filter entirely avoids having to choose the importance density and benefits, moreover, from the generated particles being naturally close to the likelihood.
332
Modeling, Estimation and Optimal Filtering in Signal Processing
Another alternative consists of constructing the importance density from a bank of EKF or unscented Kalman filters (UKF)3. An EKF or UKF is run for each i particle, yielding a Gaussian estimation N ( xˆ k , Pki ) of the a posteriori probability density function of the state vector. This estimation is taken as the importance density for the corresponding particle. The M importance densities thus obtained are then used in a SIR filter. This is depicted in Figure 9.10. The final filter is known as the EPF or the UPF, depending on whether it uses the EKF or UKF [14]. Its operation is described by the following five steps: – an EKF or UKF gives the mean estimation xˆ ik of the particle x ik and the associated covariance matrix Pki ; i – a particle x ik is generated according to each importance density N ( xˆ k , Pki ) ;
– a weight wk ( x ik ) is applied to each particle x ik according to equation [9.21]; – E^h( x k )` is estimated according to equation [9.19]; – a resampling step is applied.
yk
xˆ 1k 1 Pk11
EKF/ UKF
xˆ 1k 1
Importance density no. 1
Pk11
N ( xˆ 1k , Pk1 )
EKF/ UKF
^xˆ , w ( xˆ )` 1 k
k
1 k
SIR
zk
xˆ kM1 PkM1
yk
xˆ kM1 PkM1
Importance density no. M
N ( xˆ kM , PkM )
Figure 9.10. EPF or UPF filtering
3 For a detailed description of the UKF filter, see Appendix K.
^xˆ
M k
, wk ( xˆ kM )
`
Introduction to Particle Filtering
333
The advantage of the structures of the EPF and UPF filters is that they allow us to set the importance sampling after accounting for the new observation. For the specific Gaussian case, the variance of the weights is minimized. However, the major disadvantage of these filters is their high calculation cost because M parallelconnected Kalman filters are needed. In addition to its principal use for guiding or tracking objects [14], the particle filter has also been applied to various domains including analysis and enhancement of audio and speech signals based on variable parameter AR models [6] [16], the simultaneous estimation of the transmission channel, characterized by an AR model, and its AR parameters, for use in digital communications [8], and the analysis of audio signals through frequency tracking [5] [15], etc. 9.4. References [1] M. S. Arulampalam, S. Maskell, N. Gordon and T. Clapp, “A Tutorial on Particle Filters for Online Nonlinear/ Non-Gaussian Bayesian Tracking”, IEEE Trans. on Signal Processing, vol. 50, no. 2, pp. 174-188, February 2002. [2] N. Bergman, Recursive Bayesian estimation: Navigation and Tracking Applications, PhD thesis, Linkoping University, 1999. [3] O. Cappé and E. Moulines, T. Rydén, Inference in Hidden Markov Models, Springer Series in Statistics, 2005. [4] A. Doucet, S. Godsill and C. Andrieu, “On Sequential Monte Carlo Sampling Methods for Bayesian Filtering”, Statistics and Computing, vol. 10, no. 3, pp. 197-208, 2000. [5] C. Dubois, M. Davy and J. Idier, “Tracking of Time-Frequency Components Using Particle Filtering”, IEEE-ICASSP ’05, Philadelphia, USA, vol. 4, pp. 9-12, 18-23 March 2005. [6] W. Fong, S. J. Godsill, A. Doucet and M. West, “Monte Carlo Smoothing with Application to Audio Signal Enhancement”, IEEE Trans. on Signal Processing, vol. 50, no. 2, pp. 438-449, February 2002. [7] N. J. Gordon, D. J. Salmond and A. F. M. Smith, “Novel Approach to Nonlinear/NonGaussian Bayesian State Estimation”, IEE Proc.-F, vol. 140, no. 2, pp. 107-113, 1993. [8] Y. Huang and P. M. Djuric, “A Blind Particle Filtering Detector of Signals Transmitted Over Flat Fading Channels”, IEEE Trans. on Signal Processing, vol. 52, no. 7, pp. 1891-1900, July 2004. [9] A. Kong, J. S. Liu and W. H. Wong, “Sequential Imputations and Bayesian Missing Data Problems”, Journal of the American Statistical Association, vol. 89, no. 425, pp. 278-288, 1994. [10] J. S. Liu and R. Chen, “Sequential Monte Carlo Methods for Dynamical Systems”, Journal of the American Statistical Association, vol. 93, pp. 1032-1044, 1998.
334
Modeling, Estimation and Optimal Filtering in Signal Processing
[11] C. Musso, N. Oudjane and F. Legland, “Improving Regularised Particle Filters”, in Sequential Monte Carlo Methods in Practice, A. Doucet, N. de Freitas, N. J. Gordon, eds., New York, Spinger, 2001. [12] M. K. Pitt and N. Shepard, “Filtering via Simulation: Auxiliary Particle Filters”, Journal of the American Statistical Association, vol. 94, no. 446, pp. 590-599, 1999. [13] M. K. Pitt and N. Shepard, “Auxiliary Variable Based Particle Filters”, in Sequential Monte Carlo Methods in Practice, A. Doucet, N. de Freitas, N. J. Gordon, eds., New York, Springer, 2001. [14] B. Ristic, S. Arulampalam and N. Gordon, Beyond the Kalman Filter: Particle Filters for Tracking Applications, Artech House, 2004. [15] Y. Shi and E. Chang, “Spectrogram-Based Formant Tracking via Particle Filters”, IEEEICASSP ’03, Hong Kong, vol. 1, pp. 168-171, April 2003. [16] J. Vermaak, C. Andrieu, A. Doucet and S. J. Godsill, “Particle Methods for Bayesian Modeling and Enhancement of Speech Signals”, IEEE Trans. on Speech and Audio Processing, vol. 10, no. 3, pp. 173-185, March 2002.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Appendix A
Karhunen Loeve Transform
Let us consider N consecutive samples of a random sequence ^y (k )`k
0,..., N 1 .
Our purpose is to decompose this sequence into a set of independent orthonormal functions
y (k )
¦ k i
[A.1]
i 1
As a preamble, let us define the various notations we will use in the rest of this analysis. First, the functions
¦ < j k
G ij
[A.2]
k 0
where G ij denotes the Kronecker symbol. It is equal to 1 when i = j and 0 otherwise. The above equation can equivalently be written in the matrix form as follows: < jT < i*
G ij
[A.3]
336
Modeling, Estimation and Optimal Filtering in Signal Processing
with: ª
[A.4]
where < i * denotes the complex conjugate of < i . Moreover, the projection coefficients k i satisfy the following relation:
ki
N 1
¦
[A.5]
k 0
The above equation can be written as follows: ki
< i *T y
[A.6]
y
ª y 0 º « y 1 » » « « # » » « ¬ y N 1 ¼
[A.7]
where:
The coefficients k i are random variables because the sequence ^y (k )`k
0,..., N 1
is itself random, but the functions
where 9 z 0 .
9 2G ij
[A.8]
Appendices
337
Combining equations [A.6] and [A.8], we obtain:
º ª E «k i k j * » ¼ ¬
º ª < i T * E « yy *T » < j ¼ ¬
9 2G ij
[A.9]
where R y denotes the NuN autocorrelation matrix of process y. At that stage, let us introduce the following vector: uj
Ry < j
[A.10]
Equation [A.9] shows that the vector u j is orthogonal to all vectors < i for i z j . It is therefore necessarily collinear with < j , and thus we have:
uj
Ry < j
O j < j for all j = 1,…, N.
[A.11]
The Karhunen-Loeve transform thus makes it possible to decompose the vector of N samples of the random process into the set of eigenvectors of the process vector autocorrelation matrix. In the following some brief observations on the interpretation of the eigenvalues of the correlation matrix are provided. Let us suppose that y (k ) is approximated using a decomposition of M < N functions. The error sequence of such an approximation is thus defined as follows:
H k
y k yˆ (k )
y k
M
¦ k i
[A.12]
i 1
This approximation is based on the minimization of the following error: ;
^ `
½° ° E ® ¦ H 2 n ¾ E H *T H °¿ °¯ n 0 ° § N · ·§ N E ® ¨ ¦ k i * < i *T ¸ ¨ ¦ k j < j ¸ ¨ ¸¨ ¸ °¯ © i M 1 ¹ © j M 1 ¹ N 1
½° ¾ °¿
¦ E^ k i N
i M 1
2
`
[A.13]
by introducing the column vector y of the N samples of sequence y expressed by:
338
Modeling, Estimation and Optimal Filtering in Signal Processing M
N
i 1
i M 1
¦ ki < i ¦ ki < i
y
yˆ H
[A.14]
and using equation [A.6], this equation [A.13] is modified to: N
¦ < i *T R y < i
;
[A.15]
i M 1
To express the mean square error [A.13] in terms of the process vector correlation matrix, we use equation [A.11] which stipulates that < i is an eigenvector of R y . N
¦ < i *T R y < i
;
i M 1
N
¦ < i *T Oi < i
i M 1
N
¦ Oi
[A.16]
i M 1
Thus, the higher the eigenvalue is, the greater the role played by the associated eigenvector when carrying out the decomposition of the vector.
Application Let us consider a signal modeled by a complex exponential function A exp jT 0 k with normalized angular frequency T 0 . Then, let this signal be
disturbed by a zero-mean white Gaussian noise bk with variance V 2 . In addition, this noise is uncorrelated with the signal: y k
with A
As k bk
A exp jT 0 k bk
[A.17]
A exp( jI ) and where phase I is uniformly distributed over the range [0
2ʌ]. The autocorrelation matrix of the associated Nu1 column vector satisfies the following:
Appendices
Ry
° ° E® ° ° ¯
ª y 0 º « y 1 » « » > y 0 « » # « » 1 y N ¬ ¼
^ `
E A
2
° ° ® ° ° ¯
339
½ ° ° y N 1 @ * ¾ ° ° ¿
y 1 "
1 º ª » « exp( j T ) 0 » >1 exp( jT ) " « 0 » « # » « ¬ exp( j ( N 1)T 0 ) ¼
exp( j ( N 1)T 0 ) @ *
½ ° ° 2 ¾ V IN ° ° ¿
[A.18] We can now determine the eigenvalues of the correlation matrix R y . In fact, given that:
º ª 1 » « * ª1 exp( jT ) " exp( j ( N 1)T )º « exp( jT 0 ) » 0 0 » « » «¬ ¼ # » « «exp( j ( N 1)T 0 )» ¼ ¬
N
[A.19]
we obtain: 1 ª º « » exp( ) j T 0 » Ry « « » # « » exp( ( 1 ) ) j N T 0 ¼ ¬
E ^ A `N V 2
^ `N V ! V
This shows that E A
2
2
2
2
1 ª º « » exp( ) j T 0 « » « » # « » exp( ( 1 ) ) j N T 0 ¼ ¬
[A.20]
! 0 is an eigenvalue of R y associated
with the eigenvector < 1 >1 exp( jT 0 ) " exp( j ( N 1)T 0 )@ T . This eigenvector defines the signal subspace. All the other eigenvectors < j of R y are thus orthogonal to <1 .
340
Modeling, Estimation and Optimal Filtering in Signal Processing
We thus have:
Ry < j
º ª » « ° 2 ½° T * 2 «E ® A ¾ <1 <1 V I N » < j °¿ » « °¯ ¼ ¬ 2 ½ ° ° E ® A ¾ < 1 T < 1* < j V 2 < j 0 V 2 < j °¯ °¿
[A.21]
From the above equation, we can conclude that the N-1 remaining eigenvalues are identical to one another and equal to the additive noise variance. Therefore, we can decompose the observation space into two subspaces: the “signal” subspace and the “noise” subspace.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Appendix B
Subspace Decomposition for Spectral Analysis
M ®T i ¯
Let us consider the case of a random stationary process y(k) defined as a sum of complex exponentials of the normalized angular frequency fi ½ i.e.: 2S ¾ f ech ¿ i 1,..., M M
y (k )
¦ Ai exp jT i k b(k )
[B.1]
i 1
with Ai
Ai exp( jIi ) where the phases Ii are uniformly distributed over the
range [0 2ʌ] and independent of each other. Process b(k ) is a zero-mean stationary white Gaussian noise with a variance V 2 . The autocorrelation of y(k) is thus: r yy (W )
M
¦ M i exp jT iW V 2G (W )
[B.2]
i 1
where M i denotes the variance of Ai . Let us now concatenate p+1 consecutive samples of the process in a vector and define the corresponding autocorrelation matrix as follows:
342
Modeling, Estimation and Optimal Filtering in Signal Processing
R y ( p 1)
ª r yy (0) r yy (1) « r (1) r (0) yy « yy « # % « « # «r yy ( p ) " ¬
" % % % "
r yy ( p )º # »» % # » » % r yy (1) » r yy (1) r yy (0) »¼ "
[B.3]
Defining matrix S i as follows: Si
ª1 exp( jT ) exp( j 2T ) " exp( jpT )º i i i » «¬ ¼
T
.
[B.4]
and combining equations [B.2], [B.3] and [B.4], we can alternatively express the process vector autocorrelation matrix by carrying out an eigenvalue decomposition as follows: M
Ry
¦ M i S i S i H V 2 I p 1 i 1
PDP
1
>< 1
" < N 1
" ªO1 0 «0 % % < N @« « # % O N 1 « 0 ¬0 "
0 º # »» 1 P 0 » » ON ¼
[B.5]
Using the eigenvalues Oi of the process vector autocorrelation matrix and arranging them in decreasing order, the observation space can be split into two subspaces: – the first, associated with the signal, i.e. the complex exponentials, characterized by the eigenvectors related to the highest eigenvalues; – the second, associated with the additive noise, characterized by the eigenvalues equal to the variance of the noise. When studying the signal’s (M+1)u(M+1) correlation matrix, as first proposed by Pisarenko in 1973, the noise subspace is characterized by the eigenvector < M 1
><M 1 (1) 2
" <M 1 ( M 1)@T
corresponding to the
smallest eigenvalue O M 1 V . At that stage, the frequency components of the M complex exponential can be easily characterized because all vectors
Appendices
ª1 exp( jT ) exp( j 2T ) " exp( jpT )º i i i » «¬ ¼ < M 1 . This means that: Si
< M 1 S i
< M 1T S i *
T
343
are necessarily orthogonal to
M
¦ <M 1 §¨© k 1·¸¹ exp( jkT i )
[B.6]
0
k 0
The above condition can alternatively be expressed as follows: M
¦ <M 1 k 1 z k
k 0
[B.7]
0 z exp jT i
Consequently, if we now introduce the so-called “eigenfilter” E M 1 z , whose coefficients correspond to the elements of the eigenvector < M 1 , condition [B.7] consists of finding the roots of the “eigenfilter” on the unit circle in the z-plane. EM 1 z z
M
exp jT i
¦ <M 1k 1 z k
k 0
[B.8]
0 z exp jT i
Thus, we define the “pseudo-spectrum” Pˆ pisarenko §¨ exp( jT ) ·¸ as follows: ¹ ©
Pˆ pisarenko exp( jT )
1 E M 1 exp( jT )
1 2
E M 1 z E M 1* §¨ 1 * ·¸ © z ¹
[B.9] z exp( jT )
This subspace decomposition is the central idea behind Pisarenko’s “harmonic decomposition” method. The results are significantly improved by using the MUSIC method, which was developed by Kumaresan and Tufts who introduced a minimumnorm procedure [1]. The MUSIC procedure consists of using an autocorrelation correlation matrix of the process, with a size N always greater than M+1. The noise subspace is then characterized by eigenvectors associated with eigenvalues O M 1 O M 2 ... O N V 2 whereas the signal subspace is characterized by the M predominant eigenvalues. The “pseudo-spectrum”, denoted PMU , is thus expressed as a function of eigenfilters ^E i z `i M 1,..., N as follows:
344
Modeling, Estimation and Optimal Filtering in Signal Processing
PMU exp( jT )
1 § ¨ ¨ ©i
· E i z E i ¨ 1 * ·¸ ¸¸ © z ¹¹ M 1 N
¦
[B.10]
*§
z exp( jT )
The main difficulty in the implementation of the above methods lies in the determination of the dominant eigenvalues. References [1] R. Kumaresan and D. W. Tufts, “Estimating the Angles of Arrival of Multiple Plane Waves”, IEEE Trans. on Aerospace and Electronic System, AES vol. 19, no. 1, January 1983.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Appendix C
Subspace Decomposition Applied to Speech Enhancement
The subspace method can potentially be used in the field of speech enhancement using a single microphone. In this context, our purpose is to estimate the signal s k using observations disturbed by a white additive noise bk . Instead of directly operating on the correlation matrix, an alternative procedure is to carry out the singular value decomposotion of the noisy observations’ Hankel matrix. This algorithm consists of three steps: 1) First we construct the L u M Hankel matrix H y using the noisy data
^yk `k
1,..., N
Hy
^sk bk `
k 1,..., N
as follows:
y 2 " y M º ª y1 « y 2 # # »» « « # # # » « » y L " y N 1 » « y L 1 «¬ y L y L 1 " y N »¼
[C.1]
All the elements of the anti-diagonal in the Hankel matrix are equal to one another. L and M are such that L M N 1 . Moreover, we choose L !! M .
346
Modeling, Estimation and Optimal Filtering in Signal Processing
2) Then, the least squares estimate of the signal subspace, i.e. H s LS , can be obtained only by considering the K dominant singular values of the observation Hankel matrix H y . The criterion to be considered for this step is:
H sLS
where
of rank K
H
F
2
H y H sLS
min
[C.2]
F
is the Frobenius1 norm of matrix H .
Given that: U6V T
Hy
>U 1,K
ª6 s U K 1, M « 1, K ¬« 0
@
where U R LuM , 6 R M uM V K 1,M R
6
M uM K
ª6 1s, K « «¬ 0
º ª V1, K T º » »« 6 bK 1, M ¼» ¬«V K 1, M T ¼» 0
and V R M uM , and where U 1, K R Lu K
[C.3]
and
and:
0 6 bK 1, M
º » »¼
ªV 1 0 " «0 % % « « # % VK « % «# «# « ¬« 0 " "
"
0 º # »» # » ». % # » % 0 » » 0 V M ¼» "
%
V K 1 % "
[C.4]
we obtain:
H s LS
U 1, K 61s, K V1T, K
[C.5]
We can also obtain a minimum-variance estimation of H s , denoted H s MV [1]. To do this, we search for the best estimation of H s that can be obtained using a 1 If h i , j is the coefficient of the i
th
row and jth column in matrix H of size L u M , the Frobenius norm of matrix H verifies the following relation: 1/ 2
H
F
§ L M · ¨ hi, j 2 ¸ ¦ ¦ ¨i 1 j 1 ¸ © ¹
traceH H T
1/ 2
Appendices
347
linear combination of the observations H y . This approach consists of finding a matrix such that: min
X size M u M
H y X Hs
2
[C.6]
F
Equivalently: min
X size M u M
>traceH X
T
y
H y X H sT H s 2 H y X
T H s @
[C.7]
Considering criterion [C.6] and equation [C.7], this leads to finding X such that: w trace
H X
T
y
H y X H sT H s 2 H y X wX
T H s
0
[C.8]
thus: 2H y T H y X 2H y T H s
[C.9]
0
Thus: X
H
y
T
Hy
1
H yT H s
[C.10]
The minimum-variance estimation H s MV of H s satisfies the following: H s MV
HyX
H y H yT H y
1
H yT H s
[C.11]
It should be noted that this is the orthogonal projection of H s on the column space of H y . Thus, using the decomposition of H y into singular values, we obtain:
H s MV
U 1, K FMV 61s, K V1, K T
[C.12]
348
Modeling, Estimation and Optimal Filtering in Signal Processing
where:
FMV
ª V noise 2 «1 V 12 « « « 0 « « # « « 0 « ¬
0 1
V noise 2 V 22
% "
º » » » » % # » » % 0 2» V 0 1 noise2 » V K »¼ "
0
[C.13]
3) Finally, the averaging of the anti-diagonal elements of matrices H s LS or H s MV is carried out to restore the structure of the signal data’s Hankel matrix and to obtain an estimation of the signal.
The enhanced signals might present “musical” noise in which case perceptual criteria can be added to improve the quality of the enhanced signal [2] [3] [4] [5]. References [1] B. De Moor, “The Singular Value Decomposition and Long Short Spaces of Noisy Matrices”, IEEE Trans. on Signal Processing, vol. 41, no. 9, September 1993. [2] Y. Ephraim and H. L. Van Trees, “A Signal Subspace Approach for Speech Enhancement”, IEEE Trans. Speech Audio Processing, vol. 3, no. 4, pp. 251-266, July 1995. [3] M. Klein and P. Kabal, “Signal Subspace Speech Enhancement with Perceptual Post Filtering”, IEEE-ICASSP ’02, Orlando, Florida, USA, vol. no. 1, pp. 537-540, 13-17 May 2002. [4] U. Mittal and N. Phamdo “Signal/Noise KLT Based Approach for Enhancing Speech Degraded by Colored Noise” IEEE Trans. on Speech Audio Processing, vol. 8, pp. 159167, March 2000. [5] H. Yi and P. C. Loizou, “A Generalized Subspace approach for Enhancing Speech Corrupted by Colored Noise”, IEEE Trans. on Speech and Audio Processing, vol. 11, no. 4, July 2003.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Appendix D
From AR Parameters to Line Spectrum Pair
In this appendix, we recall the alternatives to the AR modelling, such as those based on log area ratio functions. These functions ensure that the system is stable even after quantification. We also recall other alternatives such as the “line spectrum pairs” (LSP) and the “immitance spectrum pairs” (ISP), both of which are widely used in speech coding standards. We also consider the cepstrum coefficients and the linear prediction cepstrum coefficients (LPCC), both essentially used in speech recognition. In the field of speech coding, if the AR parameters obtained using the YuleWalker equations are transmitted, they have to be quantified by a sufficiently high number of bits, generally about ten, to ensure the exact fit of the spectral envelope of the signal under analysis. Nonetheless, the calculation cost of this procedure is quite high. Due to this high cost, an alternative approach, called the Durbin-Levinson algorithm, is often adopted. The PARCOR reflection coefficients are less sensitive to quantification. To further lower this sensitivity, we can also use the log area ratio functions defined as follows: LARi
log 10
1 Ui 1 Ui
[D.1]
These coefficients can be understood as the bilinear transformation of the reflection coefficients U i . It should be noted that the reflection coefficients can be
350
Modeling, Estimation and Optimal Filtering in Signal Processing
expressed in terms of the elements LARi with a hyperbolic tangent mean. Thus, irrespective of the values of the log area functions LARi , the module of U i is always lower than 1: expln 10 LAR i 1 expln 10 LAR i 1
Ui
§ ln 10 LAR i tanh ¨¨ 2 ©
· ¸¸ ¹
[D.2]
When implementing equation [D.2], to avoid the division operation and log calculation, the following approximation is considered: U i if U i 0.675 ° ®sgn U i u 2 U i 0.675 if 0.675 d U i 0.950 ° sgn U u 8 U 6.375 if 0.950 d U d 1 i i i ¯
LARi
[D.3]
The inverse transform that makes it possible to obtain the reflection coefficient from the Log Area Ratio is given by:
Ui
LARi if LARi 0.675 ° [D.4] sgn u 0 . 5 LAR LAR ® i i 0.3375 if 0.675 d LARi 1.225 °sgn LAR u 0.125 LAR 0.796875 if 1.225 d LAR d 1.625 i i i ¯
The LSP, first introduced in 1975, have been used in speech coding. To introduce these pairs, the two following polynomials are defined: Pz
¦ a z ¦ a z p
Az z p 1 A z 1
p
i
i
i
i 0
p
¦ ai z
i
i 0
Q §¨ z ·¸ © ¹
p 1
¦ a p 1 m z
m
m 1
p
¦ ai z i 0
i
p 1
¦ a p 1m z
m 1
i 0
p
m
>
@
1 ¦ a m a p 1 m z m 1
§¨ p 1 ·¸ § · A§¨ z ·¸ z © ¹ A¨ z 1 ¸ © ¹ © ¹
i p 1
p
¦
a i z i
i 0
1
[D.5] m
z
p 1
p
¦ ai z i p 1 i 0
p
¦
m
§
·
ªa a º z m z ¨© p 1¸¹ m p 1 m « »¼ 1¬
[D.6]
On the one hand, polynomials Pz and Qz , one symmetric and the other antisymmetric, satisfy the following condition:
Appendices
P z Q z
Az z p 1 A z 1 Az z p 1 A z 1
2 Az
351
[D.7]
On the other hand, Pz and Qz have the following properties: – Property 1: roots zi of both polynomials are all located on the unit circle in the
z-plane. This means that z i
exp jT i where T i
2S
fi is the normalized angular fs
frequency with respect to the sampling frequency f s . – Property 2: for any stable inverse filter Az , the roots of Pz and Qz are intertwined. – Property 3: for any root of Az located close to the unit circle, there are two roots of Pz and Qz on either side of this root. Finally, we can look at the cepstral2 approach which makes it possible to obtain the spectral envelope of the signal. Let the logarithm function of a complex number z be defined as follows: ln z ln z exp jM ln z jM
[D.8]
The complex cepstrum ~ x k and the cepstrum C k of a signal xk can be defined respectively as follows: 1 2S
~ x k TF 1 ln TF xk
C k TF 1 ln TF xk
S
³ S ln X exp jT exp jkT dT
1 2S
³ S ln X exp jT exp jkT dT S
[D.9]
[D.10]
Two major properties of the cepstrum can be pointed out: – first, the complex cepstrum and the cepstrum are real when signal xk is real; – secondly, they are related by the following equation: C k
1 ~ >x k ~x k @ 2
[D.11]
2 As we will see in the following the names are anagrams of terms used in the frequency
domain. For example, “spectrum” is changed to “cepstrum”, “frequency” to “quefrency”, “filtering” to “liftering”, etc.
352
Modeling, Estimation and Optimal Filtering in Signal Processing
This implies that: C 0 ~ x 0
[D.12]
Let the z-transform of the signal xk be defined as follows:
1 z i z 1 p
X z K
i 1 q
1 p i z
with z i 1 and p i 1
[D.13]
1
i 1
If the unit circle in the z-plane lies within the convergence domain of X z , the
Fourier transform of xk satisfies X f f
~ x k
1 2S
§ ¨ ¨ TF 1 ln¨ K ¨ ¨ ¨ ©
³
§ ¨ ln K S ¨ ¨ © S
1 z i z 1 p
i 1 q
1 pi z 1 i 1
¦ p
z
X z z
exp( jT )
. We can show that:
· ¸ ¸ ¸ ¸ ¸ ¸ exp( jT ) ¹
¦ ln1 p z
ln 1 z i z 1
i 1
q
i
1
i 1
z
· ¸ ¸¸ exp jkT dT exp( jT ) ¹ [D.14]
However, since the development of the log series around 1 gives :
ln 1 z i z 1
¦ z f
m 1
m m i z
m
,
[D.15]
equation [D.14] is modified to:
~ x k TF
1 2S
S
1
§ ¨
§ p ¨ 1 z i z 1 ¨ 1 i ln ¨ K q ¨ 1 ¨ 1 pi z ¨ i 1 © p
f
¦ ³ S ¨¨ ln K ¦ i 1 m 1 ©
z
· ¸ ¸ ¸ ¸ ¸ ¸ exp( j T ) ¹
q z im z m ¦ m i 1
f
¦
m 1
p im z m m
z exp(
· ¸ ¸¸ exp j T k d T jT ) ¹
Appendices
353
Let us then consider the cases where k is respectively zero, positive and negative; The complex cepstrum has the following approximate form: ~ x 0 ln K p °° z ik q p ik ~ ¦ for k ! 0 ® x k ¦ i 1 k i 1 k ° ~ °¯ x k 0 for k 0
[D.16]
We have already seen that:
³1 / 2 ln¨© Ae 1/ 2
§
j 2Sf
2
· ¸df ¹
[D.17]
0
where the zeros of A(z ) lie within the unit circle in the z plane i.e., A(z ) has the following form: Az
1 z i z 1 with p
zi 1
[D.18]
i 1
Given equation [D.10], the above relation is similar to considering the cepstrum to be zero. Thus: S S jT jT ³ S ln Ae dT ³S ln Ae exp( jkT ) k
0
dT
C 0 0
[D.19]
In practice, the cepstral analysis can be used to estimate the pitch of a voiced speech signal. In speech recognition, however, the Mel scale frequency cepstral coefficients (MFCC) are used. They are derived from the outputs of a filter bank uniformly distributed on the Mel scale3, which takes proper account of the psychoacoustic nature of the ear.
3
Mel f
Mel f
1000 § f · ¨1 ¸ where f is expressed in Hz or log2 © 1000 ¹
f · § 2595 log10 ¨1 ¸ © 700 ¹
354
Modeling, Estimation and Optimal Filtering in Signal Processing
Moreover, the calculation of the cepstral coefficients can also be based on the calculation of the signal AR coefficients. The linear prediction cepstral coefficients (LPCC) are thus introduced. If the signal is modeled using a pth order AR process, the AR coefficients and the LPCC coefficients satisfy the following relation suggested by Atal: i 1
k ck a i k for i 1,..., p 1 i
ci a i ¦ k
i 1
ci k
k ck a i k for i ! p i p i
¦
where coefficients c i are equivalent to the complex cepstrum.
[D.20]
[D.21]
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Appendix E
Influence of an Additive White Noise on the Estimation of AR Parameters
Let y k be a pth-order autoregressive process disturbed by an additive noise bk which is independent of the driving process u k i.e.: y (k )
p
¦ a i y (k i) u (k ) i 1
z k
y k bk
[E.1]
In Chapter 2, we saw that the noisy observation z k can itself be considered as a pth-order autoregressive process. In fact, we can easily show that: z (k )
p
p
i 1 p
i 1
¦ ai z k i u(k ) b(k ) ¦ ai bk i [E.2]
¦ ai z k i E (k ) i 1
To understand and analyze the influence of the noise on the estimation of the AR parameters, Kay proposes a comparison between the spectral flatness [ y of the
process y k and that of the observation z k , i.e. [ z [1]. For any process x, the spectral flatness is defined as follows:
356
Modeling, Estimation and Optimal Filtering in Signal Processing 1/ 2 exp§¨ ³ ln S xx f df ·¸ © 1/ 2 ¹
[x
1/ 2
³1 / 2
S xx f df
Px
[E.3]
R xx 0
where S xx Z and R xx W denote, respectively, the power spectral density and the autocorrelation function of the process. When xk is an autoregressive process, it can be seen as a sequence wk filtered by a filter whose transfer function is H z 1 / A( z ) . Since the poles of H z necessarily lie inside the unit circle in the z-plane, the filter with transfer function A( z ) satisfies the following criterion4:
³1 / 2 ln¨© Ae 1/ 2
§
j 2Sf / fs
2
· ¸df ¹
[E.4]
0
This implies that:
Px
ln S xx f df ·¸ 1 / 2 ¹
§ ¨ exp¨ ¨¨ ©
§ ¨ S f ln¨ ww 2 1 / 2 ¨ A e j 2Sf ©
exp§¨ ©
³
§ exp¨ ©
³1 / 2 lnS ww f df ³1 / 2 ln¨© Ae
exp§¨ ©
³1 / 2 lnS ww f df ¸¹
1/ 2
1/ 2
1/ 2
1/ 2
1/ 2
³
§
j 2Sf
2
· ¸ ¸df ¸ ¹
· · ¸df ¸ ¹ ¹
· ¸ ¸ ¸¸ ¹
[E.5]
·
However, since wk is a zero-mean white sequence with variance V w2 , we have
S ww f V w2 . This modifies the above equation to the following:
Px
exp§¨ ©
1/ 2
· V2 w
³1 / 2 lnS ww f df ¸¹
The spectral flatness has the following three properties: – 0 d [x d1; – [x
1 if and only if S xx f is constant;
4 For further details, refer to Appendix D on the cepstrum.
[E.6]
Appendices
357
– [ x | 0 if and only if S xx f presents a sharp peak. Thus, we expect that [ z ! [ y . Let us now take up Kay’s demonstration. To ease the comparison between [ y and [ z , we first find the expressions for
R yy W , R zz W , Py and Pz . Seeing equation [E.1], we can easily show that the
autocorrelation function rzz W of the noisy observation satisfies the following relation: rzz W r yy W rbb W r yy W V b2 G W
[E.7]
The pup autocorrelation matrix of the observation z k can thus be expressed as follows:
R zz ( p)
" rzz ( p 1)º ª rzz (0) » « # % # » « «¬rzz ( p 1) " rzz (0) »¼
R yy ( p ) V b2 I p
[E.8]
Similarly, the autocorrelation vectors of the process and the observations verify the following equality:
R zz ( p)
ª rzz (1) º « # » » « «¬rzz ( p)»¼
ª r yy (1) º » « « # » «r yy ( p )» ¼ ¬
[E.9]
R yy ( p )
The least squares estimation Tˆ z , obtained by applying equation [2.129] to the
noisy observations, satisfies the following condition: R zz ( p)Tˆ z
>R
2 yy ( p ) V b I p
@Tˆ
z
R zz ( p )
R yy ( p )
[E.10]
Re-arranging the above equation, we obtain the following expression:
Tˆ z
>
R yy ( p ) V b2 I p
@
1
R yy ( p)
[E.11]
358
Modeling, Estimation and Optimal Filtering in Signal Processing
However, the least squares estimation Tˆ obtained by applying equation [2.129] to the noiseless observations would verify the following condition: R yy ( p )Tˆ
R yy ( p)
[2.129]
Now, combining equations [E.11] and [2.129], we obtain:
Tˆ z
>R
>R
@
1 2 R yy ( p)Tˆ yy ( p ) V b I p
[E.12]
If we apply the matrix inversion lemma, introduced in section 2.2.2, to
@
1 2 , yy ( p ) V b I p
Tˆ z
we obtain:
>
@
ª R 1 ( p ) V 2 R 1 ( p ) I V 2 R 1 ( p ) 1 R 1 ( p)º R ( p )Tˆ b yy p b yy yy «¬ yy »¼ yy
>
Tˆ V b2 R yy 1 ( p ) R yy ( p) R yy ( p ) V b2 I p
@
1
Tˆ
[E.13]
Tˆ V b2 R zz 1 ( p) Tˆ Taking the expectation of the above relation [E.13] and recalling that the parameter vector is denoted by T [ a 1 " a p ] T , we obtain the following difference:
^ `
E Tˆ z T
V b2 R zz 1 ( p) T
^ `
because in that case, E Tˆ
[E.14]
T.
At that stage, let us carry out the eigenvalue decomposition of the observation autocorrelation matrix:
R yy ( p )
PDP T
PDP 1
where P is constructed using the unitary eigenvectors Vi associated with the eigenvalues Oi . The above break-up simplifies equation [E.12] as follows:
Tˆ z
>PDP V I @ P>D V I @ P 1
1 2 PDP 1 Tˆ b p
1 2 b p
1
PDP 1 Tˆ
>P>D V I @P @ P>D V I @ DP 2 b p
1 2 b p
1 1
PDP 1Tˆ
1
Tˆ
P'P 1 Tˆ
[E.15]
Appendices
where ' is a diagonal matrix whose ith element is equal to
Oi O i V b2
359
.
Using equation [2.74], the minimum square prediction error E p can be expressed as follows: Ep
¦
y 2 (k )
k
p
¦ aˆ i ¦ y(k ) y(k i) i 1
[E.16]
k
Thus: Py
r yy 0
p
¦ aˆ i ryy i
T r yy 0 Tˆ R yy ( p )
[E.17]
i 1
As concerns the minimum quadratic prediction error associated with the noisy observation, we have: Pz
rzz 0 Tˆ z T R zz ( p)
r yy 0 V b2 Tˆ z T R yy ( p )
[E.18]
From equations [2.129], [E.17] and [E.18], we can easily show that: Pz
T T ryy 0 V b2 Tˆ z T R yy ( p) Tˆ R yy ( p) Tˆ R yy ( p)
Tˆ
Tˆ
T Py V b2 Tˆ z Tˆ R yy ( p)
Py V b2
T
z
[E.19]
R yy ( p)Tˆ
Taking into account equation [E.15], which expresses Tˆ z in terms of Tˆ , we obtain: Pz
>
@ P'P I PDP Tˆ I P'P PDP Tˆ
T Py V b2 P'P T I Tˆ R yy ( p )Tˆ
Py V b2 Tˆ Py V b2
T
T Tˆ
T Py V b2 Tˆ /Tˆ
T
T
T
T
T
[E.20]
360
Modeling, Estimation and Optimal Filtering in Signal Processing
Note that /
I P'P PDP T
I P'P PDP T
/
T
T
is a symmetric matrix, and: PI ' DP T
PDP T P'DP T
[E.21]
As ' and D are diagonal matrices, we have:
PI ' DP
T T
/T
Moreover, § · ¨1 O i ¸O i ¨ O V 2 ¸ i b ¹ ©
I ' D
P I ' DP T
/
[E.22]
is a diagonal matrix whose ith element is equal to
O i V b2
/
is a positive definite matrix.
Py V b2 H with H
Tˆ /Tˆ ! 0 . Expressing [ z using
O i V b2
Consequently, Pz
PD T I 'T P T
! 0 . Thus,
T
equation [E.20], we obtain:
Py V b2 H
[z
rzz 0
!
Py V b2 H r yy 0 V b2
r yy 0 [ y V b2 [ y H r yy 0 V b2
[y
r yy 0 [ y V b2 H r yy 0 V b2
H
r yy 0 V b2
[E.23]
![y
The spectral flatness associated with the noisy observations is therefore greater than that associated with the AR process. References [1] S. M. Kay, “Noise Compensation for Autoregressive Spectral Estimates”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 28, no. 3, pp. 292-303, June 1980.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Appendix F
The Schur-Cohn Algorithm
In this appendix, our aim is to present the Schur-Cohn algorithm [1] which is often used as a criterion for testing the stability of bounded-input bounded-output systems [2]. To simplify the description of this algorithm, we first take up the analysis of the stability domain of a 2nd-order transfer function. This particular case leads to a simplification of the stability criteria imposed on the denominator of the transfer function. Unfortunately, it cannot be applied to transfer functions of an order greater than 2. We also present the Schur-Cohn stability algorithm based on the transfer function of an all-pass filter, allowing us to establish equivalence relation between the Schur coefficients and the reflection coefficients. Let there be a second-order transfer function defined as follows: H 2 z
1 a1 z
1
The poles of H 2 z
a2 z
2
[F.1]
1 b1 z 1 b2 z 2
1 p z 1 p z are equal to: 1
p1
N 2 z D2 z
1 b1 z 1 b2 z 2
1
2
1ª a1 a12 4a 2 º and p 2 »¼ 2 «¬
1
1ª a1 a12 4a 2 º »¼ 2 «¬
[F.2]
362
Modeling, Estimation and Optimal Filtering in Signal Processing
and its zeros are defined as follows:
z1
1ª b1 b12 4b2 º and z 2 »¼ 2 «¬
1ª b1 b12 4b2 º »¼ 2 «¬
[F.3]
Depending on the values taken by a1 and a 2 , the poles can be real or complex. For example, when a12 4a 2 , the poles are complex conjugates of each other. Otherwise, they are real. To ensure stability, the poles of the transfer function must be located within the unit circle in the z-plane, i.e.: p1 1 and p2 1
[F.4]
This constraint implies that the following two inequalities are satisfied: a2
p1 p 2 d p1 p 2 1
[F.5]
and: a1 d 1 a 2
[F.6]
Relations [F.5] and [F.6] make it possible to define a triangle in the ( a1 , a 2 ) plane where the filter is stable and which is called the stability triangle. This triangle depicted in Figure F.1. is a simple tool for testing the stability as it is based on the values of the filter’s coefficients.
Appendices
363
Figure F.1. The stability triangle
Application of the stability triangle Let there be a pth-order transfer function defined as follows: H p z
N p z
[F.7]
D p z p
where D p z 1 ¦ a ip z i i 1
p
¦
a0p 1 i 0
a ip z i
1 pi z 1 . p
i 1
The first condition required for the stability is expressed in terms of a pp : a pp
p
1 p p i
1 since p i 1 i 1,.., p
[F.8]
i 1
In the rest of this stability test, we will take k p
a pp and assume that the first
condition [F.8] is satisfied. Let us develop the transfer function of a pth-order all-pass filter using D p z . p
~ H p z
~ D p z D p z
¦ a ppi z i i 0 p
¦ i 0
[F.9] a ip z i
364
Modeling, Estimation and Optimal Filtering in Signal Processing
~ Furthermore, we define H p 1 z as follows: ~ H p 1 z
z
~ H p z k p ~ 1 k p H p z
z
~ H p z a pp ~ 1 a pp H p z
(a pp1 a pp a1p ) (a pp 2 a pp a 2p ) z 1 ... (1 (a pp ) 2 ) z p 1
[F.10]
(1 (a pp ) 2 ) ... (a pp 2 a pp a 2p ) z p 2 (a pp1 a pp a1p ) z p 1
Note 1: ~ We note that H p 1 z is also an all-pass filter of order p-1, and its expression
can be simplified by imposing: a ip 1
(a ip a pp a ppi )
1
(1 (a pp ) 2 )
(1 k 2p )
a ip
kp (1 k p2 )
a ppi i 1,.., p 1
[F.11]
Thus, we have: ~ H p 1 z
a pp11 z 1 ... a1p 1 z p 2 z p 1 1 a1p 1 z 1 ... a pp12 z p 2 a pp11 z p 1
[F.12]
~ From equation [F.10], we note that the poles of H p 1 z , ~ pi , are such that: ~ 1 a pp H p ~ pi 0
[F.13]
Thus, taking equation [F.8] into account, we obtain: ~ H p ~ pi ! 1
[F.14]
~ We now show that satisfying the assertions “ H p z is the transfer function of a ~ stable all-pass filter” and “ k p2 1 ” is equivalent to saying that “ H p 1 z is stable”. ~ To do this, we first show that if “ H p z is the transfer function of a stable all~ pass filter” and “ k p2 1 ”, then “ H p 1 z is the transfer function of a stable all-pass
filter”.
Appendices
365
We can easily show that any given all-pass function G z satisfies the following properties: G z 1 if z ! 1 ° ® G z 1 if z 1 ° G z ! 1 if z 1 ¯
[F.15]
~ Consequently, if H p z is in fact the transfer function of a stable all-pass filter, ~ H p ~ pi ! 1 when ~ pi 1 . However, from equation [F.14], we see that ~ ~ H p ~ p i ! 1 . Therefore, the poles ~ pi of H p 1 z lie inside the unit circle in the ~ z-plane and H p 1 z is stable. ~ Let us now assume that H p 1 z is the transfer function of a stable all-pass filter ~ and that k p2 1 . Let us take up equation [F.10] and express H p z as a function of ~ H p 1 z . We thus obtain:
~ H p z
~ k p z 1 H p 1 z ~ 1 k p z 1 H p 1 z
[F.16]
~ If O0 is a pole of H p z , it must verify: ~ k p H p 1 O0
O 0
[F.17]
As k p2 1 , we get: ~ H p 1 O 0 ! O 0
[F.18]
~ Since H p 1 z is the transfer function of a stable all-pass filter, equation [F.15]
gives us: ~ H p 1 z 1 if z ! 1
[F.19]
366
Modeling, Estimation and Optimal Filtering in Signal Processing
~ H p 1 z
1
[F.20]
~ H p 1 z ! 1 if z 1
[F.21]
1 if z
Condition [F.18] contradicts [F.19], but is in agreement with [F.21]. ~ Consequently, O 0 1 and H p z is stable. ~ Using the same development as presented above, we can define H p 2 z in ~ ~ ~ terms of H p 1 z , then define H p 3 z in terms of H p 2 z , and so on, until we ~ obtain H 0 z 1 . At each successive step, we test the value of k p 1 , then k p 2 , and ~ so on. The Schur stability criterion states that H p z is stable if k i2 1 for all
values of i. Let us now look at the correspondence between Schur coefficients and reflection coefficients. Let us take equation [F.11]: a ip 1
(a ip a pp a ppi ) (1
(a pp ) 2 )
1 (1
k p2 )
a ip
kp (1
k 2p )
a ppi i 1,.., P 1
and write it in a matrix form, taking into account that k p ª a 0p 1 º « » « # » 1 p «a » « p 1 » 0 ¬« ¼»
ª a 0p º « » kp 1 « # » p (1 k p2 ) «« a p 1 »» (1 k 2p ) «¬ a pp »¼
ª a pp º « » « # » p «a » « 1p » ¬« a 0 ¼»
a pp and a 0p
[F.11]
a 0p 1 1 :
[F.22]
Let us compare this matrix with equation [2.90], obtained using the Levinson algorithm, into which we integrate a 0p
a 0p 1 1 :
Appendices ª a 0p º « p » « a1 » « # » « p 1 » «a p » « p » ¬« a p ¼»
>
The
a 0p
ª a 0p 1 º ª 0 º « p 1 » « a p 1 » « a1 » « p 1 » « # » Up« # » « p 1 » « p 1 » « a p 1 » « a1 » « 0 » «¬ 1 »¼ ¼ ¬
following
a1p
"
D p z
367
[F.23]
polynomial
is
@:
associated
with
the
vector
T a pp
a pp1 p
¦ aip z i i 0
>a
Thus, we can associate the following polynomial with the vector p p
a pp1 " a1p
a 0p
@: T
p
¦ a ppi z i
z p D p z 1
i 0
The Schur-Cohn algorithm is written as follows: D p 1 z ª « p 1 D p 1 z 1 «¬ z
º » »¼
kp º ª 1 « » 2 (1 k p2 ) » ª D p z « (1 k p ) « » « z p D z 1 kp 1 p « z z » «¬ 2 2 «¬ (1 k p ) (1 k p ) »¼
º » »¼
[F.24]
while the expression for the Levinson algorithm satisfies: ª D p z º « p 1 » ¬« z D p z »¼
ª 1 « «¬ U p
D p 1 z º » « p 1 » D p 1 z 1 ¼» »¼ ¬« z
U p z 1 º ª z
1
[F.25]
We note that the matrices in equations [F.24] and [F.25] are the inverses of one another. Thus, we have:
368
Modeling, Estimation and Optimal Filtering in Signal Processing
kp º ª 1 « » 2 (1 k p2 ) » ª 1 « (1 k p ) « » «U kp 1 « » «¬ p z z 2 «¬ (1 k p2 ) (1 k p ) »¼
U p z 1 º z
1
» »¼
ª1 0º «0 1 » ¬ ¼
[F.26]
so: kp
Up
[F.27]
This proves the equivalence between the Schur coefficients and the reflection coefficients. References [1] T. Kailath, “A Theorem of I. Schur and its Impact on Modern Signal Processing”, Operator Theory: Advances and Applications (I. Schur Methods in Operator Theory and Signal Processing), 18, pp. 9-30, Birkhauser, 1986. [2] S. K. Mitra, Digital Signal Processing – a Computer Based Approach, 3rd edition, McGraw-Hill, 2006.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Appendix G
The Gradient Method
This appendix is related to Chapter 4. Let us consider the specific case of a single-coefficient gradient based filter: H N k H N k 1
D wJ 2 wH N
[G.1] H N H N k 1
From Figure G.1, we see that the derivative
wJ wH N
is positive. Our H N H N k 1
task now is to update the filter coefficient using the term
Note that if the derivative is negative, value of H N k 1 increases.
D wJ 2 wH N
D wJ 2 wH N
. H N H N k 1
is positive and the H N H N k 1
370
Modeling, Estimation and Optimal Filtering in Signal Processing
Figure G.1. Reduction in the gradient for the case of a filter with one coefficient
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Appendix H
An Alternative Way of Understanding Kalman Filtering
The orthogonal projection approach is an alternative to the algebraic method presented in Chapter 5 to obtain the equations of the Kalman filter. Given
^ e1 , " , e n ` , an orthogonal base of the subspace spanned by the observations ^y (1), " , y (n)` , xˆ (k / n) can be defined as the projection of the state vector x(k ) on the measurement subspace Y ^ y (1), " , y (n)` , see Figure H.1.
Figure H.1. Projection of the state vector on the measurement subspace
xˆ (k / n) is thus expressed as follows: xˆ (k / n)
>
E x(k ) Y
@
>
E x(k ) y (1), " , y (n)
@
[H.1]
372
Modeling, Estimation and Optimal Filtering in Signal Processing
thus: xˆ (k / n)
n
¦ E[ x(k )ei ]E[ei ei ] 1 ei
[H.2]
i 1
Note: let - be orthogonal to Y
>
i 1, 2, ! , n . Moreover, E x(k ) Y , -
^ y (1), " , y (n)`
. Thus, E[- ei ] 0 for
@ E>x(k ) Y @ E>x(k ) - @.
We can also show that xˆ (k/k ) is the maximum likelihood estimation of the state. If we assume the initial state x(0) , the driving process u(k ) and the additive noise v(k ) to be Gaussian distributions, we obtain a maximum a posteriori (MAP) estimation of the state, by maximizing the state’s probability density accounting for the k observations {y(1), y(2), ..., y(k)}. This probability density function is taken here to be the likelihood function. The assumption on the initial state, the driving process and the measurement noise allows us to obtain an optimal filter. Otherwise stated, the Kalman filter is the best linear minimum-variance filter. For further information, the reader is referred to [1] and [2]. References [1] P. S. Maybeck, Stochastic Models, Estimation and Control, vol. I, Academic Press, New York, 1979. [2] A. P. Sage and J.L. Melsa, Estimation Theory with Applications to Communications and Control, McGraw-Hill, 1971.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Appendix I
Calculation of the Kalman Gain using the Mehra Approach
Calculation of the autocorrelation function of the innovation sequence Let us first recall the notations, introduced in Chapter 5, which are used in this appendix and in Chapter 6. x (k /k 1) denotes the a priori estimation of the state vector at instant k: x (k /k 1) = x(k ) xˆ ( k / k 1)
[I.1]
and P(k / k 1) denotes the associated autocorrelation matrix:
^
P (k / k 1) E ~ x (k / k 1) ~ x T (k / k 1)
`
[I.2]
Furthermore, the Kalman gain at instant k is given by: K (k )
>
P(k / k 1) H T HP(k / k 1) H T R
@
1
[I.3]
whereas the innovation is defined as follows: e( k )
H~ x (k/k 1) v(k )
[I.4]
Using equation [I.4], the autocorrelation function of this innovation, ree ( j ) E^e(k )e(k j )` , is given by:
374
Modeling, Estimation and Optimal Filtering in Signal Processing
^
`
ree ( j ) = H E ~ x (k / k 1) ~ x T (k j/k j 1) H T E^v(k ) v(k j )` x (k / k 1) v(k j )` H E^~ x (k j/k j 1) v(k )` H E^~
For the particular case j
[I.5]
0 , we can show that:
ree (0) = HP(k/k-1) H T R
[I.6]
To simplify the expression of the innovation’s autocorrelation function [I.5] for j ! 0 , we first show that ~ x ( k / k 1) can be expressed as follows: j ½ x(k / k 1)= ®ĭ(I -K (k -i)H )¾ x(k j /k j 1) ¯ i =1 ¿ j i 1 ½ ¦ ®ĭ(I -K (k -l )H )¾Gu(k i+1) i 1¯l 1 ¿ j i 1 ½ ¦ ®ĭ(I -K (k -l )H )¾ĭK (k i)v(k i) i 1¯l 1 ¿
[I.7]
However, for 1 d i d j , we have:
^ E ^v(k i ) ~ x
`
E u (k i+1) ~ x T (k j/k j 1) T
`
(k j/k j 1)
0
0
[I.8] [I.9]
Taking equations [I.7], [I.8] and [I.9] into account: – the first term in the innovation’s autocorrelation function [I.5] is modified to:
^
`
T HE x (k / k 1) x (k j / k j 1) H T
j ½ =H ® ĭ(I K (k i )H ) ¾ P(k j /k j 1)H T ¯ i =1 ¿
[I.10]
– the third term of equation [I.5] verifies:
j 1 ½ H E ^ x (k / k 1) v(k j )` = H ® ĭ (I K (k i )H ) ¾ ĭK (k j )R ¯i 1 ¿
[I.11]
Appendices
375
x (k j/k j 1) v(k )` of equation [I.5] is zero; – the last term H E^~
– and finally, the second term can be written as follows: E^v(k )v(k j )`= RG ( j ) .
[I.12]
Given the above observations, for j ! 0 , the expression of the innovation’s autocorrelation function is modified to: j ½ ree ( j ) = H ® ĭ ( I K ( k i ) H ) ¾ P ( k j / k j 1) H T =1 i ¯ ¿
[I.13]
j 1 ½ H ® ĭ ( I K (k i ) H ) ¾ ĭK ( k j ) R ¯i1 ¿
thus: ree (j ) j 1 ½ H ® ĭ(I K (k i )H ) ¾ ĭ ª¬(I K (k j )H ) P(k j /k j 1)H T K (k j )R º¼ 1 i ¯ ¿
[I.14]
We can thus reformulate the innovation’s autocorrelation function as follows: j -1 ½ ree ( j ) = H ® ĭ (I K (k i)H ) ¾ ĭ ¯ i =1 ¿ ª P(k j / k j 1)H T K (k j ) HP(k j / k j 1)H T R º ¬ ¼
[I.15]
Iterative method for obtaining the optimal Kalman gain When the Kalman algorithm converges, we have: lim K ( k )
k of
K and lim P (k/k-j ) k of
P for all integer values of j
[I.16]
Using the expression for the value ree (0) of the innovation’s autocorrelation function [I.6], equation [I.15] is expressed as follows: ree ( j ) = H > ĭ (I KH )
@
j -1
ĭ ª¬ PH T Kree (0) º¼ for j ! 0
[I.17]
Combining the expressions ree j for 1 j p , we obtain the following matrix relation:
376
Modeling, Estimation and Optimal Filtering in Signal Processing
ª ree (1) º « r (2) » « ee » « # » « » ¬ ree ( p ) ¼
Hĭ ª º « » 1 « H ĭ (I KH ) ĭ » T « » ª¬ PH Kree (0) º¼ # « » « H ĭ (I KH ) p -1 ĭ » ¬ ¼
[I.18]
The above relation can be written as follows:
K
Hĭ ª º « » 1 « H ĭ (I KH ) ĭ » PH T / ree (0) « » # « » « H ĭ (I KH ) p -1 ĭ » ¬ ¼
1
ª ree (1) º « r (2) » « ee » / r (0) « # » ee « » ¬ ree ( p ) ¼
[I.19]
For the optimal case, the second element on the right hand side of equation [I.19] is zero. For all sub-optimal cases, its value is non-zero and we use an iterative procedure to calculate K [1]:
K (i )
Hĭ ª º « » 1 « H ĭ (I KH ) ĭ » K (i -1) « » # « » « H ĭ (I KH ) p -1 ĭ » ¬ ¼
1
ª ree (1) º « r (2) » « ee » / r (0) « # » ee « » ¬ ree ( p ) ¼
[6.51]
where ree (j) is the autocorrelation of the innovation, obtained with the gain K(i-1). As the entity ree j is an unknown, we estimate it using the values of ek (with 0 d k d N 1 ): rˆee ( j )
1 N
N -1
¦ e(i)e(i j )
[I.20]
i j
and update the estimated gain as follows: ª º Hĭˆ « » « Hĭˆ [I Kˆ (i 1)H ]ĭˆ » Kˆ (i)=Kˆ (i 1) « » # « » « » p -1 ĭˆ » «¬ H ª¬ĭˆ [I Kˆ (i 1)H º¼ ¼
-1
ª rˆee (1) º « rˆ (2) » « ee » / rˆ (0) « # » ee « » ¬ rˆee (p) ¼
[I.21]
Appendices
377
References [1] R.K. Mehra, “On the Identification of Variances and Adaptive Kalman Filtering”, IEEE Trans. on Automatic Control. vol. AC-15, no. 2, pp. 175-184, April 1970. [2] R.K. Mehra, “On-Line Identification of Linear Dynamic Systems with Applications to Kalman Filtering”, IEEE Trans. on Automatic Control. vol. AC-16, no. 1, pp. 12-21, February 1971.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Appendix J
Calculation of the Kalman Gain (the Carew and Belanger Method)
Introduction In this appendix, we present the Carew and Belanger approach [1] as an alternative to the Mehra method to obtain the optimal Kalman gain. This approach is based on the prediction of the state vector using a sub-optimal gain. We first present some basic definitions and notations, used in the rest of the appendix; then, we consider the calculation of the innovation sequence’s autocorrelation function for the sub-optimal case; and finally, using this calculation, we establish the relation between the optimal and sub-optimal cases. Notations and definitions The superscript
denotes the sub-optimal case and the superscript ^ denotes the
optimal case. Thus, for example, x * (k/k - 1) is the a priori estimation of the state vector x(k ) when the gain K * is sub-optimal whereas xˆ (k/k - 1) denotes the a priori estimation of x(k ) for optimal Kalman gain Kˆ . We now introduce the following notations: P * (k/k - 1) denotes the autocorrelation matrix of xˆ (k / k 1)- x * (k / k 1) , i.e.:
P (k / k 1)
E ® xˆ (k / k 1) x * (k/k - 1) xˆ (k / k 1) x * (k/k - 1) ¯
T
½ ¾ ¿
[J.1]
380
Modeling, Estimation and Optimal Filtering in Signal Processing
Furthermore, the innovation sequence e* (k ) in the sub-optimal case satisfies the following relation: e* (k ) y (k ) H x * (k / k 1)
[J.2]
The autocorrelation function of e* (k ) is denoted ree * ( j ) For the optimal case, the innovation satisfies:
^
`
E e * k e * k j .
eˆ(k ) y (k ) H xˆ (k / k 1)
[J.3]
and its autocorrelation is denoted rˆee ( j ) . Using the above expressions for the innovation in the optimal and sub-optimal cases, we get: e* (k ) y (k ) H x * (k / k 1) y (k ) H x * (k / k 1) H xˆ (k / k 1) H xˆ (k / k 1)
>
eˆ(k ) H xˆ (k / k 1) x * (k / k 1)
[J.4]
@
Updating P * (k/k-1) Considering equations [5.13] and [5.20] at instant k, and for a sub-optimal filtering, the state vector can be updated as follows: x* (k/k 1)=ĭ x* (k 1/k 1)
>
ĭ x* (k 1/k 2)+ĭK * y (k 1)- H x* (k 1/k 2)
@
[J.5]
Using equations [J.1] and [J.5], we thus obtain the following updating relation between P * (k/k 1) and P * (k 1/k 2) : xˆ (k/k 1) x * (k/k 1) = ĭ xˆ (k 1/k 2)+ĭKˆ > y (k 1)- H xˆ (k 1/k 2)@
>
ĭ x* (k 1/k 2) ĭK * y (k 1)- H x * (k 1/k 2)
ĭK * >H xˆ (k 1/k 2) H xˆ (k 1/k 2)@
@
[J.6]
Appendices
381
thus: xˆ (k/k 1) x* (k/k 1)
@>
>
@
ĭ I - K * H xˆ (k 1/k 2) x* (k 1/k 2) ĭKˆ > y (k 1)- H xˆ (k 1/k 2)@ *
*
ĭK y (k 1) ĭK H xˆ (k 1/k 2)
[J.7] or: xˆ (k/k 1) x* (k/k 1)
@>
>
@
[J.8]
ĭ I - K * H xˆ (k 1/k 2) x* (k 1/k 2) ĭ Kˆ K * eˆ(k 1)
We can now express matrix P (k / k 1) , defined in [J.1], in terms of P * (k - 1/k - 2) and the value rˆee (0): P * (k/k - 1)
ĭ ( I K * H ) P * (k - 1/k - 2)( I K * H ) T ĭ T rˆ (0)ĭ ( Kˆ K * )( Kˆ K * ) T ĭ T
[J.9]
ee
Calculation of the autocorrelation function of the innovation sequence, for the suboptimal case Starting from equation [J.4] and using equation [J.8], we show recursively that: e* (k ) H (ĭ ( I K * H )) j ( xˆ (k j/k j 1)- x* (k j/k j 1))
>
j
eˆ(k ) ¦ H ĭ ( I K * H ) i=1
@
i-1
[J.10]
ĭ ( Kˆ K * )eˆ(k-i)
However, when the Kalman gain is optimal and simultaneously orthogonal to xˆ (k - j/k - j - 1) and to x * (k - j/k - j - 1) , the sequence e* (k ) is a white sequence. Its
autocorrelation function ree * can thus be expressed as follows, for j ! 0 :
ree* ( j )=
>
H ĭ( I-K * H )
@
j-1
>
ĭ ( I-K * H ) P* (k - j/k - j - 1) H T ( Kˆ K * )rˆee (0)
@
[J.11]
and: ree * (0) = H P * (k/k - 1) H T rˆee (0)
[J.12]
382
Modeling, Estimation and Optimal Filtering in Signal Processing
Iterative method for obtaining the Kalman gain If the filter is assumed to converge, we have: lim P * (k/k - 1)
n of
P*
[J.13]
and equations [J.11] and [J.12] are modified to:
>
>
@
°r * ( j ) = H ĭ ( I - K * H ) j -1 ĭ ( I - K * H ) P * H T ( Kˆ K * )rˆ (0) T ee ® ee ˆree (0) ree * (0) - HP * H T °¯
@
[J.14]
Combining equations [J.14] for 1 d j d p , we obtain the following equality relation: ª ree * (1) º ª Hĭ º « * » « » * « ree (2) » = « H ĭ I - K H ĭ » ( I - K * H ) P * H T ( Kˆ K * )rˆ (0) ee « # » « » # « * » « p 1 » * ĭ¼ «¬ree ( p )»¼ ¬ H ĭ I - K H
>
@
>
@
>
@
[J.15]
This equality can also be expressed as follows: Kˆ K * - ( I - K * H ) P * H T / rˆee (0) Hĭ ª º « H ĭ I - K *H ĭ » » « « » # p -1 » « * ĭ¼ ¬H ĭ I - K H
>
@
>
@
1
ª ree * (1) º « * » « ree (2) » / rˆ (0) « # » ee » « * «¬ree ( p )»¼
[J.16]
We can thus determine the optimal Kalman gain using sub-optimal quantities and the covariance of the innovation for the optimal case. In practice, we proceed as follows. Let P * 0 be the initial value of P * . The optimal gain and the variance of the innovation are first calculated. Then, P * is updated using these new values. At iteration i, we can thus write:
Appendices
ree i (0) = ree * (0) - H Pi* H T Kˆ i
383
[J.17]
K * - ( I - K * H ) Pi * H T / rˆeei (0) Hĭ º ª « H ĭ I - K *H ĭ » » « » « # p -1 » « * ĭ¼ ¬H ĭ I - K H
>
@
>
@
1
ª ree * (1) º « * » « ree (2) » / rˆ (0) « # » eei » « * ¬«ree ( p)¼»
[J.18]
* * * * T T * * T T Pi+ 1 ĭ( I K H ) Pi ( I K H ) ĭ ree i (0)ĭ( K i K )( K i K ) ĭ [J.19]
To estimate ree * ( j ) , we use the values of e* (k ) (for 0 d k d N 1 ): rˆee * ( j )
1 N
N -1
¦ e* (i)e* (i j)
[J.20]
i j
References [1] B. Carew and P. R. Belanger, “Identification of Optimum Filter Steady-State Gain for Systems with Unknown Noise Covariances”, IEEE Trans. on Automatic Control. vol. AC-18, pp. 582-587, December 1973.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Appendix K
The Unscented Kalman Filter (UKF)
The unscented Kalman filter (UKF) is based on the “unscented transformation” (UT). First proposed by Julier et al. [1] the UT allows for the estimation of the mean and the covariance of an arbitrary analytical transformation y f (x) of a random Gaussian vector x with a mean value m x and a covariance matrix Px . If L denotes the size of the vector x , the method put forth by Julier et al. runs in three steps: 1) 2L+1 particles or ı-points [1] are generated as follows: x0
mx
xi
mx
L O P , i ^1,..., L`
[K.2]
xi
mx
L O P , i ^L 1,...,2L`
[K.3]
[K.1]
x
x
i
i
where M i is the iith row or column of matrix M and O D 2 L N L is a scaling factor. Element D is a parameter which allows us to control the dispersion of the ıpoints around the mean m x . N is a secondary scaling factor. 2) The ı-points are transformed using function f: yi
f x i , i ^0,...,2 L` .
[K.4]
386
Modeling, Estimation and Optimal Filtering in Signal Processing
3) The mean y and covariance Py of variable y are estimated as follows: 2L
my
¦ Wmi y i
[K.5]
i 0
2L
Py
¦ Wci ( y i y )( y i y ) *T
[K.6]
i 0
where * denotes the complex conjugate and the weighting factors are defined as follows: W m0
O /(O L)
Wc0 W mi
O /(O L) (1 D 2 E ) Wci
[K.7]
1 /^2(O L)` , i ^1 , ... , 2 L `
Here, parameter E allows us to take into account the higher-order terms in the distribution of the random vector [2]. This approach is similar to the Monte Carlo (MC) sampling methods. As we see from Figure K.1, the density probability of the random vector is characterized by a set of points which travels through function f. Nonetheless, the UT is based on a deterministic construction of the particle cloud and does now allow us to characterize any kind of density distribution. Its biggest advantage is its calculation cost, lower than the MC methods which use hundreds, even thousands, of samples.
Figure K.1. UT in the case of a random second-order vector
x
Appendices
387
Let there be a state space representation expressed as follows: x k ® ¯yk
f ( x k 1 , u k ) g ( x k , bk )
,
[K.8]
where f and g are two arbitrary analytical functions, and x k , y k , uk and bk denote, respectively, the state vector, the observation, the driving process and the observation noise. The UKF allows the recursive estimation of x k . At a given instant k, upon applying the UT on functions f and g twice, we obtain the predictions xˆ k / k 1 and yˆ k / k 1 of the state vector and the observation. The a posteriori estimation of the state vector is then obtained as follows:
xˆ k / k
xˆ k / k 1 K k y k yˆ k / k 1
[K.9]
where K (k ) is the filter gain, calculated using the estimation of the covariance matrices Pkyy/ k 1 of the observation and the intercorrelation matrix Pkxy/ k 1 between the state vector and the observation [1]: K (k )
Pkxy/ k 1 Pkyy/ k 1
1
[K.10]
388
Modeling, Estimation and Optimal Filtering in Signal Processing
Generation of the ı-points
>xˆ
xˆ ek 1 / k 1 Pke1 / k 1
xˆ
e k 1 / k 1
@
T T T vT k 1 / k 1 u diag Pkxx1 / k 1 , P uu , P vv
ª xˆ e «¬ k 1 / k 1
i
>xˆ
i k 1 / k 1
uˆ ki 1 / k 1
i e k / k 1
xˆ ek 1 / k 1 J Pke1 / k 1 º »¼
xˆ ek 1 / k 1 J Pke1 / k 1
A priori estimation
xˆ
f xˆ ik 1 / k 1 , uˆ ki 1 / k 1
vˆ ki 1 / k 1
@
2L
xˆ k / k 1
¦Wmi xˆ ik 1/ k 1
Pkxx/ k 1
¦Wci xˆ ik / k 1 xˆ k / k 1 xˆ ik / k 1 xˆ k / k 1
i 0 2L
T
i 0
yˆ k / k 1
g xˆ ik 1/ k 1 , vˆki 1 / k 1
yˆ k / k 1
¦Wmi yˆ ki 1/ k 1
2L
i 0
Pkyy/ k 1
¦Wci yˆ ki / k 1 yˆ k / k 1 yˆ ki / k 1 yˆ k / k 1 2L
T
i 0
A posteriori estimation Pkxy/ k 1
¦Wci xˆ ik / k 1 xˆ k / k 1 yˆ ki / k 1 yˆ k / k 1 2L
i 0
T
1
K k Pkxy/ k 1 Pkyy/ k 1 xˆ k / k xˆ k / k 1 K k y k yˆ k / k 1 Pkxx/ k
Pkxx/ k 1 K k Pkyy/ k 1 K kT
u and P uu are the mean and the covariance matrix of the driving process v and P vv are the mean and covariance matrix of the additive noise L is the size of the extended state vector
Figure K.2. The UKF algorithm
Appendices
389
References [1] S. J. Julier, J. K. Uhlmann, H. F. Durrant-Whyte, “A New Approach for Filtering Nonlinear Systems”, IEEE-ACC ’95, Seattle, USA, vol. 3, pp. 1628-1632, 21-23 June 1995. [2] E. A. Wan, R. van der Merwe, “The Unscented Kalman Filter”, Chapter 7 in Kalman Filtering and Neural Networks, S. Haykin, ed., John Wiley, 2001.
Modeling, Estimation and Optimal Filtering in Signal Processing Mohamed Najim Copyright 0 2008, ISTE Ltd.
Index
U-LMS 166, 271 E-LMS 166 J-LMS 166, 271
A–B Adaptive filter 138, 149–183 APA (affine projection algorithm) 150, 172– 178 AR (autoregressive) – model 2, 11–19, 20, 21, 34, 60, 67, 68, 71, 102, 143, 164, 202, 241, 303, 333 – parameters 49, 68, 82–93, 151, 156, 204, 230, 256, 264, 275, 296, 298, 302, 349, 355 ARMA (autoregressive moving average) – model 5, 7, 9, 19, 36, 38, 93, 95, 99 – parameters 2 ARMAX process 6, 261, 296 ASIR (auxiliary sampling importance resampling) filter 331 Autocorrelation method 65, 308 Bias 85, 258
C–E Canonical form 225, 233, 240 CELP (Code-Excited Linear Prediction) 2, 22 Cepstrum 349, 351 Controllability 5, 32, 33, 234 Covariance method 63, 65, 77–80 Difference equation 12, 161
Differential equation 27, 28, 30, 151, 185, 192, 210 Disturbance attenuation level 289, 290, 291, 293 Durbin-Levinson algorithm 49, 72, 75, 349 ECG (electrocardiogram) 2 EEG (electroencephalogram) 2, 20 Eigenvalue decomposition 86, 88, 137, 174, 342, 358 Eigenvector 137, 160, 174, 337–343, 358 ESPRIT 24 Extended Kalman filter (EKF) 185, 208, 212, 218, 261
F–I FIR (Finite Impulse Response) filter 9, 176, 177, 179, 255 Forgetting factor 62, 155, 201, 230 Forward 75 Fourier transform 1, 12, 27, 108, 109, 123, 130, 139, 140, 352 Fredholm equation 114, 116, 119, 123 Hf filter 286, 289–313 Hf norm 151, 286–288, 290, 294, 295 Hamming window 226, 305 Hanning window 139, 226 Hessian matrix 53, 152, 153, 157 Identification 1, 20, 33, 77, 224, 233, 255, 261, 296 Innovation 196–198, 240–247, 261, 266, 292, 299, 373–382
150,
115,
240, 263,
392
Modeling, Estimation and Optimal Filtering in Signal Processing
Instrumental variable (IV) technique 255– 283 ISP (immitance spectrum pair) 349
K–M Kalman – filter 119, 185–222, 223–254, 255, 371, 385 – gain 189, 213, 240–243, 261, 294, 373, 379 – smoothing 231, 240, 266, 306, 307 Lattice filter 75 Least squares estimation 49–104, 166, 189, 357 Levinson algorithm 67, 75, 249, 267, 366 Linear prediction 21, 75, 349, 354 LMS (least mean square) 119, 150, 158, 163, 166, 167, 271 LSP (line spectrum pair) 349 MA (moving average) – model 9, 18, 35 – parameters 42 Matched filter 105–119, 122 Matlab 85, 261, 291 Matrix inversion lemma 60, 192, 358 Maximum likelihood 100, 231, 272, 285, 305 MFCC (Mel scale frequency cepstral coefficient) 353 MLMS (modified LMS) 167, 171, 178 Modified Yule-Walker equations 85, 90, 228, 240, 260, 271 MUSIC 24, 343
N–P NLMS (normalized LMS) 167, 171, 178 Nonlinear estimation 185, 205, 208, 256 Norm 151, 160, 167, 168, 169, 170, 171, 208, 286, 290, 294, 346 Normal equation 67, 135, 138 Observability 5, 32, 38, 234, 236 Optimization 52, 112, 149, 157, 167, 291 Parametric model 1–47, 119, 308 PARCOR 63, 349 Particle filtering 315–334 Pisarenko 24, 342
Pitch 102, 207, 232, 353 Projection 172, 233, 234, 336, 347, 371
R–S Random – process 1, 19, 105, 337 – variable 5, 316, 336 – vector 386 Recursive IV (instrumental variable) technique 261 RLS (recursive least squares) 49, 150–156, 176, 178, 204 Reflection coefficients 68, 71, 75, 349, 361, 366, 368 Ricatti equation 192 RQ factorization 238 Schur coefficient 361, 366, 368 Schur-Cohn stability test 17, 361–368 Schwarz inequality 108 Sinusoidal model 21–27, 44, 260 SIR 326, 332 Spectral factorization 116, 290 Speech enhancement 2, 22, 82, 90, 139, 151, 225, 235, 246, 260, 271, 286, 304, 345 Stability 16, 19, 170, 276, 291, 361 State space representation 5, 27, 186, 192, 203, 208, 223, 224, 231, 233, 240, 261, 267 Steepest descent 157 Subspace methods 224, 233, 235, 237, 345
T–Y Toeplitz matrix 66 Transfer function 7, 16, 27, 361 Transversal parameters 6, 11, 71 Uncertainty 227, 289 Unscented transformation (UT) 385 Weights 155, 300, 322 Whitening 116, 127 White noise 39, 49, 83, 93, 107, 116, 127, 132, 139, 166, 215, 223, 232, 248, 276, 355 Wiener filter 105, 157, 160, 173, 185 Wiener Hopf equation 119, 121, 127, 135, 138, 159 Window 101, 139, 146, 155, 226, 242, 305 Yule-Walker equation 49, 67, 70, 85, 90, 157, 230, 248, 250, 349