Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen
2251
3
Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Tokyo
Yuan Y. Tang Victor Wickerhauser Pong C. Yuen Chun-hung Li (Eds.)
Wavelet Analysis and Its Applications Second International Conference, WAA 2001 Hong Kong, China, December 18-20, 2001 Proceedings
13
Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Yuan Y. Tang Pong C. Yuen Chun-hung Li Hong Kong Baptist University Department of Computer Science Kowloon Tong, Hong Kong E-mail:{yytang/pcyuen/chli}@comp.khbu.edu.hk Victor Wickerhauser Washington University, Department of Mathematics Campus Box 1146, Cupples I St. Louis, Missouri 63130, USA E-mail:
[email protected]
Cataloging-in-Publication Data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Wavelet analysis and its applications : second international conference ; proceedings / WAA 2001, Hong Kong, China, December 18 - 20, 2001. Yuan Y. Tang ... (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; Milan ; Paris ; Tokyo : Springer, 2001 (Lecture notes in computer science ; Vol. 2251) ISBN 3-540-43034-2
CR Subject Classification (1998): E.4, H.5, I.4, C.3, I.5 ISSN 0302-9743 ISBN 3-540-43034-2 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2001 Printed in Germany Typesetting: Camera-ready by author, data conversion by DA-TeX Gerd Blumenstein Printed on acid-free paper SPIN 10845973 06/3142 543210
Preface
The first international conference on wavelet analysis and its applications was held in China in 1999. Following the success of the first conference, the second international conference (ICWAA 2001) was held in Hong Kong in December 2001. The objective of this conference is to provide a forum for researchers working on both wavelet theory and its applications. By idea-sharing and discussions on the state of the art in wavelet theory and applications, ICWAA 2001 is aimed to stimulate the future development, explore novel applications, and exchange ideas for developing robust solutions. By August 2001, we had received 67 full papers submitted from all over the world. To ensure the quality of the conference and proceedings, each paper was reviewed by three reviewers. After a thorough review process, the program committee selected 24 regular papers for oral presentation and 27 short papers for poster presentation. In addition to these 24 oral presentations, there were 3 invited talks delivered by distinguished researchers, namely Prof. John Daugman from Cambridge University, UK, Prof. Bruno Torresani from Inria, France, and Prof. Victor Wickerhauser, from Washington University, USA. We must add that the program committee and the reviewers did an excellent job within a tight schedule. We wish to thank all the authors for submitting their work to ICWAA 2001 and all the participants, whether you came as a presenter or an attendee. We hope that there was ample time for discussion and opportunity to make new acquaintances. Finally, we hope that you experienced an interesting and exciting conference and enjoyed your stay in Hong Kong.
October 2001
Yuan Y. Tang, Victor Wickerhauser Pong C. Yuen, C. H. Li
Organization
The Second International Conference on Wavelet Analysis and Applications is organized by the Department of Computer Science, Hong Kong Baptist Univeristy and IEEE Hong Kong Section Computer Chapter.
Organizing Committee
Congress Chair:
Ernest C. M. Lam
General Chairs:
John Daugman Ernest C. M. Lam
Program Chairs:
Yuan Y. Tang Victor Wickerhauser P. C. Yuen
Organizing Chair:
Kelvin C. K. Wong
Local Arrangement Chair:
William K. W. Cheung
Registration & Finance Chair: K. C. Tsui Publications Chairs:
C. H. Li M. W. Mak
Workshop Chair:
Samuel P. M. Choi
Publicity Chair:
C. S. Huang
Sponsors
Hong Kong Baptist University Croucher Foundation IEEE Hong Kong Section Computer Chapter
Organization
VII
Program Committee Metin Akay Akram Aldroubi Claudia Angelini Algirdas Bastys T. D. Bui Elvir Causevic Mariantonia Cotronei Hans L. Cycon Dao-Qing Dai Wolfgang Dahmen Donggao Deng T. N. T. Goodman D. Hardin Daren Huang Wen-Liang Hwang Rong-Qing Jia P. Jorgensen K. S. Lau Seng-Luan Lee Jian-Ping Li Wei Lin Guixing Luan Hong Ma Peter Oswald Lizhong Peng Valrie Perrier S. D. Riemenschneider Zuowei Shen Guoxiang Song Georges Stamon Chew-Lim Tan Michael Unser Jianzhong Wang Yueshen Xu Lihua Yang Rongmao Zhang Xingwei Zhou
Dartmouth College Vanderbilt University Istituto per Applicazioni della Matematica Vilnius University Concordia University Everest Biomedical Instrument Company Universita’ di Messina Fachhochschule fur Technik und Wirtschaft Berlin Zhongshan University Technische Hochschule Aachen Zhongshan University University of Dundee Vanderbilt University Zhongshan University Institute of Information Science University of Alberta University of Iowa Chinese University of Hong Kong National University of Singapore Logistical Engineering University Zhongshan University Shenyang Inst. of Computing Technology Sichuan University Bell Laboratories, Lucent Technologies Peking University Domaine Universitaire West Virgina University National University of Singapore XiDian University University Rene Descartes National University of Singapore Batiment de Microtechnique Sam Houston State University University of North Dakota Zhongshan University Shenyang Inst. of Computing Technology Nankai University
Table of Contents
Keynote Presentations Personal Identification in Real-Time by Wavelet Analysis of Iris Patterns . . . . 1 J. Daugman, OBE Hybrid Representations of Audiophonic Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 B. Torresani Singularity Detection from Autocovariance via Wavelet Packets . . . . . . . . . . . . . 3 M. V. Wickerhauser
Image Compression and Coding Empirical Evaluation of Boundary Policies for Wavelet-Based Image Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 C. Schremmer Image-Feature Based Second Generation Watermarking in Wavelet Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 S. Guoxiang and W. Weiwei A Study on Preconditioning Multiwavelet Systems for Image Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 W. Kim and C.-C. Li Reduction of Blocking Artifacts in Both Spatial Domain and Transformed Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 W.-K. Ling and P. K.-S. Tam Simple and Fast Subband De-blocking Technique by Discarding the High Band Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 W.-K. Ling and P. K-S. Tam A Method with Scattered Data Spline and Wavelets for Image Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49 L. Guan and L. Feng
Video Coding and Processing A Wavelet-Based Preprocessing for Moving Object Segmentation in Video Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 L.-C. Liu, J.-C. Chien, H. Y. Chuang, and C.-C. Li
X
Table of Contents
Embedded Zerotree Wavelet Coding of Image Sequence . . . . . . . . . . . . . . . . . . . . 65 M. J´erˆ ome and N. Ellouze Wavelet-Based Video Compression Using Long-Term Memory Motion-Compensated Prediction and Context-Based Adaptive Arithmetic Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 D. Marpe, T. Wiegand, and H. L. Cycon Wavelets and Fractal Image Compression Based on Their Self-Similarity of the Space-Frequency Plane of Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Y. Ueno
Theory Integration of Multivariate Haar Wavelet Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 S. Heinrich, F. J. Hickernell, and R.-X. Yue An Application of Continuous Wavelet Transform in Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 H.-Z. Qu, C. Xu, and Z. Ruizhen Stability of Biorthogonal Wavelet Bases in L2 (R) . . . . . . . . . . . . . . . . . . . . . . . . . 117 P. F. Curran and G. McDarby Characterization of Dirac Edge with New Wavelet Transform . . . . . . . . . . . . . 129 L. Yang, X. You, R. M. Haralick, I. T. Phillips, and Y. Y. Tang Wavelet Algorithm for the Numerical Solution of Plane Elasticity Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Y. Shen and W. Lin Three Novel Models of Threshold Estimator for Wavelet Coefficients . . . . . . 145 S. Guoxiang and Z. Ruizhen The PSD of the Wavelet-Packet Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 M. Li, Q. Peng, and S. Zhong Orthogonal Multiwavelets with Dilation Factor a . . . . . . . . . . . . . . . . . . . . . . . . . 157 S. Yang, Z. Cheng, and H. Wang
Image Processing A Wavelet-Based Image Indexing, Clustering, and Retrieval Technique Based on Edge Feature . . . . . . . . . . . . . . . . . . . . . . . . . . 164 M. Kubo, Z. Aghbari, K. S. Oh, and A. Makinouchi Wavelet Applications in Segmentation of Handwriting in Archival Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 C. L. Tan, R. Cao, and P. Shen
Table of Contents
XI
Wavelet Packets for Lighting-Effects Determination . . . . . . . . . . . . . . . . . . . . . . . 188 A. Z. Kouzani, and S. H. Ong Translation-Invariant Face Feature Estimation Using Discrete Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 K. Ma and X. Tang Text Extraction Based on Nonlinear Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Y. Guan and L. Zhang A Wavelet Multiresolution Edge Analysis Method for Recovery of Depth from Defocused Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Q. Wang, W. Hu, J. Hu, and K. Hu Construction of Finite Non-separable Orthogonal Filter Banks with Linear Phase and Its Application in Image Segmentation . . . . . . . . . . . . 223 H. Chen and S. Peng Mixture-State Document Segmentation Using Wavelet-Domain Hidden Markov Tree Models . . . . . . . . . . . . . . . . . . . . . . 230 Y. Y. Tang, Y. Hou, J. Song, and X. Yang Some Experiment Results on Feature Analyses of Stroke Sequence Free Matching Algorithms for On-Line Chinese Character Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 M. L. Tak Automatic Detection Algorithm of Connected Segments for On-line Chinese Character Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 M. L. Tak
Signal Processing
Speech Signal Deconvolution Using Wavelet Filter Banks . . . . . . . . . . . . . . . . . 248 W. Hu and R. Linggard A Proposal of Jitter Analysis Based on a Wavelet Transform . . . . . . . . . . . . . . 257 J. Borgosz and B. Cyganek Skewness of Gabor Wavelets and Source Signal Separation . . . . . . . . . . . . . . . . 269 W. Yu, G. Sommer, and K. Daniilidis The Application of the Wavelet Transform to Polysomnographic Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .284 M. MacCallum and A. E. A. Almaini Wavelet Transform Method of Waveform Estimation for Hilbert Transform of Fractional Stochastic Signals with Noise . . . . . . . . . 296 W. Su, H. Ma, Y. Y. Tang, and M. Umeda
XII
Table of Contents
Multiscale Kalman Filtering of Fractal Signals Using Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 J. Zhao, H. Ma, Z.-S. You, and M. Umeda General Analytic Construction for Wavelet Low-Passed Filters . . . . . . . . . . . . 314 J. P. Li and Y. Y. Tang A Design of Automatic Speech Playing System Based on Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Y. Liu, J. Cen, Q. Sun, and L. Yang General Design of Wavelet High-Pass Filters from Reconstructional Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 L. Yang, Q. Chen, and Y. Y. Tang Realization of Perfect Reconstruction Non-uniform Filter Banks via a Tree Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 W.-K. Ling and P. K.-S. Tam Set of Decimators for Tree Structure Filter Banks . . . . . . . . . . . . . . . . . . . . . . . . . 336 W.-K. Ling and P. K.-S. Tam Set of Perfect Reconstruction Non-uniform Filter Banks via a Tree Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 W.-K. Ling and P. K.-S. Tam Systems and Applications Joint Time-Frequency Distributions for Business Cycle Analysis . . . . . . . . . . .347 S. Md. Raihan, Y. Wen, and B. Zeng The Design of Discrete Wavelet Transformation Chip . . . . . . . . . . . . . . . . . . . . . 359 Z. Razak and M. Yaacob On the Performance of Informative Wavelets for Classification and Diagnosis of Machine Faults . . . . . . . . . . . . . . . . . . . . . . . . 369 H. Ahmadi, R. Tafreshi, F. Sassani, and G. Dumont A Wavelet-Based Ammunition Doppler Radar System . . . . . . . . . . . . . . . . . . . . 382 S. H. Ong and A. Z. Kouzani The Application of Wavelet Analysis Method to Civil Infrastructure Health Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 J. P. Li, S. A. Yan, and Y. Y. Tang Piecewise Periodized Wavelet Transform and Its Realization, Properties and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 W.-K. Ling and P. K.-S. Tam Wavelet Transform and Its Application to Decomposition of Gravity Anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 H. Zunze
Table of Contents
XIII
Computations of Inverse Problem by Using Wavelet in Multi-layer Soil . . . . 411 B. Wu, S. Liu, and Z. Deng Wavelets Approach in Choosing Adaptive Regularization Parameter . . . . . . 418 F. Lu, Z. Yang, and Y. Li DNA Sequences Classification Based on Wavelet Packet Analysis . . . . . . . . . .424 J. Zhao, X. W. Yang, J. P. Li, and Y. Y. Tang The Application of the Wavelet Transform to the Prediction of Gas Zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430 X. W. Yang, J. Zhao, J. P. Li, J. Liu, and S. P. Zeng Parameterizations of M-Band Biorthogonal Wavelets . . . . . . . . . . . . . . . . . . . . . . 435 Z. Zhang and D. Huang Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .449
Author Index
Aghbari, Z. . . . . . . . . . . . . . . . . . . . . 164 Ahmadi, H. . . . . . . . . . . . . . . . . . . . . 369 Almaini, A. E. A. . . . . . . . . . . . . . . 284 Borgosz, J. . . . . . . . . . . . . . . . . . . . . 257 Cao, R. . . . . . . . . . . . . . . . . . . . . . . . . 176 Cen, J. . . . . . . . . . . . . . . . . . . . . . . . . 321 Chen, H. . . . . . . . . . . . . . . . . . . . . . . 223 Chen, Q. . . . . . . . . . . . . . . . . . . . . . . 326 Cheng, Z. . . . . . . . . . . . . . . . . . . . . . . 157 Chien, J.-C. . . . . . . . . . . . . . . . . . . . . 54 Chuang, H. Y. . . . . . . . . . . . . . . . . . . 54 Curran, P. F. . . . . . . . . . . . . . . . . . . 117 Cycon, H. L. . . . . . . . . . . . . . . . . . . . . 76 Cyganek, B. . . . . . . . . . . . . . . . . . . . 257 Daniilidis, K. . . . . . . . . . . . . . . . . . . 269 Daugman, J. . . . . . . . . . . . . . . . . . . . . . 1 Deng, Z. . . . . . . . . . . . . . . . . . . . . . . . 411 Dumont, G. . . . . . . . . . . . . . . . . . . . 369 Ellouze, N. . . . . . . . . . . . . . . . . . . . . . .65 Feng, L. . . . . . . . . . . . . . . . . . . . . . . . . 49 Guan, L. . . . . . . . . . . . . . . . . . . . . . . . .49 Guan, Y. . . . . . . . . . . . . . . . . . . . . . . 211 Guoxiang, S. . . . . . . . . . . . . . . . 16, 145 Haralick, R. M. . . . . . . . . . . . . . . . . 129 Heinrich, S. . . . . . . . . . . . . . . . . . . . . . 99 Hickernell, F. J. . . . . . . . . . . . . . . . . .99 Hou, Y. . . . . . . . . . . . . . . . . . . . . . . . .230 Hu, J. . . . . . . . . . . . . . . . . . . . . . . . . . 217 Hu, K. . . . . . . . . . . . . . . . . . . . . . . . . . 217 Hu, W. . . . . . . . . . . . . . . . . . . . 217, 248 Huang, D. . . . . . . . . . . . . . . . . . . . . . 435
J´erˆome, M. . . . . . . . . . . . . . . . . . . . . . 65 Kim, W. . . . . . . . . . . . . . . . . . . . . . . . . 22 Kouzani, A. Z. . . . . . . . . . . . . 188, 382 Kubo, M. . . . . . . . . . . . . . . . . . . . . . . 164 Li, C.-C. . . . . . . . . . . . . . . . . . . . . 22, 54 Li, J. P. . . . . . . . . . 314, 393, 424, 430 Li, M. . . . . . . . . . . . . . . . . . . . . . . . . . 151 Li, Y. . . . . . . . . . . . . . . . . . . . . . . . . . .418 Lin, W. . . . . . . . . . . . . . . . . . . . . . . . . 139 Ling, W.-K. . . 37, 44, 331, 336, 341, 398 Linggard, R. . . . . . . . . . . . . . . . . . . . 248 Liu, J. . . . . . . . . . . . . . . . . . . . . . . . . . 430 Liu, L.-C. . . . . . . . . . . . . . . . . . . . . . . . 54 Liu S. . . . . . . . . . . . . . . . . . . . . . . . . . 411 Liu, Y. . . . . . . . . . . . . . . . . . . . . . . . . 321 Lu, F. . . . . . . . . . . . . . . . . . . . . . . . . . 418 Ma, H. . . . . . . . . . . . . . . . . . . . . 296, 305 Ma, K. . . . . . . . . . . . . . . . . . . . . . . . . 200 MacCallum, M. . . . . . . . . . . . . . . . . 284 Makinouchi, A. . . . . . . . . . . . . . . . . 164 Marpe, D. . . . . . . . . . . . . . . . . . . . . . . 76 McDarby, G. . . . . . . . . . . . . . . . . . . .117 Oh, K. S. . . . . . . . . . . . . . . . . . . . . . . 164 Ong, S. H. . . . . . . . . . . . . . . . . 188, 382 Peng, Q. . . . . . . . . . . . . . . . . . . . . . . . 151 Peng, S. . . . . . . . . . . . . . . . . . . . . . . . 223 Phillips, I. T. . . . . . . . . . . . . . . . . . . 129 Qu, H.-Z. . . . . . . . . . . . . . . . . . . . . . . 107 Raihan, S. Md. . . . . . . . . . . . . . . . . 347 Razak, Z. . . . . . . . . . . . . . . . . . . . . . . 359
450
Author Index
Ruizhen, Z. . . . . . . . . . . . . . . . 107, 145 Sassani, F. . . . . . . . . . . . . . . . . . . . . . 369 Schremmer, C. . . . . . . . . . . . . . . . . . . . 4 Shen, P. . . . . . . . . . . . . . . . . . . . . . . . 176 Shen, Y. . . . . . . . . . . . . . . . . . . . . . . . 139 Sommer, G. . . . . . . . . . . . . . . . . . . . 269 Song, J. . . . . . . . . . . . . . . . . . . . . . . . 230 Su, W. . . . . . . . . . . . . . . . . . . . . . . . . 296 Sun, Q. . . . . . . . . . . . . . . . . . . . . . . . . 321 Tafreshi, R. . . . . . . . . . . . . . . . . . . . . 369 Tak, M. L. . . . . . . . . . . . . . . . . 237, 242 Tam, P. K.-S. 37, 44, 331, 336, 341, 398 Tan, C. L. . . . . . . . . . . . . . . . . . . . . . 176 Tang, X. . . . . . . . . . . . . . . . . . . . . . . . 200 Tang, Y. Y. 129, 230, 296, 314, 326, 393, 424 Torresani, B. . . . . . . . . . . . . . . . . . . . . . 2 Umeda, M. . . . . . . . . . . . . . . . 296, 305 Ueno, Y. . . . . . . . . . . . . . . . . . . . . . . . . 87 Wang, H. . . . . . . . . . . . . . . . . . . . . . . 157 Wang, Q. . . . . . . . . . . . . . . . . . . . . . . 217 Weiwei, W. . . . . . . . . . . . . . . . . . . . . . 16 Wen, Y. . . . . . . . . . . . . . . . . . . . . . . . 347
Wickerhauser, M. V. . . . . . . . . . . . . . 3 Wiegand, T. . . . . . . . . . . . . . . . . . . . . 76 Wu, B. . . . . . . . . . . . . . . . . . . . . . . . . 411 Xu, C. . . . . . . . . . . . . . . . . . . . . . . . . . 107 Yaacob, M. . . . . . . . . . . . . . . . . . . . . 359 Yan, S. A. . . . . . . . . . . . . . . . . . . . . . 393 Yang, L. . . . . . . . . . . . . . 129, 321, 326 Yang, S. . . . . . . . . . . . . . . . . . . . . . . . 157 Yang, X. . . . . . . . . . . . . . . . . . . . . . . .230 Yang, X. W. . . . . . . . . . . . . . . 424, 430 Yang, Z. . . . . . . . . . . . . . . . . . . . . . . . 418 You, X. . . . . . . . . . . . . . . . . . . . . . . . . 129 You, Z.-S. . . . . . . . . . . . . . . . . . . . . . 305 Yu, W. . . . . . . . . . . . . . . . . . . . . . . . . 269 Yue, R.-X. . . . . . . . . . . . . . . . . . . . . . . 99 Zeng, B. . . . . . . . . . . . . . . . . . . . . . . . 347 Zeng, S. P. . . . . . . . . . . . . . . . . . . . . .430 Zhang, L. . . . . . . . . . . . . . . . . . . . . . . 211 Zhang, Z. . . . . . . . . . . . . . . . . . . . . . . 435 Zhao, J. . . . . . . . . . . . . . . . . . . 424, 430 Zhao, J. . . . . . . . . . . . . . . . . . . . . . . . 305 Zhong, S. . . . . . . . . . . . . . . . . . . . . . . 151 Zunze, H. . . . . . . . . . . . . . . . . . . . . . . 404
Personal Identification in Real-Time by Wavelet Analysis of Iris Patterns John Daugman, OBE The Computer Laboratory, University of Cambridge, UK
Abstract. The central issue in pattern recognition is the relation between within-class variability and between-class variability. These are determined by the various degrees-of-freedom spanned by the patterns themselves, and by the selectivity of the chosen feature encoders. An interesting application of 2D wavelets in computer vision is the automatic recognition of personal identity by encoding and matching the complex patterns visible at a distance in each eye’s iris. Because the iris is a protected, internal, organ whose random texture is highly unique and stable over life, it can serve as a kind of living password or passport that one need not remember but is always in one’s possession. I will describe wavelet demodulation methods that I have developed for this problem over the past 10 years, and which are now installed in all existing commercial systems for iris recognition. The principle that underlies iris recognition is the failure of a test of statistical independence performed on the phase angle sequences of iris patterns. Quadrature 2D Gabor wavelets spanning 3 octaves in scale enable the complex-valued assignment of local phasor coordinates to iris patterns. The combinatorial complexity of these phase sequences spans about 244 independent degrees-of-freedom, and generates binomial distributions for the Hamming Distances (a similarity metric) between different irises. In six public independent field trials conducted so far using these algorithms, involving several millions of iris comparisons, there has never been a single false match recorded. The time required to locate and to encode an iris into quantized wavelet phase sequences is 1 second. Then database searches are performed at a rate of 100,000 irises/second. Data will be presented in this talk from 2.3 million IrisCode comparisons. This wavelet application could be used in a wide range of settings in which persons’ identities must be established or confirmed by large scale database search, without relying upon cards, keys, documents, secrets, passwords or PINs.
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, p. 1, 2001. c Springer-Verlag Berlin Heidelberg 2001
Hybrid Representations of Audiophonic Signals Bruno Torresani LATP, CMI, Universit´e de Provence, France
Abstract. A new approach for modeling audio signal will be presented, in view of efficient encoding. The method is based upon hybrid models featuring transient, tonal and stochastic components in the signal. The three components are estimated and encoded independently using a strategy very much in the spirit of transform coding. The signal models involve nonlinear expansions on local trigonometric bases, and binary trees of wavelet coefficients. Unlike several existing approaches, the method does not rely on any prior segmentation of the signal. The talk is based on joint works with L. Daudet and S. Molla.
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, p. 2, 2001. c Springer-Verlag Berlin Heidelberg 2001
Singularity Detection from Autocovariance via Wavelet Packets M. Victor Wickerhauser Department of Mathematics, Washington University, USA
Abstract. We use the eigenvalues of a version of the autocovariance matrix to recognize directions at which the Fourier transform of a function is slowly decreasing, which provides us with a technique to detect singularities in images. In very high dimensions, we show how the wavelet packet best-basis algorithm can be used to compute these eigenvalues approximately, at relatively low computational complexity.
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, p. 3, 2001. c Springer-Verlag Berlin Heidelberg 2001
Empirical Evaluation of Boundary Policies for Wavelet-Based Image Coding Claudia Schremmer Praktische Informatik IV Universit¨ at Mannheim, 68131 Mannheim, Germany
[email protected]
Abstract. The wavelet transform has become the most interesting new algorithm for still image compression. Yet there are many parameters within a wavelet analysis and synthesis which govern the quality of a decoded image. In this paper, we discuss different image boundary policies and their implications for the decoded image. A pool of gray–scale images has been wavelet–transformed at different settings of the wavelet filter bank and quantization threshold and with three possible boundary policies. Our empirical evaluation is based on three benchmarks: a first judgment regards the perceived quality of the decoded image. The compression rate is a second crucial factor. Finally, the best parameter settings with regard to these two factors is weighted with the cost of implementation. Contrary to the JPEG2000 standard, where mirror padding is implemented, our investigation proposes circular convolution as the boundary treatment. Keywords: Wavelet Analysis, Boundary Policies, Empirical Evaluation
1
Introduction
Due to its outstanding performance in compression, the wavelet transform is the focus of new image coding techniques such as the JPEG2000 standard [8,4]. JPEG2000 proposes a reversible (Daub 5/3–tap) and an irreversible (Daub 9/7– tap) wavelet filter bank. However, since we were interested in how filter length affects the quality of image coding, we investigated the orthogonal and separable wavelet filters developed by Daubechies [2]. These belong to the group of wavelets used most often in image coding applications. They specify a number n0 of vanishing moments: if a wavelet has n0 vanishing moments, then the approximation order of the wavelet transform is also n0 . Implementations of the wavelet transform on still images entail other aspects as well: speed, decomposition depth, and boundary treatment policies. Long filters require more computing time than short ones. Furthermore, the (dyadic) wavelet transform incorporates the aspect of iteration: the low–pass filter defines an approximation of the original signal that contains only half as many coefficients. This approximation successively builds the input for the next approximation. For compression purposes, coefficients in the time–scale domain Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 4–15, 2001. c Springer-Verlag Berlin Heidelberg 2001
Empirical Evaluation of Boundary Policies
5
are discarded and the synthesis quality improves with the number of iterations on the approximation. Finally, the wavelet transform is mathematically defined only within a signal; image applications thus need to solve the boundary problem. Depending on the boundary policy selected, the number of iterations in a wavelet transform might vary with the filter length. Moreover, the longer the filter length, the more important the boundary policy becomes. In this work, we investigate the effects of three different boundary policies in combination with different wavelet filter banks on a number of gray–scale images. A first determining factor is the visual perception of a decoded image. As we will see, although the quality varies strongly with the selected image, for a given image it remains relatively unconcerned about the parameter settings. A second crucial factor is therefore the expected compression rate. Finally, the cost of implementation weights these two benchmarks. Our empirical evaluation leads us to recommend circular convolution as the boundary treatment, contrary to JPEG2000 which proposes padding. The article is organized as follows. In Section 2, we cite related work on wavelet filter evaluation. Section 3 reviews the wavelet transform and details the aspects that are important for our survey. In Section 4, we present the technical evaluation of the wavelet transform and detail our results. The article ends in Section 5 with an outlook on future work.
2
Related Work
Villasenor’s group researches wavelet filters for image compression. In [10], the focus is on biorthogonal filters, and the evaluation is based on the information preserved in the reference signal, while [3] focuses on a mathematically optimal quantizer step size. In [1], the evaluation is based on lossless as well as on subjective lossy compression performance, complexity and memory usage. An interpretation of why the observations are made is nevertheless lacking. Strutz has thoroughly researched the dyadic wavelet transform in [9]: the design and construction of different wavelet filters is investigated, as are good Huffman and arithmetic encoding strategies. An investigation of boundary policies, however, is lacking.
3
The Wavelet Transform
A wavelet is an (ideally) compact function, i.e., outside a certain interval it vanishes. Implementations are based on the fast wavelet transform, where a given wavelet (i.e., mother wavelet) is shifted and dilated so as to provide a base in the function space. That is, a one–dimensional function is transformed into a two– dimensional space, where it is approximated by coefficients that depend on time (determined by the translation parameter) and on scale, i.e., frequency (determined by the dilation parameter). The localization of a wavelet in time spread (σt ) and frequency spread (σω ) has the property σt σω = const. However, the resolution in time and frequency depends on the frequency. This is the so–called
6
Claudia Schremmer
zoom phenomenon of the wavelet transform: it offers high temporal localization for high frequencies while offering good frequency resolution for low frequencies. 3.1
Wavelet Transform and Filter Banks
By introducing multiresolution, Mallat [7] made an important contribution to the application of wavelet theory to multimedia: the transition from mathematical theory to filters. Multiresolution analysis is implemented via high–pass, respectively, band–pass filters (i.e., wavelets) and low–pass filters (i.e., scaling functions): The detail coefficients (resulting from the high–pass, respectively, band–pass filtering) of every iteration step are kept apart, and the iteration starts again with the remaining approximation coefficients (from application of the low–pass filter). This multiresolution theory is ‘per se’ defined only for one–dimensional wavelets on one–dimensional signals. As still images are two– dimensional discrete signals and two–dimensional wavelet filter design remains an active field of research [5][6], current implementations are restricted to separable filters. The successive convolution of filter and signal in both dimensions opens two potential iterations: – standard: all approximations, even in mixed terms, are iterated, and – non–standard: only the purely low–pass filtered parts of every approximation enter the iteration. In this work, we concentrate on the non–standard decomposition. 3.2
Image Boundary
A digital filter is applied to a signal by convolution. Convolution, however, is defined only within a signal. In order to result in a reversible wavelet transform, each signal coefficient must enter into filter length/2 calculations of convolution (here, the subsampling process by factor 2 is already incorporated). Consequently, every filter longer than two entries, i.e., every filter except Haar, requires a solution for the boundary. Furthermore, images are signals of a relatively short length (in rows and columns), thus the boundary treatment is even more important than e.g. in audio coding. Two common boundary policies are padding and circular convolution. Padding Policies. With padding, the coefficients of the signal on either border are padded with filter length-2 coefficients. Consequently, each signal coefficient enters into filter length/2 calculations of convolution, and the transform is reversible. Many padding policies exist; they all have in common that each iteration step physically increases the storage space in the wavelet domain. In [11], a theoretical solution for the required storage space (depending on the signal, the filter bank and the iteration level) is presented. Nevertheless, its implementation remains sophisticated.
Empirical Evaluation of Boundary Policies
7
Circular Convolution. The idea of circular convolution is to ‘wrap’ the end of a signal to its beginning or vice versa. In so doing, circular convolution is the only boundary treatment to maintain the number of coefficients for a wavelet transform, thus simplifying storage management1 . A minor drawback is that the time information contained in the time–scale domain of the wavelet–transformed coefficients ‘blurs’: the coefficients in the time–scale domain that are next to the right border (respectively, left border) also affect signal coefficients that are located on the left (respectively, right). The selected boundary policy has an important impact on the iteration behavior of the wavelet transform. It does not affect the iteration behavior of padding policies. However, with circular convolution, the decomposition depth varies with the filter length: the longer the filter, the fewer the number of decomposition iterations possible. For example, for an image of 256 × 256 pixels, the Daub–2 filter bank with 4 coefficients allows a decomposition depth of 7, while the Daub–20 filter bank with 40 coefficients has reached signal length after only 3 decomposition levels. Thus, the evaluation presented in Tables 1 to 4 is based on a decomposition depth of level 8 for the two padding policies, while the decomposition depth for circular convolution varies from 7 to 3, according to the selected filter length.
4
Empirical Evaluation
4.1
Set-Up
Our empirical evaluation sought the best parameter settings for the choice of the wavelet filter bank and for the image boundary policy to be implemented. The performance was evaluated according to the criteria: 1. visual quality, 2. compression rate, and 3. complexity of implementation. The quality was rated based on the peak signal–to–noise ratio (PSNR)2 . The compression rate was simulated by a simple quantization threshold: the higher the threshold, the more coefficients in the time–scale domain are discarded, the higher is the compression rate. More precisely, the threshold was carried out only on the parts of the image that have been high–pass filtered (respectively, band–pass filtered) at least once. That is, the approximation of the image was excluded from the thresholding due to its importance for the image synthesis. 1 2
Storage space, however, expands indirectly: an image can be stored with integers, while the coefficients in the time–scale domain require floats. When org(x, y) depicts the pixel value of the original image at position (x, y), and dec(x, y) denotes the pixel value of the decodedimage at position (x, y), then PSNR [dB] = 10 · log
xy xy
2552
(org(x,y)−dec(x,y))2
.
8
Claudia Schremmer
Our evaluation was set up on the six gray–scale images of size 256 × 256 pixels demonstrated in Figure 1. These test images have been chosen in order to comply with different features: – contain many small details: Mandrill, Goldhill, – contain large uniform areas: Brain, Lena, Camera, House, – be relatively symmetric at the left–right and top–bottom boundaries: Mandrill, Brain, – be very asymmetric with regard to these boundaries: Lena, Goldhill, House, – have sharp transitions between regions: Brain, Lena, Camera, House, and – contain large areas of texture: Mandrill, Lena, Goldhill, House. 4.2
Results
Image-Dependent Analysis. The detailed evaluation results for the six test images are presented in Tables 1 and 2. Some interesting observations made from these two tables and their explanations are as follows: – For a given image and a given quantization threshold, the PSNR remains astonishingly constant for different filter banks and different boundary policies. – At high thresholds, Mandrill and Goldhill yield the worst quality. This is due to the large amount of details in both images. – House produces the overall best quality at a given threshold. This is due to its large uniform areas. – Due to their symmetry, Mandrill and Brain show good quality results with padding policies. – The percentage of discarded information at a given threshold is far higher for Brain than for Mandrill. This is due to the uniform black background of Brain, which produces small coefficients in the time–scale domain, compared to the many small details in Mandrill which produce large coefficients and thus do not fall below the threshold. – With regard to the heuristic for compression, and for a given image and boundary policy, Table 2 reveals that • the compression ratio for zero padding increases with increasing filter length, • the compression ratio for mirror padding decreases with increasing filter length, and • the compression ratio for circular convolution varies, but most often stays almost constant. The explanation is as follows. Padding an image with zeros, i.e., black pixel values, most often produces a sharp contrast to the original image, thus the sharp transition between the signal and the padding coefficients results in large coefficients in the fine scales, while the coarse scales remain unaffected. This observation, however, is put into a different perspective for longer filters: With longer filters, the constant run of zeros at the boundary does not show
Empirical Evaluation of Boundary Policies
9
strong variations, and the detail coefficients in the time–scale domain thus remain small. Hence, a given threshold cuts off fewer coefficients when the filter is longer. With mirror padding, the padded coefficients for shorter filters represent a good heuristic for the signal adjacent to the boundary. Increasing filter length and accordingly, longer padded areas, however, introduces too much ‘false’ detail information into the signal, resulting in many large detail coefficients that ‘survive’ the threshold. Image-Independent Analysis. The above examples reveal that most phenomena are signal–dependent. As a signal–dependent determination of best– suited parameters remains academic, our further reflections are made on the average image quality and the average amount of discarded information as presented in Tables 3 and 4 and the corresponding Figures 2 and 3. Figure 2 visualizes the coding quality of the images, averaged over the six test images. The four plots represent the quantization thresholds λ = 10, 20, 45 and 85. In each graphic, the visual quality (quantified via PSNR) is plotted against the filter length of the Daubechies wavelet filters. The three boundary policies: zero padding, mirror padding and circular convolution are regarded separately. The plots obviously reveal that the quality decreases with an increasing threshold. More important are the following statements: – Within a given threshold, and for a given boundary policy, the PSNR remains almost constant. This means that the quality of the coding process depends hardly or not at all on the selected wavelet filter bank. – Within a given threshold, mirror padding produces the best results, followed by circular convolution. Zero padding performs worst. – The gap between the performance of the boundary policies increases with an increasing threshold. Nevertheless, the differences observed above with 0.28 dB maximum gap (at the threshold λ = 85 and the filter length of 40 coefficients) are so marginal that they do not actually influence visual perception. As the visual perception is neither influenced by the choice of filter nor by the boundary policy, the coding performance has been studied as a second benchmark. The following observations are made in Figure 3. With a short filter length (4 to 10 coefficients), the compression ratio is almost identical for the different boundary policies. This is not astonishing as short filters involve only little boundary treatment, and the relative importance of the boundary coefficients with regard to the signal coefficients is negligible. More important for our investigation is that: – The compression heuristic for each of the three boundary policies is inversely proportional to their quality performance. In other words, mirror padding discards the least number of coefficients at a given quantization threshold, while zero padding discards the most.
10
Claudia Schremmer
– With an increasing threshold, the gap between the compression ratios of the three policies narrows. In the overall evaluation, we have seen that mirror padding performs best with regard to quality, while it performs worst with regard to compression. Inversely, zero padding performs best with regard to compression and worst with regard to quality. Circular convolution holds the midway in both aspects. On the other hand, the gap in compression is by far superior to the differences in quality. Calling to mind the coding complexity of the padding approaches, compared to the easy implementation of circular convolution (see Section 3.2), we strongly recommend to implement circular convolution as the boundary policy in image coding.
5
Conclusion
We have discussed and evaluated the strengths and weaknesses of different boundary policies in relation to various orthogonal wavelet filter banks. Contrary to the JPEG2000 coding standard, where mirror padding is suggested for boundary treatment, we have proven that circular convolution is superior in the overall combination of quality performance, compression performance and ease of implementation. In future work, we will improve our heuristic on the compression rate and rely on the calculation of a signal’s entropy such as it is presented in [12] and [9].
References 1. Michael D. Adams and Faouzi Kossentini. Performance Evaluation of Reversible Integer–to–Integer Wavelet Transforms for Image Compression. In Proc. IEEE Data Compression Conference, page 514 ff., Snowbird, Utah, March 1999. 5 2. Ingrid Daubechies. Ten Lectures on Wavelets, volume 61. SIAM. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1992. 4 3. Javier Garcia-Frias, Dan Benyamin, and John D. Villasenor. Rate Distortion Optimal Parameter Choice in a Wavelet Image Communication System. In Proc. IEEE International Conference on Image Processing, pages 25–28, Santa Barbara, CA, October 1997. 5 4. ITU. JPEG2000 Image Coding System. Final Committee Draft Version 1.0 – FCD15444-1. International Telecommunication Union, March 2000. 4 5. Jelena Kovaˇcevi´c and Wim Sweldens. Wavelet Families of Increasing Order in Arbitrary Dimensions. IEEE Trans. on Image Processing, 9(3):480–496, March 2000. 6 6. Jelena Kovaˇcevi´c and Martin Vetterli. Nonseparable Two– and Three–Dimensional Wavelets. IEEE Trans. on Signal Processing, 43(5):1269–1273, May 1995. 6 7. St´ephane Mallat. A Wavelet Tour of Signal Processing. Academic Press, San Diego, CA, 1998. 6 8. Athanassios N. Skodras, Charilaos A. Christopoulos, and Touradj Ebrahimi. JPEG2000: The Upcoming Still Image Compression Standard. In 11th Portuguese Conference on Pattern Recognition, pages 359–366, Porto, Portugal, May 2000. 4
Empirical Evaluation of Boundary Policies
11
9. Tilo Strutz. Untersuchungen zur skalierbaren Kompression von Bildsequenzen bei niedrigen Bitraten unter Verwendung der dyadischen Wavelet–Transformation. PhD thesis, Universit¨ at Rostock, Germany, May 1997. 5, 10 10. John D. Villasenor, Benjamin Belzer, and Judy Liao. Wavelet Filter Evaluation for Image Compression. IEEE Trans. on Image Processing, 2:1053–1060, August 1995. 5 11. Mladen Victor Wickerhauser. Adapted Wavelet Analysis from Theory to Software. A. K. Peters Ltd., Natick, MA, 1998. 6 12. Mathias Wien and Claudia Meyer. Adaptive Block Transform for Hybrid Video Coding. In Proc. SPIE Visual Communications and Image Processing, pages 153– 162, San Jose, CA, January 2001. 10
(a) Mandrill
(b) Brain
(c) Lena
(d) Camera
(e) Goldhill
(f) House
Fig. 1. Test images for the evaluation
12
Claudia Schremmer
Table 1. Detailed results of the quality evaluation with the PSNR on the six test images. The mean values over the images are given in Table 3 Quality of visual perception — PSNR [dB] Wavelet
zero mirror circular zero mirror circular zero mirror circular padding padding convol. padding padding convol. padding padding convol. Mandrill
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
18.012 18.157 18.169 18.173 17.977 17.938 17.721
17.996 18.187 18.208 18.167 17.959 17.934 17.831
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
14.298 14.414 14.231 14.257 14.268 14.246 14.046
14.350 14.469 14.239 14.216 14.274 14.258 14.065
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
10.905 10.988 10.845 10.918 10.907 10.845 10.784
10.885 10.970 10.839 10.969 10.929 10.819 10.872
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
9.095 9.206 9.160 9.171 9.207 9.083 9.071
9.121 9.184 9.152 9.208 9.193 9.161 9.142
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
17.334 17.532 17.529 17.489 17.539 17.747 17.474
17.346 17.560 17.591 17.448 17.541 17.530 17.527
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
14.387 14.473 14.438 14.460 14.468 14.408 14.384
14.365 14.452 14.438 14.505 14.400 14.406 14.370
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
12.213 12.032 12.150 12.077 12.061 12.074 11.798
12.242 12.122 12.178 12.133 12.197 12.059 11.975
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
11.035 11.092 10.943 11.018 10.815 10.779 10.688
11.161 11.176 11.152 11.148 11.064 11.005 11.031
Camera
Brain Threshold: 10 — Excellent overall quality 18.238 18.141 18.151 18.197 16.392 18.221 18.429 18.434 18.433 16.391 17.963 18.353 18.340 18.248 16.294 18.186 18.279 18.280 18.259 16.543 18.009 18.291 18.300 18.479 16.249 18.022 18.553 18.543 18.523 16.267 18.026 18.375 18.357 18.466 16.252 Threshold: 20 — Good overall quality 14.403 16.610 16.611 16.577 14.775 14.424 16.743 16.755 16.721 14.758 14.276 16.637 16.628 16.734 14.862 14.269 16.747 16.751 16.854 14.739 14.360 16.801 16.803 16.878 14.624 14.300 16.822 16.810 16.852 14.395 14.227 16.953 16.980 16.769 14.252 Threshold: 45 — Medium overall quality 10.910 14.815 14.816 14.747 13.010 10.948 15.187 15.150 15.052 12.766 10.885 15.014 15.029 15.056 12.820 10.949 15.036 15.031 14.999 12.913 10.913 14.989 15.013 15.212 12.447 10.815 15.093 15.133 15.064 12.577 10.843 14.975 14.934 14.882 12.299 Threshold: 85 — Poor overall quality 9.135 13.615 13.621 13.783 11.587 9.124 13.787 13.784 13.759 11.437 9.168 13.792 13.815 13.808 11.539 9.203 13.837 13.850 13.705 11.692 9.206 13.870 13.922 14.042 11.128 9.126 13.731 13.795 13.917 11.128 9.204 13.852 13.800 13.974 11.142 Goldhill Threshold: 10 — Excellent overall quality 17.371 16.324 16.266 16.412 19.575 17.625 16.322 16.296 16.358 19.640 17.577 16.241 16.212 16.342 19.560 17.389 16.214 16.193 16.154 19.613 17.383 16.307 16.223 16.317 19.482 17.523 16.012 16.067 16.033 19.653 17.484 16.322 16.245 16.319 19.550 Threshold: 20 — Good overall quality 14.396 13.937 13.940 13.898 17.446 14.426 13.872 13.892 13.858 17.525 14.430 13.828 13.836 13.753 17.468 14.427 13.743 13.743 13.711 17.454 14.409 13.762 13.785 13.798 17.592 14.414 13.687 13.730 13.697 17.260 14.362 13.700 13.782 13.731 17.476 Threshold: 45 — Medium overall quality 12.131 12.033 12.034 11.876 15.365 12.188 11.961 12.006 11.889 14.957 12.145 11.855 11.891 11.925 14.906 12.120 11.848 11.844 11.801 15.159 12.093 11.760 11.917 11.726 14.776 12.176 11.725 11.855 11.753 14.810 12.048 11.763 11.803 11.703 14.420 Threshold: 85 — Poor overall quality 11.041 10.791 10.805 10.844 13.530 11.080 10.943 10.916 10.754 13.488 11.046 10.861 10.904 10.740 13.524 11.129 10.826 10.935 10.738 13.114 10.987 10.824 10.972 10.771 13.158 10.982 10.737 10.838 10.607 13.073 11.090 10.709 10.819 10.766 13.173
Lena 16.288 16.402 16.355 16.561 16.278 16.304 16.470
16.380 16.350 16.260 16.527 16.214 16.288 16.238
14.765 14.817 14.918 14.946 14.840 14.631 14.597
14.730 14.687 14.735 14.815 14.699 14.477 14.353
13.052 13.138 13.132 13.301 13.066 12.954 12.877
12.832 12.903 12.818 12.983 12.795 12.686 12.640
11.902 11.793 11.806 11.790 11.430 11.610 11.694
11.577 11.516 11.636 11.872 11.555 11.475 11.597
House 19.563 19.630 19.558 19.555 19.388 19.671 19.495
19.608 19.621 19.584 19.566 19.732 19.726 19.524
17.480 17.594 17.647 17.458 17.635 17.276 17.449
17.471 17.612 17.351 17.465 17.689 17.266 17.240
15.437 15.476 15.080 15.382 15.246 15.090 15.033
15.155 15.118 15.180 15.244 14.872 14.969 14.609
13.804 13.726 13.613 13.903 13.695 13.357 13.257
13.703 13.627 13.510 13.111 13.434 13.123 13.678
Empirical Evaluation of Boundary Policies
13
Table 2. Heuristic for the compression rate of the coding parameters of Table 1: The higher the percentage of discarded information in the time–scale domain is, the higher is the compression ratio. The mean values over the images are given in Table 4 Discarded information in the time–scale domain — Percentage [%] Wavelet
zero mirror circular zero mirror circular zero mirror circular padding padding convol. padding padding convol. padding padding convol. Mandrill
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
42 43 44 45 53 59 65
41 42 42 41 38 35 32
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
63 64 65 66 70 74 78
63 63 63 62 58 56 51
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
86 86 87 87 88 90 92
86 86 86 85 82 79 74
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
96 96 96 96 97 97 97
96 96 96 95 93 91 86
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
78 77 77 77 77 80 81
80 79 79 78 74 71 66
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
86 86 86 86 86 88 88
88 88 88 87 85 82 78
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
93 93 94 94 93 94 95
95 95 95 94 93 91 88
Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
97 97 97 97 97 97 98
98 98 98 97 96 95 93
Camera
Brain Lena Threshold: λ = 10 — Excellent overall quality 41 83 83 83 78 79 42 84 84 84 78 80 41 85 84 84 78 79 41 85 84 84 79 79 41 87 82 84 79 74 40 88 78 82 82 69 40 89 74 83 83 64 Threshold: λ = 20 — Good overall quality 63 91 91 91 87 89 64 92 91 91 87 89 63 92 91 91 87 88 63 92 91 91 87 90 63 93 89 91 88 83 62 93 86 91 89 79 63 94 82 91 90 74 Threshold: λ = 45 — Medium overall quality 87 96 96 96 94 95 87 96 96 96 94 95 87 96 96 96 94 95 87 96 96 96 95 94 87 97 94 96 94 91 87 97 91 96 95 88 87 97 89 96 96 83 Threshold: λ = 85 — Poor overall quality 97 98 98 98 97 98 97 98 98 98 97 98 97 98 98 98 97 97 97 98 98 98 98 97 97 98 97 98 97 94 97 98 95 98 98 92 98 98 93 99 98 88 Goldhill House Threshold: λ = 10 — Excellent overall quality 79 70 71 70 79 80 78 70 71 71 79 80 78 71 71 70 79 80 78 71 71 70 79 79 76 73 67 69 80 72 75 77 63 68 82 66 74 79 58 68 83 59 Threshold: λ = 20 — Good overall quality 88 85 87 86 87 88 88 85 87 86 87 88 88 86 86 86 87 88 88 86 86 86 87 87 87 86 83 86 87 81 86 89 79 86 89 75 86 89 73 86 89 69 Threshold: λ = 45 — Medium overall quality 95 94 96 95 93 95 95 95 96 95 94 95 95 95 95 95 94 94 95 95 95 96 94 94 95 95 92 96 94 89 95 95 89 96 95 84 95 96 85 96 95 78 Threshold: λ = 85 — Poor overall quality 98 97 98 98 97 98 98 98 98 98 97 97 98 98 98 98 97 97 98 98 98 99 97 97 98 98 96 99 97 93 98 98 93 99 97 89 98 98 90 99 98 84
79 80 79 80 78 77 77 88 89 89 89 88 88 88 95 95 96 96 96 96 96 98 98 98 98 98 98 99
80 80 79 79 78 77 76 88 88 87 88 87 87 87 94 95 95 95 95 94 95 98 97 98 98 98 98 99
14
Claudia Schremmer
Table 3. Average quality of the six test images. Figure 2 gives a more ‘readable’ plot of these digits
Wavelet Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20 Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
Average image quality — PSNR [dB] zero mirror circular zero mirror circular padding padding convol. padding padding convol. Threshold λ = 10 Threshold λ = 20 17.630 17.602 17.701 15.242 15.252 15.246 17.745 17.752 17.768 15.298 15.330 15.288 17.691 17.711 17.662 15.244 15.284 15.213 17.719 17.701 17.680 15.233 15.270 15.257 17.641 17.615 17.689 15.253 15.290 15.306 17.695 17.675 17.686 15.136 15.185 15.168 17.616 17.654 17.676 15.135 15.207 15.114 Threshold λ = 45 Threshold λ = 85 13.057 13.078 12.942 11.609 11.736 11.681 12.982 13.144 13.016 11.659 11.763 11.643 12.932 13.025 13.002 11.637 11.740 11.651 12.992 13.110 13.016 11.610 11.806 11.626 12.823 13.061 12.935 11.500 11.713 11.666 12.854 12.985 12.911 11.422 11.628 11.538 12.673 12.916 12.788 11.439 11.624 11.718
Quality - Threshold 10
Quality - Threshold 20
18 zero-padding mirror-padding circular convolution
zero-padding mirror-padding circular convolution
15.4
17.8 15.2
PSNR
PSNR
17.6 15
17.4 14.8 17.2 14.6 17 4
6
8
10
20 Length of Wavelet Filter
30
40
4
6
8
10
Quality - Threshold 45
20 Length of Wavelet Filter
30
40
Quality - Threshold 85 12 zero-padding mirror-padding circular convolution
13.2
zero-padding mirror-padding circular convolution 11.8
13
PSNR
PSNR
11.6 12.8
11.4 12.6 11.2 12.4 11 4
6
8
10
20 Length of Wavelet Filter
30
40
4
6
8
10
20 Length of Wavelet Filter
30
40
Fig. 2. Visual quality of the test images at the quantization thresholds λ = 10, 20, 45 and 85. The values correspond to Table 3
Empirical Evaluation of Boundary Policies
15
Table 4. Average bitrate heuristic of the six test images. Figure 3 gives a more ‘readable’ plot of these digits
Wavelet Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20 Daub–2 Daub–3 Daub–4 Daub–5 Daub–10 Daub–15 Daub–20
Average discarded information — Percentage [%] zero mirror circular zero mirror circular padding padding convol. padding padding convol. Threshold λ = 10 Threshold λ = 20 72.0 72.3 72.0 83.2 84.3 84.0 71.8 72.7 72.5 83.5 84.3 84.3 72.3 72.5 71.8 83.8 84.0 84.0 72.7 72.0 72.0 84.0 83.8 84.2 74.8 67.8 71.0 85.0 79.8 83.7 78.0 63.7 69.8 87.0 76.2 83.3 80.0 58.8 69.7 88.0 71.2 83.5 Threshold λ = 45 Threshold λ = 85 92.7 93.8 93.7 97.0 97.7 97.8 93.0 93.8 93.8 97.2 97.5 97.7 93.3 93.5 94.0 97.2 97.3 97.8 93.5 93.0 94.2 97.3 97.0 98.0 93.5 90.2 94.2 97.3 94.8 98.0 94.3 87.0 94.0 97.5 92.5 98.0 95.2 82.8 94.2 97.8 89.0 98.7
Discarded Information - Threshold 10
Discarded Information - Threshold 20
100
100 zero-padding mirror-padding circular convolution
90
90
85
85
80 75 70
80 75 70
65
65
60
60
55
55
50
50 4
6
8
10
20
30
40
4
6
8
10
20
Length of Wavelet Filter
Length of Wavelet Filter
Discarded Information - Threshold 45
Discarded Information - Threshold 85
100
30
40
100 zero-padding mirror-padding circular convolution
95
zero-padding mirror-padding circular convolution
95
90
90
85
85 Percentage (%)
Percentage (%)
zero-padding mirror-padding circular convolution
95
Percentage (%)
Percentage (%)
95
80 75 70
80 75 70
65
65
60
60
55
55
50
50 4
6
8
10
20 Length of Wavelet Filter
30
40
4
6
8
10
20 Length of Wavelet Filter
30
40
Fig. 3. Average bitrate heuristic of the test images at the quantization thresholds λ = 10, 20, 45 and 85. The values correspond to Table 4
Image-Feature Based Second Generation Watermarking in Wavelet Domain Song Guoxiang and Wang Weiwei School of Science, Xidian University Xi’an, 710071, P.R.China
Abstract. An image-feature based second generation watermarking scheme is proposed in this paper. A host image is firstly transformed into wavelet coefficients and features are extracted from the lowest approximation. Then a watermark sequence is inserted in all high frequency coefficients corresponding to the extracted featured approximation coefficients. Original host image is not needed in watermarking detection, but the featured approximation coefficients position is necessary for robust detection. The correlation between the embedded watermark and all high frequency coefficients of a possibly corrupted watermarked image corresponding to the approximate coefficients at the same position as the original featured approximation coefficients is calculated and compared to a predefined threshold to see if the watermark is present. Experimental results show the watermark is very robust to common image processing, lossy compression in particular. Keywords: image feature, digital watermarking, wavelet transform
1
Introduction
Lately, multimedia and computer networking have known rapid development and expansion. This created an increasing need for systems that protect the copyright ownership for digital images. Digital watermarking is the embedding of a mark into digital content that can later be, unambiguously, detected to allow assertions about the ownership or provenience of the data. This makes watermarking an emerging technique to prevent digital piracy. To be effective, a watermark must be imperceptible within its host, discrete to prevent unauthorized removal, easily extracted by the owner, and robust to incidental and intentional distortions. Most of the recent work in watermarking can be grouped into two categories: spatial domain methods and frequency domain methods. Kutter et al. [1] refered both the spatial-domain and the transform domain techniques as first generation watermarking schemes and introduced the concept of second generation watermarking schemes which, unlike the first generation watermarking schemes, employ the notion of the data features. For images, features can be edges, corners, textured areas or parts in the image with specific characteristics. Features suitable for watermarking should have three basic properties: First, invariance Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 16–21, 2001. c Springer-Verlag Berlin Heidelberg 2001
Image-Feature Based Second Generation Watermarking in Wavelet Domain
17
to noise (lossy compression, additive, multiplicative noise, ect.) Second, covariance to geometrical transformations (rotation, translation, sub-sampling, change of aspect ratio, etc.) The last, localization (cropping the data should not alter remaining feature points). In this paper, we deal with the wavelet domain image watermarking method with the notion of second generation watermarking scheme. Previous wavelet domain watermarking schemes [2,3,4,5,6,7,8] added a watermark to a selected set of DWT coefficients in chosen subbands. The methods proposed in [2,3,6,8] requires the original image for detection, while the methods in [4,5,7] does not. However, the method [4] needs the embedded position and the corresponding subband label as well as two threshold value. For the method [5], if the watermarked image is tampered, the number of the coefficients that are greater than the larger threshold may not be equal to the size of the embeded watermark, thus there existed a problem for detection in calculating the correlation between the embedded watermark and the coefficients of a possibly modified watermarked image, whose absolute magnitude is above the larger threshold. The method [7] embedded watermarks into all HL and LH coefficients at levels 2 to 4, resulted in poor quality. Based on the concept of second generation watermarking scheme, we propose a wavelet domain watermarking method which embeds watermarks into all high frequency coefficients corresponding to the featured lowest approximation coefficients. First, the host image is transformed using DWT and features are extracted from the lowest approximation using the method in [9]. Then the watermark is embedded into all subband coefficients corresponding to the featured lowest approximate coefficients. Finally, the modified coefficients is inversely transformed to form the watermarked image. In the watermark detection, the original image is not needed, but for more robust detection, the featured lowest approximate coefficients position of the original image is required, which can be encrypted using private key encryption and stored in the image header. The correlation between the embedded watermark and all high frequency coefficients of a possibly corrupted watermarked image corresponding to the lowest approximate coefficients at the same position as the original featured approximation coefficients is calculated and compared to a predefined threshold to see whether the watermark is present or not. Experimental results show that the watermark is very robust to common image processing, lossy compression in particular. Even when the watermarked image is compressed by JPEG with a quality factor of one percent, the watermark is still present.
2
The Proposed Method
The original image is firstly decomposed using DWT with 8 taps Daubechies orthogonal filter [10] until the scale N to obtain multiresolution LHn , HLn , HHn (n = 1, 2, · · ·, N ) and the lowest resolution approximation LLN .There exists a tree structure between the coefficients [11] as shown in Fig.1(for N = 3). The
18
Song Guoxiang and Wang Weiwei
tree relation can be defined as follows: tree(LLN (x, y)) = tree(HLN (x, y)) ∪ tree(LHN (x, y)) ∪ tree(HHN (x, y)) (1) tree(HLn (x, y)) = tree(HLn−1 (2x − 1, 2y − 1)) ∪ tree(HLn−1 (2x, 2y − 1)) (2) ∪ tree(HLn−1 (2x − 1, 2y)) ∪ tree(HLn−1 (2x, 2y)) where n = N, N − 1, · · · , 2. For tree(LHn (x, y)), tree(HHn (x, y))(n = N, N − 1, · · · , 2), the definition is similar to (2). tree(HL1 (x, y)) = HL1 (x, y) tree(LH1 (x, y)) = LH1 (x, y) tree(HH1 (x, y)) = HH1 (x, y) For the experiments reported in this paper, N is taken as N = 4. 2.1
Feature Extraction
We use the method in [9] to extract features of the image. The difference is that we extract features from the lowest approximation components LLN of the DWT of the image, rather than from the original image. Since the size of LLN is 1/(4N ) times that of the original image, the time needed for extracting features is largely reduced.The feature extraction scheme is based on a decomposition of the image using Mexican-Hat wavelets. In two dimensions, the response of the Mexican-Hat mother avelet is defined as: ψ(x, y) = (2 − (x2 + y 2 ))e−(x
2
+y 2 )/2
(3)
The isotropic nature of the Mexican-Hat filter is well suited for detecting pointfeatures. Here we briefly describe the feature-detection procedure as follows: Firstly, define the feature-detection function, Pij (·, ·) as: Pij (k, l) = |Mi (k, l) − γMj (k, l)|
(4)
where Mi (k, l) and Mj (k, l) represent the responses of Mexican-Hat wavelets at the image location (k, l) for scales i and j respectively. For an image A, the wavelet response Mi (k, l) is given by: Mi (k, l) =< (2−i ψ(2−i (k, l))), A >
(5)
where < ·, · > denotes the convolution of its operands. We only consider wavelets on a dyadic scale. Thus, the normalizing constant is given by γ = 2−(i−j) . The operator | · | returns the absolute value of its parameter. Here we take i = 2 and j = 4 as in [9]. Secondly, determine points of local maxima of Pij (·, ·). These maxima correspond to the set of potential feature-points. A circular neighborhood with a radius of 5 points is used to determine the local maxima. Finally, accept a point of local maxima of Pij (·, ·) as a feature-point if the variance of the image-pixels in the neighborhood of the point is higher than a threshold. Here a 7 × 7 neighborhood around the point is used for computing the local variance. A candidate point is accepted as a feature-point if the corresponding local variance is larger than a threshold, which we take as 20.
Image-Feature Based Second Generation Watermarking in Wavelet Domain
2.2
19
Watermark Inserting
The original image I is firstly decomposed using DWT with 8 taps Daubechies orthogonal filter until the scale N = 4 to obtain multiresolution LHn , HLn , HHn (n = 1, 2, · · · , 4) and the lowest resolution approximation LL4 . Then featurepoints are extracted from LL4 using the method in 2.1. If LL4 (x, y) is a featurepoint, then some watermark bits x ∈ X are added to all the children notes of tree(LL4 (x, y)). X stands for a set of watermark x and the elements xl of x are given by the random noise sequence whose probability law has a normal distribution of zero mean and unit variance. Since for every tree(LL4 (x, y))), there are 255 children in all, except for the root, the size of the watermark x, denoted by M , is given by M = 255× the number of feature-points in LL4 ). The specific embedding method is as follows: For every feature-point LL4 (x, y), for every Wl ∈ tree(LL4 (x, y)) and Wl = LL4 (x, y) Wl ← Wl + α|Wl |xl
(6)
where wl and Wl denotes respectively the DWT coefficient of the original and watermarked image,α is a modulating factor, here we take α = 0.2. Finally, inversely transform the modified multiresolution subbands to obtain the watermarked image I . 2.3
Watermark Detection
The original image is not required in the watermark detection, but for more robust detection, the feature-points position of the original image is indeed necessary. Firstly, A possibly corrupted watermarked image I˜ is decomposed as I in ˜ l ∈ tree(LL ˜ 4 (x, y)) ˜ 4 (x, y), all coefficients W 2.2. Then for every feature-point LL ˜ ˜ ˜ ˜ and Wl = LL4 (x, y) are taken out, where LL4 and Wl respectively represents ˜ the lowest resolution approximation and high frequency coefficients of I.We cal˜ culate the correlation z between Wl and all candidates y ∈ X of the embedded watermark x as: M ˜ l yl W z = 1/M (7) l=1
By comparing the correlation with a predefined threshold Sz , which is given in [7] to determine whether a given watermark is present or not. In theory, the threshold Sz is taken as M α |Wl | (8) Sz = 2M l=1
In practice, the watermarked image would be attacked incidentally or intentionally, so for robust detection, the threshold is taken as Sx = r
M α ˜ |Wl |, 0 < r ≤ 1 2M l=1
(9)
20
3
Song Guoxiang and Wang Weiwei
Experimental Results
In order to confirm that the proposed watermarking scheme is effective, we performed some numerical experiments with some gray-scale standard images. Here we describe experimental results for the standard image ”lenna”(512 × 512 pixels, 8 bits/pixel) shown in Fig.2(a). Fig.2(b) shows the watermarked image with parameters α = 0.2, N = 4 and M = 4080. Next, we tested the robustness of the watermark against some common image processing operations on the watermarked image Fig.2(b). Fig.3 is the result of JPEG compression with quality factor of 1. The image after 11 × 11 mean filtering is shown in Fig.4. The image after adding white Gaussian noise of power 40db is shown in Fig.5. Fig.6 is the clipped image with only 25% center data left. Fig.7 shows the result of rotation counter clockwise by 10 degrees. The response of the watermark detector and the corresponding threshold for the untampered and attacked watermarked image are given in Tab.1. The threshold is calculated using the equation (10), where r = 2/3 . As shown in Tab.1, though image degradation is very heavy, the watermark is still easily recovered and the detector response is also well above the threshold. Numerical experiments with the other standard images have also demonstrated similar results.
4
Conclusions
An image-feature based wavelet domain second generation watermarking scheme is proposed in this paper. Experiments show that the watermark is very robust to common image processing, lossy compression and smoothing in particular. Even for the JPEG compressed version of the watermarked image with quality factor of 1%, the feature-points remain salient. Furthermore, we will investigate watermarking method that resistant to geometric attacks.
References 1. M. Kutter, S. K. Bhattacharjee, and T. Ebrahimi, ”Towards second generation watermarking scheme,” Proc. IEEE ICIP’99, Vol.1,1999 16 2. D. Kundur and D. Hatzinakos, ”A robust digital image watermarking method using wavelet-based fusion,” Proc. IEEE ICIP’97, vol.1, 1997, pp.544-547 17 3. X. G. Xia, C. G. Boncelet and G. R. Arce, ”A multiresolution watermark for digital images,” Proc. IEEE ICIP’97, Vol.1,1997, pp.548-551 17 4. H. Inoue, A. Miyazaki, A. Yamamoto, etal., ”A digital watermark bases on the wavelet transform and its robustness on image compression,” Proc. IEEE ICIP’98, Vol.2, 1998, pp.391-423 17 5. R. Dugad, K. Ratakonda and N. Ahuja, ”A new wavelet-based scheme for watermarking image,” Proc. IEEE ICIP’98, vol.2, 1998, pp.419-423 17 6. W. W. Zhu, Z. X. Xiong and Y. Q. Zhang, ”Multiresolution watermarking for images and video: a unified approach,” Proc. IEEE ICIP’98, vol.1, 1998, pp.465468 17
Image-Feature Based Second Generation Watermarking in Wavelet Domain
21
7. H. Inoue, A. Kiomiyazaki and T. Katsura, ”An image watermarking method based on the wavelet transform,” Proc. IEEE ICIP’99, vol.1, 1999, pp.296-300 17, 19 8. J. R. Kim and Y. S. Moon, ”A robust wavelet-based digital watermarking using Level-adaptive thresholding,” Proc. IEEE ICIP’99, vol.2, 1999, pp.226-230 17 9. S. K. Bhattacharjee and M. Kutter, ”Compression tolerant image authentication”, Proc. IEEE ICIP’98, Vol.1,1998 17, 18 10. I. Daubechies, ”Ten Lectures on Wavelets,” CBMS-NSF conference series in applied mathematics, SIAM Ed. 17 11. J. M. Shapiro, ”Embeded image coding using zerotrees of wavelet coefficients,” IEEE trans. On Signal Processing, Vol.41, No.12, 1993, pp.3445-3462 17
A Study on Preconditioning Multiwavelet Systems for Image Compression Wonkoo Kim and Ching-Chung Li University of Pittsburgh, Dept. of Electrical Engineering Pittsburgh, PA 15261, USA
[email protected] [email protected]
Abstract. We present a study on applications of multiwavelet analysis to image compression, where filter coefficients form matrices. As a multiwavelet filter bank has multiple channels of inputs, we investigate the data initialization problem by considering prefilters and postfilters that may give more efficient representations of the decomposed data. The interpolation postfilter and prefilter are formulated, which are capable to provide a better approximate image at each coarser resolution level. A design process is given to obtain both filters having compact supports, if exist. Image compression performances of some multiwavelet systems are studied in comparison to those of single wavelet systems.
1
Nonorthogonal Multiwavelet Subspaces
Let us define a multiresolution analysis of L2 (R) generated by several scaling functions, with an increasing sequence of function subspaces {Vj }j∈Z in L2 (R): {0} ⊂ . . . ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ . . . ⊂ L2 (R).
(1)
Subspaces Vj are generated by a set of scaling functions φ1 , φ2 , . . . , φr (namely, multiscaling functions) such that Vj := closL2 (R) < φm j,k : 1 ≤ m ≤ r, k ∈ Z >,
∀ j ∈ Z,
(2)
2 i.e., Vj is the closure of the linear span of {φm j,k }1≤m≤r, k∈Z in L (R), where j/2 m j φ (2 x − k), φm j,k (x) := 2
∀ x ∈ R.
(3)
Then we have a sequence of multiresolution subspaces {Vj } generated by a set of multiscaling functions, where the resolution gets finer and finer as j increases. ˙ Wj , ∀ j ∈ Z, Let us define inter-spaces Wj ⊂ L2 (R) such that Vj+1 := Vj + ˙ denotes a nonorthogonal direct sum. Wj where the plus sign with a dot (+) is the complement to Vj in Vj+1 , and thus Wj and Wl with j = l are disjoint but may not be orthogonal to each other. If Wj ⊥ Wl , ∀ j = l, we call them Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 22–36, 2001. c Springer-Verlag Berlin Heidelberg 2001
A Study on Preconditioning Multiwavelet Systems for Image Compression
23
semi-orthogonal wavelet spaces [1]. By the nature of construction, subspaces Wj can be generated by r base functions, ψ 1 , ψ 2 , . . . , ψ r that are multiwavelets. The m subspace Wj is the closure of the linear span of {ψj,k }1≤m≤r, k∈Z : m : 1 ≤ m ≤ r, k ∈ Z >, Wj := closL2 (R) < ψj,k
where
m ψj,k (x) := 2j/2 ψ m (2j x − k),
∀ j ∈ Z,
∀ x ∈ R.
(4) (5)
We may express multiscaling functions and multiwavelets as vector functions: 1 1 φ (x) ψ (x) .. .. φ(x) := . , ψ(x) := . , ∀ x ∈ R. (6) φr (x)
ψ r (x)
Also, in vector form, let us define φj,k (x) := 2j/2 φ(2j x − k) and ψ j,k (x) := 2j/2 ψ(2j x − k),
∀ x ∈ R.
(7)
Since the multiscaling functions φm ∈ V0 and the multiwavelets ψ m ∈ W0 1/2 m are all in V1 , and since V1 is generated by {φm φ (2x− k)}1≤m≤r, k∈Z , 1,k (x) = 2 2 there exist two matrix sequences {Hn }n∈Z and {Gn }n∈Z such that we have a two-scale relation for the multiscaling function φ(x): Hn φ(2x − n), x ∈ R, (8) φ(x) = 2 n∈Z
which is also called as a two-scale matrix refinement equation (MRE), and for multiwavelet ψ(x): Gn φ(2x − n), x ∈ R, (9) ψ(x) = 2 n∈Z
where Hn and Gn are r × r square matrices. We are interested in finite sequences of Hn and Gn , namely, FIR (Finite Impulse Response) filter pairs. Using the fractal interpolation, Geronimo, Hardin, and Massopust successfully constructed a very important multiwavelet system [2,3,4] which has two orthogonal multiscaling functions and two orthogonal multiwavelets. Their four matrix coefficients Hn satisfy the MRE for a multiscaling function φ(x): " √ # H0 =
3 10 √
−
2 40
4 2 10 3 − 20
, H1 =
3 0 10 √ 9 2 1 40 2
, H2 =
0
√ 9 2 40
0 0√ 0 , H3 = , 3 − 20 − 402 0
(10)
and other four matrix coefficients Gn generate a multiwavelet ψ(x): " √ # " √ # 9√2 1 √2 2 3 9 2 3 G0 =
− 40 − 20 √ , G1 = 1 − 20 − 3202
40 9 20
− −2 − 40 0 , G2 = 409 3√202 , G3 = 1 0 0 − 20 20 20
(11)
24
2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
Wonkoo Kim and Ching-Chung Li GHM multiscaling function 1
GHM multiscaling function 2
2
1 0.5 0 -0.5 0
0.5
1
(a) φ1
1.5
2
-1
GHM multiwavelet 1
2 1
1
0.5
0.5
0
0
-0.5
-0.5
-1
-1
-1.5
0
0.5
1
(b) φ2
1.5
2
GHM multiwavelet 2
2 1.5
1.5
1.5
-2
-1.5 0
0.5
1
(c) ψ 1
1.5
2
-2
0
0.5
1
(d) ψ 2
1.5
2
Fig. 1. Geronimo-Hardin-Massopust orthogonal multiscaling functions and multiwavelets 2
Othogonal cardinal 2-balanced multiscaling function 1
2
1.5
1.5
1
1
0.5
0.5
0
0
-0.5
0
1
2
3
1
4
(a) φ
5
Othogonal cardinal 2-balanced multiscaling function 2
-0.5
2
Othogonal cardinal 2-balanced multiwavelet 1
1
1
0.5
0.5
0
0
-0.5
-0.5
-1
-1
1
2
3
2
4
(b) φ
5
-2
Othogonal cardinal 2-balanced multiwavelet 2
-1.5
-1.5 0
2 1.5
1.5
0
1
2
3
(c) ψ
1
4
5
-2
0
1
2
3
(d) ψ 2
4
5
Fig. 2. Cardinal 2-balanced orthogonal multiscaling functions and multiwavelets
The GHM (Geronimo-Hardin-Massopust) orthogonal multiscaling functions are shown in Figure 1(a) and (b), and their corresponding orthogonal multiwavelets are shown in (c) and (d). The GHM multiwavelet system has very remarkable properties: its scaling functions and wavelets are orthogonal, very shortly supported, symmetric or antisymmetric, and it has second order approximation so that locally constant and locally linear functions are in Vj . Another example of orthogonal multiwavelet is shown in Figure 2[5,6,7], where multiscaling functions are shown in figures (a) and (b), and multiwavelet functions are shown in figures (c) and (d), respectively. Two scaling functions in each cardinal balanced multiwavelet system are the same functions up to a half integer shift in time, and also the wavelets are the same up to a half integer shift in time. The approximation orders of the cardinal balanced orthogonal multiwavelet systems are 2 for cardinal 2-balanced, 3 for cardinal 3-balanced, and 4 for cardinal 4-balanced systems. The cardinal 2-balanced orthogonal multiwavelet filters are given by −1 −1 H(z) =
b(z) 0.5z , z −5 b(−1/z) 0.5z −2
G(z) =
−b(z) 0.5z , −z −5 b(−1/z) 0.5z −2
(12)
where b(z) = 0.015625+0.123015364784490z −1 +0.46875z −2 −0.121030729568979z −3+ 0.015625z −4 −0.001984635215512z −5 . For more details on cardinal balanced orthogonal multiwavelets, refer to the paper written by I. Selesnick [6]. We should note that a scalar system with one scaling function cannot combine symmetry, orthogonality, and the second order approximation together. Furthermore, the solution of a scalar refinement equation with four coefficients is supported on the interval [0,3], while multiscaling functions with four matrix coefficients can be supported on a shorter interval.
A Study on Preconditioning Multiwavelet Systems for Image Compression
25
˙ W0 , Since all elements of both φ(2x) and φ(2x − 1) are in V1 and V1 = V0 + ˜ n }n∈Z and {G ˜ n }n∈Z such that there exist two 2 matrix sequences {H T ˜ k−2n ˜ Tk−2n ψ(x − n) , ∀ k ∈ Z, H φ(x − n) + G (13) φ(2x − k) = n∈Z
which is called the decomposition relation of φ and ψ.1 ˜ n }, {G ˜ n }), which are We have two pairs of sequences ({Hn }, {Gn }) and ({H ˙ 0 . A carefully chosen pair of unique due to the direct sum relationship V1 = V0 +W sequences ({Hn }, {Gn }) can generate multiscaling functions and multiwavelets and thus multiwavelet subspaces; hence, they can completely characterize a multiwavelet analysis.
2
Multiwavelet Decomposition and Reconstruction
From the formulas (8), (9), and (13), the following signal decomposition and reconstruction algorithms can be derived. Let vj ∈ Vj and wj ∈ Wj so that vj (x) := cj,k · φ(2j x − k) = cTj,k φ(2j x − k); (14) k∈Z
wj (x) :=
k∈Z j
dj,k · ψ(2 x − k) =
k∈Z
dTj,k ψ(2j x − k),
(15)
k∈Z
where · denotes a dot product between two vectors and ·T denotes the transpose operator. The scale factor 2j/2 is not explicitly shown here for simplicity but ˙ Wj−1 , incorporated into the sequences cj,k and djk . By the relation Vj = Vj−1 + vj (x) := vj−1 (x) + wj−1 (x) cj−1,k · φ(2j−1 x − k) + dj−1,k · ψ(2j−1 x − k), = k∈Z
(16) ∀ j ∈ Z.
k∈Z
Thus we have the following recursive decomposition (analysis) formulas: ˜ n−2k cj,n = ˜ −n cj,2k−n , H H ∀ j ∈ Z; cj−1,k = n
dj−1,k =
n
(17)
n
˜ n−2k cj,n = G
˜ −n cj,2k−n , G
∀ j ∈ Z.
(18)
n
An original data sequence c0 (={c0,k }k ) is decomposed into c1 and d1 data sequences, and the sequence c1 is further decomposed into c2 and d2 sequences, etc.. Keeping this process recursively, the original sequence c0 is decomposed into d1 , d2 , d3 , . . . . Note that this process continuously reduces the data size by half for each decomposed sequence but it conserves the total data size. 1
˜ and G ˜ and reversed indexing We here intentionally transposed the matrices of H instead of 2n − k, for some convenience in representing formulas of dual relationship.
26
Wonkoo Kim and Ching-Chung Li
cj
cj
✲ H ˜−
m 2 ✲ cj−1 ❄
✲ 2m ✻
HT
✲ G ˜−
m ✲ dj−1 2 ❄
✲ 2m ✻
GT
❄
m✲ cj +❤ ×2 ✻
(a) Filterbanks derived from multiwavelet analysis ✲ H m m ˜ 2 ✲ cj−1 ✲ ✻ 2 H∗ ❄ ❄ m✲ +❤ ×2 cj ∗ m ✲ G ✲ ✲ m ✻ ˜ 2 2 d G j−1 ❄ ✻ (b) Multiwavelet filterbanks by reverse indexing
Fig. 3. The multiwavelet transform filter banks. Filters are r × r matrices and data paths are r lines, where r = 2 in our examples. The multiwavelet systems (a) and (b) are equivalent, except that filter indices are all reversed between the two systems Let DK , K ≥ 1, be the subsampling (downsampling) operator defined by (DK x)[n] := x[Kn],
(19)
where K is a subsampling rate and x is a sequence of vector-valued samples. The decomposition formulas can be rewritten in the Z-transform domain as ˜ − (z)cj (z), cj−1 (z) = D2 H ˜ − (z)cj (z), dj−1 (z) = D2 G
(20) (21) T
where the superscript − denotes reverse indexing, i.e., H− := H∗ . From the two-scale relations (8), (9) and from (14), (15), we have the following recursive reconstruction (synthesis) formula:
T Hk−2n (22) cj,k = 2 cj−1,n + GTk−2n dj−1,n . n
Let UK , K ≥ 1, be the upsampling operator defined by n n x[ K ], if K is an integer; (UK x)[n] := 0, otherwise,
(23)
where K is an upsampling rate and x is a sequence of vector-valued samples. Then the reconstruction formula can be rewritten in the Z-transform domain as
cj (z) = 2 HT (z)U2 cj−1 (z) + GT (z)U2 dj−1 (z)
(24)
The decomposition and reconstruction systems implemented by multiwavelet filterbanks are shown in Figure 3, where the system (a) is the exact implementation of our equations derived. If we take reverse indexing for all filters, we have the system (b), and the multiwavelet decomposition formulas become ˜ cj−1 (z) = D2 H(z)c j (z), ˜ dj−1 (z) = D2 G(z)c j (z),
(25) (26)
A Study on Preconditioning Multiwavelet Systems for Image Compression
27
and the reconstruction formula becomes cj (z) = 2 [H∗ (z)U2 cj−1 (z) + G∗ (z)U2 dj−1 (z)] .
(27)
Note that the input data cj is a sequence of vector-valued data, every data path has r lines, and filters are r × r matrices. We restrict r = 2 in this study. Constructing a vector-valued sequence cj from a signal or an image is nontrivial. As an 1-D input signal is vectorized, the direction of filter indexing will affect the reconstructed signal in an undesirable way, if the vectorization scheme does not match with filter indexing. This effect does not happen in a scalar wavelet system, whose filters are not matrices. As we do not take reverse indexing for data sequences, we will take the system (a) of Figure 3 in our implementation. A prefilter for the chosen input scheme will be designed later in Section 5.
3
Biorthogonality and Perfect Reconstruction Condition
From the two-scale dilation equations (8), (9), and the decomposition relation (13), we have the following biorthogonality conditions: ˜ ∗ (z) H(z)H ˜ ∗ (z) H(z)G ˜ ∗ (z) G(z)H ˜ ∗ (z) G(z)G
˜ ∗ (−z) = Ir ; + H(−z)H ˜ ∗ (−z) = 0r ; + H(−z)G ˜ ∗ (−z) = 0r ; + G(−z)H ˜ ∗ (−z) = Ir , + G(−z)G
(28) (29) (30) (31)
which completely characterize the biorthogonality between the analysis filter ˜ G) ˜ and the synthesis filter pair (H, G). (Namely, H ⊥ G ˜ and H ˜ ⊥ G.) pair (H, 2 Let Hm (z) denote the modulation matrix of (H, G) as defined by Hm (z) :=
H(z) H(−z) , G(z) G(−z)
(32)
˜ m (z) denote the modulation matrix of (H, ˜ G) ˜ similarly defined, then the and H above biorthogonality condition becomes ∗ ∗ ˜ ∗m (z) = Hm (z)H
H(z) H(−z) G(z) G(−z)
˜ (z) H ˜ ∗ (−z) H
˜ (z) G Ir 0 = = I2r . ˜ 0 Ir G∗ (−z)
(33)
From the decomposition and reconstruction formulas (20), (21) and (24), we have the following perfect reconstruction (PR) condition: ˜ ∗m (z)Hm (z) = c I2r , H
(34)
where c is a non-zero constant (a scale change in the reconstructed signal is allowed). 2
The modulation matrix is also called as the AC (alias component) matrix[8].
28
Wonkoo Kim and Ching-Chung Li
˜ G), ˜ the modulation Theorem 1. For two matrix filter pairs (H, G) and (H, ˜ matrices Hm (z) and Hm (z) are defined by ˜ ˜ H(z) H(−z) ˜ m (z) := H(z) H(−z) . Hm (z) := , H (35) ˜ ˜ G(z) G(−z) G(z) G(−z) Then
˜ ∗m (z) = H ˜ ∗m (z)Hm (z) = c I2r , Hm (z)H
(36)
where c is a nonzero constant, is the necessary and sufficient condition for the ˜ G) ˜ to be biorthogonal and to ensure the two matrix filter pairs (H, G) and (H, perfect reconstruction. If these filter pairs generate multiscaling functions and multiwavelets, then they are biorthogonal. ˜ = H and G ˜ = G, and then For orthogonal filter pairs, we have H Hm (z)H∗m (z) = H∗m (z)Hm (z) = cI2r .
(37)
Hence, Hm (z) is paraunitary (lossless), i.e., unitary for all z on the unit circle.
4
Construction of Biorthogonal Multiwavelets
Plonka and Strela constructed biorthogonal Hermite cubic (piecewise cubic polynomial) multiscaling functions and multiwavelets using the cofactor method [9,10]. The coefficient matrix −1 2 −1 −1 H(z) =
1 4(1 + z ) −2(1 − z )(1 + z ) 16 3(1 − z −1 )(1 + z −1 ) −1 + 4z −1 − z −2
(38)
generates Hermite cubic multiscaling functions, where det H(z) = (1+z −1 )4 /128. ˜ for dual functions is A possible choice of H −1 −2 −3 −2 −3 1 z − 8 + 18z − 8 + z ˜ H(z) = 2z − 8 + 8z −2 − 2z −3 32
−3z + 12 − 12z + 3z −4z + 8 + 24z −1 + 8z 2 − 4z −3
By the biorthogonality conditions, we have −1 −1 2 z ˜ G(z) = 16
and by cofactor method, −1 G(z) =
−4(1 − z ) 6(1 − z −1 )(1 + z −1 ) −1 −1 −(1 − z )(1 + z ) 1 + 4z −1 + z −2
(39)
(40)
1 1 + 8z + 18z −2 + 8z −3 + z −4 −1 − 4z −1 + 4z −3 + z −4 . −1 −3 −4 6 + 24z − 24z − 6z −4 − 8z −1 + 24z −2 − 8z −3 − 4z −4 32 (41)
The Hermite cubic multiscaling functions and multiwavelets generated by H and G are shown in Figure 4 (a)–(d). Their corresponding biorthogonal multiscaling functions and multiwavelets are shown in Figure 4 (e)–(h).
A Study on Preconditioning Multiwavelet Systems for Image Compression Hermite cubic multiscaling function 1
1
0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 0
0.8 0.6 0.4 0.2 0
0
0.5
1
1.5
(a) φ1
2
Multiscaling function dual to Hermite cubics 1
1 0.5 0 -0.5 -1
-1
-0.5 0
0.5
1
1.5
(e) φ˜1
2
2.5
3
0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 0
Hermite cubic multiscaling function 2
Hermite cubic multiwavelet 1
1.2
1
0.8
0.5 0
0.6
-0.5
0.4
-1
0.2
1
(b) φ2
1.5
2
Hermite cubic multiscaling function 2
0.5
1
1.5
(f) φ˜2
2
0
Hermite cubic multiwavelet 2
2 1.5
1
0.5
29
-1.5
0
1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1
0.5
1
1.5
(c) ψ 1
2
2.5
3
Multiwavelet dual to Hermite cubics 1
0
0.5
1
1.5
(g) ψ˜1
2
2.5
3
-2
0
0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -0.05 0
0.5
1
1.5
(d) ψ 2
2
2.5
3
Multiwavelet dual to Hermite cubics 2
0.5
1
1.5
(h) ψ˜2
2
2.5
3
Fig. 4. Hermite cubics and their dual multiwavelets
5
Preconditioning Multiwavelet Systems
In this section we consider multiwavelet systems that analyze discrete data, and investigate how to precondition a multiwavelet system by prefiltering input data, which is not necessary for the case of single (or scalar) wavelet systems. 5.1
Prefilters and Postfilters
Consider the multiwavelet series expansion: fj (t) := cTj,k φ(2j t − k)
(42)
k
From a given 1-D signal x[n], construct a vector-valued sequence x[n] by x[nr] .. x[n] := , r ≥ 1 .
(43)
x[nr + r − 1] Let us define a prefilter Q(z), which maps a vector-valued sequence space onto itself, such that the coefficient vector sequence c0,k is obtained by filtering x[n]: c0 (z) = Q(z)x(z)
(44)
For any j ≤ 0, cj,k is decomposed to {cj−1,k , dj−1,k } by a layer of multiwavelet decomposition. Recursive multiwavelet decompositions down to a resolution level J < 0 give us a set of decomposed data sequences cJ,k and {dj,k }J≤j<0 . Recursive multiwavelet reconstruction from the decomposed data set gives the original coefficient vector c0,k . Then x(z) is reconstructed by applying a postfilter P(z): x(z) = P(z)c0 (z)
(45)
30
Wonkoo Kim and Ching-Chung Li
✲↓ ❧ 2 ✲ z −1 xj [n]
✻ ✲↓ ❧ 2 ✲
✲ c1j [n]
c1j [n] ✲
✲↑ ❧ 2✲+❤✲ xj [n] ✻ P(z) z −1 ✻ ✲↑ ❧ 2 c2j [n] ✲
Q(z)
✲ c2j [n]
(a) Prefilter
(b) Postfilter
Fig. 5. Prefilter and postfilter blocks. A unit delay and downsampling in a prefilter block (a) vectorize the 1-D input data sequence xj [n] to a vector-valued sequence, where the prefilter output [c1j [n] c2j [n]] is the input to multiwavelet decomposition filter banks. A unit delay and upsampling in a postfilter block (b) serialize the two-channel postfilter output vector sequence to the 1-D output signal xj [n], where [c1j [n] c2j [n]] are from the outputs of multiwavelet reconstruction filter banks The postfilter P must be an inverse of the prefilter Q up to some unit delays for the perfect reconstruction: P(z)Q(z) = z −l I,
for some integer l.
We may assume l = 0 (no delay) for convenience. Define x0 (z) := x(z) and xj (z) := P(z)cj (z).
(46)
(47)
Then {xj }j<0 are the projections of x into (discrete-time) multiscaling spaces at lower resolutions. This implies that a postfilter should be applied to a coefficient vector cj if we want to see a decomposed signal at the resolution level j < 0. For an r-channel multiwavelet system, the construction of a vector-valued input sequence from an 1-D signal can be implemented in a prefilter block by serial-to-parallel conversion (vectorization) by using r − 1 unit delays and then downsampling each channel at the rate r. The block diagrams of a prefilter and a postfilter blocks for a 2-channel multiwavelet system are shown in Figure 5. 5.2
Interpolation Prefilter and Postfilter
In the multiwavelet case, in order to avoid the undesirable visual effect, we need a prefilter that computes multiscaling coefficient sequence c0,k from a discretetime input signal before starting the multiwavelet decomposition[11,12,13]. In this section, we develop a process of finding a pair of prefilter and postfilter such that cT0,k φ(t − k) (48) f0 (t) := k
interpolates an original signal x0 [n]. Since we have r scaling functions, a continuous-time signal f0 (t) is sampled at the interval of 1/r at the 0-th resolution level: n n n cT0,k φ( − k) = φ( − k)T c0,k , (49) f0 ( ) = r r r k∈Z
k∈Z
A Study on Preconditioning Multiwavelet Systems for Image Compression
31
and we impose an interpolation property by f0 ( nr ) = x0 [n]. We construct vectorvalued sequences f 0 [n] and x0 [n] from the sampled sequence f0 (n/r) and the 1-D signal x0 [n], respectively: f0 (n) x0 [nr] f0 (n + 1 ) x0 [nr + 1] r x0 [n] := (50) f 0 [n] := , , .. .. . . f0 (n + r−1 x0 [nr + r − 1] r ) then the interpolation condition f0 (n/r) = x0 [n] gives the following relation: f 0 [n] = x0 [n] = Pn−k c0 [k] = Pk c0 [n − k], (51) k∈Z
k∈Z
where Pn is an r × r matrix sequence and defined by φ(n)T φ(n + 1 )T r Pn := . .. . φ(n +
(52)
r−1 T r )
This is an interpolation postfilter that maps the space of scaling coefficients cj [k] to the space of sampled signals f j [n]. At any resolution level j, a decomposed signal can be obtained by filtering scaling coefficients cj [k] by the postfilter Pn : xj [n] = Pn−k cj [k] = Pk cj [n − k]. (53) k∈Z
k∈Z
This relation is expressed in the Z-transform domain as xj (z) = P(z)cj (z),
(54)
where P(z) := n Pn z −n . By (52), Pn is a finite sequence (FIR filter) if the scaling vector function φ is compactly supported. We define a prefilter Q(z) such that Q(z)P(z) = P(z)Q(z) = Ir .
(55)
Then the scaling coefficient cj (z) is obtained by filtering the signal xj (z): cj (z) = Q(z)xj (z).
(56)
To have an FIR solution to the above condition (55), det(P(z)) must have the form of det(P(z)) = αz −l , where α is a constant and l is an integer. For the GHM orthogonal multiwavelet system, an interpolation postfilter P is obtained from the GHM scaling functions (Figure 1(a) & (b)): −1 P(z) =
0 1.73210618015z 1.95965444133 −0.519631854046 − 0.519631854046z −1
(57)
32
Wonkoo Kim and Ching-Chung Li
The corresponding prefilter Q is computed from the condition P(z)Q(z) = I, Q(z) = P−1 (z) =
0.1530923245z + 0.1530923245 0.5103077369 0.5773517497z 0.
(58)
For the cardinal 2-balanced (also 3-balanced or 4-balanced) orthogonal multiwavelet system, we obtain the postfilter and prefilter as √ −2 √ 0 P(z) = √ −1 2z
2z 0
,
Q(z) =
0√ z/ 2 . z2/ 2 0
(59)
The biorthogonal Hermite cubic multiwavelet system does not give a stable prefilter for an interpolation postfilter. In this case, we need to design a different pair of prefilter and postfilter for those systems. One possible solution is to design an orthogonal prefilter. 5.3
Orthogonal Prefilter A prefilter Q(z) := n Qn z −n is said to be orthogonal if
Q ∗ c = c
(60)
for all c ∈ 2 (Z)r , where Q is an impulse response (a sequence of r × r matrices) of Q(z) and ∗ denotes a discrete (matrix) convolution operator. The above condition Q(z)c(z) = c(z) is equivalent to the paraunitary condition of Q(z): Q(z)Q(z −1 )T = I.
(61)
An FIR filter Q(z) is paraunitary if and only if it is of the form Q(z) = Q(1)
N
(I − Pi + Pi z i ),
(62)
i=1
where Q(1) is an orthogonal (unitary) matrix, "i = ±1, and Pi for i = 1, ..., N are orthogonal (unitary) matrices [8]. Higher approximation orders will give quite complex relations, so here we consider a prefilter only up to the approximation order 2. Then, for a minimal filter length (N = 2), we need to find P1 and P2 such that Q(z) = Q(1)(I − P1 + P1 z)(I − P2 + P2 z) satisfies the above orthogonality condition. A delay factor z −2 may be introduced to make Q(z) causal. An example of orthogonal prefilter of approximation order 2 for the GHM orthogonal multiwavelet system is given by Q(z) := Q0 + Q1 z −1 , where 0.11942337067748 0.99158171438258 , Q0 = 0.04967860804828 −0.00598315472909 −0.00598315472909 −0.04967860804828 Q1 = . (63) 0.9915817143825 −0.11942337067748
A Study on Preconditioning Multiwavelet Systems for Image Compression
33
Table 1. Compression performances of wavelet systems
CR 2 4 8 16 32 64 128 256 Prefilter
6
PSNR [dB] Multiwavelets Orthogonal Biorth. GHM (i) GHM (o) CardBal2 H-Cubics 48.929 47.933 48.317 44.262 41.012 40.500 41.327 36.964 36.126 35.717 37.041 32.212 32.259 31.887 32.922 29.004 28.786 28.348 29.296 26.730 26.031 25.590 26.070 23.847 23.379 23.036 23.381 21.414 20.566 20.572 20.785 20.453 Inter. Orth. Inter. Orth.
Single Wavelets Orthogonal Biorth. D4 D6 Bin9-7 47.410 48.232 49.162 39.483 41.233 42.388 34.762 36.874 38.626 30.956 32.568 35.481 27.617 28.810 31.799 24.991 25.532 27.535 22.688 23.106 24.121 20.559 20.672 20.712 N/A
Compression Performances
Multiwavelet systems have been explored for applications to data compression and image processing [5,13,14,15,16]. With the prefilers and postfilters that we have designed for multiwavelet systems, we have investigated the applications of these systems to image compression and examined their compression performances. Experimental studies are performed on the level of compression performances of three multiwavelet systems (two orthogonal multiwavelets, GHM and cardinal balanced, and one biorthogonal multiwavelet, Hermite cubics) in comparison to some single wavelet systems (Daubechies’ D4 and D6 orthogonal wavelets and binary 9-7 biorthogonal wavelet). We consider a simple compression scheme with a uniform quantizer, which removes a certain number of small values from highpassed subimages but keeps the larger values to achieve a specified compression ratio (CR). We used six test images (5125128-bit) of Lena, Airplane, Baboon, Peppers, Sailboat, and Wavy in our experiments. Our experiments suggested that wavelet decomposition up to the 3rd or 4th level would give a reasonably high compression ratio and a good reconstruction. To describe the image fidelity, PSNR (peak signal-to-noise ratio) is defined by M M 1 (64) (f [i, j] − s[i, j])2 , PSNR [dB] := 20 log 255/ M N i=1 j=1 where f is a M × N noisy or distorted image (decompressed or reconstructed image) and s is the M × N original image. The PSNR values shown in Table 1 are the average values taken from the experimental results for the six test images at each given compression ratio. The image compression performances of orthogonal wavelet systems are shown in Figure 6(b) and some biorthogonal wavelet systems in Figure 6(a). In orthoronal wavelet systems, multiwavelet systems perform better than single wavelet systems with comparable support lengths.
34
Wonkoo Kim and Ching-Chung Li
Compression Performance 50
GHM (int-pre) GHM (orth-pre) CardBal2 (int-pre) H-Cubics (orth-pre) D4 D6 Bin9-7
45
PSNR [dB]
40
35
30
25
20 2
4
8
16 32 Compression Ratio
64
128
256
(a) Biorthogonal systems Compression Performance 50
GHM (int-pre) GHM (orth-pre) CardBal2 (int-pre) D4 D6
45
PSNR [dB]
40
35
30 28dB 25
20 2
4
8
16 32 Compression Ratio
64
128
256
(b) Orthogonal systems Fig. 6. Compression performances of wavelet systems
However, the binary 9-7 biorthogonal single wavelet system significantly outperforms other wavelet systems, because it has a higher order of approximation and symmetric functions. The biorthogonal Hermite cubic multiwavelet system with an orthogonal prefilter of approximation order of 2 did not give a desirable compression performance. The reason is that this orthogonal prefilter is not a good approximation to an interpolation prefilter because of its lower approximation order (only 2) while the Hermite cubics have the 4th order approximation. We have yet to find a good biorthogonal multiwavelet filters and prefilters.
A Study on Preconditioning Multiwavelet Systems for Image Compression
7
35
Conclusion
In this paper, multiwavelet systems are applied to image compression. Each line of image data is vectorized for r channel inputs of a multiwavelet system. A general method of prefiltering the inputs has been formulated to provide data to the multiwavelet filter bank, which should enable the reconstruction of the original data after postfiltering. A design process for interpolation prefilter-postfilter, if exist, has been developed, which will provide a better approximation image at each coarser resolution level. These filters must be of the finite impulse response type, or else, an orthogonal prefilter of some approximation order can be designed. The prefilters and postfilters have been designed for 3 multiwavelet systems (GHM, cardinal balanced, and Hermite cubics). Using these filters, image compression performances of orthogonal multiwavelet systems have been shown to be better than those of the scalar orthogonal wavelet systems.
References 1. Chui, C. K.: An Introduction to Wavelets. Volume 1 of Wavelet Analysis and Its Applications. Academic Press (1992) 23 2. Geronimo, J. S., Hardin, D. P., Massopust, P. R.: Fractal functions and wavelet expansions based on several scaling functions. Journal of Approximation Theory 78 (1994) 373–401 23 3. Donovan, G. C., Geronimo, J., Hardin, D. P.: Intertwining multiresolution analyses and the construction of piecewise polynomial wavelets. SIAM Journal of Mathematical Analysis 27 (1996) 1791–1815 23 4. Donovan, G., Geronimo, J. S., Hardin, D. P., Massopust, P. R.: Construction of orthogonal wavelets using fractal interpolation functions. SIAM Journal of Mathematical Analysis 27 (1996) 1158–1192 23 5. Strela, V., Walden, A. T.: Orthogonal and biorthogonal multiwavelets for signal denoising and image compression. SPIE Proc. 3391 AeroSense 98, Orlando, Florida, April 1998 (1998) 24, 33 6. Selesnick, I. W.: Interpolating multiwavelet bases and the sampling theorem. IEEE Trans. on Signal Processing 47 (1999) 1615–1621 24 7. Chui, C. K., Lian, J.: A study on orthonormal multi-wavelets. J. Appl. Numer. Math. 20 (1996) 273–298 24 8. Vaidyanathan, P. P.: Multirate Systems and Filter Banks. Prentice-Hall, New Jersey (1993) 27, 32 9. Strela, V.: Multiwavelets: Theory and Applications. PhD thesis, Massachusetts Institute of Technology, Cambridge, Mass. (1996) 28 10. Plonka, G., Strela, V.: Construction of multiscaling functions with approximation and symmetry. SIAM Journal of Mathematical Analysis 29 (1998) 481–510 28 11. Xia, X. G., Geronimo, J. S., Hardin, D. P., Suter, B. W.: Design of prefilters for discrete multiwavelet transforms. IEEE Trans. on Signal Processing 44 (1996) 25–35 30 12. Xia, X. G.: A new prefilter design for discrete multiwavelet transforms. IEEE Trans. on Signal Processing 46 (1998) 1558–1570 30 13. Miller, J. T., Li, C. C.: Adaptive multiwavelet initialization. IEEE Trans. on Signal Processing 46 (1998) 3282–3291 30, 33
36
Wonkoo Kim and Ching-Chung Li
14. Xia, T., Jiang, Q.: Optimal multifilter banks: design, related symmetric extension transform and application to image compression. IEEE Trans. on Signal Processing 47 (1999) 1878–1889 33 15. Jiang, Q.: On the design of multifilter banks and orthogonal multiwavelet bases. IEEE Trans. on Signal Processing 46 (1998) 3292–3303 33 16. Strela, V., Heller, P., Strang, G., Topiwala, P., Heil, C.: The application of multiwavelet filter banks to image processing. IEEE Trans. on Image Processing 8 (1999) 548–563 33
! " # $%&'( ')**+*',% -.# $%&'( ',*'+%/,0 # 1 233
334
1+ 5 1 4 3 . 1 4 5 1 4 3 6
4 . 1 4 13 " 1 5 5 . 3
!"
# $ % $ !" % " " " & '( '()*+ #+,)*+,- #! " $$ %% % % $ % !" " "" " $ " $ " $ $ " " $ " $ %. " " % / 0 123 " $ 4)5 $ $ % % " $ $% " ! 4+5 $ " % !" " ! % " $$ ! 4-5 $ !" " "" % % " " $ 465 $ " "" 7 " ! $ $ " !"" " "% $ % "" % % " % $ 8 % " ! " % %% $ $ " " $ % " 9 2 % 0923 " % 7 % " " $ $ " $ % %%
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 37-43, 2001. c Springer-Verlag Berlin Heidelberg 2001
38
Wing-kuen Ling and Peter Kwong-shun Tam
7 " % % " $ " $
% !" $ $ ⋅⋅⋅⋅ !" " $ " % $ " % ' " $ .
= ∑ ∑ ⋅ ⋅
0)3
= =
!" " ! % " ! % " ! % $ % ! %!. ⋅ = ⋅ ⋅
⋅
⋅
⋅
⋅ ⋅ ⋅ ⋅
0+3
" $ % % % % !" ⋅⋅ !" " % $ ⋅⋅ !" !"
8 "% " $$ % $" " " $
" % " "% $ " "% $ $ % 4:5 %!. =
0-3
7% " % " $$ "" " " ;2 %% % " $$ " " % % " "% $ $ % 2 ! " % %% 0 3 " "% $
! "
$ " ! % " "% $ $ " !
% " % $ $ " ! % " "% $ $ " ! % " % $ < " $ " " % %% " "% $ " $ % " " % 7 " $ % " " $ % " $ $ # ! " % " "% $ ! " < " "
Reduction of Blocking Artifacts in Both Spatial Domain and Transformed Domain
39
! " $ ! " $ " 7 " % %% " "" % %% % " $ % $ " $ ! "" % # " $ % ! " " "" % " % < " % %% ! " $ ! " ; " " $ ! " ! 0+3 $ ! " % ! %!. ⋅( −)+ ⋅( −)+ = ⋅( −)+ ⋅( −)+
⋅( −)+ ⋅( −)+ + ⋅ ⋅( −)+ ⋅( −)+
⋅
063
!" " ! % " " $ $ " % % ⋅ ⋅ !" " ! % " "" % " " % % " "% $ " ! % " % ! % " 7 % " ! ! % %% 03 % " % % %% 03 " %!. " % ! " ; ! " "" % % " "% $ ! " " " ⋅ ⋅
⋅ ⇒ ⋅ < ) "! " $ % " "
6 4 +1 4 5 " " 9 2 % 0923 " % 03 %!. 7 $ "! " " 92 %% % " $ " % " ! % % " # ! =4 5 ; =4 5 ⋅ 7
$
"!
"
8 " 792 "
⋅
⋅ !" " 92
" 792 $ "! " " " $
40
Wing-kuen Ling and Peter Kwong-shun Tam
% " $ " % % ! !" ! "! % + " % $ ! " " " " $ " " $ %
"
" +1 4 5 " # $ "! " " % %% % " $ " % # ! =4 5 " %% 03 " " $% % 7 $ "! " ! " $ 03 ! $ % ! $ !$ $ $ "! % -
Reduction of Blocking Artifacts in Both Spatial Domain and Transformed Domain
41
%
+1 4 5
#
$ " 92 $ " " $ " " & '(
>%%? >2? >? % " " " %% % " " $ $" < " " $ % " " " % " % " ! $ " % " " $$ " $ % $ " " 777 "% ! " %! " " . 7% " % "$ $ 92 $ " " " $ " ! " " % $ " 0#$%3 % " $ %!. =
⋅ ∑ [ ( ) − ( )] ( )∈
0:3
!" & " ' " ( " !" " % "$ 92 $ & " $ % ( $ ) "! " % " % " $ " 7 $ % $ ) " " $ " " % " " " "! " % 6 "! " " $ " " % " "
42
Wing-kuen Ling and Peter Kwong-shun Tam
%#
! ! "" # $ #
!
.
"#$%&'()*+,"'/(''()*+,0$"$ 1&'()* 0$21'$ 1&'()*
!
! ! .
! , ,.
,
! .
. ! !,
,!
)# 7 1 .
%
& 7 " ! $ $ " !"" " "%
$ "" % " % $ 8 % " ! " % %% ! $ $ " " $ " " % "" $$ " $ % "" "" "
Reduction of Blocking Artifacts in Both Spatial Domain and Transformed Domain
43
$$ " " $ " $ " " " " 9 2 % " % "! " " $ % % <" " ! ! " %% " % " $ % " $ % % %% " % " " % %% ! $
! ' " ! $ " ! $ $ % " # " @ !" $ (AB,C
) D" ;. 7 % % $ %% % 7''' 2 % A " A + E ) 0)BB+3 B)B: + # 9 #. " 1. % !" $ %% 7''' ; " A -F E 6 0)BCB3 ::-::B - G # 2 & . G % $ %% 1 ' A +- E ) 0)BC63 -6-F 6 H # 2 # . 8 %% % & '( $ % 7''' 7 A F E + 0)BBC3 ++B+-6 : D 8. ; " % $ %% 92 % " )BBB 7''' 7 2 072;3 A 6 0)BBB3 6,6B
Simple and Fast Subband De-blocking Technique by Discarding the High Band Signals Wing-kuen Ling and P. K. S. Tam
Department of Electronic and Information Engineering The Hong Kong Polytechnic University Hung Hom, Kowloon, Hong Kong Hong Kong Special Administrative Region, China Tel: (852) 2766-6238, Fax: (852) 2362-8439 Email:
[email protected]
Abstract. In this paper, we propose a simple and fast post-processing de-blocking technique to reduce blocking artifacts. The block-based coded image is first decomposed into several subbands. Only the low frequency subband signals are retained and the high frequency subband signals are discarded. The remaining subband signals are then reconstructed to obtain a less blocky image. The ideas are demonstrated by a cosine filter bank and a modulated sine filter bank. The simulation result shows that the proposed algorithm is effective in the reduction of blocking artifacts.
1
Introduction Transform codecs, such as those based on the Discrete Cosine Transform (DCT), are simple codecs widely applied in the
industry. However, they usually produce undesirable blocking artifacts at high compression ratios. This is because each block in an image is transformed independently, and the correlation among adjacent blocks is not exploited. Thus, at a high compression ratio, quantization errors lead to blocking artifacts. In order to tackle this problem, the lapped transform before encoding was proposed to capture the correlation information among the adjacent blocks [8]. However, this pre-processing technique requires a decrease of compression ratio and so it is not adopted in the international standard. Some subband de-blocking techniques [2, 3, 4] have also been proposed, but they are too complex in terms of implementation and computation. In this paper, we propose a simple and fast post-processing subband de-blocking technique, which discards some high band signals and retains the remaining low band signals. The algorithm is tested by the cosine filter banks and the modulated sine filter banks. The simulation results show that this algorithm can suppress the blocking artifact effectively in both the quantitative measurement and the qualitative evaluation.
2
De-blocking System Since a block-based transform and a lapped transform can be viewed as a discrete time linear time periodic varying system,
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 44-48, 2001. c Springer-Verlag Berlin Heidelberg 2001
Simple and Fast Subband De-blocking Technique
45
it can be realized by a filter bank structure [1]. Due to the fact that block edges always contain high frequency components [5], we propose to retain the low frequency band signals and discard the high band frequency signals. The more low band signals are retained, the more block boundaries will be captured in the reconstructed image. However, the image details will be destroyed if we only keep a very little subset of the subband signals. We have conducted an intensive simulation and found that the best performance corresponds to retain two subband signals and discard the remaining high band signals. The block diagram of the subband de-blocking system is shown in figure 1. There are many ways to select the analysis filters, h j[n], for j=0,1, …,M, and the synthesis filters, fj[n], for j=0,1,…,M, where the quantizers are designed as Qj(x)=x, for j=0,1, and Q j(x)=0, for j=2,3, …,M. The design of the filters should give a perfect reconstruction system when the quantizers are removed. This is because the error introduced due to the filter bank structure is illuminated in the perfect reconstruction system. In this paper, a cosine filter bank [6] and a modulated sine filter bank [7] are selected to demonstrate this idea. v1 [n] h0 [n]
w0 [n] ↓M
v0 [n]
q0 [n] ↑M
w1 [n] ↓M
h1 [n]
p0 [n] Q0
f 0 [n]
q1 [n]
p1 [n] ↑M
Q1
f M-1[n]
z[n]
x[n]
vM-1[n] hM-1[n]
↓M
pM-1[n]
wM-1[n]
qM-1[n] ↑M
QM-1
f 1 [n]
Fig. 1. Block diagram of subband de-blocking technique
2.1
Cosine Filter Bank
The impulse responses of the synthesis filters, fj[n], for j=0,1, …,7, are the transform basis functions of the DCT and the impulse responses of the analysis filters, h j[n], for j=0,1, …,7, are equal to the time-reversed basis functions [6] as follows: p ⋅ j ⋅ (2 ⋅ n + 1) , h j [n ] = aj ⋅ cos 16
(1)
for j=0,1,º,7 and for n=0,1,º,7, where: aj =
1 8 1 2
; j = 0, ; otherwise,
p ⋅ j ⋅ (15 − 2 ⋅ n ) , f j [n ] = aj ⋅ cos 16
(2)
for j=0,1,º,7 and for n=0,1,º,7.
2.2
Modulated Sine Filter Bank
The modulated sine filter bank is similar to the cosine filter bank except that the impulse responses of the synthesis filters, fj[n], for j=0,1, …,7, are the transform basis functions of the modulated sine transform and the impulse responses of the analysis filters, h j[n], for j=0,1, …,7, are equal to its time-reversed basis functions [7] as follows: h j [n ] =
p 15 1 1 23 p ⋅ sin ⋅ − n ⋅ cos ⋅ j + ⋅ − n , 2 2 2 16 2 8
(3)
46
Wing-kuen Ling and Peter Kwong-shun Tam
for j=0,1,º,7 and for n=0,1,º,7, f j [n ] =
p p 1 1 1 9 , ⋅ sin ⋅ n + ⋅ cos ⋅ j + ⋅ n + 2 2 2 2 8 16
(4)
for j=0,1,º,7 and for n=0,1,º,7.
3
Simulation Results The proposed de-blocking technique is applied to the JPEG-coded image “Cancer” of size 512x512 adaptively. The
effectiveness of the proposed algorithm can be estimated by both the quantitative measurement and the qualitative evaluation. For the quantitative measurement, the blocking artifact is mainly due to the grid noise in the monotone areas. Since the intensity of the monotone areas of most natural image change very slowly, but there is a tendency for the intensity in the blockbased coded image to change abruptly from one block to another, we propose the following methodology to measure this effect: If the four neighbor 8x8 image blocks are all DC blocks, that is, all the pixel values in the individual blocks are constant, then we sum up the error square in these four blocks, and finally we compute the mean square error (MSE) of all these blocks as follows: MSE =
1 , ⋅ ∑ [R(i, j ) − O (i, j )]2 N (i , j )∈Q
(5)
where O is the original image, R is the reconstructed image, Q is the region where there are four neighbor 8x8 DC blocks and N is the total number of pixels in Q. Table 1 shows the comparison of the results of applying existing methods and our proposed de-blocking technique. It can be seen from table 1 that our proposed algorithm gives better quantitative results than that of the existing methods. The qualitative results shown in figure 2 also demonstrates that our proposed algorithm gives a better image quality than that of the existing methods.
JPEG coded image DCT zero-masking technique [5] DCT coefficient weighting technique [5] Cosine de-blocking technique Modulated sine de-blocking technique
Cancer(0.139bpp) 22.6819 19.2036 19.0579 17.9483 18.4175
Table 1. Simulation results calculated by MSE of applying existing methods and our proposed algorithms
4
Concluding Remarks In this paper, we have proposed a simple and fast post-processing subband de-blocking technique, which discards the high
band signals and only retains the lowest two low band signals. This algorithm is tested by a cosine filter bank and a modulated sine filter bank. The simulation results show that our proposed method is very effective. Since it adopts the existing transform codec and do not affect the compression ratio, the proposed algorithm can be applied to the enhancement of very high compression ratio block-based coded images. The given image can be first compressed to a very high compression ratio image through the block-based coder, and then the blocky image is enhanced by the proposed algorithm. Further research work will focus on the finding of the best filter bank that gives the highest coding gain.
Simple and Fast Subband De-blocking Technique
JPEG-coded Image
Original Image
Image Processed by Zero-masking Technique
50
50
50 100
100
100
150
150
150
200
200
200
250
250
250
300
300
300
350
350
350
400
400
400
450
450
450
500
100
200
300
400
500
500
Image Processed by DCT Coefficient Weighting Technique
100
200
300
400
500
Image Processed by Cosine De-blocking Technique
500
50
50
50
100
100
150
150
150
200
200
200
250
250
250
300
300
300
350
350
350
400
400
400
450
450
450
500
100
200
300
400
500
100
200
300
400
500
Image Processed by Modulated Sine De-blocking Technique
100
500
47
500 100
200
300
400
500
100
200
300
400
500
Fig. 2. Simulation results of the comparison of the existing methods and our proposed algorithms
Acknowledgement The work described in this paper was substantially supported by a grant from the Hong Kong Polytechnic University with account number G-V968.
References 1. Malvar H. S.: Extended Lapped Transforms: Properties, Applications, and Fast Algorithms. IEEE Transactions on Signal Processing, Vol. 40, No. 11. (1992) 2703-2714. 2. Sung W. H., Chan Y. H. and Siu W. C.: Subband Adaptive Regularization Method for Removing Blocking Effect. in Proc. ICIP, Vol. 2. (1995) 523-526. 3. Rabiee H. R. and Kashyap R. L.: Image De-blocking with Wavelet-Based Multiresolution Analysis and Spatially Variant OS Filters. in Proc. ICIP, Vol. 1. (1997) 318-321. 4. Hsung T. C., Lun P. K. and Siu W. C.: A Deblocking Technique for JPEG Decoded Image Using Wavelet Transform Modulus Maxima Representation. in Proc. ICIP, Vol. 1. (1996) 561-564. 5. Ling W. K. and Zeng B.: A Novel Method for Blocking Effect Reduction in DCT-Coded Images. in Proc. ISCAS, Vol. 4. (1999) 46-49. 6. Malvar H. S.: Lapped Transforms for Efficient Transform/ Subband Coding. IEEE Transactions on Acoustics, Speech, and
48
Wing-kuen Ling and Peter Kwong-shun Tam
Signal Processing, Vol. 38, No. 6. (1990) 969-978. 7. Malvar H. S.: Efficient Signal Coding with Hierarchical Lapped Transforms. in Proc. ICASSP, Vol. 3. (1990) 1519-1522. 8. Malvar H. S. and Staelin D. H.: The LOT: Transform coding without blocking effects. IEEE Transactions on Acoustics, Speech, and Signal processing, Vol. 37. (1989) 553-559.
A Method with Scattered Data Spline and Wavelets for Image Compression Guan L¨ utai and Lu Feng Department of Scientific Computing and Computer Applications Zhongshan University, Guangzhou 510275, P. R. China
Abstract. This paper presents a method for image compression. First, selecting some scattered data points on some lines of a plane to construct an interpolating spline surface approach to the image, then, one kind of wavelets for this spline function is given. By different codes to spline and wavelets, an image compression finished.
1
Introduction
In [1], we discussed spline-wavelets of plane scattered data for data compression. The basic idea is using spline interpolation first, then, by spline-wavelets for data compression. From [3]-[7], some different multivariate spline interpolation for scattered data were given. Think of image data be in some lines, we can simplify the local support multivariate splines to scattered data in Hilbert space of [1] in this case. In this paper, a method for image compression is presented. To select some scattered data points on some lines to construct an interpolating spline approach surface for the image first, then to give a spline-wavelets decomposition for the spline function using different codes to the spline and wavelets to finish image compression.
2
Polynomial Natural Spline Local Basis Interpolation for Large Scattered Data on Some Lines
Problem I: Given scattered data points on some lines of a plane (xi , yi,j ), i = 1, 2, . . . n0 ; j = 1, 2, . . . mi and realnumbers zi,j , i = 1, 2, . . . n0 ; j = 1, 2, . . . mi , find a function f (x, y) ∈ H mn (R) Dz satisfying J1 (f ) =
min
u∈H mn (R)
where : H mn (R) = {u(x, y)|
Dz
J1 (u)
∂ m+n (u) ∂ α+β (u) ∈ L2 (R), m n ∂x ∂y ∂xα ∂y β
This work is supported by Natural Science Foundation of Guangdong(9902275), Foundation of Zhongshan University Advanced Research Centre.
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 49–53, 2001. c Springer-Verlag Berlin Heidelberg 2001
50
Guan L¨ utai and Lu Feng
is an absolutely continuous function, α = 0, . . . , m − 1, β = 0, . . . , n − 1; (x, y) ∈ R = [a, b] × [c, d]}, Dz = {u(x, y)|u(xi , yij ) = zij , i = 1, . . . n0 , j = 1, . . . mi }, Let: u(m,n) (x, y) = J1 (u) =
∂ m+n u(x,y) ∂xm ∂y n ,
(u(m,n) (x, y))2 dxdy +
R2
b n−1
a ν=0
(u(m,ν) (x, c))2 dx
d m−1
+ c
(u(µ,n) (a, y))2 dy
µ=0
We call the solution of this problem I natural interpolation spline function for scattered data on some lines. Theorem 1 A natural interpolation spline function for scattered data on some lines f (x, y) has the following explicit and closed-form expression: f (x, y) =
n0 mi
αij G1 (x, xi )G2 (y, yi,j ) +
i=1 j=1
m−1 n−1
cij xi y j
i=0 j=0
where: G1 (x, t) =
m−1 (x − a)µ (a − t)2m−µ−1 (t − x)2m−1 (a − t)µ + + { + (−1)m−µ } (2m − 1)! µ! (2m − µ − 1)! µ! µ=0 n−1
(y − c)ν (c − τ )2n−ν−1 (τ − y)2n−1 (c − τ )ν + G2 (y, τ ) = + { + (−1)n−ν } (2n − 1)! ν! (2n − ν − 1)! ν! ν=0 Let
1 yi,j 1 yi,j+1 Bij (y) = ··· ··· 1 yi,j+2n
2n−1 · · · yi,j G2 (y, yi,j ) 2n−1 · · · yi,j+1 G2 (y, yi,j+1 ) ··· ··· ··· 2n−1 · · · yi,j+2n G2 (y, yi,j+2n )
,
if j + 2n > mi , then to 0 < ε1 < ε2 < · · · < ε2n , let yi,mi +1 = yi,mi + ε1 , · · · , yi,mi +2n = yi,mi + ε2n . We can prove that Bi,j (y) is B-spline with knots yi,j , yi,j+1 , · · · , yi,j+2n Theorem 2 A natural interpolation spline function for scattered data on some lines f (x, y) has the following local basis explicit and closed-form expression: f (x, y) =
n0 mi i=1 j=1
αij G1 (x, xi )Bij (y) +
m−1 n−1 i=0 j=0
cij xi y j
A Method with Scattered Data Spline and Wavelets
51
Theorem 3 The coefficients {αij }i=1,···,n0 ,j=1,···,mi and {cij }i=0,···,m−1,j=0,···,n−1 of a natural interpolation spline function for scattered data on some lines f (x, y) can be solved by the following linear system: Λ Z A B = 0 0 BT 0 Z is a given real number set {zij }, i = 1, · · · , n0 ; j = 1, · · · , mi ,Λ is a unknown coefficient vector (αij )i=1,n0 ;j=1,mi and C = (cij )i=0,m−1;j=0,n−1 Elements of ν matrix B are bij,µν = (xµi yij )µ=0,···,m−1;ν=0,···,n−1;i=1,···,n0 ;j=1,···,mi ; Elements of matrix A are aαβ ij = G1 (xα , xi )Bij (yα,β ), i, α = 1, · · · , m − 1; β = 1, · · · , mα ; j = 1, · · · , mi . and 0 is a zero matrix.
3
Algorithm for Polynomial Natural Interpolation Splines on Some Lines
An algorithm for an interpolating spline surface approach to the image on some lines is given as follows: 1) Select suitable points on some lines: (xi , yij ), i = 1, · · · , n0 ; j = 1, · · · , mi . To an image with k rows and l columns (k < l),we use the even rows as the lines. To every line,we find the image points with suddenly changing color as our suitable points,then adding one to two points in the two image points that there exist similar colors in. 2) To m = n = 1 or m = 1, n = 2 or m = n = 2,compute non-zero numbers of matrix A and matrix B. when m = 1, G1 (xα , xi ) = (xi − xα )+ + a − xi − 1; (x −x )3
i) i) + 1 + (xα − a)[ (a−x − a + xi ] when m = 2, G1 (xα , xi ) = i 6 α + + (a−x 6 2 Bij is a B-spline,when n = 1,it is a piece-wise polynomial with one degree;when n = 2,it is a piece-wise polynomial with three degree. We can use the following formula:
n Bij (y) = Bij (y) =
3
2
y − yij yi,j+n+1 − y n−1 Bij (y) + B n−1 (y) yi,j+n − yi,j yi,j+n+1 − yi,j+1 i,j+1 1 y ∈ [yij , yi,j+1 ] 0 Bij (y) = 0 otherwise
3) Using gerneralized conjugate gradient acceleration of iteration method to find the solution of the spline interpolation problem. (Theorem 3)
4
Wavelets for the Polynomial Natural Interpolation Splines on Some Lines
Just like one variate case [2] and two variate cases [1],we can define a multiresolution analysis to the polynomial natural interpolation spline on some lines. Then
52
Guan L¨ utai and Lu Feng
a theorem to the polynomial natural interpolation spline basis on some lines being the basis of scale function space is given,and its dimension is discussed. Note S2m,2n be a spline space with natural spline local basis on some lines 0 be a subspace of S2m,2n with zero condition on the of(2m, 2n) order,S2m,2n refinement points on some lines. A similar theorem of (m, n) order differential 0 onto the wavelet space for local basis polynomial operator Dm,n operating S2m,2n natural spline on some lines is obtained. Then using Lagrange interpolation method,a wavelet basis of this wavelet space is constructed.
5
Image Compression Algorithm
By local basis polynomial natural splines on some lines and wavelets for this local basis polynomial natural splines method, an algorithm for image compression is shown as follows: 1) Using the algorithm in 3, selecting suitable points on some lines constructing local basis polynomial natural spline interpolating on some lines to approach this image data. 2) To threat these data by wavelet decomposition,these wavelets are wavelets for the local basis polynomial natural spline on some lines in 4. 3) Using different coding method to compress these data.
6
Conclusion
Notice the local support properties of B-spline,the coefficient matrix is sparse matrix,and the zero elements in the matrix are more and more than it in [1].
Acknowledgements This work is supported by the Foundation of Zhongshan University Advanced Research Centre. This work is supported by Natural Science Foundation of Guangdong (9902275).
References 1. L¨ utai Guan, Spline-wavelet of plane scattered data for data compression, in ”ICMI’99 Proceedings”, Hongkong Baptist University (1999), VI-127-VI-131 (ISBN 962-85415-2-8). 2. L¨ utai Guan, Spline-wavelets of free knots for signal processing, in ”Proceeding of ICSP’96”, eds. Yuan Baozhong & Tang Xiaofang, IEEE Press, (1996) 311-314. 3. Chui, C. K. and L. T. Guan, Multivariate polynomial natural splines for interpolation of scattered data and other applications, in ”Workshop on Computational Geometry” eds. A. Conte et al, World Scientific (1993) 77-96.
A Method with Scattered Data Spline and Wavelets
53
4. L¨ utai Guan, Bivarate polynomial natural spline for smoothing or generalized interpolation of the scattered data, Chinese J. of Num. Math & Appl. 16:1 (1994) 1-14. 5. Dong Hong, Recent progress on multivariate spline, in ”Approximation Theory: in memory of A. K.Varme” eds. N. K.Govil, Marcel Dekker Inc. N. Y., 265-291 (1998).
A Wavelet-Based Preprocessing for Moving Object Segmentation in Video Sequences Li-Chang Liu1, Jong-Chih Chien1, Henry Y. Chuang 2, and Ching-Chung Li1 1
Department of Electrical Engineering, University of Pittsburgh Pittsburgh, PA 15261, USA {lilst4,jocst4}@pitt.edu,
[email protected] 2 Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15261, USA
[email protected]
Abstract. A simple preprocessing method for extracting boundary regions of moving objects in a video sequence is presented. We use Chui’s overssampled shift-invariant wavelet transform and the multiresolution motion estimation and compensation in the wavelet domain. Dominant prediction errors often appear along the boundary of a moving object. Our algorithm is developed to detect boundary regions at a coarse scale by utilizing the prediction error information provided in all subband images at the coarse resolution. This is taken as our first step toward the video object segmentation for use in the wavelet-based MPEG-4.
1
Introduction
Object-based video coding requires the initial segmentation of objects in video frames. Video objects can be characterized by their shape, texture and motion; the problem of automatic segmentation of objects in motion is a complex and tedious task [1,2,3,6,7,8,9]. In this paper, we present a wavelet-based preprocessing method for coarse extraction of moving object boundaries within the setting of multiresolution block-based motion estimation and motion compensation. The recent work of Al-Mohimeed and Li [4] investigated the motion estimation and compensation in the wavelet domain for video compression based on the Chui-ShiChan oversampled frame shift-invariant wavelet transform [5]. Using a minimal oversampling rate of 3, a 1-dimensional signal in interpolated, by a spline interpolation, with two middle points between each pair of successive pixels to give 3 channels of data for wavelet-decomposition. At each resolution level, these three channels of decomposition are appropriately combined, resulting in an almost shift-invariant wavelet-transform as shown in Fig. 1. With the same interpolation along both horizontal scans and vertical scans and using the tensor product of these 1-dimensional transforms, one obtains an almost shift-invariant 2-dimensional wavelet transform of Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 54-64, 2001. Springer-Verlag Berlin Heidelberg 2001
A Wavelet-Based Preprocessing for Moving Object Segmentation in Video Sequences
55
an image. Let subband images in the wavelet decomposition at a given resolution level be denoted by LL (scaling image) and LH, HL, and HH (wavelet images) respectively, and let the decomposition be carried out to J resolution levels (j=1, 2, …,J). Because of the shift-invariance, the block matching in each subband image in successive frames yields a reliable block motion estimation and a reduction of prediction errors, leading to an improved performance on video compression. The prediction errors in multiple subband images also readily provide boundary information of objects in motion. This leads to an efficient preprocessing method for moving object segmentation as described in the following sections.
Fig. 1. The Chui-Shi-Chan Oversampled (rate=3) Wavelet Decomposition
2
Boundary Regions of Moving Objects
Let us consider a pair of matching blocks illustrated in Fig. 2(a), where the top block is the current frame and the bottom block is in the previous frame. If a block contains partially a moving object and partially the background, the object motion may bring in different background in the immediate vicinity of its boundary. Thus, the block matching will yield relatively large absolute value of the prediction error in the immediate neighbor outside the object boundary. Or else, if the block matching is improper as illustrated in Fig. 2(b), the prediction error along and near the boundary will have large absolute values. At a coarse resolution level, such regions will become more pronounced in one or more subband images, and will keep the object boundary of the original resolution within their interiors. This is the model based upon which our preprocessing algorithm is developed for extraction of moving object boundary.
56
Li-Chang Liu et al.
At the resolution level j, let the four subband images LL, LH, HL and HH be indexed as i = 1, 2, 3 and 4, respectively. Consider first the prediction error in the scaling image LL (i=1) at the resolution level j. If the absolute value of the prediction error e at a point is greater than a chosen threshold Ei, that point is taken as a candidate point in the boundary region at the resolution level j. All the candidates points in the subband image are processed for connected component labeling, resulting in a number of connected regions. With a chosen threshold for the size of each connected region (number of points per region), one detects Q1 portions of the moving object boundaries in this subband image, designated as Rj,i,q (i=1; q = 1, 2,…,Q1). A bounding rectangle can be constructed for each of these subregions. Similar processing is done for each of the wavelet images LH (i=2), HL (i=3) and HH (i=4), giving significant boundary subregions Rj,i,q (i=2,3,4) detected in these subband images. These boundary subregions will be merged with, and complement, those subregions detected in the scaling image. The merging process will be done as follows. Consider a larger rectangular neighborhood Nq1 enclosing a bounding rectangle of Rj,1,q detected in the scaling image. Any Rj,i,q (i=2,3,4; q=1,2,…,Qi), detected in the corresponding wavelet images, whose bounding rectangle intersects with any Nq1 will be pooled in the union; those not intersecting with any Nq1 will be eliminated. The union of those retained subregions are then processed by the morphological closing operations to give the extracted boundary regions of moving objects at the particular resolution level j.
(a)
(b)
Fig. 2. Matching Blocks Containing an Object Boundary Produce Large Values of the Prediction Error in the Immediate Vicinity of the Moving Boundary: (a) Object Boundary Moves into Different Background; (b) Improper Matching
3
Moving Boundary Region Detection Algorithm
Our algorithm is described in the following steps: Step 1. Use the shift-invariant wavelet transform to decompose images in a video sequences to three resolution levels. Compute the block-matched motion estimation
A Wavelet-Based Preprocessing for Moving Object Segmentation in Video Sequences
57
(using full-search) and prediction errors in all scaling and wavelet images for three resolution levels. Step 2. At the coarsest resolution level (j=3), compute mean (mi) and standard deviation (σι) of the absolute values of prediction errors in each subband image (i=1,2,3,4) and determine the threshold values Ei as (mi + σi) for detection of candidate points in the coarse boundary region. Step 3. Apply threshold Ei to detect all candidate points in the boundary regions of moving objects in each subband image at level 3. Label all connected candidate points to determine candidate regions of moving boundaries Step 4. Choose a threshold for the size of connected regions and determine the significant boundary subregions Rj,i,q in that subband image and construct a bounding rectangle for each subregion. Step 5. Merging object boundary subregions detected in the wavelet images with those detected in the scaling image. Construct a larger rectangular neighborhood Nq1 for each bounding rectangle obtained in the scaling image, and test intersection of any bounding rectangle obtained in the wavelet images with any rectangular neighborhood Nq1 constructed in the scaling image. If intersected, keep the corresponding subregion Rj,i,q; if not, eliminate it. Merge all retained boundary subregions. Step 6. Perform the morphological closing operation to obtain boundary regions of moving objects at this particular resolution level.
4
Experimental Results
Experiments were performed using three video sequences: Clair [Fig. 3 and 4, image size 288x320], Table Tennis [Fig. 5, image size 224x352], and Salesman [Fig. 6, image size 233x320]. An almost shift-invariant wavelet transform using biorthogonal wavelet BIOR(2,2) and linear interpolation with an oversampling rate of 3, was used. ~ 1 The decomposition filter coefficients are given by h = 8 {−1, 2, 6, 2, − 1} and 1 ~ g = 2 {−1, 2, − 1} . Video frames were decomposed to the third resolution level. The
blocksize used for motion vector estimation at the third level was either 4x4 or 2x2. BIOR(2,2) was used in the experiments because of its short support and symmetric property. Fig. 3(a) and 3(b) show two successive frames of the first clip from the Clair sequence. Fig. 3(g) displays the absolute values of the prediction errors in the scaling image (LL) at the resolution level 3, where the locations of large values reflect the moving object boundaries at that resolution level. In this experiment, matching blocksize of 4x4 was used. The prediction errors were thresholded and connected components labeled to yield one significant component of a moving boundary enclosed in a bounding rectangle which is overlaid in the scaling image as shown in Fig. 3(c). Similar processing was done on each of the LH, HL, HH wavelet images at the resolution level 3, the results are shown in Fig. 3(d), (e), and (f); these results were com-
58
Li-Chang Liu et al.
bined with the result from the scaling image, as indicated by four overlaid rectangles shown in Fig. 3(h). After a morphological closing operation on the composite of these detected subregions, we extracted the moving object boundary regions at the resolution level 3, which are given by the bright regions around Clair’s head as shown in Fig. 3(i). Fig. 4 shows the experimental results obtained on the second clip of the Clair sequence obtained at a different time. Fig. 4(d) displays the extracted object boundary regions when the matching block size 4x4 was used, and Fig. 4(f) displays the result obtained when the matching blocksize was 2x2; the latter gave better boundary extractions. Note that the blocksize 2x2 at level 3, although small, corresponds to the blocksize 16x16 at the original level (in our case, with interpolation), which is widely used in the standard block matching. Fig. 5 shows the experimental results obtained on the Table Tennis sequence that contains two separate moving objects. Fig. 5(d) gives the extracted moving boundaries when 4x4 matching blocksize was used, while Fig. 5(f) gives the result when 2x2 matching blocksize was used. Again, the 2x2 blocksize appeared to yield a slightly better result. A different video sequence, Salesman, was also used for the experiment, where the matching blocksize of 4x4 was tested. In this case, the connected components of large prediction errors in the scaling image at the level 3 yielded three significant components of moving boundaries, and the experimental result is shown in Fig. 6.
5
Conclusion
We have presented a simple wavelet-based preprocessing method in extraction of boundary regions of moving objects in a video sequence. It operates on prediction errors associated with the block-matching motion estimations in the scaling and wavelet images at a coarse resolution level, thus a fast computation can be attained. The preprocessing yields coarse boundary regions of video objects, based on which a refined extraction can be developed. This brings us one step closer to the automatic moving object segmentation for use in MPEG-4. Two different sizes of the matching block have been experimented, the smaller blocksize appears to give better result on boundary extraction.
Acknowledgment This work is supported in part by a grant from the Pittsburgh Digital Greenhouse in collaboration with OKI Semiconductor.
A Wavelet-Based Preprocessing for Moving Object Segmentation in Video Sequences
59
References 1. 2. 3.
4.
5. 6.
7.
8. 9.
T. Meier and King N. Ngan, “Automatic Segmentation of Moving Objects for Video Object Plane Generation”, IEEE Trans. on Circuits and Systems for Video Technology, Vol 8, No. 5, pp. 525-538, September, 1998. D. Wang, “Unsupervised Video Segmentation Based on Watersheds and Temporal Tracking”, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, No. 5, pp. 539-546, September, 1998. M.R. Razaee, P. M.J. van der Zwet, B. P. F. Lelieveldt, R. J. van der Geest, and J. H. C. Reiber, “A Multiresolution Image Segmentation Technique based on Pyramidal Segmentation and Fuzzy Clustering”, IEEE Trans. on Image Processing, Vol. 9, No. 7, pp. 1238-1248, July 2000. M.A. Al-Mohimeed, and C. C. Li “Motion estimation and compensation based on almost shift-invariant wavelet transform for image sequence coding”, International Journal of Imaging Systems and Technology, Vol. 9, No. 4, pp. 214-229, 1998. C. Chui, X. Shi, and A. Chan “An oversampled frame algorithm for real-time implementation and applications” Proc. SPIE Conf. On Wavelet Applications”, Orland, FL, April 1994, Vol. 2242, pp 272-301. L. Zheng, J. C. Liu, A.K. Chan, W. Smith “Object-Based Image Segmentation Using DWT/RDWT Multiresolution Markov Random Field” Proc. IEEE International Conf. On Acoustics, Speech, and Signal Processing, Phoenix, AZ, March 1999, Vol. 6, pp. 3485-3488. I. Kompatsiaris, and M. G. Strintzis “Spatiotemporal Segmentation and Tracking of Objects for Visualization of Videoconference Image Sequences” IEEE Trans. On Circuits and Systems for Video Technology, Vol. 10, pp. 1388-1402, Dec, 2000. I. Koprinska, S. Carrato “Temporal video Segmentation: A Survey” Signal Processing: Image Communication, vol. 16, pp. 477-500, 2001 M. Bagci, I. Yilmaz, M.H. Karci, T. Kolcak, U. Orguner, Y. Yardimci, M. Demirekler, and A.E. Cetin “Moving Object Detection and Tracking in Video Based on Higher Order Statistics and Kalman Filtering” Proc. (CDROM) 2001 IEEE-EURASIP workshop on Nonlinear Signal and Image Processing, Baltimore, MD, June 2001.
60
Li-Chang Liu et al.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 3. Experimental results on clip 1 of the Clair sequence (image size 288x320): (a) previous frame; (b) current frame; (c) a bounding rectangle of the moving boundary subregion obtained in level 3 scaling image LL; (d),(e),(f) boundary bounding rectangles obtained in level 3 wavelet images LH, HL, and HH respectively
A Wavelet-Based Preprocessing for Moving Object Segmentation in Video Sequences
61
(g)
(h)
(i))
Fig. 3. (continued) (g) absolute prediction errors in level 3 LL image; (h) a composite of bounding rectangles of boundary subregions detected from all scaling and wavelet images at level 3; (i) extracted coarse boundary regions of the moving objects (matching blocksize is 4x4)
62
Li-Chang Liu et al.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 4. Experimental results on clip 2 of the Clair sequence (image size 288x320): (a) previous frame, (b) current frame, (c) a composite of bounding rectangles of boundary subregions detected from all scaling and wavelet images at level 3, (matching blocksize : 4x4), (d) extracted coarse boundary regions of the moving object (matching blocksize: 4x4), (e) a composite of bounding rectangles of boundary subregions detected from all scaling and wavelet images at level 3 (matching blocksize: 2x2), (f) extracted coarse boundary regions of the moving object (matching blocksize: 2x2)
A Wavelet-Based Preprocessing for Moving Object Segmentation in Video Sequences
(a)
(b)
(c)
(d)
(e)
(f)
63
Fig. 5. Experimental results on Table Tennis sequence (image size 224x352): (a) previous frame, (b) current frame, (c) a composite of bounding rectangles of boundary subregions detected from all scaling and wavelet images at level 3, (matching blocksize : 4x4), (d) extracted coarse boundary regions of the moving objects (matching blocksize: 4x4), (e) a composite of bounding rectangles of boundary subregions detected from all scaling and wavelet images at level 3 (matching blocksize: 2x2), (f) extracted coarse boundary regions of the moving objects (matching blocksize: 2x2)
64
Li-Chang Liu et al.
(a)
(b)
(c)
(d)
Fig. 6. Experimental results on Salesman sequence (image size 288x320): (a) previous frame, (b) current frame, (c) a composite of bounding rectangles of boundary subregions detected from all scaling and wavelet images at level 3, (d) extracted coarse boundary regions of moving objects (matching blocksize is 4x4)
Embedded Zerotree Wavelet Coding of Image Sequence Mbainaibeye Jérôme and Noureddine Ellouze Laboratoire de Système et Traitement du Signal (LSTS) Ecole Nationale d’Ingénieurs de Tunis BP 37, Tunis le Belvédère 1002, Tél :874 700
[email protected] [email protected] Abstract. In this paper we present an image sequence coding system based on Embedded Zerotree Wavelet algorithm (EZW). Difference between the image in the coder and the reconstructed previous image in the decoder is used as technique for removing the temporal redundancies. The first image is encoded in intra-mode by EZW algorithm and a specific binary codebook CB1. The subsequent images in the sequence are encoded by performing the difference between the reconstructed previous image in the decoder and the current image in the coder; this difference (residual image) is then encoded by EZW algorithm and a specific binary codebook CB2. Simulations are operated on Claire and Alexis sequences. The results show that the system can provides best reconstruction quality as well objectively as subjectively for a minimum given bit rate. Progressive transmission, rate control for constant bit-rate and rate scalability are the main characteristics of this system.
1. Introduction In multimedia applications, digital image compression is generally used for storage and transmission. MPEG1, MPEG2 and H263 are standards used in moving images coding. MPEG2 uses DCT applied in blocks of 8 x 8 pixels where motion estimation and compensation are performed. H263 uses also DCT for low bit rate. Due to the fact that image is split in blocks then the above standards produce image quality affected by block effects at low bit rate. Shapiro proposed Embedded Zerotree Wavelet algorithm (EZW) for image compression [1], which uses dependencies among wavelet subbands [2]-[6]. This coder outperforms today JPEG standard, ranging from low bit-rate to high bit-rate. Since, many developments in image compression using wavelet transform are performed [7]-[16]; improvements have been obtained by modification of EZW [10, 12, 16]. JPEG2000 is a new standard based on wavelet transform [17]. B.J.Kim and al. extended the Set Partitioning in Hierarchical Three (SPIHT) for video sequences [18] which exploits the energy clustering property of 3D subband/wavelet coefficients. Despite of the realization of MPEG4 and MPEG7 standards, the adoption of wavelet to video coding constitutes a special challenge. One can apply 2D wavelet coding in combination with motion compensation to temporal prediction, or one can consider the sequence as a three-dimensional array of data and making compression with 3D-wavelet analysis. These approaches present Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 65-75, 2001. c Springer-Verlag Berlin Heidelberg 2001
66
Mbainaibeye J´erˆome and Noureddine Ellouze
some difficulties that arise from the fundamental property of discrete wavelet transform which is space-varying operator. In this paper, we present an image sequence coding system based on EZW algorithm. Image sequences are characterized by great similarities between consecutive images (the term image in this paper is used to design frame). These similarities are known as temporal redundancies. The removing of these temporal redundancies is the key technique, which improves the compression performances. In some standards such as MPEG1-2 and H.263, this is performed by motion estimation and compensation where displaced blocks are searched and encoded to predict the current image from the previous one. In our approach, the temporal redundancies removing process is operated by calculating the difference between the consecutive images in the sequence. Discrete wavelet transform is applied on the residual images. EZW algorithm is used for the encoding process. This paper is organized as follows: in section 2, we present a short description of EZW algorithm; section 3 presents the proposed image sequence coding system. Results and discussions are presented in section 4; the conclusion is finally presented in section 5.
2. Embedded Zerotree Wavelet Coding The EZW algorithm encodes images in embedded fashion from their dyadic wavelet representations. The goal of embedded coding is to generate a single encoded bit stream that allows achieving any desired bit rate while giving the best reconstruction quality at that rate. In wavelet domain, image is represented by approximation coefficients (called DC subband) and detail coefficients (called AC subbands). These coefficients are represented in trees. The trees are structured according to a rule such that a parent coefficient in AC subband is related to four children in the next finer AC subband for the same orientation and same spatial location. Only the parent coefficient in DC subband is related to three children, one in each of the three coarsest AC subbands. The EZW algorithm encodes these coefficients by using a sequence of thresholds. The initial value of the threshold T0 is defined such that C < 2T0 where C is the maximum wavelet coefficient. A coefficient X i is significant if X i ≥ T . Significance map consists of scanning the wavelet coefficient matrix to decide whether or not a coefficient is significant and it is generated in each bit plane. Two passes are performed for each threshold value: the dominant pass and the subordinate pass. All significant coefficients found in dominant pass are encoded by four symbols which are ZTR (Zerotree Root), IZ (Isolated Zero), POS (significant Positive) and NEG (significant Negative). ZTR symbol is generated for an insignificant coefficient, which has no significant children. IZ symbol is generated for an insignificant coefficient, which has at least one significant child. POS and NEG are generated for significant coefficients which are positive and negative respectively. In finer AC subbands where the coefficients have no children, IZ and ZTR are merged into Z (zero) symbol. The subordinate pass refines the quantized coefficients to obtain the best approximation of wavelet coefficients. EZW algorithm is particularly interesting for applications such as rate and quality scalabilities since encoder and decoder can terminate the encoding and decoding process at any time and gives a target rate or target distortion.
Embedded Zerotree Wavelet Coding of Image Sequence
67
3. Coding of Image Sequences A general structure of image sequence coding system is composed by encoder and decoder (figure 1). We shortly describe this system, referring to MPEG2 where the orthogonal transform is the DCT and the entropy coding is the variable length coding (VLC). In the encoder, blocks of the first image in the sequence are encoded in intramode without any reference. In fact, DCT is applied in blocks of 8 x 8 pixels. The quantized coefficients are then encoded using VLC coding to produce the bit stream. The subsequent images are encoded by prediction from the previous images using motion estimation and compensation technique. The motion estimation process tries to detect the displaced blocks between the current image and the previous image. These blocks are then encoded to predict the current image. Of course, for constant bit rate applications, bit rate control algorithm is used to prevent the underflow or overflow. However, MPEG does not specify the way to search the displaced blocks; this is the detail that the system designer can choose to implement in one or many possible ways. This is also the case of bit rate control algorithm where complexity versus quality issues need not to be addressed relative to individual application. The decoder performs the inverse operations accomplished by the encoder.
Bit rate control
Xn
+
Orthogonal transform
Entropy coding
Quantizer
Bit stream
+
Motion compensation
+
~ X n −1
Frame memory
a
~ Xn
+ Bit stream
Entropy decoding
Inverse quantizer
Inverse orthogonal transform +
Motion compensation
Frame memory
b Fig. 1. General structure of image sequence coding system: a) Encoder, b) Decoder
68
Mbainaibeye J´erˆome and Noureddine Ellouze
3.1 Proposed System Figure 2 shows the structure of the proposed image sequence coding. Compared to figure 1, our system differs by the following considerations: - The image is not split in blocks; - Wavelet transform is applied on the whole image; -Temporal redundancy removing process is operated on the whole image and not on blocks; -The encoding is realized for limited channel or in the other words for a given level. The encoder contains three components: -The Discrete Wavelet Transform (DWT), which represents the image in the wavelet domain. -The Embedded Zerotree Wavelet Quantization (EZWQ) which quantizes the wavelet coefficients in embedded fashion and produces the EZW symbols; -The Binary coding which encodes the produced EZW symbols by a specific defined binary codes [19]. The decoder performs the inverse of the encoder’s operations.
Xn
Dn
+
~ X n −1
Bit stream DWT
EZWQ
Binary Coding
Frame memory
a
Bit stream
Binary Decoding
EZWQ
-1
IDWT
~ Dn
+
+ Frame memory
~ Xn
~ X n−1
b Fig. 2. Proposed coding system: a) Encoder, b) Decoder
In [19], we have studied the probability distributions of the EZW symbols for standard images including Lena, Barbara, Mandrill, Goldhill and Peppers. In fact, these images are decomposed in wavelet domain using the Daubechies biorthogonal wavelet 9/7-tap filter bank [22]. Five scales are performed [20, 21]. EZW algorithm is used to generate the different symbols described in the section 2. The probability distributions of these symbols are estimated. From these distributions, we have defined binary codes for each symbol in each subband. A specific codebook which we called CB1 is built. Using CB1 in still image coding, the obtained results outperform the Flexible Zerotree Codec [21]. According to this performance, we have extended the probability distributions analysis of EZW symbols to image sequences. So,
Embedded Zerotree Wavelet Coding of Image Sequence
69
differences between consecutive images in the image sequences are calculated. Some image sequences including Alexis, Claire, Mother & daughter, Salesman are used in the experimentation. These differences are then decomposed in wavelet domain. Similarly, EZW algorithm is used to generate the different symbols where their probability distributions are estimated. From these distributions, we have defined the binary codes for each symbol in each subband. A specific binary codebook for image difference, which we called CB2, is then built. Since the first image in the sequence is considered as the still image, it is encoded without any reference. The subsequent images are encoded by prediction. In our system, the first image is encoded by using CB1 and the subsequent images in the sequence are encoded by CB2.
3.2 Encoding Protocol Two configurations are analysed in the terms of objective and subjective reconstruction qualities. For the first configuration, the following steps are performed: 1. The first image in the sequence (designed by X 0 ) is decomposed in wavelet domain and encoded by using EZW algorithm and CB1. The produced bit stream is considered as a reference bit stream; 2. The difference between the current image X n and the previous image X n −1 in the encoder frame memory is calculated to remove temporal redundancies. 3. The obtained residual image Dn is decomposed in wavelet domain and encoded by using EZW and CB2. The bit stream produced in this case is the residual bit stream; ~ 4. The current image is reconstructed by adding the residual image Dn and the
previous image X~ n −1 in the decoder. The following expressions summarize reconstruction process: Dn = X n − X n −1
image
difference
calculating,
and (3.1)
~ ~ ~ X n = X n −1 + Dn
(3.2) The expression 3.1 provides the difference Dn between the current image X n and the previous image X n −1 . The expression 3.2 gives the reconstruction of the current ~ ~ image X n from the residual image Dn and the reconstructed previous image X~ n −1 . For the second configuration, only the step 2 is changed where difference is calculated between the current image and the reconstructed previous in the decoder. Expression 3.1 becomes: ~ Dn = X n − X n −1 (3.3)
70
Mbainaibeye J´erˆome and Noureddine Ellouze
4. Experimental Results and Discussions Simulations are operated on Claire and Alexis sequences. Decomposition is performed using 9/7 filter bank [27]. The image size is rescaled to 256 x 256 pixels before decomposition. To reproduce an image from the received binary symbols, the output bit stream includes seven bytes as header information: four bytes for horizontal and vertical dimensions of the image, one byte for the filter bank, one byte for the decomposition levels and one byte for the initial threshold. Since the horizontal and vertical dimensions, the filter bank, and the number of decomposition levels are the same for the residual image, only initial threshold can change; so, one byte header is included in the residual bit stream to inform the decoder to update the initial threshold. Figure 3 shows PSNR versus image number for Claire sequence at 56 Kbits/s where only the 36 first images are reconstructed. The curve labelled “Serie1” is the result of the second configuration and the curve labelled “Série 2” is the result for the first configuration.
40 PSNR in dB
38 36 34 32 30 28 26 1
6
11
16
21
26
31
36
41
image number Série1
Série2
Fig. 3. Claire sequence at 56 Kbits/s and 10 fps Serie1: second configuration result Serie2: first configuration result
We observe in figure 3 that the reconstruction quality in the case of the first configuration decreases where the coding is performed by using the difference between the current image X n and the previous image X n −1 ( which is assumed to be transmitted without any loss and any quantization error). The reality is that the previous image in the decoder is affected by the quantization error. The difference between X n and X n −1 can not cover information which exists between these consecutive images. Since the reconstruction is performed by using expression 3.2 and the bit rate is limited to 56 Kbits/s, there is not enough bits to improve the reconstruction quality. This is the main reason which explains the observed degradation quality.
Embedded Zerotree Wavelet Coding of Image Sequence
71
We then repeat the encoding process by using the second configuration. Then Claire and Alexis sequences are encoded at 56 Kbits/s and 10 fps. Figure 4 and figure 5 show PSNR versus image number.
39 38 PSNR in dB
37 36 35 34 33 32 0
20
40
60
80
100
120
140
160
image number
Fig. 4. Claire sequence at 56 Kbits/s and 10 fps Mean PSNR: 35.98 dB
39,5
PSNR in dB
39 38,5 38 37,5 37 36,5 36 1
11
21
31
image number
Fig. 5. Alexis sequence at 56 Kbits/s and 10 fps Mean PSNR: 38.28 dB
41
72
Mbainaibeye J´erˆome and Noureddine Ellouze
It is shown in figure 4 and figure 5 that the system provides good reconstruction quality objectively where average PSNR of 35.98 dB and 38.28 dB are reached respectively for Claire and Alexis sequences. Figure 6 and figure 7 show the original and reconstructed images. Figure 6 A and B are respectively the original and reconstructed images 61, figure 6 C and D are respectively the original and reconstructed image 134 for Claire sequence. Figure 7 E and F are respectively the original and reconstructed image 20, figure 7 G and H are respectively the original and reconstructed image 33 for Alexis sequence. The reconstruction is operated at 56 Kbits/s and 10 fps, then the average compression ratio is 94. It is shown, despite of this compression ratio, that the system provides best reconstruction quality subjectively. There are no block effects in the reconstructed images. Since the system keeps the progressive encoding and decoding property of the EZW algorithm, it is robust against the loss of information. It means that if the encoder ceases the encoding process, the decoder can reconstruct the sequence with the previous received bit stream. Furthermore, it is possible to encode the sequence with the maximum quality (loss less compression) by transmitting at high bit rate.
A
B
C D Fig. 6. Reconstruction results of Claire sequence at 56 Kbits/s and 10 fps A: original image 61, B: reconstructed image 61 C: original image 134, D: reconstructed image 134
Embedded Zerotree Wavelet Coding of Image Sequence
E
F
G
H
73
Fig. 7. Reconstruction results of Alexis sequence at 56 Kbits/s and 10 fps E: original image 20, F: reconstructed image 20 G: original image 33, H: reconstructed image 33
5. Conclusion In this paper, we have presented an image sequence coding system based on EZW algorithm and binary coding. Difference between the current image in the coder and the reconstructed image in the decoder is used as technique for removing temporal redundancies. The residual image is then decomposed in wavelet domain and encoded. Specific binary codebooks are built and used in the encoding process. Experimental results show that the system provides best reconstruction quality as well objectively as subjectively. What explains the performance of our system is the fact that temporal redundancies are removed between the current image and the reconstructed previous image in the decoder. This enables the encoder to minimize the overall distortion due to the quantization error and improves the reconstruction quality.
74
Mbainaibeye J´erˆome and Noureddine Ellouze
References [1] J.M.Shapiro, “Embedded image coding using zerotree of wavelet coefficients”, IEEE Trans. on Signal Processing, Vol.41, No.12, pp.3445-3462, Dec.1993. [2] I.Daubechies, “Orthonormal bases of compactly supported wavelets”, Communication on Pure and Applied Mathematics, V.41, pp.909-996, Nov.1988. [3] S.Mallat, “Atheory for multi-resolution signal decomposition: the wavelet representation”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vo.11, pp.674-693, July 1989. [4] I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, PA, 1992. [5] J.D.Villasenor, B.Belzer, and J.Lio, “Wavelet filter evaluation for image compression”, IEEE Trans. on Image Processing, Vol.4, No.8, pp.1053-1060, Aug.1995. [6] G.Strang, and T. Nguyen, Wavelets and Filter Banks, Wallesley-Cambridge Press, Wellesley, MA, 1996. [7] A.Zandi, J.D.Allen, E.L.Schwartz, and M.Boliek, “CREW: Compression with Reversible Embedded wavelet”, IEEE Data Compression Conference, pp.212-221, Snowbird, Mar.1995. [8] A.Said, and W.A.Pearlman, “An image multi-resolution representation for loss less and lossy compression”, IEEE Trans. on Image Processing, Vol.5, No.9, pp.13031310, Sep.1996. [9] Y.Chen, and W.A.Pearlman, “Three-dimensional subband coding of video using zerotree method”, Proc. SPIE, Visual Communications and Image Processing, pp.1302-1309, Orlando, Mar. 1996. [10] A.Said, and W.A.Pearlman, “A new fast and efficient image codec based on set partitioning in hierarchical trees”, IEEE Trans. on Circuits and Systems for Video Technology, Vol.6, No.3, pp.243-250, Jun. 1996. [11] S.A.Martucci, I.Sodagar, T.H.Chiang, and Y.Q.Zhang, “ A zerotree wavelet coder”, IEEE Trans. on Circuits and Systems for Video Technology, Vol.7, No.1, pp.109-118, Feb. 1997. [12] J.Li, P.Cheng, and C.Kuo, “ On the improvement of embedded zerotree wavelet coding”, Proc. SPIE, Visual Communications and Image Processing, pp.1490-1501, Orlando, Apr. 1995. [13] H.Man, F.Kossentini, and M.Smith,”Robust EZW image coding for noisy channels”, IEEE Signal Processing Letters, Vol.4, No.8, pp.227-229, Aug. 1997. [14] C.D.Creusere, “A new method for robust image compression based on the embedded zerotree wavelet algorithm”, IEEE Trans. on Image Processing, Vol.6, No.10, pp.1436-1442, Oct. 1997. [15] J.K.Rogers, and P.C.Cosman, “Wavelet zerotree image compression with packetization”, IEEE Signal Processing Letters, Vol.5, No.5, pp.105-107, May 1998. [16] S.Joo, H.Kikuchi, S.Sasaki, and J.Shin, “Flexible Zerotree coding of Wavelet coefficeints”, IEICE Trans. Fundamentals, Vol.E82-A, No.4, Apr. 1999. [17] Michael W. Marcellin, Michael J.Gormish, Ali Bilgin, and Martin P.Boliek, “ An overview of JPEG-2000”, Proc. IEEE Data Compression Conference, pp.523-541, 2000. [18] Beong-Jo Kim, and W.A.Pearlman, “An embedded wavelet video coder using three-dimensional set partitioning in hierarchical trees (SPIHT)”, Proc. DCC’97, IEEE Data Compression Conference, pp.251-260, Snowbird, UT, Mar. 1997.
Embedded Zerotree Wavelet Coding of Image Sequence
75
[19] M. Jérôme, “Optimal Image Coding based on Probability Distribution of Embedded Zerotree Wavelet Symbols”, Tunisian-German Conference on Smart Systems and Devices SSD, pp.666-671, Hammamet, Tunisia, March 27-30, 2001. [20] M. Jérôme et N. Ellouze, “Etude énergétique de l’analyse multi-résolution d’images par ondelette , Proc. in JTEA’2000, Tome1, pp.103-109, 24-25 Mar. 2000 Hammamet, Tunisia. [21] M. Jérôme and N. Ellouze, “Image Wavelet Coefficients Quantization by Embedded Zerotree Wavelet Algorithm”, Proc. in ACIDCA’2000, International conference on Artificial and Computational Intelligence for Decision, Control and Automation in Engineering and Industrial Applications, pp.1-5, Monastir, 22-24 March 2000. [22] M.Antoni, M.Barlaud, P.Mathieu, and I.Daiubechies, “Image coding using wavelet transform”, IEEE Trans. on Image Processing, Vol.1, No.2, pp.205-220, Apr. 1992.
Wavelet-Based Video Compression Using Long-Term Memory Motion-Compensated Prediction and Context-Based Adaptive Arithmetic Coding Detlev Marpe1 , Thomas Wiegand1 , and Hans L. Cycon2 1
Image Processing Department Heinrich-Hertz-Institute (HHI) for Communication Technology Einsteinufer 37, 10587 Berlin, Germany {marpe,wiegand}@hhi.de 2 University of Applied Sciences (FHTW Berlin) Allee der Kosmonauten 20–22, 10315 Berlin, Germany
[email protected]
Abstract. In this paper, we present a novel design of a wavelet-based video coding algorithm within a conventional hybrid framework of temporal motion-compensated prediction and transform coding. Our proposed algorithm involves the incorporation of multi-frame motion compensation as an effective means of improving the quality of the temporal prediction. In addition, we follow the rate-distortion optimizing strategy of using a Lagrangian cost function to discriminate between different decisions in the video encoding process. Finally, we demonstrate that context-based adaptive arithmetic coding is a key element for fast adaptation and high coding efficiency. The combination of overlapped block motion compensation and frame-based transform coding enables blocking-artifact free and hence subjectively more pleasing video. In comparison with a highly optimized MPEG-4 (Version 2) coder, our proposed scheme provides significant performance gains in objective quality of 2.0– 3.5 dB PSNR.
1
Introduction
Multi-frame prediction [11] and variable block size motion compensation in a rate-distortion optimized motion estimation and mode selection process [12,10] are powerful tools to improve the coding efficiency of today’s video coding standards. In this paper, we present the design of a video coder, dubbed DVC, which demonstrates how most elements of the state-of-the-art in video coding as currently implemented in the test model long-term [2] (TML8) of the ITUT H.26L standardization project can be successfully integrated in a blockingartifact free video coding environment. In addition, we provide a solution for an efficient macroblock based intra coding mode within a frame-based residual coding method, which is extremely beneficial for improving the subjective quality as well as the error robustness. Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 76–86, 2001. c Springer-Verlag Berlin Heidelberg 2001
Wavelet-Based Video Compression
77
We further explain how appropriately designed entropy coding tools, which have already been introduced in some of our previous publications [6,7] and which, in some modified form [5], are now part of TML8, help to improve the efficiency of a wavelet-based residual coder. In our experiments, we compared our proposed wavelet-based DVC coder against an improved MPEG-4 coder [10], where both codecs were operated using a fixed frame rate, fixed quantization step sizes and a search range of ±32 pels. We obtained coding results for various sequences showing that our proposed video coding system yields a coding gain of 2.0–3.5 dB PSNR relative to MPEG-4. Correspondingly, the visual quality provided by the DVC coder compared to that of the block-based coding approach of MPEG-4 is much improved, especially at very low bit rates.
Fig. 1. Block diagram of the proposed coding scheme
2
Overview of the DVC Scheme
Fig. 1 shows a block diagram of the proposed DVC coder. As a hybrid system, it consists of a temporal predictive loop along with a spatial transform coder. Temporal prediction is performed by using a block motion estimation (BME) and an overlapped block motion compensation (OBMC), such that the reference of each predicted block can be obtained from a long-term reference frame memory. Coding of the motion compensated P-frames as well as of the initial intra (I) frame is performed by first applying a discrete wavelet transform (DWT) to an entire frame. Uniform scalar quantization (Q) with a central dead-zone around zero similar to that designed for H.263 is then used to map the dynamic range of the wavelet coefficients to a reduced alphabet of decision levels. Prior to the final arithmetic coding stage, the pre-coder further exploits redundancies of the quantized wavelet coefficients in a 3-stage process of partitioning, aggregation and conditional coding.
78
Detlev Marpe et al.
Table 1. Macroblock partition modes Mode Block Size 1 2 3 4
3 3.1
16 × 16 16 × 8 8 × 16 8×8
Partition Leave MB as Split MB into 2 Split MB into 2 Split MB into 4
a whole sub-blocks sub-blocks sub-blocks
Motion-Compensated Prediction Motion Model
As already stated above, the motion model we used is very similar to that of the H.26L TML8 design [2]. In essence it relies on a simple model of block displacements with variable bock sizes. Given a partition of a frame into macroblocks (MB) of size 16 × 16 pels, each macroblock can be further sub-divided into smaller blocks, where each sub-block has its own displacement vector. Our model supports 4 different partition modes, as shown in Table 1. Each macroblock may use a different reference picture out of a long-term frame memory. In addition to the predictive modes represented by the 4 different MB partition modes in Table 1, we allow for an additional macroblock-based intra coding mode in P-frames. This local intra mode is realized by computing the DC for each 8×8 sub-block of each spectral component (Y,U,V) in a macroblock and by embedding the DC-corrected sub-blocks into the residual frame in a way, which is further described in the following section. 3.2
Motion Estimation and Compensation
Block motion estimation is performed by an exhaustive search over all integer pel positions within a pre-defined search window around the motion vector predictor, which is obtained from previously estimated sub-blocks in the same way as in TML8 [2]. In a number of subsequent steps, the best integer pel motion vector is refined to the final 14 -pel accuracy by searching in a 3 × 3 sub-pel window around the refined candidate vector. All search positions are evaluated by using a Lagrangian cost function, which involves a rate and distortion term coupled by a Lagrangian multiplier. For all fractional-pel displacements, distortion in the transform domain is estimated by using the Walsh-Hadamard transform, while the rate of the motion vector candidates is estimated by using a fixed, precalculated table. This search process takes place for each of the 4 macroblock partitions and each reference frame, and the cost of the overall best motion vector candidate(s) of all 4 macroblock modes is finally compared against the cost of the intra mode decision to choose the macroblock mode with minimum cost.
Wavelet-Based Video Compression
O B M C
O B M C 1 D - w e ig h t fu n c tio n s
79
2 D - w e ig h t fu n c tio n
w e ig h t 1
w 1
0 ,9
1
0 ,8
0 ,8
0 ,7 0 ,6
0 ,6
w 1
0 ,5
0 ,4
w 2
0 ,4
0 ,2
0 ,3
0
0 ,2
1 ,0
0 ,1
0 ,7 0 0 ,0 0
0 ,1 0
0 ,2 0
0 ,3 0
0 ,4 0
0 ,5 0
0 ,6 0
0 ,7 0
0 ,8 0
0 ,9
1
y '
x ´
0 ,4 0 ,1
(a)
1
0 ,8
0 ,6
0 ,4
0
0 ,2 x '
(b)
Fig. 2. (a) 1-D profile of 2-D weighting functions along the horizontal or vertical axes of two neighboring overlapping blocks. (b) 2-D weighting function The prediction error luminance (chrominance) signal is formed by the weighted sum of the differences between all 16 × 16 (8 × 8) overlapping blocks from the current frame and their related overlapping blocks with displaced locations in the reference frame, which have been estimated in the BME stage for the corresponding core blocks. In the case of an intra macroblock, we compute the weighted sum of the differences between the overlapping blocks of the current intra blocks and its related DC-values. As a weighting function w, we used the ’raised cosine’, as shown in Fig. 2. For a support of N × N pels, it is given by 1 2πn w(n, m) = wn · wm , wn = 1 − cos for n = 0, . . . , N. (1) 2 N In our presented approach, we choose N = 16 (N = 8) for the luminance (chrominance, resp.) in Eq. (1), which results in a 16 × 16 (8 × 8) pixel support centered over a ”core” block of size 8 × 8 (4 × 4) pels for the luminance (chrominance, resp.). For the texture interpolation of sub-pel positions, the same filters as specified in TML8 [2] have been used.
4
Wavelet Transform
In wavelet-based image compression, the so-called 9/7-wavelet with compact support [3] is the most popular choice. Our proposed coding scheme, however, utilizes a class of biorthogonal wavelet bases associated with infinite impulse response (IIR) filters, which was recently constructed by Petukhov [9]. His approach relies on the construction of a dual pair of rational solutions of the matrix equation ˜ T (z −1 ) = 2I, (2) M (z)M where I is the identity matrix, and h(z) h(−z) M (z) = , g(z) g(−z)
˜ (z) = M
˜ ˜h(−z) h(z) g˜(z) g˜(−z)
80
Detlev Marpe et al.
0.2
0.2
0.25
0.25 0.2
0.15 0.15
0.2
0.15
0.1 0.05
0.1
0.05
0
0.1
0 -0.05
-0.05 0.05
0.1
0.15
0.05
-0.1
-0.1
-0.2
-0.2 -4
-3
-2
-1
0
1
2
3
4
0.2
-0.05 -4
-3
-2
-1
0
1
2
3
4
0.2
-0.25 -4
-3
-2
-1
0
1
2
3
4
0.25 0.2
-1
0
1
2
3
4
2
3
4
-4
-3
-2
-1
0
1
2
3
4
0
-0.15
0
-0.2
-0.2 -2
1
-0.1
-0.1
-3
0
-0.05 0.05
-0.15 -4
-1
0.05 0.1
-0.05
0
-2
0.1
0.15
0
0.05
-3
0.15
0.1 0.05
0.1
-4 0.25 0.2
0.15 0.15
-0.15
0
-0.15 0
-0.05 -4
-3
-2
-1
0
1
2
3
4
-0.25 -4
-3
-2
-1
0
1
2
3
4
Fig. 3. From left to right: scaling function of analysis, analyzing wavelet, scaling function of synthesis, and synthesizing wavelet used for I-frame coding (top row) and P-frame coding (bottom row) are so-called ‘modulation matrices’. In [9], a one-parametric family of filters ha , ga , ˜ha and g˜a satisfying Eq. (2) was constructed:1 1 ha (z) = √ (1 + z), (3) 2 (2 + a)(z −1 + 3 + 3z + z 2 )(z −1 + b + z) ˜ √ ha (z) = , (4) 4 2(2 + b)(z −2 + a + z 2 ) (2 + a)(z −1 − 3 + 3z − z 2 )(−z −1 + b − z) √ , (5) ga (z) = 4 2(2 + b) 1 − z −1 1 , (6) g˜a (z) = √ −2 2 z + a + z2 |a| > 2, a = 6. where b = 4a−8 6−a , To adapt the choice of the wavelet basis to the nature and statistics of the different frame types of intra and inter mode, we performed a numerical simulation on this one-parametric family of IIR filter banks yielding the optimal value of a = 8 for intra frame mode and a = 25 for inter frame mode in Eqs. (3)–(6). Graphs of these optimal basis functions are presented in Fig. 3. Note that the corresponding wavelet transforms are efficiently realized with a composition of recursive filters [9].
5
Pre-coding of Wavelet Coefficients
For encoding the quantized wavelet coefficients, we follow the conceptual ideas initially presented in [6] and later refined in [7]. Next, we give a brief review of the involved techniques. For more details, the readers are referred to [6,7]. 1
ha and ga denote low-pass and high-pass filters of the decomposition algorithm, ˜ a and g˜a denote the corresponding filters for reconstruction. respectively, while h
Wavelet-Based Video Compression
AA A AA AA AAAAAAAAA A A AA AA AA A AA AA AA AA Motion Vectors
delay
Zerotree Aggregation
Significance Map
Conditioning
Conditioning Categories
Isolated Zeros
Partitioning
Magnitude Map
Sign Map
81
Conditioning
Conditioning
A r i t h m e t i c C o d e r
Fig. 4. Schematic representation of the pre-coder used for encoding the quantized wavelet coefficients 5.1
Partitioning
As shown in the block diagram of Fig. 4, an initial ‘partitioning’ stage divides each frame of quantized coefficients into three sub-sources: a significance map, indicating the position of significant coefficients, a magnitude map holding the absolute values of significant coefficients, and a sign map with the phase information of the wavelet coefficients. Note that all three sub-sources inherit the subband structure from the quantized wavelet decomposition, so that there is another partition of each sub-source according to the given subband structure. 5.2
Zerotree Aggregation
In a second stage, the pre-coder performs an ‘aggregation’ of insignificant coefficients using a quad-tree related data structure. These so-called zerotrees [4,6] connect insignificant coefficients, which share the same spatial location along the multiresolution pyramid. However, we do not consider zero-tree roots in bands below the maximum decomposition level. In inter-frame mode, coding efficiency is further improved by connecting the zerotree root symbols of all three lowest high-frequency bands to a so-called ’integrated’ zerotree root which resides in the LL-band. 5.3
Conditional Coding
The final ‘conditioning’ part of the pre-coding stage supplies the elements of each source with a ‘context’, i.e., an appropriate model for the actual coding process in the arithmetic coder. Fig. 5 (a) shows the prototype template used for conditioning of elements of the significance map. In the first part, it consists of a causal neighborhood of the actual coding event C, which depends on the
82
Detlev Marpe et al.
P
level l+1 C V
level l H
(a)
C
(b)
Fig. 5. (a) Two-scale template (white circles) with an orientation dependent design for conditional coding of an event C of the significance map; V , H: additional element used for vertical and horizontal oriented bands, respectively. (b) 8-neighborhood of significance used for conditioning of a given magnitude C scale and orientation of a given band. Except for the lowest frequency bands, the template uses additional information of the next upper level (lower resolution) represented by the neighbors of the parent P of C, thus allowing a ’prediction’ of the non-causal neighborhood of C. The processing of the lowest frequency band depends on the intra/inter decision. In intra mode, mostly non-zero coefficients are expected in the LL-band, so there is no need for coding a significance map. For P-frames, however, we indicate the significance of a coefficient in the LLband by using the four-element kernel of our prototype template (Fig. 5 (a)), which is extended by the related entry of the significance map belonging to the previous P-frame. The processing of subbands is performed band-wise in the order from lowest to highest frequency bands and the partitioned data of each band is processed such that the significance information is coded (and decoded) first. This allows the construction of special conditioning categories for the coding of magnitudes using the local significance information. Thus, the actual conditioning of magnitudes is performed by classifying magnitudes of significant coefficients according to the local variance estimated by the significance of their 8-neighborhood (cf. Fig. 5 (b)). For the conditional coding of sign maps, we are using a context built of two preceding signs with respect to the orientation of a given band [7]. For coding of the LL-band of I-frames, the proposed scheme uses a DPCM-like procedure with a spatially adaptive predictor and a backward driven classification of the prediction error using a six-state model.
6
Binarization and Adaptive Binary Arithmetic Coding
All symbols generated by the pre-coder are encoded using an adaptive binary arithmetic coder, where non-binary symbols like magnitudes of coefficients or motion vector components are first mapped to a sequence of binary symbols by means of the unary code tree. Each element of the resulting ”intermediate” code-
Wavelet-Based Video Compression
83
word given by this so-called binarization will then be encoded in the subsequent process of binary arithmetic coding. At the beginning of the overall encoding process, the probability models associated with all different contexts are initialized with a pre-computed start distribution. For each symbol to encode the frequency count of the related binary decision is updated, thus providing a new probability estimate for the next coding decision. However, when the total number of occurrences of symbols related to a given model exceeds a pre-defined threshold, the frequency counts will be scaled down. This periodical rescaling exponentially weighs down past observations and helps to adapt to non-stationarities of a source. For intra and inter frame coding we use separate models. Consecutive Pframes as well as consecutive motion vector fields are encoded using the updated related models of the previous P-frame and motion vector field, respectively. The binary arithmetic coding engine used in our presented approach is a straightforward implementation similar to that given in [13].
7 7.1
Experimental Results Test Conditions
To illustrate the effectiveness of our proposed coding scheme, we used an improved MPEG-4 coder [10] as a reference system. This coder follows a ratedistortion (R-D) optimized encoding strategy by using a Lagrangian cost function, and it generates bit-streams compliant with MPEG-4, Version 2 [1]. Most remarkable is the fact that this encoder provides PSNR gains in the range from 1.0–3.0 dB, when compared to the MoMuSys reference encoder (VM17) [10]. For our experiments, we used the following encoding options of the improved MPEG-4 reference coder: – – – – –
1 4 -pel
motion vector accuracy enabled Global motion compensation enabled Search range of ±32 pels 2 B-frames inserted (IBBP BBP . . .) MPEG-2 quantization matrix used
For our proposed scheme, we have chosen the following settings: – 14 -pel motion vector accuracy – Search range of ±32 pels around the motion vector predictor – No B-frames used (IP P P . . .) – Five reference pictures were used for all sequences except for the ’News’sequence (see discussion of results below) Coding experiments were performed by using the test sequences ’Foreman’ and ’News’ both in QCIF resolution and with 100 frames at a frame rate of 10 Hz. Only the first frame was encoded as an I-frame; all subsequent frames were encoded as P-frames or B-frames. For each run of a sequence, a set of quantizer parameters according to the different frame types (I,P,B) was fixed.
84
Detlev Marpe et al.
Fig. 6. Average Y-PSNR against bit-rate using the QCIF test sequence ’Foreman’ at a frame rate of 10 Hz
Fig. 7. Average Y-PSNR against bit-rate using the QCIF test sequence ’News’ at a frame rate of 10 Hz
Wavelet-Based Video Compression
(a)
85
(b)
Fig. 8. Comparison of subjective reconstruction quality: Frame no. 22 of ’Foreman’ at 32 kbit/s. (a) DVC reconstruction (b) MPEG-4 reconstruction. Note that the MPEG-4 reconstruction has been obtained by using a de-blocking filter
7.2
Test Results
Figs. 6–7 show the average PSNR gains obtained by our proposed DVC scheme relative to the MPEG-4 coder for the test sequences ’Foreman’ and ’News’, respectively. For the ’Foreman’-sequence, significant PSNR gains of 2.0–2.5 dB on the luminance component have been achieved (cf. Fig. 6). Figure 8 shows a comparison of the visual quality for a sample reconstruction at 32 kbit/s. The results we obtained for the ”News”-sequence show dramatic PSNR improvements of about 2.5–3.5 dB. To demonstrate the ability of using some a priori knowledge about the scene content, we checked for this particular sequence in addition to the five most recent reference frames one additional reference frame 50 frames back in the past according to the repetition of parts of the scene content. By using the additional reference frame memory for this particular test case, we achieved an additional gain of about 1.5 dB PSNR on the average compared to the case where the reference frame buffer was restricted to the five most recent reference frames only.
8
Conclusions and Future Research
The coding strategy of DVC has proven to be very efficient. PSNR gains of up to 3.5 dB relative to an highly optimized MPEG-4 coder have been achieved. However, it should be noted that in contrast to the MPEG-4 coding system, no B-frames were used in the DVC scheme, although it can be expected that DVC will benefit from the usage of B-frames in the same manner as the MPEG-4 coder, i.e., depending on the test material, additional PSNR improvements of up to 2 dB might be achievable. Another important point to note, when comparing
86
Detlev Marpe et al.
the coding results of our proposed scheme to that of the highly R-D optimized MPEG-4 encoder used for our experiments, is the fact that up to now, we did not incorporate any kind of high-complexity R-D optimization method. We even did not optimize the motion estimation process with respect to the overlapped motion compensation, although it is well known, that conventional block motion estimation is far from being optimal in an OBMC framework [8]. Furthermore, we believe that the performance of our zerotree-based wavelet coder can be further improved by using a R-D cost function for a joint optimization of the quantizer and the zerotree-based encoder. Thus, we expect another significant gain by exploiting the full potential of encoder optimizations inherently present in our DVC design. This topic will be a subject of our future research.
References 1. ISO/IEC JTC1SC29 14496-2 MPEG-4 Visual, Version 2. 83 2. Bjontegaard, G. (ed.): H.26L Test Model Long Term Number 8 (TML8), ITU-T SG 16 Doc. VCEG-N10 (2001) 76, 78, 79 3. Cohen, A., Daubechies, I., Feauveau, J.-C.: Biorthogonal Bases of Compactly Supported Wavelets, Comm. on Pure and Appl. Math., Vol. 45 (1992) 485–560 79 4. Lewis, A., Knowles, G.: Image Compression Using the 2D Wavelet Transform, IEEE Trans. on Image Processing, Vol. 1, No. 2 (1992) 244–250 81 5. Marpe, D., Bl¨ attermann, G., Wiegand, T.: Adaptive Codes for H.26L, ITU-T SG 16 Doc. VCEG-L13 (2001) 77 6. Marpe, D., Cycon, H. L.: Efficient Pre-Coding Techniques for Wavelet-Based Image Compression, Proceedings Picture Coding Symposium 1997, 45–50 77, 80, 81 7. Marpe, D., Cycon, H. L.: Very Low Bit-Rate Video Coding Using Wavelet-Based Techniques, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 9, No. 1 (1999) 85–94 77, 80, 82 8. Orchard, M. T., Sullivan, G. J.: Overlapped Block Motion Compensation: An Estimation-Theoretic Approach, IEEE Trans. on Image Processing, Vol. 3, No. 5 (1994) 693–699 86 9. Petukhov, A. P.: Recursive Wavelets and Image Compression, Proceedings International Congress of Mathematicians 1998 79, 80 10. Schwarz, H., Wiegand, T.: An Improved MPEG-4 Coder Using Lagrangian Coder Control, ITU-T SG 16 Doc. VCEG-M49 (2001) 76, 77, 83 11. Wiegand, T., Zhang, X., Girod, B.: Long-Term Memory Motion-Compensated Prediction, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 9, No. 1 (1999) 70–84 76 12. Wiegand, T., Lightstone, M., Mukherjee, D., Campbell, T. G., Mitra, S. K.: RateDistortion Optimized Mode Selection for Very Low Bit Rate Video Coding and the Emerging H.263 Standard, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 6, No. 2 (1996) 182-190 76 13. Witten, I., Neal, R., Cleary, J.: Arithmetic Coding for Data Compression, Communications of the ACM, Vol. 30 (1987) 520–540 83
Wavelets and Fractal Image Compression Based on Their Self–Similarity of the Space-Frequency Plane of Images Yoshito Ueno Graduate School of Engineering, SOKA University 1-236 Tangi-cho Hachioji-shi Tokyo 192-8577, Japan
[email protected]
Abstract. This paper presents a fusion scheme for Wavelets and Fractal image compression based on the self-similarity of the space-frequency plane of sub-band encoded images. Various kinds of Wavelet transform are examined for the characteristics of their self–similarity and evaluated for the adoption of Fractal modeling. The aim of this paper is to reduce the information of the two sets of blocks involved in the Fractal image compression by using the self-similarity of images. And also, the new video encoder using the fusion method of Wavelets and Fractal adopts the similar manner as the motion compensation technique of MPEG encoder. Experimental results show almost the same PSNR and bits rate as conventional Fractal image encoder depending on the sampled images by computer simulations.
1
Introduction
Ordinary video compression methods based on DCT (Discrete Cosign Transform) have been standardized for N-ISDN networks and mobile communications. However, this DCT transform usually produces so called the block noise and mosquito noise when encoding is performed. Therefore, JPEG 2000 accepts the Wavelet transform instead of DCT to reduce these noises. After applying the Wavelet transform for the images, we can observe the self- similarity into transform-ed images and can introduce the Fractal image coding for these images. This paper presents the fusion method of Wavelet transform and Fractal coding and derives experimental results for still images. Also, I propose the new video compression algorithms to encode the inter-frame image through mapping the range block of N-frame into the same range block of (N+1) frame.
2
Self-Similarity of Wavelet Transformed Images
Generally, images carried out by 3 stages down sampling using multi-resolution analysis are derived into lower frequency domain and higher frequency region using Wavelet transform. Especially, higher frequency components of images have the selfY. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 87-98, 2001. Springer-Verlag Berlin Heidelberg 2001
88
Yoshito Ueno
similarity between each frequency components, horizontally, vertically and diagonally. This self-similarity can be derived from the correlation coefficients between lower frequency components and theirs similar high frequency components. [1] Let’s try the multi-resolution analysis of images (N x N pixels) by applying Wavelet transform. At first, row pixels of images are quantized into space-frequency domain by the multi-resolution analysis. Then, we can assume the lower frequency components of pixels at the resolution level j and position n as written in {
C
( j) n
: n = 1,2, ----N/2}
and the higher components of pixels at the same level and the same position as written in {
d
( j) n
: n = 1, 2, ------ N/2}.
However, the first row pixel is assumed to be {
C
0 k
: k = 1, 2, ---- N}. According to
the multi-resolution analysis, images are decomposed into (j+1) level and executed by the following equation. N
C
( j +1) n
j −1
1 2 = ∑ 2 k =0 N
pC k
( j) k +2n
j −1
1 2 = d n+ N 2 2 ∑ k =0 ( j +1)
q C( ) j
k
k +2n
(1)
Here, j: resolution level (j = 0,1,2, ----) Parameters pk,qk are given by the next reflexive function. m
φ ( x) = ∑ p φ (2 x − k ) k =0
k
m
ϕ ( x) = ∑ q ϕ (2 x − k ) k =0
k
(2)
Here, φ ( x ) ≠ 0, ϕ ( x ) ≠ 0 during 0< k ≤ m interval. However, m is an odd number and in general, the decomposition level is carried out into 3 or 5 octave down sampling. And also, for the column pixels, the same multi-resolution analysis is performed and the space-frequency domains are quntized. Therefore, after the multi-resolution analysis, the correlation parameter ϒr between sub-bands of row pixels is given in the following equation.
Wavelets and Fractal Image Compression
n
γ
r
=
∑ (C j =0
n
( j +1) n
89
( j +1)
− C n)(d n + N − d n + N ) 2
2
( j +1) ∑ ( −C n) X C n j =0
2
( j +1) ∑ (d n + N − d n + N 2 ) n
j =0
2
2
(3) And also, the correlation parameter ϒc between sub-bands of column pixels is obtained by the same manner. 2.1 Self-Similarity of Images after Wavelet Transform Using Haar Basis Filter Images are quantized into down sampling by two-dimensional Wavelet transform using Haar basis filter with orthogonal functions. Each sub-band is assumed as the independent images and performed into horizontal, vertical and diagonal groups as shown in Fig.1.
Fig. 1 Grouping all sub-bands in the horizontal, vertical and diagonal directions
The correlation coefficients between these groups of image components are very high as usual. This means that Wavelet transform by Haar basis orthogonal function carries out the mean value among adjacent pixels can decompose into the space-frequency plane with the lower resolution and repeat into n times reflectively. Exactly, this process repeats the finite different operation after averaging the adjacent pixels. The operation of deriving the correlation coefficients is shown in Fig.2.
90
Yoshito Ueno
Fig. 2 The abstraction method of histogram of correlation coefficients among sub-bands
According to Fig.2, the 4 components of the frequency band SBn and the 16 components of the corresponding frequency band SBn-1 are compared and carried out the correlation coefficients of them and put into the frequency distribution. As a result, the frequency distribution of the correlation coefficients of Lena image (512 X 512 pixels, 8 bps) as shown in Fig.3.
Fig. 3 Self-Similarity among sub-bands of image after transform using Haar basis filter
Therefore, Fractal coding is feasible by utilizing the high correlation coefficients of horizontal, vertical and diagonal components of images respectively. And also, the other Wavelet transform basis filter, such as 4 taps length of Daubechies filter (N = 2) having sharp frequency cut off characteristics and 12 taps length of Coiflet filter (N = 4) having linear phase characteristics for the measurement of the correlation coefficient among sub-bands of images have been examined. However, we observed the zero frequency distribution of the correlation coefficient after applying these filters because of longer taps and dispersing Wavelet parameters. 2.2 Self-Similarity among the Same Bands by Wavelet Packet Transform Wavelet transform can decompose the lower frequency bands of images into spacefrequency domain reflexively. Therefore, when the quantization process of the spacefrequency characteristics of images by the multi-resolution analysis does not match to
Wavelets and Fractal Image Compression
91
the space-frequency characteristics of images, Wavelet transform becomes the inefficient decomposition. On the other hand, Wavelet packet transform can be adaptable to decompose the subbands of images without restricting the lower frequency bands of images. Namely, the adaptive octave decomposition of images has higher correlation coefficients of the same sub-bands of images. For example, four partition of the same sub-bands of images has higher correlation coefficients. When these four divided fork packets maintain the relation to the parent packets in the position, these relation of the location can be encoded into the Fractal form. [1] And also, Wavelet packet transform can use linear orthogonal basis filters and decompose images by rapid process using single tree algorithms.
3
Reflexive Encoder Using Self-Similarity of Images
The image compression method using Fractal encoder has the characteristics of the plenty amount of computation because of searching the self-similarity among subsets of whole images. [2] Therefore, Fractal encoder has been proposed by utilizing the self-similarity among sub-bands of images in the horizontal, vertical and diagonal group after octave decomposition of Wavelet transform respectively. Furthermore, Fractal encoder can be achieved by utilizing the self-similarity of the same sub-bands of images using Wavelet packet transform. 3.1 Domain Decision of Pixels on the Wavelet Space-Frequency Plane For applying Fractal encoder of images, the range blocks of each sub-band on the space-frequency plane after the multi-resolution analysis are divided into the three domains, such as the shade region, the mid-range and the edge regions. Let’s calculate the frequency number F(R) of pixel (R) in the range blocks and classify the domain using the average expected value F(R) and the range width of edge searching width in the sub-band, w. 1.
Shade region: in the case of the equal values of pixels in the range blocks.
F ( R ) = RS × RS
2.
(4) Mid-range region: in the case of the existence of all values of pixels centered around the expected value F (R ) of the frequency number within the range width of the searching edge, w.
R < F ( R) + w
(5) 3. Edge region: in the case of the area without the above conditions. The other division methods of domains can partition into the shade and edge region comparing the size with the threshold level of the variance for the value of pixels, which is calculated inside the range blocks.
92
1. 2.
Yoshito Ueno
In the case of Threshold σ2 ≥, range blocks become to be the shade region. In the case of σ2 >Threshold, range blocks become to be the edge region.
3.2 Fractal Image Encoding In general, Fractal image encoder is similar to the vector quantization and encoded by the repetition function, that is, images are divided into several blocks and approximated by conversion code books which are derived from images themselves. This conversion equation is described into conversion terms and linear terms. Mapping into scale-down on the other blocks with different resolution can derive the information of conversion code. As for the Fractal encoder, we can use the IFS (Iterated Function Systems) method that processes the model of self-similarity images. [2] Applying the affine transform after searching the domain blocks that have the practical self-similarity of images for the range blocks on the Wavelet space can derive the scale-down images. This searching range is called as the domain pool. For the sake of reducing the amount of the computation of IFS encoding, three major parameters, such as range blocks, domain blocks and domain pools are selected by utilizing the correlation among the adjacent sub-bands. The scale-down image can extract the targeted range block Ri,j (RS×RS) and the domain block Di,j (DS×DS): DS = 2RS from the smallest frequency band SBn excepting the sub-band SB4 on the Wavelet space. According to the partial self-similarity of images, four range blocks Ri’+k, j’+1and one domain pool Di,j(4RS×4RS) can be selected from the next frequency band SBn-1 at the condition of a reference position at the lower left block as shown in Fig.4.
Fig.4 The selection method of each block on the wavelet space
Then, the domain block Di’,j’ corresponding to the four range blocks Ri’+k, j’+l are selected from the domain pool. These processes are applied to the next frequency band and searched in the whole image repeatedly. Above process can be described into the following equation.
Wavelets and Fractal Image Compression
R
i, j
93
≅ l × (α × Di , j + g )
(6) Here, R: the value of pixels inside the range cell, l: the mapping transform, α : the brightness scaling, Di,j: the value of pixels inside the domain cell, g: the amount of shift. Then,
R
i, j
⇒
R
i, j
⊃
Ri
'
+k ,
j +l , and Di , j ⇒ '
D
i, j
⊃
Di
'
,
j
'
(7)
Here, Ri,j : the set of range cell Di,j : The domain pool And also, at the next frequency band, the same affine transform can be established.
Ri
'
+k ,
' ' j +l ≅ m × α × s × Di ', j + ∆g
(8) This block selection method can process the IFS encoder that higher sub-band images (SBn) with MRR are encoded at first and the relation of position between the range block and the corresponding domain block at the next sub-band (SBn-1), range blocks and domain pools are determined. At this scheme, the domain pool within higher subbands within the highest decomposition level is assumed for the whole bands. In general, concerning about the searching area of selecting blocks, this block selection method can produce minimum number of domain pools and range blocks to be searched due to the existence of higher frequency components of the Wavelet space on the higher sub-bands. After encoding images, range blocks of each frequency band and transformed domain blocks have higher correlation coefficients. When the information between them are redundant, choosing the one side of information can reduce these overlapped information. The mechanism of reducing the quantity of information by the above cell selection method is shown in Fig.5.
Fig.5
The reduction method of amounts of block information by cell selection
94
Yoshito Ueno
For example, when the domain block No.24 is vanished, the original domain block No.2 or beginning domain block No.0 can be abstracted and reproduced into the necessary information. After all, at the transmitter side, the coded stream can be derived through the entropy coding of parameters about the scale-down images and the position of domain cells for each range cell. However, the lowest frequency band (SB4) with high resolution is coded into 8 bits linear quantization. At the receiver side, we can find the specific domain block that has to process the affine transform, the brightness scaling, the brightness shift and the symmetrical transform for each one range block of any initial images. According to this transform, we can derive the pixel value of the range block from the domain block. When the mean square error between the reproduced images by this process repeatedly. When the begging decoded image is less than threshold, this process has to converge. Then, the final reproduced image can be extracted. 3.3 Experimental Results The simulation conditions are given as follows;
Range block size, RS=2: the same length as the tap length of Haar basis filter. Domain block size, DS=4: its step size is 2. Brightness scaling: select the one value among {0.5, 0.6, 0.7, 0.8, 0.9} and quantize it. Therefore, PSNR is given by the following equation.
PSNR = 10 log 10 Here,
2
255 img −img '
(
)
2
2 γ
img : The original image
(9)
'
img : The reproduced image
(img −img ) γ '
2
2
: The mean square error between the
img and img '
γ : The size of image, γ x γ pixels The reproduced image after 14 times repetition is shown in Fig.6 and PSNR is about 31.9db.
Wavelets and Fractal Image Compression
1) Original image
95
2) Regenerated image
Fig. 6 The Regenerated image after 14 times repetitions (PSNR=31.9db)
Then, the amount of computation about this process was examined. This algorithm can reduce the searching time due to the following reason. Each sub-bands of the same group on the Wavelet space have the self-similarity characteristics, while the high correlation coefficients and the restriction of the searching area within the specific size have range blocks and domain blocks from the adjacent frequency band after creating the domain pool without range blocks. Let’s the range block size to be 2 × 2 pixels, the domain block size to be 4 × 4 pixels, the domain pool size to be 64 × 64 pixels, and the domain block size to be 2 pixels. Then, the domain pool is divided without overlapping each other. Finally, we can calculate the searching times. The number of range blocks inside the domain pool is 210. The number of domain blocks to be searched for one range block becomes 302. When the size of images is 512 × 512 pixels, the number of domain pools inside the image becomes 26. Therefore, the ordinary searching times of domain blocks using the usual methods is reached to 58,982,400 times. On the other hand, for each of horizontal, vertical and diagonal direction on the Wavelet space respectively, the domain blocks with high decomposition level need to search at first. At the decomposition level 3, the size of domain pools of this sub-band is consistent with that of the usual method. Therefore, the number of the searching times about domain blocks becomes 210×302. Then, at the decomposition level 2, the number of domain blocks inside the domain pool 4 range blocks becomes 4 × 49 . This range block group has 210sets and then the number of searching times about domain blocks is 4×49×210. Therefore, about all three directions, the numbers of searching times of domain blocks become to be
3×
{2
10
2
× 30 + 4 × 49 ×
(2
10
12
+2
)}= 5,775,360 .
96
Yoshito Ueno
Finally, the searching times, that is, the amount of computation can be reduced up to 1 10 comparing with the usual IFS methods.
4
Video Compression Method Using Wavelet and Fractal Encoder
Recently, concerning about the Fractal video compression methods, there are several methods being investigated. The video sequence encoder by 3D domain blocks and range cell has been proposed. The inter-frame mapping encoder without the repeating convergence that the previous frame assumes to be the domain pool has been investigated. The previous frame estimation encoder with the circulation is the range cell with the approximation of the domain blocks for the previous frame to be mapped. [3] In general, video images have the feature that the video objects of each frame within the same scene are assumed to be the continuous images corresponding to the motion vector of images. Therefore, after the scene cut detection of video, [4], several frames of these cuts can produce one average frame by Wavelet transform and Fractal encoding. Then, the inter-frame Fractal encoding can be carried out in the same manner as the motion compensation by utilizing the high correlation coefficients among several frames. Finally, Wavelet transform and Fractal encoder are executed by encoding the succeeding frame under the average image of Fractal encoder. The sequence of this process can be shown in Fig.7. According to this figure, the next cut of images has to be processed by Wavelet transform and Fractal encoding similar as the scheme for the still image. The initial frame up to fourth frame becomes to be one group and processed by Fractal encoding after affine transform. Let’s the range block of the k-th frame to be block
R and i
then search the domain
Dα ( ) of the previous circular frame by the next equation. i
R ≅R i
i
{
}
= S × O × Dα (i ) + Oi × C
(10)
Here, αi: the proper position of the domain block C: the constant block that the value of all pixels is 1 O: the orthogonal function operator (Fractal still image encoder) S: the contrast scaling of mapping, -1< S < 1 Oi: D.C component of range block According to the above equation, the initial frame by Wavelet transform is assumed to be the previous frame and then the succeeding frame to be the domain pool needs to be encoded by Fractal scheme.
Wavelets and Fractal Image Compression
97
At the decoding side, after mapping the arbitrary circular four frames can be reproduced. Then, the repetitive regenerated image can be decoded using range blocks and domain blocks through this average regenerated image. This process needs to continue until the regenerated image is converged. Then, the contrast and the luminance of each block are compensated by the parameters, Si and Oi. Furthermore, every parameters of the above equation have to be quantized and assigned to independent bits. And also, the entropy coding, such as Huffman coding can increase the compression ratio of these kinds of encoders. Consequently, Wavelet transform and Fractal encoder can realize for the video compression with the lower bit rate.
Fig.7 The principle of video coding by Wavelet transform and Fractal coding
5
Conclusion
In conclude, this paper presents the video compression scheme with Wavelet transform and Fractal encoder and also examined the images having the high correlation coefficients among adjacent sub-bands after Wavelet transform using Haar basis filter. As this orthogonal filter has the short length of taps, the reproduced image caused some ridges in the edge region of it and was observed some edges at the shade area due to the accumulation of small quantizing errors after the IFS encoding. In future, Johnson filter having the linear phase characteristics should be examined and the better performance by this method could be derived.
98
Yoshito Ueno
Furthermore, using the Wavelet packet transform, the size of range blocks and domain blocks can be variable because the small signal parameters at the higher frequency bands can reduce the distortions that affect the regenerated images. And also, the above-mentioned video compression scheme with the notice of the features of video needs to be investigated experimentally. Especially, the bit rate and coding efficiency using this new algorithm should be concerned about.
References 1. 2. 3. 4.
Stollinitz, E. J., et al.: Wavelets for Computer Graphics: A Premier Pt.1. IEEE Computer Graphics & Application, (May, 1995), 76-84 Jaquine, A. E.: Image Coding based on a Fractal Theory of Iterated Contractive Image Transformations. IEEE Trans. Image Processing, Vol.1, No.1, (June, 1992), 18-30 Kim, C. S., et al.: Fractal Coding of Video Sequence using CPM and NCIM. 1st New Video Media Technology Conference, (March, 1996), 72-76 Tonomura, Y., et al.: Video Handling based on Structured Information for Hypermedia Systems. Proc. of International Conference on Multimedia Information Systems, (1991), 333-344
Integration of Multivariate Haar Wavelet Series Stefan Heinrich1 , Fred J. Hickernell2 , and Rong-Xian Yue3 1
FB Informatik, Universit¨ at Kaiserslautern PF 3049, D-67653 Kaiserslautern, Germany
[email protected] 2 Department of Mathematics, Hong Kong Baptist University Kowloon Tong, Hong Kong SAR, China
[email protected] 3 College of Mathematical Science, Shanghai Normal University 100 Guilin Road, Shanghai 200234, China
[email protected]
Abstract. This article considers the error of integrating multivariate Haar wavelet series by quasi-Monte Carlo rules using scrambled digital nets. Both the worst-case and random-case errors are analyzed. It is shown that scrambled net quadrature has optimal order. Moreover, there is a simple formula for the worst-case error.
1
Introduction
Digital (t, m, s)-nets and (t, s)-sequences are popular low discrepancy point sets used for quasi-Monte Carlo multidimensional quadrature [8,11]. In recent years it has been shown that these sets are especially effective for integrating multivariate Haar wavelet series. The convergence rate depends on the decay rates of the wavelet series coefficients. This article reports recent results by the authors and others. For proofs the reader is referred to the references cited. The following section defines the Hilbert space of multivariate Haar wavelet series, Hwav . Section 3 describes constructions of digital nets and sequences, and Section 4 defines the integration problem to be studied. The main results are described in Section 5.
2
Function Spaces Spanned by Haar Wavelets
The space, Hwav , of multivariate Haar wavelets studied here was defined in [15]. The domain of interest is the unit cube, [0, 1)s , where the dimension s is any
This work was partially supported by a Hong Kong Research Grants Council grant HKBU/2030/99P, by Hong Kong Baptist University grant FRG/97-98/II-99, by Shanghai NSF Grant 00JC14057, and by a Shanghai Higher Education STF Grant.
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 99–106, 2001. c Springer-Verlag Berlin Heidelberg 2001
100
Stefan Heinrich et al.
positive integer. Let b be an integer greater than one, and define the univariate basic wavelet functions as ψγ (x) = b1/2 1bx=γ − b−1/2 1x=0 ,
γ = 0, 1, . . . , b − 1,
where 1{·} denotes the characteristic function, and x denotes the greatest integer less than x. For each subset u of the coordinate axes {1, . . . , s}, let |u| denote the cardinality of u. For each r ∈ u let κr , τr and γr be integers with κr ≥ 0, 0 ≤ τr < bκr and 0 ≤ γr < b. Define the vectors κ = (κr )r∈u , τ = (τr )r∈u , and γ = (γr )r∈u . Let ψuκτ γ be a product over r ∈ u of the dilated and translated wavelets, i.e., (b1bκr +1 xr =bτr +γr − 1bκr xr =τr ), (1) ψuκτ γ (x) := b(|κ|−|u|)/2 r∈u
where |κ| = r∈u κr . For u = ∅ we take by convention ψuκτ γ (x) = ψ∅ (x) = 1. The wavelets defined above are not orthogonal nor linearly independent, but they are nearly so. As observed in [15], b−1
[0,1)s
ψuκτ γ (x) = 0,
∀r ∈ u, ∀u, κ, τ , γ u−{r} ,
γr =0
ψuκτ γ (x)ψu κ τ γ (x) dx = δuu δκκ δτ τ
(δγr γr − b−1 ),
r∈u
where δ is the Kronecker delta function. The space of integrands, Hwav , consists of all series of wavelet functions (1) whose coefficients converge to zero quickly enough. Let and βu = βr ωuκ = βu b−2α|κ| , r∈u
for some α > 0 and βr > 0 for r = 1, . . . , s. Then define the scaled wavelets as 1/2 ω ψuκτ γ (x) = ωuκ ψuκτ γ (x). The space of multivariate Haar wavelets may then be defined as: (x) = ˆf T ψ (x) : fˆω ψ ω Hwav = f (x) = u,κ,τ ,γ
uκτ γ
ˆfω 2 < ∞ &
uκτ γ b−1
ω
ω
ω fˆuκτ γ
= 0, ∀r ∈ u, ∀u, κ, τ , γ u−{r}
.
γr =0 ω where ˆfω is a column vector of the coefficients fˆuκτ γ , and ψ ω is a column vector ω of the basis functions ψuκτ γ . Because the wavelets are not linearly independent, the condition on the sum of the series coefficients is required to insure that the series expression for f ∈ Hwav is unique. The inner product and norm for Hwav are defined in terms of the scalar product and L2 -norm of the coefficient vectors: ˆ ω , f Hwav = ˆfω = (ˆfωT ˆfω )1/2 . ˆ ω = ˆfωT g f, gHwav = ˆfω , g 2
2
Integration of Multivariate Haar Wavelet Series
3
101
Digital Sequences
One important family of low discrepancy sequences is the (t, s)-sequences in base b [11]. Moreover, all general constructions of such sequences [1,10,12,16] use the digital method [8,11,13]. Owen [14] proposed a method for randomly scrambling (t, s)-sequences so that they are still (t, s)-sequences with probability one. This random scrambling has been implemented by [7]. The following definition describes the construction of a randomly scrambled digital sequence in a prime base. A similar construction is possible for prime power bases. Definition 1 [7] Let b ≥ 2 be a prime number, and Zb = {0, 1, . . . , b − 1}. Let the following ∞ × ∞ matrices and ∞ × 1 vectors all have elements in Zb : predetermined generator matrices C1 , . . . , Cs , lower triangular scrambling matrices L1 , . . . , Ls with nonzero diagonal elements, and shift vectors e1 , . . . , es . For any non-negative integer i = · · · i3 i2 i1 (base b), define the ∞ × 1 vector Υ (i) as the vector of its digits, i.e., Υ (i) = (i1 , i2 , . . .)T . For any point z = 0.z1 z2 · · · (base b) ∈ [0, 1), let φ(z) = (z1 , z2 , . . .)T denote the ∞ × 1 vector of the digits of z. Then the scrambled digital sequence in base b is {x0 , x1 , x2 , . . .}, where each xi = (xi1 , . . . , xis ) ∈ [0, 1)s is defined by φ(xir ) = Lr Cr Υ (i) + er ,
r = 1, . . . , s, i = 0, 1, . . . ,
(2)
where all arithmetic operations in the above formula take place using arithmetic modulo b. The basic non-scrambled digital sequence takes L1 = · · · = Ls = I, and e1 = . . . = es = 0. Owen’s randomly scrambled sequence chooses the elements of L1 , . . . , Ls , e1 , . . . , es randomly, independently and uniformly over their possible values. The function φ gives proper b-ary expansions of its arguments, i.e., φ(z) cannot end in an infinite trail of b − 1s. Thus, the right side of (2) should not give a vector ending in an infinite trail of b − 1s almost surely. To insure this, it is assumed that any linear combination of columns of any Cr cannot be a vector ending in an infinite trail of b − 1s. The quality of a digital sequence is often measured by its t-value, which is related to the generator matrices. Smaller values of t imply a better sequence. The lemma below describes how to find the t-value for a digital sequence. Lemma 1. [8,9,11] Let {x0 , x1 , x2 , . . .} be a digital sequence in base b with generator matrices C1 , . . . , Cs . For any positive integer m let cTrmk be the row vector containing the first m columns of the k th row of Cr . Let t be an integer, 0 ≤ t ≤ m, such that for all non-negative integers κ = (κ1 , . . . , κs ) with |κ| = m − t the vectors crmk , k = 1, . . . , κr , r = 1, . . . , s, are linearly independent over Zb . Then for any non-negative integer ν and any λ = 0, . . . , b − 1 with λ ≤ b − (ν mod b), the set {xνbm , . . . , x(ν+λ)bm −1 }, is a (λ, t, m, s)-net in base b. (Note that a (1, t, m, s)-net is the same as a (t, m, s)-net.) If the same value of t holds for all non-negative integers m, then the digital sequence is a (t, s)-sequence.
102
4
Stefan Heinrich et al.
Problem Formulation
The integration problem studied here is integration over the unit cube: I(f ) = f (x)dx. [0,1)s
Quadrature rules to approximate this integral take the form: Q(f ; P, {wi }) =
n−1
wi f (xi )
i=0
for some set of nodes P = {x0 , . . . , xn−1 } ⊂ [0, 1)s and some set of weights {wi } = {w0 , . . . , wn−1 }. Quasi-Monte Carlo quadrature methods choose P to be a set of points evenly distributed over the integration domain and wi = n−1 for all i. The quality of a quadrature rule can be assessed by a worst-case or randomcase analysis [5]. Let Bwav be the unit ball in the Haar wavelet space, i.e., Bwav = {f ∈ Hwav : f Hwav ≤ 1}. The quadrature error for a specific integrand and a specific quadrature rule is given by Err(f ; Q) = I(f ) − Q(f ; P, {wi }). Suppose that Q is random, i.e., the nodes, weights, and number of function evaluations are all chosen randomly. Specifically, let Q be chosen from some sample space, Qn , according to some probability distribution, µ, where the average number of function evaluations is n. (Deterministic quadrature rules are the case where Qn has a single element.) The worst-case and random-case error criteria for the Haar wavelet space are: worst-case
ew (Hwav ; Qn , µ) := rms
sup |Err(f ; Q)| ,
(3a)
random-case:
er (Hwav ; Qn , µ) := sup
rms |Err(f ; Q)| .
(3b)
Q∈Qn f ∈Bwav f ∈Bwav Q∈Qn
The operator rms means root mean square. The worst-case error analysis corresponds to the case where your enemy chooses the worst possible integrand after you have chosen the particular quadrature rule. The random-case error analysis corresponds to the case where your enemy chooses the worst possible integrand after knowing your method for randomly choosing quadrature rules, but before you choose a particular one. The optimal error criteria for the Haar wavelet space are defined as the infima of the above with respect to all possible quadrature rules: ew (Hwav , n) := inf ew (Hwav ; Qn , µ), Qn ,µ
er (Hwav , n) := inf er (Hwav ; Qn , µ). Qn ,µ
A sequence of random quadrature rules (Qnm , µm ), m = 0, 1, 2, . . . is said to be optimal if it has the same asymptotic order as best possible quadrature rules. Specifically, one has worst-case and random-case optimality if there exists some
Integration of Multivariate Haar Wavelet Series
103
nonzero constant C independent of n such that for all n = 1, 2, . . . min ew (Hwav ; Qnm , µm ) ≤ Cew (Hwav , n),
nm ≤n
min er (Hwav ; Qnm , µm ) ≤ Cer (Hwav , n).
nm ≤n
It is possible for a sequence of quadrature rules to be optimal for one of the above criteria and not for the other.
5
Results
A key ingredient in the worst-case and random-case error analyses is the ∞ × ∞ matrix whose elements are the mean square errors of integrating the product of any two wavelet functions by a randomized quadrature. Define
Λ := EQ∈Qn [Err(ψ ω ; Q)][Err(ψ ω ; Q)]T . Then the worst-case and random-case error analyses can be expressed as in the following theorem. Theorem 2. [2,5] Consider the case of random quadrature rules applied to multivariate Haar wavelet series. The error criteria defined in (3) are given by ew (Hwav ; Qn ) = trace(Λ), assuming α > 1/2, er (Hwav ; Qn ) = ρ(Λ), assuming α ≥ 0, where trace denotes the trace, and ρ denotes the spectral radius or largest eigenvalue. The assumption α ≥ 0 is required to insure that the Haar wavelet series are square integrable, so that the random-case error analysis is valid. The assumption α > 1/2 is required to insure that the Haar wavelet series are absolutely summable, so that the worst-case error analysis is valid. From this theorem it can be seen that the worst-case error criterion is never smaller than the random-case error criterion because the trace of a matrix is never smaller than its spectral radius. The relationship in Theorem 2 in fact holds for all Hilbert spaces of functions [5]. The above formulas are difficult to evaluate precisely in general. However, for quasi-Monte Carlo quadrature rules based on scrambled digital nets one can derive a simple formula for ew (Hwav ; Qn ). For any ∞×1 vector φ = (φ1 , φ2 , . . .)T , let ξ(φ) denote the number of zero elements in φ preceding the first nonzero element: ξ(φ) = min{k : φk+1 = 0}.
104
Stefan Heinrich et al.
In other words, the smallest interval of the form [0, b−k ), k = 0, 1, . . . that contains z is [0, b−ξ(φ(z)) ). Next define the function G(ξ; α) as follows: G(ξ; α) ξ = 0, −1, = (b2α−1 − 1)−1 [b2α−1 (b − 1 − b1−(2α−1)ξ ) + b−(2α−1)ξ ], 0 < ξ < ∞, 2α−1 − 1)−1 (b − 1)b2α−1 , ξ = ∞. (b The kernel function, Kwav (x, y) is defined in terms of G as Kwav (x, y) = −1 +
s
[1 + βr G(ξ(φ(xr ) − φ(yr )); α)].
r=1
This is, in fact, the reproducing kernel of the Hilbert space Hwav [17]. Theorem 3. [17] Let {xi } be a basic, non-scrambled digital sequence in a prime power base b as defined in Definition 1. For quasi-Monte Carlo quadrature using any non-scrambled or randomly scrambled digital (λ, t, m, s)-net with n = λbm points it follows that [ew (Hwav ; Qn )]2 1 = n
bm −1 ˜ ı=0
Kwav (x˜ı , 0) +
λ−1 ˆ ı=1
bm −1 2(λ − ˆı) Kwav (xˆıbm +˜ı , 0) . λ ˜ ı=0
Although analogous formulas for ew (Hwav ; Qn ) exist for general reproducing kernel Hilbert spaces of integrands and general quadrature rules, they require O(n2 ) operations to evaluate. Because of the good match between Hwav and digital nets the above formula only requires O(n) operations to evaluate. The asymptotic behaviour of ew (Hwav ; Qn ) and er (Hwav ; Qn ) for scrambled net quadrature may be obtained by looking at the gain coefficients of nets as defined in [15] and analyzed in [6]. Lower bounds on the optimal convergence rates for quadrature rules may be obtained by constructing Haar wavelet series that fool any quadrature rule. Putting these results together leads to the following theorem. Theorem 4. [2] For quasi-Monte Carlo quadrature of Haar wavelet series using scrambled (λ, t, m, s)-nets in base b, the error criteria defined in (3) have the following asymptotic orders: min ew (Hwav ; Qsc,λbm ) ew (Hwav , n) n−α [log n](s−1)/2 ,
λbm ≤n
min er (Hwav ; Qsc,λbm ) er (Hwav , n) n−α−1/2 ,
λbm ≤n
where means “exactly the same asymptotic order”.
α > 1/2, α ≥ 0,
Integration of Multivariate Haar Wavelet Series
6
105
Conclusion
The original reason for investigating the integration of multivariate Haar wavelet series arose from studies of quasi-Monte Carlo quadrature of arbitrary functions. If one uses scrambled nets as the sampling points then this has been shown to be equivalent to integrating Haar wavelet series [4,6]. Thus the results reported above have broader applicability. However, no matter how smooth one assumes the integrand to be, the best convergence one can obtain using scrambled digital nets is O(n−3/2+ ) for the worst-case error. It appears that to handle smoother integrands well one must consider smoother wavelets and different quadrature rules. This is an open problem.
References 1. H. Faure, Discr´epance de suites associ´ees ` a un syst`eme de num´eration (en dimension s), Acta Arith. 41 (1982), 337–351. 101 2. S. Heinrich, F. J. Hickernell, and R. X. Yue, Optimal quadrature for Haar wavelet spaces, 2001, submitted for publication to Math. Comp. 103, 104 3. P. Hellekalek and G. Larcher (eds.), Random and quasi-random point sets, Lecture Notes in Statistics, vol. 138, Springer-Verlag, New York, 1998. 105 4. F. J. Hickernell and H. S. Hong, The asymptotic efficiency of randomized nets for quadrature, Math. Comp. 68 (1999), 767–791. 105 5. F. J. Hickernell and H. Wo´zniakowski, The price of pessimism for multidimensional quadrature, J. Complexity 17 (2001), to appear. 102, 103 6. F. J. Hickernell and R. X. Yue, The mean square discrepancy of scrambled (t, s)sequences, SIAM J. Numer. Anal. 38 (2001), 1089–1112. 104, 105 7. H. S. Hong and F. J. Hickernell, Implementing scrambled digital nets, 2001, submitted for publication to ACM TOMS. 101 8. G. Larcher, Digital point sets: Analysis and applications, In Hellekalek and Larcher [3], pp. 167–222. 99, 101 9. , On the distribution of digital sequences, Monte Carlo and quasi-Monte Carlo methods 1996 (H. Niederreiter, P. Hellekalek, G. Larcher, and P. Zinterhof, eds.), Lecture Notes in Statistics, vol. 127, Springer-Verlag, New York, 1998, pp. 109–123. 101 10. H. Niederreiter, Low discrepancy and low dispersion sequences, J. Number Theory 30 (1988), 51–70. 101 , Random number generation and quasi-Monte Carlo methods, CBMS-NSF 11. Regional Conference Series in Applied Mathematics, SIAM, Philadelphia, 1992. 99, 101 12. H. Niederreiter and C. Xing, Quasirandom points and global function fields, Finite Fields and Applications (S. Cohen and H. Niederreiter, eds.), London Math. Society Lecture Note Series, no. 233, Cambridge University Press, 1996, pp. 269–296. 101 , Nets, (t, s)-sequences and algebraic geometry, In Hellekalek and 13. Larcher [3], pp. 267–302. 101 14. A. B. Owen, Randomly permuted (t, m, s)-nets and (t, s)-sequences, Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (H. Niederreiter and P. J.-S. Shiue, eds.), Lecture Notes in Statistics, vol. 106, Springer-Verlag, New York, 1995, pp. 299–317. 101
106
Stefan Heinrich et al.
, Monte Carlo variance of scrambled net quadrature, SIAM J. Numer. Anal. 34 (1997), 1884–1910. 99, 100, 104 16. I. M. Sobol’, Multidimensional quadrature formulas and Haar functions (in Russian), Izdat. “Nauka”, Moscow, 1969. 101 17. R. X. Yue and F. J. Hickernell, The discrepancy of digital nets, 2001, submitted to J. Complexity. 104 15.
An Application of Continuous Wavelet Transform in Differential Equations Qu Han-zhang1 , Xu Chen2 , and Zhao Ruizhen3 1
Xi’an Post and Telecommunications Institute Xi’an, P. R. China 2 Xidian University Xi’an, 710071, P. R.China 3 Shenzhen University 518060, P.R.China
Abstract. The relation btween some differential equations and the integral equations is discussed;the differential equations can be transformed into the integral equations by using the continuous wavelet transform; the differential equations and the integral equations are equivalent not only in the weak topology but also in the strong topology; the discussion on the differential equations can be connected with the discussion on the integral equations.
1
Introduction
Wavelet theory includes the discret wavelet transform and continuous wavelet transform. On the discrete wavelet transform and its applications there are many papers. But on the continuous wavelet transform and its applications there are a few papers. Especially on the application of the continuous wavelet transform there are few papers. Therefore it is necessary to continue to discuss the wavelet transform and its applications. On the continuous wavelet transform there are some results. These results mainly come from ’Ten Lecture on Wavelets’ Wwritten by Ingrid Daubechies. Among those results there is the following result. Lemma 1.1[1] ψ(x) ∈ L2 (R), 0 < Cψ < +∞, then for any f (x) ∈ L2 (R) da −1 f (x) = (2πCψ ) < f, ψa,b > ψa,b db 2 R |a| R The above formula is true not only ine weak topology but also in the strong topology. In this paper we connect some differential equatins with the integral equations by using the continuous wavelet transform, provide a method of the discussing the properties of the differential equations and enlarge the applications of continuous wavelet transform.
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 107–116, 2001. c Springer-Verlag Berlin Heidelberg 2001
108
2
Qu Han-zhang et al.
Some Differential Equations and the Integral Equations
We consider the following equation. n
ak (x)y (k) = b(x)
(1)
k=0
{ak (x); k = 0, 1, · · · , n} ⊂ L∞ (R), y (k) ; k = 0, 1, · · · , n ⊂ L2 (R), b(x) ∈ L2 (R) (k) Take ψ (x); k = 0, 1, · · · , n ⊂ L2 (R),Supp(ψ) ⊂ [−L, L], 0 < Cψ =
R
2 |ψ(η)| dη = (2π)−1 < +∞ |η|
According to lemma 1.1 there is the following. da < y, ψa,b > ψa,b (x)db y(x) = 2 R |a| R
(2)
We differentiate formula (2). If we could commute the order between the differential and the integral, we should get da (k) y (k) (x) = < y, ψa,b > a−k ψa,b (x)db (3) 2 R |a| R {k = 0, 1, · · · , n} In formula (1) we should substitute the expressions of y and y (k) {k = 0, 1, . . . , n} for y and y (k) {k = 0, 1, . . . , n} respectively, and should get the following. R
da |a|2
R
< y, ψa,b >
n k=0
(k)
a−k ak (x)ψa,b (x)db = b(x)
(4)
Because we don’t know whether the order between the differential and the integral can be commuted, we don’t know whether the integral operator in the left of formula (4) exists or not. n k=0
ak (x)y (k) =
R
da |a|2
R
< y, ψa,b >
n k=0
(k)
a−k ak (x)ψa,b (x)db
(5)
If formula (5) is true, we say that differential equation (1) is equivalent to integral equation (4). If formula (5) is true in the weak topology, we say that differential equation (1) is equivalent to integral equation (4) in the weak topology. If formula (5) is true in the strong topology, we say that differential equation (1) is equivalent to integral equation (4) in the strong topology. Define n+1
H = (f, f , · · · , f n ); {f, f , · · · , f n } ⊂ L2 (R) ⊂ L2 (R) × · · · × L2 (R)
An Application of Continuous Wavelet Transform in Differential Equations
109
If formula (5) is true, the integral operator in the left of formula (5) is a bounded linear operator from H to L2 (R). From our assumption we can not know whether the order between the differential the integral can be commuted, so we can not know whether they are equivalent. In this paper we maily discuss whether formula (1) is equivalent to formula (4). Are they equivalent in the weak topology? Are they equivalent in the strong topology?
3
The Relation between Formula (1) and Formula (4) in the Weak Toplogy
In order to discuss whether they are equivalent, firstly we discuss whether they are equivalent in the weak topology. Theorem 3.1 Formula (5) is true in the weak topology. That is, formula (1) is equivalent to formula (4) in the weak topology. Proof: We only need to prove that the operator in the left of formula (1) is equivalent to the operator in the left of formula (4) in the weak topology. In order to do this we only need to prove that in the weak topology for k =
da
(k) < y, ψa,b > a−k ak (x)ψa,b (x)db. 0, 1, · · · , n, ak (x)y (k) is equivalent to R |a| 2 R That is, for any g(x) ∈ L2 (R), there is the following. da (k) (k) < y, ψa,b >< a−k ak ψa,b , g > db (6) < ak y , g >= 2 R |a| R Define H1 = {(f, f , · · · , f n ); {f, f , · · · , f n }
⊂ L2 (R), k = 0, 1, · · · , n, Supp(f (k) ) is compact
For any X = (f, f , · · · , f n ) ∈ H1 , x H1 =
n
12
f (k) 2
.
k=0
Firstly we prove that formula is true for any (y, y , · · · , y n ) ∈ H1 . For any (y, y , · · · , y n ) ∈ H1 , we only need to calculate the inner product in the left of formula (6). < ak y (k) , g >=< y (k) , ak g >=< (y (k)), (ak g) > = 2π R
(y (k))(η)(ak g)d(η)
= 2π
R
(y (k))(η)(ak g)d(η)
R
R
)|2 |ψ(W dW |W | 2 |ψ(aη)| dη |η|
110
Qu Han-zhang et al.
(taking w = aη)
= 2π R
|a|2
da ψ(aη) ψ(aη)(a k g )dη k (η R (iη) y
(according to F ubinis theorem[2] ) da −k ibη a [ y (η)e e−ibη (iaη)k ψ(aη)(a = ψ(aη)dη][ k g )]dη 2 |a| R R R R (according to the property that inverse Fourier transform preserves their inner products[3] ) da −k ibη a [ y (η)e e−ibη (ψ (k))(aη)(ak g)]dη ψ(aη)dη][ = 2 R |a| R R R (according to the property that (ψ (k))(aη) = (iaη)k ψ(aη)) da −k (k) a < y, (ψa,b) >< (ψa,b ), (ak g) > db = 2 |a| R R [4]
(k) (ψ (k)) ) (substituting (ψa,b), (ψa,b ) f or ψ, da −k (k) a < y, ψa,b >< ψa,b , ak g > db = 2 R |a| R
(according to the property that Fourier transform preserves their inner products) da (k) = < y, ψa,b >< a−k ak ψa,b , g > db 2 |a| R R That is, for any (y, y , · · · , y (n) ) ∈ H1 formula (6) is true. Thatis, the integral operator in the left of formula (6) is a bounded linear operator. (n) of H1 For any (g0 , g1 , · · · , gn ) ∈ H 1 , there is a sequence yl , yl , · · · , yl 12 (k) (k) 2 n such that for any m ≥ l, k=0 ym − yl
1 (k) 2 2 n < 2−l , liml→0
g − y
= 0. (k) l k=0 ∞ (k) (k) (k) For k = 0, 1, · · · , n, gk = y1 + (yl+1 − yl ). It is true in the strong l=1
topology. According to the properties of the bounded linear functional we have the follwing. (k)
< ak gk , g >=< gk , ak g >=< y1 +
∞ l=1
= R
da |a|2
R
(k)
(k)
(yl+1 − yl ), ak g > (k)
< y1 , ψa,b >< a−k ψa,b , ak g > db
An Application of Continuous Wavelet Transform in Differential Equations
+ = R
∞ l=1
da |a|2
da |a|2
R
(k)
< yl+1 − yl , ψa,b >< a−k ψa,b , ak g > db
R
R
=
R
< y1 + da |a|2
111
∞ l=1
R
(k)
(yl+1 − yl ), ψa,b >< a−k ψa,b , ak g > db (k)
< g0 , ψa,b >< a−k ψa,b , ak g > db
That is, for any (g0 , g1 , · · · , gn ) ∈ H 1 formula (6) is true in the weak topology. Because H1 ⊂ H ⊂ H 1 , for any (y, y , · · · , y (n) ) ∈ H formula (6) is true in the weak topology. We complete the proof.
4
The Relation between Formula (1) and Formula (4) in the Strong Topology
We prove that formula (1) is equivalent to formula (4) in the strong topology. Theorem 4.1 Formula (1) is equivalent to formula (4) in the strong topology. Proof: We prove that the following formula is true. lim
A1 →0,A2 →∞,B→∞
n k=0
ak (x)y
(k)
(x) −
A1 ≤|a|≤A2 n k=0
da |a|2
|b|≤B
< y, ψa,b >
(k)
a−k ak (x)ψa,b (x)db = 0
According to the triangle inequality and ak ∈ L∞ (R) we only need to prove that for k = 0, 1, · · · , n, da (k) lim
y (k) (x) − < y, ψa,b > a−k ψa,b (x)db = 0 2 A1 →0,A2 →∞,B→∞ |a| A1 ≤|a|≤A2 |b|≤B (7) Firstly we prove that formula (7) is true for any (y, y , · · · , y (n) ) ∈ H1 . According to Riez’s lemma, we have da (k) (k)
y (x) − < y, ψa,b > a−k ψa,b (x)db
2 |a| A1 ≤|a|≤A2 |b|≤B da (k) (k) < y, ψa,b > a−k ψa,b db, g > | = sup | < y − 2 |a| g=1 A1 ≤|a|≤A2 |b|≤B da (k) ≤ sup | < y, ψa,b >< a−k ψa,b db, g > | 2 |a| g=1 A1 ≥|a| R da (k) + sup | < y, ψa,b >< a−k ψa,b db, g > | 2 |a| g=1 A2 ≤|a| R
112
Qu Han-zhang et al.
+ sup | g=1
A1 ≤|a|≤A2
da |a|2
(k)
|b|≥B
< y, ψa,b >< a−k ψa,b db, g > |
(8)
(according to f ormula (6) and the triangle inequation) According to the process of proving theorem 3.1 there is the following result. −k (k) < y, ψa,b >< a ψa,b , g > db = < y (k) , ψa,b >< ψa,b , g > db (9) R
R
Firstly we prove that the first expression in the end of formula (8) converges to zero as A1 → 0, A2 , B → ∞. da (k) sup | < y, ψa,b >< a−k ψa,b db, g > | 2 g=1 A1 ≥|a| |a| R da = sup | < y (k) , ψa,b >< ψa,b , g > db| 2 g=1 A1 ≥|a| |a| R (according f ormula(9)) ≤ sup
g=1
da |a|2
A1 ≥|a|
R
A1 ≥|a|
da |a|2
A1 ≥|a|
da |a|2
≤ sup | g=1
sup g=1
≤ sup | g=1
| < y (k) , ψa,b > || < ψa,b , g > |db
A1 ≥|a|
12
R
12
da |a|2
| < y (k) , ψa,b > |2 db
R
| < ψa,b , g > |2 db
12
R
| < y (k) , ψa,b > |2 db
(according to f ormula(2)) The integral converges to zero as A1 → 0 because its infinite integral converges. The second expression in formula(8) converges to zero as A2 → ∞. Its proving is analogous to the first. Finally we prove that the third expression in formula (8) converges to zero as B → ∞. Take da (k) < y, ψa,b >< aK ψa,b , g > db| M = sup | 2 g=1 A1 ≤|a|≤A2 |a| B≤|b| ≤ sup | g=1
A1 ≤|a|≤A2 and 1>|a|
da |a|2
B≤|b|
(k)
< y, ψa,b >< aK ψa,b , g > db|
An Application of Continuous Wavelet Transform in Differential Equations
+ sup | g=1
da |a|2
A1 ≤|a|≤A2 and 1≤|a|
113
(k)
B≤|b|
< y, ψa,b >< aK ψa,b , g > db|
= M1 + M2 Because for k = 0, 1, · · · , n, Supp(y (k) ) is compact, there is an N1 > 0 such that for k = 0, 1, · · · , n, Supp(y (k) ) ⊂ [−N1 , N1 . If |x| > n1 , y(x) = 0. If we take B > (L + N1 ), |x| ≤ N1 , and |a| < 1, then | x−b a | > L. That is, M1 = 0. da (k) M2 = sup | < y, ψa,b >< aK ψa,b , g > db| 2 g=1 A1 ≤|a|≤A2 and 1≤|a| |a| B≤|b| ≤ sup
g=1
da |a|2
A1 ≤|a|≤A2 and 1≤|a|
≤ sup
g=1
R
da |a|2
sup g=1
R
da |a|2
(k)
B≤|b|
| < y, ψa,b > || < ψa,b , g > |db
12
B≤|b|
| < y, ψa,b > |2 db
R
|<
(k) ψa,b , g
2
> | db
12
As B → ∞ the first expression in the above formula converges to zero because its infinite integral converges.The second is bounded. That is, the third expression in the formula (8) converges to zero as B → ∞. Formula is true for any (y, y , · · · , y (n) ) ∈ H1 . In order to provr that formula (7) is true for any (y, y , · · · , y (n) ) ∈ H 1 , we discuss the following bilinear forms. da (k) TA1 ,A2 ,B ((f, f , · · · , f (n) ), g) = < y, ψa,b >< aK ψa,b , g > db| 2 A1 ≤|a|≤A2 |a| B≤|b| T ((f, f , · · · , f (n) ), g) =
R
da |a|2
R
(k)
< y, ψa,b >< aK ψa,b , g > db|
(f, f , · · · , f (n) ) ∈ H1 , g ∈ L2 (R) According to the definition of the integral we have lim
A1 →0, A2 ,B→∞
TA1 ,A2 ,B ((f, f , · · · , f (n) ), g) = T ((f, f , · · · , f (n) ), g)
According to the properties of the bounded linear operators we can generalize TA1 ,A2 ,B , T from H1 × L2 (R) to H 1 × L2 (R). We fix g ∈ L2 (R). For any x ∈ H 1 , we have limA1 →0, A2 ,B→∞ TA1 ,A2 ,B (x, g) = T (x, g), that is, {TA1 ,A2 ,B (x, g); 0 < A1 < A2 < +∞, 0 < B < +∞} is bounded.
114
Qu Han-zhang et al. (n+1)
H 1 is a closed subspace of L2 (R) × · · · × L2 (R). According to the uniform bounded principle we have that { TA1 ,A2 ,B (g) ; 0 < A1 < A2 < +∞, 0 < B < +∞} is bounded. That is, there is a positive number sup{ TA1 ,A2 ,B (g) ; 0 < A1 < A2 < +∞, 0 < B < +∞} = K(g) such that for any 0 < A1 < A2 < +∞, 0 < B < +∞, TA1 ,A2 ,B (g) ≤ K(g). Because for any g ∈ L2 (R), { TA1 ,A2 ,B (g) ; 0 < A1 < A2 < +∞, 0 < B < +∞} is bounded and TA1 ,A2 ,B (g1 + g2 ) ≤ TA1 ,A2 ,B (g1 ) + TA1 ,A2 ,B (g2 ) ,
TA1 ,A2 ,B (αg) = |α| TA1 ,A2 ,B (g) , L2 (R) is a Hilbert space, according to the uniform bounded principle we have that { TA1 ,A2 ,B ; 0 < A1 < A2 < +∞, 0 < B < +∞} is bounded, that is, there is a positive number K such that sup { TA1 ,A2 ,B ; 0 < A1 < A2 < +∞, 0 < B < +∞} ≤ K. Secondly we prove that formula (7) is true for any (g0 , g1 , · · · , gn ) ∈ H 1 . For any . > 0, there is (y, y , · · · , Y (n) ) ∈ H1 such that n .
gl − y (l) ) < ( 8 l=0
Because for k = 0, 1, · · · , n,
y (k) − lim
A1 →0, A2 ,B→∞
A1 ≤|a|≤A2
da |a|2
(k)
B≤|b|
< y, ψa,b >< a−K ψa,b db = 0
we have δ1 > 0, N2 > 0, NB > 0 such that for k = 0, 1, · · · , n, if A1 < δ1 , A2 > N2 , B > NB , . da (k)
y (k) − < y, ψa,b >< a−K ψa,b db < 2 4 A1 ≤|a|≤A2 |a| B≤|b| If A1 < δ1 , A2 > N2 , B > NB , we have da (k)
g(k) − < g0 , ψa,b >< a−K ψa,b db
2 |a| A1 ≤|a|≤A2 B≤|b| da (k) = sup | < g(k) − < g0 , ψa,b >< a−K ψa,b db, g > | 2 g=1 A1 ≤|a|≤A2 |a| B≤|b| ≤ sup | < gk − y (k) , g > | + sup | < y g=1
(k)
−
A1 ≤|a|≤A2
+ sup g=1
A1 ≤|a|≤A2
< y (k) −
g=1
da |a|2
da |a|2
A1 ≤|a|≤A2
B≤|b|
(k)
< y, ψa,b >< a−K ψa,b db, g > | (k)
B≤|b|
< y − g0 , ψa,b >< a−K ψa,b db, g > |
≤ gk − y (k)
da (k) < y, ψa,b >< a−K ψa,b db
|a|2 B≤|b|
An Application of Continuous Wavelet Transform in Differential Equations
+K
n
y (l) − gl ≤
l=0
115
. . . + + <. 8 4 8
Formula (7) is true for any (g0 , g1 , · · · , gn ) ∈ H 1 . Because H ⊂ H 1 , for any (y, y , · · · , y (n) ) ∈ H,formula (7) is true. We complete the proof.
5
Example and Conclusion
We introduce the following example. Example 5.1 We consider the following differential equation. n
ai (x)y (i) = f (x)
(10)
i=0
{f (x), a0 (x), · · · , an (x)} ⊂ C[−π, π], y (i) (x); i = 0, 1, · · · , n ⊂ L2 (R)
(11)
Require the solution of differential equation (10) that satisfies formula (11). If x ∈ [−π, π], for any i = 0, 1, · · · , N , we define y (i) (x) = 0, ai (x) = 0, f (x) = 0. Then {f (x), a0 (x), · · · , an (x)} ⊂ L∞ (R), {f (x), a0 (x), · · · , an (x), y(x), · · · , y (n) (x)} ⊂ L2 (R). Take cosx x ∈ [−π, π] ψ(x) = 0 x ∈ [−π, π] 1 sin(η + 1)π sin(η − 1)π + ] ψ(η) = √ [ η+1 η−1 2π 0 < Cψ < +∞ According to the above results there is the following. (2π)−1
+∞
−∞
da |a|2
|a|π+x
−|a|π+x
n b−z b − x iπ −i dz] + ) db = f (x) y(z)cos ai (x)a cos( [ a a 2 −|a|π+b i=0
|a|π+b
(x ∈ [−π, π]) In order to solve equation (10) we only need to solve the above integral equation. We connect some differential equations with the integral equations by using the method of continuous wavelet transform. We obtain that formula (4) is equivalent to formula (4) not only in the weak topology but also in the strong toplogy.
116
Qu Han-zhang et al.
References 1. Ingrid Daubechies. Ten Lectures on Wavelets. Philadelphia. Pennsyvania: Society for Industrial and Applied Mathematics. 1992 2. Zheng Wei-xing, Wang Sheng-wang. Outline of real function and functional analysis. Academical Education Press, China. 1991 3. Song Guo-xiang. Numerical Analysis and Introduction to Wavelet. Science and Technology Press of Henan,China. 1993 4. Charles K.Chui. An Introduction to Wavelets. Academic Press. Inc. 1992
Stability of Biorthogonal Wavelet Bases in L2 (R) Paul F. Curran1 and Gary McDarby2 1
Department of Electronic and Electrical Engineering, University College Dublin Belfield, Dublin 4, Ireland
[email protected] 2 Medialab Europe, Crane St., Dublin 8, Ireland
[email protected]
Abstract. For stability of biorthogonal wavelet bases associated with finite filter banks, two related Lawton matrices must have a simple eigenvalue at one and all remaining eigenvalues of modulus less than one. If the filters are perturbed these eigenvalues must be re-calculated to determine the stability of the new bases – a numerically intensive task. We present a simpler stability criterion. Starting with stable biorthogonal wavelet bases we perturb the associated filters while ensuring that the new Lawton matrices continue to have an eigenvalue at one. We show that stability of the new biorthogonal wavelet bases first breaks down, not just when a second eigenvalue attains a modulus of one, but rather when this second eigenvalue actually equals one. Stability is therefore established by counting eigenvalues at one of finite matrices. The new criterion, in conjunction with the lifting scheme, provides an algorithm for the custom design of stable filter banks.
1
Introduction
In 1988 Daubechies [1] discovered a class of compactly supported orthonormal bases for L2 (R) which included the Haar basis as a special case. Mallat [2] established the relationship between wavelet transforms and multi-resolution analyses and showed that a discrete wavelet transform (relative to an orthonormal basis) can be implemented using orthogonal filter bank theory. Whereas orthogonality of the basis is a useful property in the analysis and synthesis of signals, it is not indispensable. In 1992 Cohen, Daubechies and Feauveau [3] introduced the idea of biorthogonal wavelet bases. In this case two distinct bases are employed, one for analysis and one for synthesis. The two bases are not necessarily orthogonal in their own right but are orthogonal to one another. Biorthogonal bases offer increased flexibility in the design of the associated filter bank enabling, for example, the construction of filter banks from linear phase filters. Cohen, Daubechies and Feauveau [3], Cohen and Daubechies [4] and Strang [5] provide necessary and sufficient conditions for a pair of dual filters to generate biorthogonal compactly supported wavelet bases in L2 (R). Sweldens [6] introduced the lifting scheme for designing biorthogonal filter banks. This scheme formally maintains biorthogonality but does not guarantee Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 117–128, 2001. c Springer-Verlag Berlin Heidelberg 2001
118
Paul F. Curran and Gary McDarby
that the filter bank has associated compactly supported wavelet bases in L2 (R). Whereas the lifting scheme in general contains many free parameters we reformulate it in terms of a single parameter. The principle contribution of the present work is the observation that for real, finite filters the single parameter dependent lifting scheme generates biorthogonal filter banks having associated wavelets in L2 (R) provided the parameter lies in an open interval containing zero. We present an algorithm for finding the largest interval of this kind. In conjunction with the lifting scheme, this algorithm provides a method for the custom design of biorthogonal filter banks with associated wavelet bases in L2 (R). While the method is cumbersome for large filters it has been found to be numerically tractable for filters having up to twenty taps. The resulting wavelet bases depend continuously upon the single parameter of the lifting scheme. In principle therefore it is possible, by employing a variety of different optimisation techniques, to select the value of this parameter such that the associated wavelet basis is optimal in some sense.
2
Lawton Matrices
Given h = [h−m , . . . , h−1 , h0 , h1 , . . . , hm ], a real filter of length (2m + 1), as usual we define the z-transform of the filter to be: H(z) =
m
hk z −k .
(1)
k=−m
We say that filter h is balanced if H(1) = 1. We define also an associated real sequence η as ηk = 2 q hq+k hq where the filter coefficients with indices outside range −m to m are defined to be zero. The real Lawton matrix [7] Λ associated with the filter is the (4m + 1) × (4m + 1) matrix: η2m 0 0 ··· 0 0 0 ··· 0 0 0 η2m−2 η2m−1 η2m · · · 0 0 0 · · · 0 0 0 .. .. .. .. .. .. .. .. .. . . . . . . . . . η2m η2m−1 η2m−2 · · · η1 η0 η1 · · · η2m−2 η2m−1 η2m . (2) 0 0 η2m · · · η3 η2 η1 · · · η2m−4 η2m−3 η2m−2 . .. .. .. .. .. .. .. .. .. . . . . . . . . 0 0 0 · · · 0 0 0 · · · η2m η2m−1 η2m−2 0 0 0 ··· 0 0 0 ··· 0 0 η2m We define a pair of dual real finite filters to be a set of two real balanced ˜ of length (2m + 1) and (2m filters (h, h) ˜ + 1) respectively such that
˜ eiθ H (eiθ ) + H ˜ ei(θ+π) H ei(θ+π) = 1 for all θ. H (3) Note that the overbar denotes complex conjugation. The following result is well known [4]:
Stability of Biorthogonal Wavelet Bases in L2 (R)
119
˜ generate biorthogStability Condition: A pair of dual real finite filters, (h, h), onal Riesz bases of compactly supported wavelets iff the Lawton matrix associated with each filter has a simple eigenvalue at one and all remaining eigenvalues have modulus less than one. ˜ generate biorthogonal Riesz Lemma 1. A pair of dual real finite filters, (h, h), bases of compactly supported wavelets only if the sum of the elements in every column of the Lawton matrix associated with each of the filters is one. We call this necessary condition on the Lawton matrix associated with a balanced real filter the column sum condition. It transpires that the column sum condition corresponds to a simple condition on the filter itself [8]: Lemma 2. The Lawton matrix associated with a real balanced filter h of length (2m + 1) satisfies the column sum condition iff H(−1) = 0.
3
Lawton Symmetry
Given a matrix A ∈ C N ×M with coefficients aij let matrix A ∈ C N ×M be defined by [A ]ij = a ¯N +1−i,M+1−j for all i, j. Subject to this definition a matrix A is said to be Lawton symmetric if A = A . It is not difficult to show that the Lawton matrix associated with a real filter of length (2m + 1) is real and Lawton symmetric. We observe also the following result: Lemma 3. A real, (2M + 1) × (2M + 1), following structure: A a L = bT c B a
Lawton symmetric matrix, L, has the B (b )T A
(4)
where A, B ∈ RM×M , a, b ∈ RM and c ∈ R. Employing this result and defining wT = [1, . . . , 1] ∈ R1×M , E ∈ RM×M such that Eij = δM+1−i,j (where δij denotes the Kronecker delta), we obtain the following: Lemma 4. The eigenvalues of a real, (2M + 1) × (2M + 1), Lawton symmetric matrix Λ satisfying the column sum condition may be classified as follows: 1. One of them equals 1. 2. A further M of them are the eigenvalues of the reduced order matrix (A−BE) with one of these being 12 . 3. The remaining M are eigenvalues of the reduced order matrix (A + BE − 2awT ).
120
Paul F. Curran and Gary McDarby
We call the eigenvalue at 1 the symmetric eigenvalue of type (1) and that at the skew-symmetric eigenvalue of type (1). The remaining M − 1 eigenvalues of the second class we call the skew-symmetric eigenvalues of type (2) and the eigenvalues of the third class we call the symmetric eigenvalues of type (2). The terminology is inspired by the readily established facts that symmetric eigenvalues have associated eigenvectors which are symmetric, i.e. have the following form: x1 x0 for some vector x1 ∈ C M×1 and x0 ∈ C (5) Ex1 1 2
and similarly that skew-symmetric eigenvalues have associated eigenvectors which are skew-symmetric, i.e. which have the following form: y1 0 for some vector y1 ∈ C M×1 . (6) −Ey1
4
Non-negativeness
Given any vector v ∈ R2M+1 , v will be said to be non-negative, denoted v ≥ 0, if:
Re V eiθ + Im V eiθ ≥ 0 ∀θ ∈ [0, 2π]
(7)
where V (z) denotes the z-transform of v as above. All subsequent references to non-negative vectors are understood to be in this sense. Kreˆın and Rutman [9] define a convex cone in a finite dimensional, real vector space to be a subset, C, of the vector space having the following properties: 1. 2. 3. 4.
If x ∈ C then αx ∈ C for all scalars α ≥ 0. If x, y ∈ C then x + y ∈ C. If x, y ∈ C then x + y = 0. C is closed relative to the standard Euclidean norm-topology on the vector space. Consider the set of all real, (2M + 1) × 1 non-negative vectors:
K = v ∈ R2M+1 |v ≥ 0 .
(8)
The set K has two properties that prove to be significant in the study of Lawton matrices: Lemma 5. K is a convex cone (in the sense of Kreˆın and Rutman). Lemma 6. K + (−K) = R2M+1 .
Stability of Biorthogonal Wavelet Bases in L2 (R)
121
The previous results permit a number of corollaries. Let L = {v ∈ R2M+1 |v = v }, i.e. L is the set of all real, Lawton symmetric, (2M + 1) × 1 vectors. Corollary 1. 1. 2. 3. 4.
L is a subspace of R2M+1 . A real Lawton symmetric matrix maps L into itself. K ∩ L is a convex cone in L. (K ∩ L) + (−K ∩ L) = L.
Let Z0 = v ∈ R2M+1 | [1, . . . , 1]v = 0, [M, . . . , 1, 0, −1, . . . , −M ]v = 0 .
Corollary 2. 1. Z0 is a subspace of R2M+1 . 2. A real Lawton symmetric matrix that satisfies the column sum condition maps Z0 into itself. 3. (K ∩ Z0 ) is a convex cone in Z0 . 4. (K ∩ Z0 ) + (−K ∩ Z0 ) = Z0 . Corollary 3. 1. Z0 ∩ L is a subspace of R2M+1 . 2. A real Lawton symmetric matrix that satisfies the column sum condition maps Z0 ∩ L into itself. 3. K ∩ Z0 ∩ L is a convex cone in Z0 ∩ L . 4. (K ∩ Z0 ∩ L) + (−K ∩ Z0 ∩ L) = Z0 ∩ L. One further property of non-negative, real vectors that will be required in our subsequent discussion of Lawton matrices may be stated as follows: Lemma 7. There exists no non-zero, real, skew-symmetric, non-negative vector in R2M+1 . Corresponding to the definition of non-negative, real vectors given above we now propose a definition of non-negative, real matrices. Given any matrix L ∈ R(2M+1)×(2M+1) we say that L is non-negative, denoted L ≥ 0, if Lv ≥ 0 for all v ≥ 0 in R2M+1 . A significant feature of Lawton matrices associated with real, finite filters is that they are non-negative. This observation is formally stated as follows: Lemma 8. The Lawton matrix, Λ, associated with a real filter of length (2m+1) is non-negative.
122
Paul F. Curran and Gary McDarby
In terms of the cones introduced previously lemma 8 asserts that a Lawton matrix associated with a real, finite filter of length (2m + 1) defines a linear operator on real vector space R4m+1 which maps the convex cone K (with M = 2m) into itself. There exist some elementary, but important, corollaries to this result: Corollary 4. By restriction, a Lawton matrix associated with a real, finite filter defines linear operators on real vector spaces L, Z0 , Z0 ∩L, which map the convex cones (K ∩L), (K ∩Z0 ), (K ∩Z0 ∩L) (with M = 2m) respectively into themselves.
5
Generalised Frobenius-Perron Theory
In their celebrated treatise, Kreˆın and Rutman [9] present a generalisation of the classical Frobenius-Perron theorem which we may paraphrase as follows: Theorem 1. Let C be a convex cone with non-null interior in a real, finitedimensional vector space; if a linear mapping Q maps C into itself and is not nilpotent, then there is a real, positive eigenvalue λC of Q with an associated eigenvector lying in C, having the property that no other eigenvalue of Q has modulus exceeding λC . By employing the results of section 4 together with theorem 1, we may make a number of assertions concerning Lawton matrices associated with real, finite filters. Lemma 9. Let Λ be a Lawton matrix associated with a real, finite, balanced filter which satisfies the column sum condition, then there exists a real, positive eigenvalue, L, of Λ such that: (i) all remaining eigenvalues of Λ have modulus less than or equal to L, (ii) there exists a real, non-negative eigenvector, v(L) , associated with L. Lemma 10. Let Λ be a Lawton matrix associated with a real, finite, balanced filter which satisfies the column sum condition, then there exists a real, positive symmetric eigenvalue, S, of Λ such that: (i) all remaining symmetric eigenvalues of Λ have modulus less than or equal to S, (ii) there exists a real, non-negative eigenvector, v(S) , associated with S. Lemma 11. Let Λ be a Lawton matrix associated with a real, finite, balanced filter which satisfies the column sum condition, then there exists a real, positive eigenvalue, ρ, of Λ which is either symmetric of type (2) or skew-symmetric of type (2) such that
Stability of Biorthogonal Wavelet Bases in L2 (R)
123
(i) all remaining symmetric and skew-symmetric eigenvalues of type (2) of Λ have modulus less than or equal to ρ, (ii) there exists a real, non-negative eigenvector, v(ρ) , associated with ρ. Lemma 12. Let Λ be a Lawton matrix associated with a real, finite, balanced filter which satisfies the column sum condition, then there exists a real, positive eigenvalue, σ, of Λ which is symmetric of type (2) such that: (i) all remaining symmetric eigenvalues of type (2) of Λ have modulus less than or equal to σ, (ii) there exists a non-negative eigenvector, v(σ) , associated with σ. Lemma 7 permits us to make a number of observations concerning the eigenvalues L, S, ρ and σ of lemmas 9-12. The proof of these observations is included to indicate the utility of lemma 7. Lemma 13. Eigenvalue L is symmetric and equals eigenvalue S. Eigenvalue ρ is symmetric of type (2) and equals eigenvalue σ. Proof. Lemma 9 assures that v(L) is real, non-zero and non-negative. By lemma 7 this vector cannot, therefore, be skew-symmetric. Hence eigenvalue L cannot be skew-symmetric and must, therefore, be symmetric. It is now trivial to show that L = S. Lemma 11 assures that v(ρ) is real, non-zero and non-negative. As above, lemma 7 asserts that this vector cannot be skew-symmetric and, therefore, that eigenvalue ρ cannot be skew-symmetric. Hence ρ must be symmetric of type (2) and it is now trivial to show that ρ = σ. Note that the eigenvalue σ, of lemma 12, is uniquely defined by the Lawton matrix (and hence by the real filter associated with it). We are finally in a position to state and prove the primary result of this investigation: Theorem 2. The Lawton matrix associated with a real, finite, balanced filter satisfying H(−1) = 0 has a simple eigenvalue at one and all remaining eigenvalues have modulus less than one iff the particular eigenvalue σ is less than 1. Proof. The conditions imposed imply that the associated Lawton matrix is real and satisfies the column sum condition. Hence, the division of eigenvalues into symmetric eigenvalues of types (1) and (2) and skew-symmetric eigenvalues of types (1) and (2) is valid. If σ is greater than 1 then the Lawton matrix has a real, symmetric eigenvalue of type (2) greater than 1. It follows that the Lawton matrix does not satisfy the eigenvalue condition stated in the theorem. If σ is equal to 1 then the Lawton matrix has a real, symmetric eigenvalue of type (2) equal to 1. Of course it also has a real, symmetric eigenvalue of type
124
Paul F. Curran and Gary McDarby
(1) equal to 1. Hence the matrix has an eigenvalue at 1 of algebraic multiplicity greater than or equal to 2. It follows the Lawton matrix does not satisfy the eigenvalue condition. If σ is less than 1 then, by lemma 12, all of the symmetric eigenvalues of type (2) of the Lawton matrix have modulus less than or equal to σ, i.e. less than 1. By lemma 13, ρ = σ , hence, by lemma 11, the skew-symmetric eigenvalues of type (2) of the associated Lawton matrix also have modulus less than 1. The skew-symmetric eigenvalue of type (1) equals 12 and clearly has modulus less than 1. Of course the symmetric eigenvalue of type (1) equals 1. Hence the Lawton matrix satisfies the eigenvalue condition. Note: the advantage of theorem 2 is that it permits us to test whether a Lawton matrix has a simple eigenvalue at one and all other eigenvalues of modulus less than one, not by checking all of the eigenvalues, but rather by testing a single eigenvalue σ which is known to be real, non-negative and symmetric of type (2). These known properties of σ significantly simplify the numerical task of finding this eigenvalue.
6
The Lifting Scheme
We outline a single parameter form of the lifting scheme as follows: ˜ i.e. Theorem 3. Take any initial set of real finite, balanced dual filters {h, h}, filters satisfying the biorthogonal constraint (3). Assume that these filters generate biorthogonal Riesz bases of compactly supported wavelets. Define companion filters g and g˜ as follows:
˜ ei(θ+π) , G ˜ eiθ = e−iθ H ei(θ+π) G eiθ = e−iθ H
(9)
then a new set of finite balanced filters {h, ˜hnew }, together with their companion filters {g, g˜new } , are generated as follows:
˜ new eiθ = H ˜ eiθ + τ G ˜ eiθ S (ei2θ ) H
Gnew eiθ = G eiθ − τ H eiθ S ei2θ
(10) iθ
where S e is a real trigonometric polynomial and τ is a real parameter. These new filters also satisfy the biorthogonal constraint, i.e. are dual. The question arises as to whether, for a given real trigonometric polynomial S and real parameter τ the dual filters {h, ˜hnew } generate biorthogonal Riesz bases of compactly supported wavelets. A simple necessary condition [8] is stated as follows: Lemma 14. The dual filters {h, ˜hnew } generate biorthogonal Riesz bases of compactly supported wavelets only if S(1) = 0.
Stability of Biorthogonal Wavelet Bases in L2 (R)
125
The principle contribution of the present work (theorem 2) leads directly to the following result: ˜ new } generate biorthogoTheorem 4. Assuming S(1) = 0 the dual filters {h, h nal Riesz bases of compactly supported wavelets for all real τ in an open interval containing 0. Moreover, this interval is characterised by the facts that it is maximal and that at any boundary points, but at no interior points, the Lawton matrix associated with ˜ hnew has a symmetric eigenvalue of type (2) equal to 1. By reference to lemmas 3 and 4 it is clear that the Lawton matrix associated ˜ new has a symmetric eigenvalue of type (2) equal to 1 iff det(I − (A + with h BE − 2awT )) = 0 where I is the identity matrix. It is elementary to show that the coefficients of the matrix (I − (A + BE − 2awT ) are quadratic polynomials in the variable τ . Consequently the evaluation of values of τ for which this determinant equals zero is a special case of the well-known quadratic eigenvalue problem [10]. By means of the standard method of linearisation [10] this problem may in general be converted to the problem of determining the eigenvalues of a matrix of twice the dimension. Specifically let (I − (A + BE − 2awT ) = I − C0 − τ C1 − τ 2 C2
(11)
for suitable constant, real matrices C0 , C1 , C2 . Then, assuming (I − C0 ) is non-singular, det(I − (A + BE − 2awT )) = 0 for non-zero parameter value τ iff (1/τ ) is an eigenvalue of the higher order matrix 0 I Q= . (12) C2 (I − C0 )−1 C1 (I − C0 )−1 Employing these observations yields a corollary to theorem 4 comprising a more readily tested stability condition. Corollary 5. If S(1) = 0 and if (I − C0 ) is non-singular, dual filters {h, ˜hnew } generate biorthogonal Riesz bases of compactly supported wavelets for all real τ in an open interval containing 0. Moreover, if they exist, the upper bound of this interval equals the reciprocal of the real, positive eigenvalue of Q of greatest modulus and the lower bound of this interval equals the reciprocal of the real, negative eigenvalue of Q of largest modulus. Although corollary 5 calls for inversion of matrix (I − C0 ) and determination of eigenvalues of the potentially large matrix Q, numerical implemenation is facilitated by two observations: (i) (I − C0 ) is in general highly structured so that its inversion requires relatively little numerical effort, (ii) one does not seek all eigenvalues of matrix Q, but rather the largest real positive and largest real negative eigenvalues only.
7
Example
˜ and their To initialise the lifting scheme select the Haar filters h = 0, 12 , 12 = h 1 1 ˜ satisfy companion filters g = 0, − 2 , 2 = g˜ . It is readily shown that filters h, h
126
Paul F. Curran and Gary McDarby
˜ are real, finite and balanced the biorthogonal constraint (3). Note that filters h, h ˜ and that H(−1) = H(−1) = 0. They comprise a dual real finite pair of filters. The Lawton matrix associated with both filters is: 00000 1 12 0 0 0 0 12 1 12 0 . (13) 0 0 0 12 1 00000 It satisfies the column sum condition and has eigenvalues 0, 0, 1, 12 , 12 . One eigenvalue is 1. It is simple and strictly exceeds all other eigenvalues in modulus. It follows from [4] that filters {h, ˜h} generate biorthogonal Riesz bases of compactly supported wavelets. We apply the single parameter form of the lifting scheme using the fixed real trigonometric polynomial:
S eiθ = −eiθ + e−iθ
(14)
which clearly satisfies S(1) = 0. The new filters become: 1 1 1 1 , g˜ = 0, 0, 0, − , , 0, 0 h = 0, 0, 0, , , 0, 0 2 2 2 2 τ 1 1 τ τ τ τ 1 1 τ τ τ new = 0, − , , , , , − = 0, , , − , , − , − , g . (15) 2 2 2 2 2 2 2 2 2 2 2 2
˜ new h
˜ new is, of course, Lawton symThe Lawton matrix associated with filter h metric. As this matrix is 13 × 13 we elect not to write it out in full. However, by comparing with the canonical structure of lemma 3 we can identify the submatrices:
0
−τ 2 0 A= 1+2τ 2 0 −τ
2
B=
0 τ2 2 2 −τ + τ2 2 1 2 +τ −τ 2 1 2 +τ −τ τ2 −τ + 2
0 0 0 0 τ2 2
0 0 −τ
0 0
0 0 0
−τ 2
τ2 2
τ2 2
2
0
0 0 0 2
−τ + τ2
0
2 1 2 +τ −τ 2 1 2 +τ −τ
2 −τ + τ2 2 1 1+2τ 2 +τ −τ 2
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0 0
1+2τ 2
(−τ + τ22 ) −τ 2
τ2 2
0 0 0 0 0 0
0
0 0 0 0 0 0
, , a= −τ 2 0
, b=
c = 1 + 2τ 2 .
(16)
0 τ2 2
−τ
0 0 0 0
2 2
−τ + τ2
0
,
(17)
1 τ2 2 +τ − 2
(18)
Stability of Biorthogonal Wavelet Bases in L2 (R)
127
The symmetric eigenvalues of type (2) are the eigenvalues of the reduced order matrix (A + BE − 2awT ) = 0 0 0 0 0 0 τ2 −τ 2 0 0 0 0 2 τ2 0 −τ + τ22 −τ 2 0 0 2 . (19) τ2 1+2τ 2 12 +τ −τ 2 0 −τ + τ22 −τ 2 2 2 2 2 2 2 2 2τ 12 +τ −τ 1+4τ 21 +τ +τ 2τ −τ +3τ 2 −τ 2 −τ + τ2 0 12 +τ − τ22 1+τ 2 21 − τ22 The matrix has a symmetric eigenvalue of type (2) equal to 1 iff det(I − (A + BE − 2awT )) = 0 where I is the identity matrix. With reference to (11) we note that in the present case 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 . (20) I − C0 = 1 −1 − 2 0 1 0 0 1 1 0 − 2 −1 − 2 1 0 0 0 0 − 12 −1 12 Not only is this matrix non-singular, it is also lower triangular and therefore readily invertible. In all cases examined by the authors the matrix (I − C0 ) turns out, not only to be non-singular, but also to be sparse and readily invertible. It is therefore feasible to construct matrix Q of (12) and to determine its eigenvalues. In the present case, however, we do not actually need to employ linearisation. It is feasible to apply theorem 4 directly since det(I −(A+BE −2awT )) is readily shown to equal the polynomial in τ :
1 τ2 1 + τ 2 1 + 2τ − 8τ 2 (1 + τ ) (21) 1− 2 2 √ whose roots are: ± 2, ±i, −1, − 41 , 12 . The maximal real open interval containing 0 with boundary points, but no interior points, in this set is given by − 14 < τ < 12 . Hence, for any value of τ between − 14 and 12 the resulting filters {h, ˜ hnew } generate biorthogonal Riesz bases of compactly supported wavelets.
8
Conclusions
We have formulated a single parameter form of the lifting scheme. We have shown that the scheme generates biorthogonal filter banks having associated wavelets in L2 (R) provided the parameter lies in a certain open interval and have developed a method for finding the largest such interval. Numerically this method is equivalent to a special case of the quadratic eigenvalue problem. For low order filters the method of linearisation is in general appropriate. A single matrix inversion is required in the application of the linearisation method which reduces the problem to a standard eigenvalue problem (or rather to the problem
128
Paul F. Curran and Gary McDarby
of finding the largest and smallest non-zero, real eigenvalues of a matrix). It transpires that the matrix inversion often requires relatively little numerical effort. For high order filters more advanced techniques for solving the quadratic eigenvalue problem (e.g. the Jacobi-Davidson method) would be required. We note that the parameterised lifting scheme, in conjunction with this method, yields a class of biorthogonal filter banks with associated wavelet bases in L2 (R) and that this class is itself parameterised. Clearly one may employ a stochastic algorithm to determine the filter bank in this parameterised class which is optimal with respect to some desirable property (such as maximum energy compaction, desired shape, etc.).
References 1. I. Daubechies, I.: Orthonormal Bases of Compactly Supported Wavelets. Comm. Pure Applied Math. 41 (1988) 909–996 117 2. Mallat, S.: A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Transaction on Pattern Analysis and Machine Intelligence 11 (1989) 674–693 117 3. Cohen, A., Daubechies, I., Feauveau, J. C.: Bi-orthogonal Bases of Compactly Supported Wavelets. Comm. Pure Applied Math. 45 (1992) 485–560 117 4. Cohen, A., Daubechies, I.: A Stability-Criterion for Biorthogonal Wavelet Bases and their Related Subband Coding Scheme. Duke Mathematical Journal 86 (1992) 313–335 117, 118, 126 5. Strang, G.: Eigenvalues of (↓ 2)H and convergence of the cascade algorithm. IEEE transactions on signal processing 44 (1996) 233–238 117 6. Sweldens, W.: The Lifting Scheme: A Custom-Design Construction of Biorthogonal Wavelets. Appl. Comput. Harmon. Analysis 3 (1996) 186–200 117 7. Lawton, W. M.: Necessary and Sufficient Conditions for Constructing Orthonormal Wavelet Bases. Journal Math. Phys. 32 (1991) 57–61 118 8. McDarby, G., Curran, P., Heneghan, C., Celler, B.: Necessary Conditions on the Lifting Scheme for Existence of Wavelets in L2 (R). ICASSP, Istanbul, (2000) 119, 124 9. Kreˆın, M. G., Rutman, M. A.: Linear Operators Leaving Invariant a Cone in a Banach Space. Functional Analysis and Measure Theory. American Mathematical Society, Providence R. I., 10 Translation Series 1 (1962) 199–325 120, 122 10. Gohberg, I., Lancaster, P., Rodman, L.: Matrix Polynomials. Academic Press, New York, (1982) 125
Characterization of Dirac Edge with New Wavelet Transform Lihua Yang1 , Xinge You2 , Robert M. Haralick3, Ihsin T. Phillips4 , and Yuan Y. Tang2 1
Department of Mathematics, Zhongshan University Guangzhou 510275, P. R. China 2 Department of Computer Science, Hong Kong Baptist University Kowloon Tong, Hong Kong {yytang,xyou}@comp.hkbu.edu.hk 3 Department of Computer Science, Graduate Center, City University of New York 365 Fifth Ave., New York, NY 10016, USA
[email protected] 4 Department of Computer Science, Queens College, City University of New York 65-30 Kissena Blvd., Flushing, NY 11367 USA
[email protected]
Abstract. This paper aims at studying the characterization of Diracstructure edges with a novel wavelet transform, and selecting the suitable wavelet functions to detect them. Three significant characteristics of the local maximum modulus of the wavelet transform with respect to the Dirac-structure edges are presented. By utilizing a novel continuous wavelet, it is proven that the local maxima modulus of such continuous wavelet transform of a Dirac-structure edge forms two new curves which are located symmetrically at the two sides of the original one and have the same direction with it and the distance between the two curves is estimated. An algorithm to detect curves in an image by utilizing the above invariants is developed. Several experiments are conducted, and positive results are obtained.
1
Introduction
In our previous paper [7], we presented a novel method based on the quadratic spline wavelet, to identify different structures of edges, and thereafter, to extract the Dirac-structure ones. Furthermore, a very important characterization of the Dirac-structure edges by wavelet transform was provided. Three significant characteristics of the local maximum modulus of the wavelet transform with respect to the Dirac-structure edges were presented, namely: (1) slope invariant: the local maximum modulus of the wavelet transform of a Dirac-structure edge is independent on the slope of the edge. (2) grey-level invariant: the local maximum modulus of the wavelet transform with respect to a Dirac-structure edge takes place at the same points when the images with different grey-levels are to be processed. (3) width light-dependent: for various widths of the Dirac-structure Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 129–138, 2001. c Springer-Verlag Berlin Heidelberg 2001
130
Lihua Yang et al.
edge images, the location of maximum modulus of the wavelet transform varies slightly when the scale s of the wavelet transform is larger than the width d of the Dirac-structure edge images. Based on the characteristics, a novel algorithm to detect the Dirac-structure edges from an image has been developed. Some examples of applying this algorithm to detect the Dirac-structure edge can be found in [7]. However, there are some weaknesses in the method. This paper proposes a great improvement of that work. Noted the foregoing third property in [7], it says “width light-dependent”, does not say “width invariant”. This means that for various widths of the Diracstructure edge images, the location of maximum modulus of the wavelet transform may change. These changes are small. What we want is that the location of maximum modulus does not change, i.e. the location of maximum modulus has the property of width invariant. Let us look at Fig. 1. The first row of Fig. 1 has three circles. The left image is the original one which contains a circle with various width. The middle one is the location of maximum modulus of the wavelet transform with scale s = 6, which depends on the width of the circle in some way. Finally, by utilizing the algorithm proposed in [7], the central line of the circle is extracted and displayed on the right of Fig. 1. We can find that the central line of the circle is broken. The second row of Fig. 1 has trees, where the sizes of the branches vary, some are thick and some are thin. The left image is the original one, and the right of Fig. 1 is the central line extracted utilizing the algorithm proposed in [7]. It is easy to see that some branches of the tree are lost. To overcome such a defect, In this paper, a novel wavelet is utilized, so
Fig. 1. Left: the original images; Middle: the location of maximum modulus of the wavelet transform with s = 6; Right: the central line images extracted by the algorithm of [7]
that the above “width light-dependent” properties can be improved to “width invariant” without losing the “slope invariant” and “grey-level invariant”. Due to this improvement, the detection of curves is more accurate. This paper is organized as follows: Section 2 will be a brief review of the scale wavelet transform followed by the construction of a special wavelet. In Section 3, a characterization of the Dirac-structure edges in an image by wavelet transform
Characterization of Dirac Edge with New Wavelet Transform
131
will be developed. Then, In Section 4, an algorithm to extract the central line of a curve will be presented, and several experiments will be illustrated. At last, some conclusions will be provided in Section 5.
2
Continuous Wavelet Transform with New Wavelet Function
Let L2 (R2 ) be the Hilbert space of all the square-integrable 2-D functions on plane R2 , ψ ∈ L2 (R2 ) is called a wavelet function, if ψ(x, y)dxdy = 0, (1) R
R
For f ∈ L2 (R2 ) and scale s > 0, the scale wavelet transform of f (x, y) is defined by Ws f (x, y) := (f ∗ ψs )(x, y) 1 x−u y−v , )dudv, = f (u, v) 2 ψ( s s s R R
(2)
where * denotes the convolution operator, and ψs (u, v) := s12 ψ( us , vs ). Obviously, the scale wavelet transform described in Eq. (2) is a filter, and since ψ ∈ L2 (R2 ), −i(ξx+ηy) ˆ η) := its Fourier transform can be defined by ψ(ξ, dxdy R R ψ(x, y)e 2 2 ˆ which satisfies the condition of ψ ∈ L (R ). Thus, both functions ψ and ψˆ decrease at infinity. For a general theory of the scale wavelet transform, it can be found in [2,3]. However, the wavelet transform differs from Fourier transform. There is only one basic function in the latter, while there exist many different wavelet functions in the former. Therefore, it is very important to select the one that is as “good” as possible according to its particular applications. ˆ 0) = 0, implies that ψ(x, y) is a band-pass Theoretically, Eq. (1), i.e. ψ(0, filter, but a high-pass one because of the decrease of its Fourier transform at infinity. It is easy to see that the partial derivatives of a low-pass function can become the candidates of the wavelet functions. In this paper, we consider such ∂ ∂ θ(x, y) , ψ 2 (x, y) := ∂y θ(x, y) where θ(u, v) kind of wavelets, i.e., ψ 1 (x, y) := ∂x denotes a real function satisfying: 1)θ(u, v) fast decreases at infinity; 2)θ(u, v) is ˆ 0) = 1. an even function on both u and v; 3)θ(0, 1 For wavelet ψ (x, y) defined above, its scale wavelet transform is Ws1 f (x, y) = s
∂ (f ∗ θs )(x, y) ∂x
where θs (x, y) := s12 θ( xs , ys ). This formula is equivalent to the classical multi-scale edge detection [1,5], if θ(x, y) is set to be a Gaussian. A similar explanation for wavelet ψ 2 (x, y) defined
132
Lihua Yang et al.
above can be made. However, the partial derivative is along the vertical direction instead of the horizontal one. Gaussian function has been employing in image processing. It possesses some excellent properties, such as, the locality in both the time domain and frequency domain, the same widths in both the time-window and frequency-window, and so on. All these properties make it applied extensively and deeply in the area of the filtering, and it already almost becomes the best candidate of low-pass filter in practice. Unfortunately, Gaussian function is not always the best one for all applications. In fact, we have shown that it is not the best candidate for characterizing a Dirac-structure edge [7]. Even the quadratic spline wavelet is better than it, although, the quadratic spline wavelet is still not a perfect one for such applications. In [7] it has been proved that the location of maximum modulus of the wavelet transform with respect to a Dirac-structure edge is not width invariant. It still depends on the width of the edge even though it depends lightly. To avoid such dissatisfaction, a novel wavelet is constructed and used in this paper, and its definition is described below. Let √ √ 2 1+ 1−16x2 1 ψ (x) = − (−8x ln + 2x 1 − 16x2 ) 1 π 4x √ √ 2 3 ψ2 (x) = − π2 (8x ln 3+ 9−16x − 2x 9 − 16x2 ) 4x √ √ 2 ψ3 (x) = − π2 (−4x ln 1+ x1−x + x4 1 − x2 ) Then, the 1-D wavelet ψ(x) is an odd function defined on (0, ∞) by ψ1 (x) + ψ2 (x) + ψ3 (x) x ∈ (0, 14 ) ψ2 (x) + ψ3 (x) x ∈ [ 14 , 34 ) ψ(x) := (3) x ∈ [ 34 , 1) ψ3 (x) 0 x ∈ [1, ∞) x Let φ(x) := 0 ψ(x)dx. Then φ(x) is an even function, compactly supported on [-1, 1], and φ (x) = ψ(x). The smoothness function θ(x, y) is then defined by θ(x, y) := φ( x2 + y 2 ), and the 2-D wavelets are defined by ∂ θ(x, y) = φ ( x2 + y 2 ) √ 2x 2 ψ 1 (x, y) := ∂x x +y (4) ∂ ψ 2 (x, y) := ∂y θ(x, y) = φ ( x2 + y 2 ) √ 2y 2 . x +y
and are illustrated in Fig. 2. The gradient direction and the amplitude of the wavelet transform are denoted respectively by 1 Ws f (x, y) ∇Ws f (x, y) := , (5) Ws2 f (x, y) and |∇Ws f (x, y)| :=
|Ws1 f (x, y)|2 + |Ws2 f (x, y)|2 .
(6)
Characterization of Dirac Edge with New Wavelet Transform
133
0.6 0.4 0.6
0.2 0.4
0
0.2
−0.2
0
−0.4
−0.2 1
−0.4 0.5
1 0.5
1 0.5
0 0
−0.5
−0.5 −1
−1
0
−1 −0.5 −0.5
0 0.5 1
−1
Fig. 2. The graphical descriptions of 2-D wavelet functions: left - function ψ 1 (x, y);
right - function ψ 2 (x, y)
By locating the local maxima of |∇Ws f (x, y)|, we can detect the edges of the images.
3
Characterization of Curves through New Wavelet Transform
In this section, three significant characteristics of the local maximum modulus of the wavelet transform with respect to the Dirac-structure edges in images will be presented, namely: – Grey-level invariant: the local maximum modulus of the wavelet transform with respect to a Dirac-structure edge takes place at the same points when the images with different grey-levels are to be processed. – Slope invariant: the local maximum modulus of the wavelet transform of a Dirac-structure edge is independent on the slope of the edge. – Width invariant: for various widths of the Dirac-structure edges in an image, the location of maximum modulus of the wavelet transform does not vary under certain circumstance. The proof of the above characteristics may be obtained similarly to our previous work [7]. However, it concluded mathematically that the amplitude of of the wavelet transform |∇Ws f (x, y)| reaches the local maximum if and only if the scale s ≥ d. Namely, the local maxima of |∇Ws fld (xρ , yρ )| arrive at both sides of the central line l of ld and the distance from l is 2s , which is independent on the width d. In summary, The above three invariance properties can be rewritten as the following theorem: Theorem 1. Let ld be a Dirac-structure edge with width d and l be its central line. The local maxima modulus of the wavelet transform corresponding to the wavelets of Eq. (4) forms two new lines which are located symmetrically on both sides of the central line, and have the same direction with it. If scale s ≥ d, then the distance between the two new ones equals to s.
134
Lihua Yang et al.
200 100 200
0 0
100 0
10
60
0 50
20
10 40
20 30
30
20
0 10
30
40
20
40
30 50
10
40
50 0
50 60
60
60
Fig. 3. Modulus of wavelet transforms corresponding a segment of straight line and a curve
This theorem describes the property of width-invariance, which is important. It improves our former results in [7]. Namely, for each scale s, the local maximum moduli of the wavelet transforms with respect to the curves of different widths are located at the same positions. A couple of graphical examples are shown in Fig. 3.
4
Algorithm and Experiments
In this section, the algorithm for extracting the Dirac-structure edges will be presented. Several experiments will also be conducted. 4.1
Algorithm
In practice, the wavelet transform should be calculated discretely. We have the following formula: i Ws f (n, m) = f (u, v)ψsi (n − u, m − v)dudv =
f (k, l)
k,l
=
k,l
s,i where ψk,l =
k+1 l+1 k
l
k+1
k
l+1
l
ψsi (n − u, m − v)dudv
s,i f (n − k − 1, m − l − 1)ψk,l ,
ψsi (u, v)dudv =
(k+1)/s (l+1)/s k/s
l/s
(i = 1, 2), ψ i (u, v)dudv,
1, 2). Next, we give the calculating formulae of the coefficients can be found in [4,6,7]. It is deduced easily that s,1 s,2 ψk,l = ψk,l ,
s,1 s,1 ψ−k,l = −ψk−1,l
s,1 s,1 ψk,−l = ψk,l−1 ,
s,i {ψk,l }.
(i =
The details
s,1 s,1 ψ−k,−l = −ψk−1,l−1 .
Characterization of Dirac Edge with New Wavelet Transform
135
s,1 Through further calculating, we have ψk,l = φsl,k+1 + φsl+1,k − φsl+1,k+1 − φsl,k , for 1 all non-negative integers k and l, φsk,l = k √k2 +l2 sl − v 2 − (l/s)2 ψ(v)dv. s On the other hand, it is easy to see that φsk,l = 0 for all integers k, l satisfying k 2 + l2 ≥ s2 due to the compact support [−1, 1] of φ(x). we can calculate
Table 1. The nonzero coefficients {φsk,l } for s = 2 l\k k=0 k=1
l=0 0.2500 0.0497
l=1 0.1250 0.0111
all the coefficients φsk,l numerically for non-negative integers k, l. The possible nonzero items of φsk,l for s = 2, 4, 6, 8 are listed in Tables 1 - 4. Based on the
Table 2. The nonzero coefficients {φsk,l } for s = 4 k k k k
k\l =0 =1 =2 =3
l=0 0.2500 0.1468 0.0497 0.0047
l=1 0.2292 0.1206 0.0366 0.0026
l=2 0.1250 0.0552 0.0111 0.0002
l=3 0.0208 0.0060 0.0003 0
characterization of a straight line in an image developed in Section 3, an algorithm to detect straight lines in an image can be designed. The result is also valid for general curves since a short segment of a curve can be regarded as a straight line approximately. In fact, wavelet transforms are essentially local analysis. Therefore the result of Theorem 1 can be applied to the general curves in an image. Our algorithm to detect curves in an image is designed as follows.
Table 3. The nonzero coefficients {φsk,l } for s = 6 k\l k=0 k=1 k=2 k=3 k=4 k=5
l=0 0.2500 0.1831 0.1106 0.0497 0.0133 0.0011
l=1 0.2438 0.1718 0.1003 0.0436 0.0109 0.0008
l=2 0.2022 0.1333 0.0723 0.0281 0.0056 0.0002
l=3 0.1250 0.0767 0.0367 0.0111 0.0014 0.0000
l=4 0.0478 0.0257 0.0094 0.0017 0.0000 0
l=5 0.0062 0.0025 0.0005 0.0000 0 0
136
Lihua Yang et al.
Table 4. The nonzero coefficients {φsk,l } for s = 8 k\l k=0 k=1 k=2 k=3 k=4 k=5 k=6 k=7
l=0 0.2500 0.2006 0.1468 0.0935 0.0497 0.0199 0.0047 0.0004
l=1 0.2474 0.1950 0.1403 0.0882 0.0462 0.0180 0.0041 0.0003
l=2 0.2292 0.1741 0.1206 0.0733 0.0366 0.0132 0.0026 0.0001
l=3 0.1849 0.1358 0.0902 0.0517 0.0236 0.0072 0.0011 0.0000
l=4 0.1250 0.0884 0.0552 0.0287 0.0111 0.0026 0.0002 0
l=5 0.0651 0.0433 0.0244 0.0107 0.0032 0.0004 0.0000 0
l=6 0.0208 0.0126 0.0060 0.0020 0.0003 0.0000 0 0
l=7 0.0026 0.0013 0.0004 0.0000 0 0 0 0
Algorithm 1 Let f (x, y) be an image containing curves. For a scale s > 0, Step 1 Calculate all the wavelet transforms {Ws1 f (x, y), Ws2 f (x, y)} with respect to the wavelets defined by Eq.(4). Step 2 Calculate the local maxima flocmax of |∇Ws f (x, y)| and the gradient direction fgradient . Step 3 For each point (x, y) with local maximum, search the point whose distance along the gradient direction from (x, y) is s. If it is a point of local maxima, the center point is detected. Step 4 The curves formed by all the points detected in Step 3 are what we need. 4.2
Experiments
Let us turn back to Section 1, and look at Fig. 1. The particular task is that we are required to extract the central line of the circle with various widths. Unfortunately, as we have shown in Section 1, the algorithm based on the spline wavelet in [7] can not work well due to the width dependence of the detection. Fortunately, as described in detail in Section 3, the new method developed in this paper possesses the width invariant, grey-level invariant as well as slope invariant According to these properties, the central line of the circle and tree in Fig. 1 can be extracted. After applying Steps 1 and 2 of the above algorithm to the original image as displayed on the left column of Fig. 1, the local maximum modulus of the wavelet transform with respect to them can be computed and presented on the middle column in Fig. 1. At last, the central lines are extracted using Steps 3 and 4 of the above algorithm, and presented on the right column in Fig. 1. Next, some interesting examples are shown. In Fig. 5, the left image consist of a face with various widths. By carrying out the algorithm of this paper, the central line is extracted, which is shown graphically on the right in Fig. 5. For the Chinese character ”peace”, the original image, the maximum modulus image of the wavelet transform corresponding to s = 2 and the central line extracted by the proposed algorithm are shown respectively from the left to right in Fig. 6.
Characterization of Dirac Edge with New Wavelet Transform
137
Fig. 4. Left: the original image; Middle: the location of maximum modulus of the wavelet transform corresponding to s = 6; Right: the central line extracted by the algorithm in this paper
Fig. 5. Left: the original image; Middle: the maximum modulus image of the wavelet transform corresponding to s = 6; Right: the central line extracted by the proposed algorithm
Fig. 6. Left: the original image; Middle: the location of maximum modulus of the wavelet transform corresponding to s = 2; Right: the central line extracted by the proposed algorithm
138
5
Lihua Yang et al.
Conclusions
We have improved our previous work [7] in this paper. By utilizing a novel wavelet, we have shown three significant characteristics of the local maximum modulus of the wavelet transform with respect to the the Dirac-structure edges, namely: – Slope invariant; the local maximum modulus of the wavelet transform of a Dirac-structure edge is independent on the slope of the edge. – Grey-level invariant: the local maximum modulus of the wavelet transform with respect to a Dirac-structure edge takes place at the same points when the images with different grey-levels are to be processed. – Width invariant. for various widths of the Dirac-structure edge images, the location of maximum modulus of the wavelet transform does not vary when the scale s of the wavelet transform is not less than the width d of the curve. Based on the invariance of the wavelet transform, an algorithm to extract the Dirac-structure edge by wavelet transform has been developed. Then several experiments have been conducted, and positive results have been obtained in this paper.
References 1. J. Canny. “A Computational Approach to Edge Detection”. IEEE Trans. on Pattern Analysis and Machine Intelligence, 8:679–698, 1986. 131 2. C. K. Chui. An Introduction to Wavelets. Academic Press, Boston, 1992. 131 3. I. Daubechies. Ten Lectures on Wavelets. Society for Industrial and Applied Mathemathics, Philadelphia, 1992. 131 4. S. Mallat and W. L. Hwang. “Singularity Detection and Processing with Wavelets”. IEEE Trans. Information Theory, 38(2):617–643, March 1992. 134 5. D. Marr and E. C. Hildreth. “Theory of Edge Detection”. In Proc. Roy. Soc., pages 187–217, London B 207, 1980. 131 6. Y. Y. Tang, Qi Sun, Lihua Yang, and Li Feng. “Two-Dimensional Overlap-Save Method in Handwriting Recognition”. In 6th International Workshop on Frontiers in Handwriting Recognition(IWFHR’98), pages 627–633, Taejon, Korea, August 12-14 1998. 134 7. Y. Y. Tang, Lihua Yang, and Jiming Liu. “Characterization of Dirac-Structure Edges with Wavelet Transform”. IEEE Trans. Systems, Man, Cybernetics (B), 30(1):93–109, 2000. 129, 130, 132, 133, 134, 136, 138
Wavelet Algorithm for the Numerical Solution of Plane Elasticity Problem Youjian Shen1 1
and Wei Lin2
Department of Mathematics, Zhongshan University and Hainan Normal University Haikou. 571158, P. R. China
[email protected] 2 Department of Mathematics, Zhongshan University Guangzhou, 510275, P. R. China
[email protected]
Abstract. In this paper, we apply Shannon wavelet and Galerkin method to deal with the numerical solution of the natural boundary integral equation of plane elasticity probem in the upper half-plane. The fast algorithm is given and only 3K entries need to be computed for one 4K × 4K stiffness matrix. Keyword: plane elasticity problem, natural Shannon wavelet, Galerkin-wavelet method.
1
integral
equation,
Introduction
The plane elasticity problem arises from the plane strain problem and the plane stress problem which are widely applied in engineering. For the plane elasticity problem in a disc we have obtained the fast algorithm for the numerical solution by the wavelet method [1]. Now we consider the problem in the upper half-plane which has been considersd by Yu in [2], but he did not give the algorithm of the numerical solution. In this paper, as in [1], to reduce the problem into the integral equation we use the natural boundary element method which first introduced by Kang Feng and Dehao Yu [3]. In the last decade, the natural boundary element method has been efficiently used to solve some elliptic problems [1,2,4]. One of the advantages of the natural boundary integral element method is that the energy functional of the original partial differential equation preserves unchanged which results in the unique existense and stability of the solution of the natural boundary integral equation. The natural boundary integral equation possesses the kernel with hypersingularity in Hadamard finite part sense. Nowadays many methods have been developed to deal with the hypersingular integrals [1,2,5]. In this paper, we utilize Galerkin-wavelet method and the Fourier Transform of the singular kernel in the distribution sense to tackle the difficulty of hypersingularity. It is a potential numerical technique for using wavelet to solve partial
Supported in part by NSF of Hainan normal university Supported in part by NSF of Guangdong
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 139–144, 2001. c Springer-Verlag Berlin Heidelberg 2001
140
Youjian Shen and Wei Lin
differential equations and integral equations in recent years([1],[4], [6]-[8]). The wavelet we use is Shannon wavelet that is a important wavelet in signal process which has excellent localization property in frequency [9]. We find that our Galerkin-wavelet method is very suitable to solve the natural boundary equation of the plane elasticity problem in the upper half-plane. As a result, the computational formulae of the stiffness matrices are simple and only 3K entries need to be computed for a 4K × 4Kstiffness matrix. So that our fast algorithm requires less computational cost and the solution error is small in practical computation. We organize this paper as follows. In Section 2, we introduce the Poisson integral formula and the natural integral equation of the plane elasticity in the upper half-plane. In Section 3, we use the Galerkin-wavelet method to solve the natural integral equation and give the computational formulae of the stiffness matrices, and in Section 4, we consider the convergence of the numerical solution. Lastly, the results of numerical experiments are presented in Section 5.
2
Plane Elasticity Problem
We consider the second boundary problem of the plane elasticity equation in the upper half-plane Lu = 0 in Ω := {(x, y)|y > 0} (2.1) βu = g on R. where u = (u1 , u2 )
∂ ∂ a ∂x 2 + b ∂y 2 Lu = 2 ∂ (a − b) ∂x∂y ∂ −b ∂y βu = ∂ (2b − a) ∂x 2
2
∂2 (a − b) ∂x∂y u1 . ∂2 ∂2 u 2 b ∂x2 + a ∂y2 ∂ −b ∂x u1 ∂ u2 −a ∂y y=0
with a = λ + 2µ, b = µ (λ, µ are Lam`e constants), and g = (g1 , g2 ) is a given vector function on R and satisfies the following compatible conditions: ∞ gi (x)dx = 0,
i = 1, 2.
(2.2)
−∞
Set ∂u ∂u u , ∈ L2 (Ω)} , W01 (Ω) = {u| 2 2 2 2 ∂x ∂y 1 + x + y ln(2 + x + y ) From Green formula, it is not difficult to show that for any u, v ∈ W01 (Ω)2 (v · Lu − u · Lv)dxdy = (v · βu − u · βv)dx (2.3) Ω
R
Wavelet Algorithm for the Numerical Solution of Plane Elasticity Problem
141
From this and the Green function of the equation (2.1)([2], chapter IV) we can get the Poission formula u = P ∗ u0 , where u0 = u|y=0 , P =
y
y>0
(2.4)
(a−b)y(x2 −y 2 ) 2(a−b)xy 2 π(a+b)(x2 +y 2 )2 π(a+b)(x2 +y 2 )2 2(a−b)xy 2 (a−b)y(x2 −y 2 ) y π(a+b)(x2 +y 2 )2 π(x2 +y 2 ) − π(a+b)(x2 +y 2 )2
π(x2 +y 2 )
+
.
Substituting (2.4) into the boundary value condition βu = g, we obtain the following natural boundary integral equation of the problem (2.1) Ku0 = g where
Ku0 =
(2.5)
2ab 2b − π(a+b)x 2 − a+b δ (x) 2
2b2 a+b δ (x)
2ab − π(a+b)x 2
∗ u0 .
where δ(x) is the Dirac function. It is obvious that the kernel of the natural integral operator K possesses 2nd-order singularity. On the other hand, if boundary load g ∈ H −1/2 (R)2 satisfies the compatible condition (2.2) , then the natural boundary integral equation (2.5) has a unique solution in H 1/2 (R)2 [2]. Introduce the bilinear form ∞ ˆ 0 , v0 ) = D(u
v 0 · Ku0 dx, −∞
and the linear functional
∞ Fˆ (v 0 ) =
g · v 0 dx −∞
then the natural boundary integral eqution (2.5) is equivalent to the following variational problem: s.t. f ind u0 ∈ H 1/2 (R)2 , (2.6) ˆ 0 , v 0 ) = Fˆ (v 0 ), D(u ∀v 0 ∈ H 1/2 (R)2
3
Galerkin-Wavelet Methods
Set ˆ = χ[−π,π] (ξ) φ(ξ) then φ(x) =
1 2π
∞ −∞
sin πx iξx ˆ dξ = φ(ξ)e πx
(3.1)
(3.2)
142
Youjian Shen and Wei Lin
This is well known as the scaling function of Shannon wavelet. Now, we use Galerkin method to solve the variational problem (2.6). For K ∈ N,j ∈ Z let VjK = Span{φj,k (x)|φj,k (x) = 2j/2 φ(2j x − k), k = −K, −K + 1, · · · , K − 1}, Substituting (VjK )2 for H 1/2 (R)2 in (2.6) leads to the following approximate variational problem:
∈ (VjK )2 s.t. f ind uj,K 0 (3.3) j,K j,K j,K ˆ D(u0 , v 0 ) = Fˆ (v j,K ∈ (VjK )2 0 ), ∀v 0 j,K We express uj,K 01 , u02 as
K−1 1 j,K αj,k φj,k (x) u01 = k=−K
K−1 2 uj,K αj,k φj,k (x) 02 = k=−K
= (φj,m (x), 0) and v j,K = (0, φj,m (x))(m = −K, −K + 1, · · · K − 1) select v j,K 0 0 respectively, then we get the following linear algebraic system: 1 α f1 Q11 Q12 = . (3.4) Q21 Q22 α2 f2 where i = 1, 2. αi = (αij,−K , αij,−K+1 , · · · , αij,K−1 )T , ps p, s = 1, 2 Qps = (qmn )m,n=−K,−K+1,···,K−1 , ps ˆ qmn = D((δ1,s , δ2,s )φj,n (x), (δ1,p , δ2,p )φj,m (x)) fi = (bi−K , bi−K+1 , · · · , biK−1 )T , ∞ i g(x) · (δ1,i , δ2,i )φj,m (x)dx bm =
i = 1, 2
−∞
and δij is the Kronecker’s symbol. Theorem 1. The entries of the stiffness matrix of the linear algebraic system can be expressed as
j 2 abπ r=0 11 22 a+b , (3.5) qmn = qmn = 2j+1 ab r ((−1) − 1), r = 0 π(a+b)r 2
0, r=0 21 12 qmn = qmn = 2j+1 b2 (3.6) r (−1) , r = 0 (a+b)r where r = m − n. By the Theorem (3.1) only 3K entries need to be computed for one 4K × 4K stiffiness matrix.
Wavelet Algorithm for the Numerical Solution of Plane Elasticity Problem
4
143
Convergence of Numerical Solution
2 K For j ∈ Z, K ∈ N , we define LK j : L (R) → Vj as K−1
LK j f =
f, φj,k φj,k
k=−K
Lemma 1. For all f ∈ H 1 (R), we have lim lim LK j f − f H 1 (R) = 0
j→∞ K→∞
(4.1)
Theorem 2. If u0 ∈ H 1 (R)2 , then lim lim u0 − uj,K ˆ =0 0 D
j→∞ K→∞
(4.2)
ˆ ·), i.e. · ˆ = D(·, ˆ ·)1/2 where · Dˆ is energy norm induced by bilinear form D(·, D
5
Numerical Results
In this section, we present the numerical results of a test example to illustrate our algorithm for the natural boundary integral equation (2.5) discussed in section 3. Example Consider the problem 3x3 − x 3x2 − 1 , Ku0 = on R. 10(x2 + 1)3 10(x2 + 1)3 Selecting Lam`e constants λ = 1, µ = 0.5. then the exact solution is x(4x2 + 1) x2 − 5 , u0 = − . 30(x2 + 1)2 60(x2 + 1)2
Table 1. Numerical results (K = 2j ) j 1 2 3 uj,K (x) − u (x) 0.14351 0.14349 0.14345 2 0 L (R) 0 −17 max |uj,K 2.666464e−17 2.63968−17 0 (m) − u0 (m)| 1.28227e
−5≤m≤5
j 4 5 6 uj,K (x) − u (x) 0.14337 0.14318 0.14265 2 0 L (R) 0 −17 max |uj,K 4.65589e−17 5.11821e−17 0 (m) − u0 (m)| 4.46704e
−5≤m≤5
The computational results of above examples show that our algorithm privides high accuracy with low computing cost.
144
Youjian Shen and Wei Lin
References 1. W. Lin, Y. J. Shen, Wavelet solutions to the natural integral equations of the plane elasticity problem, Proceedings of the second ISAAC Congress, Vol. 2, 1471-1480. (2000), Kluwer Academic Publishers. 139, 140 2. Dehao Yu, Mathematical theory of natural boundary element methods, Science press (in chinese), Beijing (1993). 139, 141 3. K. Feng and D. Yu, Canonical integral equations of elliptic boundary value problems and their numerical solutions, Proc. of China-France Symp. on FEM, Science Press, Beijing (1983), 211–252. 139 4. Wensheng Chen and Wei Lin, Hadamard singular integral equations and its Hermite wavelet, Proc. of the fifth international colloquium on finite or infinite dimensional complex analysis , (Z. Li, S. Wu and L. Yang. Eds.) Beijing, China (1997), 13–22. 139, 140 5. C.-Y. Hui, D. Shia, Evaluations of hypersingular integrals using Gaussian quadrature, Int. J. for Numer. Meth. in Engng. 44, 205-214 (1999). 139 6. R. P. Gilbert and Wei Lin, Wavelet solutions for time harmonic acoutic waves in a finite ocean, Journal of Computional Acoustic Vol. 1, No. 1 (1993) 31–60. 140 7. C. A. Micchelli, Y. Xu and Y. Zhao, Wavelet Galerkin methods for second-kind integral equations. J. Comp. Appl. Math. 86 (1997), 251-270. 8. Tobias Von Petersdorff, Christoph Schwab, Wavelet approximations for first kind boundary integral equations on polygons, Numer, Math, 74 (1996), 479-519. 140 9. I. Daubechies, Ten lectures on wavelets, Capital City Press, Montpelier, Vermont, 1992. 140
Three Novel Models of Threshold Estimator for Wavelet Coefficients Song Guoxiang and Zhao Ruizhen School of Science, Xidian University Xi’an, 710071, P. R. China
Abstract. The soft-thresholding and the hard-thresholding method to estimate wavelet coefficients in wavelet threshold denoising are firstly discussed. To avoid the discontinuity in the hard-thresholding and biased estimation in the soft-thresholding, three novel models of threshold estimator are presented, which are polynomial interpolating thresholding method, compromising method of hard- and soft-thresholding and modulus square thresholding method respectively. They all overcome the disadvantages of the hard- and soft-thresholding method. Finally, an example is given and the experimental results show that the improved techniques presented in this paper are efficient.
1
Introduction
Wavelet theory has recently become a popular mathematical tool in many research fileds. It throws a new light on such applications as image and signal processing. In this paper, we concentrate on the problem of signal denoising. Generally, there are three approaches which are used to distinguish noise from regular wavelet coefficients. The first one is based on the principle of modulus maximum with wavelet transform presented by Mallat[1][2]. The second approach is grounded on the different correlation properties between the wavelet coefficients of the noise and the regular signal. And the third approach is the wavelet thresholding technique presented by Donoho[3][4]. For the third case, the idea of the hard-thresholding or the soft-thresholding method is to replace the small coefficients by zero and keep or shrink the large coefficients. However, the Estimated Wavelet Coefficients (EWC) obtained in the hard-thresholding are not continuous at the threshold, so it may induce the oscillation of the reconstructed signal. In the soft-thresholding case, EWC are mathematically tractable due to the good continuity, but when the wavelet coefficients become larger, there are deviations in EWC. Thus the error will certainly bring to the reconstructed signal. The methods discussed in this paper belong to the third case, namely, noise reduction based on thresholding. Combining with the hard-thresholding and the soft-thresholding, three improved techniques are presented in this paper to avoid the disadvantages. They are polynomial interpolating thresholding method, compromising method of the hard- and soft-thresholding and modulus squared thresholding method respectively. The wavelet coefficients estimated Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 145–150, 2001. c Springer-Verlag Berlin Heidelberg 2001
146
Song Guoxiang and Zhao Ruizhen
through two of the methods given in this paper are continuous at the threshold and nearly unbiased when the original coefficients become larger. And all the three methods obtain good results. The correspondence is organized as follows. Section 2 briefly introduces some basic notations of wavelet transform and the hard-thresholding and the soft-thresholding method. And three novel models of threshold estimator for wavelet coefficients are presented in Section 3. Finally, Section 4 gives some experimental results, and a brief conclusion is stated in Section 5.
2
Wavelet Transform and the Thresholding Method
Suppose there is an observed signal f (t) = s(t) + n(t)
(1)
where s(t) is the original signal,n(t) is Gaussian white noise with mean 0 and variance σ 2 . If f (t) is sampled as N point discrete signal f (k), then the wavelet fast algorithm is Sf (j + 1, k) = Sf (j, k) ∗ h(j, k) (2) W f (j + 1, k) = Sf (j, k) ∗ g(j, k)
(3)
where Sf (0, k) is original signal f (k), Sf (j, k) are approximated coefficients and W f (j, k) wavelet coefficients. h and g are respectively low-pass and high-pass filters. For convenience, we abbreviate W f (j, k) to wj,k . Accordingly, wavelet reconstruction formula is ¯ k) + W f (j, k) ∗ g¯(j, k) Sf (j − 1, k) = Sf (j, k) ∗ h(j,
(4)
Due to the linear property of wavelet transform, the wavelet coefficients wj,k of observed data f (k) = s(k) + n(k) consist of two parts. One is W s(j, k) (abbreviated to uj,k ) corresponding to s(k) and the other is W n(j, k) (abbreviated to vj,k ) corresponding to n(k). The idea of the wavelet threshold denoising is 1. Getting the wavelet coefficients wj,k from the noisy signal f (k) by using (2) and (3); 2. Determining the estimated wavelet coefficients w ˆj,k from wj,k by the thresholding method such that ||wˆj,k − uj,k || are as small as possible; 3. Reconstructing the denoised signal fˆ(k) from w ˆj,k by (4). Donoho has presented a very concise method to estimate the wavelet coefficients wj,k . A proper threshold λ should be firstly chosen. Then the coefficients with absolute values smaller than λ are replaced by zero and those larger than λ are kept in the hard-thresholding case and shrunk in the √ soft-thresholding case. The threshold of Donoho and Johnstone[4] is λ = σ 2 log N . Define wj,k , |wj,k | ≥ λ w ˆj,k = (5) 0, |wj,k | < λ
Three Novel Models of Threshold Estimator for Wavelet Coefficients
147
It is called the hard-thresholding estimator. The soft-thresholding estimator is defined as sign(wj,k )(|wj,k | − λ), |wj,k | ≥ λ w ˆj,k = (6) 0, |wj,k | < λ Although these methods are widely used in applications, they have some underlying disadvantages. For instance, the estimated wavelet coefficients w ˆj,k by the hard-thresholding method are not continuous at the threshold λ, which may lead to the oscillation of the reconstructed signal. In the soft-thresholding ˆj,k and wj,k , which directly case, when |wj,k | > λ, there are deviations between w influence the accuracy of the reconstructed signal. To overcome the above disadvantages of the hard-thresholding and the soft thresholding method, we present some improved schemes in Section 3.
3 3.1
Three Novel Models of Threshold Estimators The Polynomial Interpolating Thresholding Method
Because the hard-threshold estimation is not continuous and the soft-threshold estimation has some deviations, the applications of these methods are somewhat limited. So we have a chance to improve them. A natural approach is to design an estimator from which the estimated wavelet coefficients wj,k will be continuous at the threshold λ and with the wj,k increase, little deviation exists in EWC. For example, we can design an estimator such that for |wj,k | > t, (t > λ), w ˆj,k and wj,k are completely the same. Such an assumption can be realized through polynomial interpolating. The model is as follows: |wj,k | ≤ t wj,k , (7) w ˆj,k = sign(wj,k )P (|wj,k |), λ ≤ |wj,k | t 0, |wj,k | < λ where P (|wj,k |) is an interpolating polynomial. Generally P (|wj,k |) can be quadratic or cubic polynomial. The corresponding interpolating conditions are P (λ) = 0 P (λ) = 0 P (λ) = 0 P (t) = t and (8) P (t) = t P (t) = 1 P (t) = 1 respectively. Very simple derivations can lead to the quadratic polynomial P (x) = −
1 [λx2 − (λ2 + t2 )x + λt2 ], (λ ≤ x ≤ t) (t − λ)2
(9)
and the cubic polynomial P (x) = −
1 [(t + λ)x3 − 2(t2 + tλ + λ2 )x2 + (t − λ)3
λ(4t2 + tλ + λ2 )x − 2t2 λ2 ], (λ ≤ x ≤ t)
(10)
148
Song Guoxiang and Zhao Ruizhen
The estimated wavelet coefficients w ˆj,k obtained from the above method are continuous everywhere. Moreover, if P (x) is a cubic polynomial, then w ˆj,k are derivative in the whole domain as well. For |wj,k | > t, w ˆj,k are unbiased estimated, which makes up for the shortage of the soft-thresholding. 3.2
The Compromising Method of the Hard- and Soft-Thresholding
Define w ˆj,k =
sign(wj,k )(|wj,k | − αλ), |wj,k | ≥ λ , (0 ≤ α ≤ 1) 0, |wj,k | < λ
(11)
This model of estimator for wavelet coefficients is called the compromising method of the hard- and soft-thresholding. Particularly, (10) will turn to the hard-thresholding (5) if α equals 0 and the soft-thresholding (6) if α is 1. For 0 < α < 1, it is clear that the data w ˆj,k by (10) lie between those by (5) and (6). So it is called the compromising method of the hard- and soft-thresholding. This method is quite efficient in noise reduction although it is simple and straight forward. It is no wonder if we pay a little attention to the thresholding method itself. For the soft-threshlding case, the absolute value of the estimated coefficient w ˆj,k is always smaller than that of wj,k by λ (when wj,k ≤ λ). Therefore, the deviation should be cut as small as possible. However, the deviation being zero (corresponding to the hard-thresholding) is not the best case as well in that |wj,k | is always larger than |uj,k | in most cases because wj,k consists ˆj,k such that ||w ˆj,k − uj,k || are of uj,k and vj,k . While our aim is to find proper w minimum. Therefore, the value of w ˆj,k should lie between |wj,k | − λ and |wj,k |, which will make w ˆj,k be closer to uj,k . Based on this idea, we add a factor α in the soft-thresholding estimator (6) to improve the performance. α is any real number between 0 and 1. An appropriate α may better the denoising result. In this correspondence, we choose α = 0.5. 3.3
The Modulus Squared Thresholding Method
We firstly consider the case wj,k > 0, then generalize the result to wj,k < 0. In the soft-thresholding method, (6) is equivalent to λ(wj,k /λ − 1), wj,k /λ ≥ 1 w ˆj,k = (12) 0, wj,k /λ < 1 when wj,k > 0. If we see wj,k /λ as a whole, then (12) means that when wj,k /λ ≥ 1, wj,k can be thought as the coefficients of the signal and hence are kept, otherwise wj,k should be removed since they are considered as the coefficients of the noise. Although it is equivalent to (6), (12) is easier to be extended. We can modify (12) as the following model λ (wj,k /λ)2 − 1, wj,k /λ ≥ 1 w ˆj,k = (13) 0, wj,k /λ < 1
Three Novel Models of Threshold Estimator for Wavelet Coefficients
149
The difference between (13) and (12) is that in (13) wj,k /λ is in its square form. The advantage of this modification is that if wj,k /λ is above 1, then the square of wj,k /λ will become larger; if wj,k /λ is below 1, then the square of wj,k /λ will become smaller. Such a procession will speed the separation of noise from signal. (13) is true only if wj,k > 0. For general case we have sign(wj,k ) (wj,k )2 − λ2 , |wj,k | ≥ λ w ˆj,k = (14) 0, |wj,k | < λ It is easy to prove that when |wj,k | ≥ λ, |wj,k | − λ ≤ (wj,k )2 − λ2 ≤ |wj,k |
(15)
holds. From (15) we can know that the value of w ˆj,k estimated by (14) still lies ˆj,k is a nonlinear function. And between those by (5) and (6). When |wj,k | ≥ λ, w w ˆj,k becomes closer and closer to wj,k with |wj,k | increasing.
4
Experimental Results
A comparison is made in signal denoising with the above threshold methods presented in this paper. Instead of the fixed threshold λ = σ 2 log(N ) presented by Donoho, we take the different thresholds λj = σ 2 log(N )/ log(j+1) at different scales. A noisy signal is processed by the above five methods. Before denoising, the Signal to Noise Ratio (SNR) is 8.226270. Table 1 shows the comparison of the SNR and relatively mean square error (RMSE) of the reconstructed signal with the above methods. From Table 1, we can see that the compromising method of the hard- and soft-thresholding and the modulus square thresholding method are obviously superior to the hard-thresholding and the soft-thresholding method. The polynomial interpolating thresholding method is only superior to the hard-thresholding method and is equivalent to the soft-thresholding method. Table 1. Comparison of estimators for wavelet coefficients by SNR and RMSE Estimator Soft-thresholding Hard-thresholding Square Interpolating Cubic Interpolating Compromising of Hard and Soft Modulus Squared Method
5
SNR 15.276322 14.331342 15.152992 15.288729 15.582417 15.367344
RMSE 0.172260 0.192058 0.174723 0.172014 0.166295 0.170464
Conclusion
We will indicate that λj in this paper are not the best. If λj are properly selected, the superiority of our methods will be more remarkable. In addition, for different λj , the experimental results may be slightly different. However, from a mass
150
Song Guoxiang and Zhao Ruizhen
of experiments the authors have made, a conclusion can be drawn that the pure hard- or soft-thresholding method has poor stability and is strongly dependent on λj . Moreover, at least one of the two methods can not reach a satisfactory result. By comparison, the modulus square thresholding method and the polynomial interpolating thresholding method are more stable and can obtain nearly the same results as the better one of the hard- or soft-thresholding. Finally, whatever λj is, the compromising method of the hard- and soft-thresholding is obviously superior to the hard- or soft-thresholding method. In addition, we only make some improvements on the threshoding method itself. There are some other problems in wavelet threshold denoising such as the selection of the threshold λ and nonstationary noise, say, Poisson noise case and so on. Some achievements have been made in [5], [6], [7].
References 1. Mallat S. and Zhong S. Characterization of signals from multiscale edges. IEEE Trans. on PAMI, 1992, 14(7): 710-732 2. Mallat S. and Hwang W. L. Singularity detection and processing with wavelets. IEEE Trans. on IT, 1992, 38(2): 617-643 3. Donoho D. L. De-noising by soft-thresholding. IEEE Trans. on IT., 1995, 41(3):613627 4. Donoho D. L. and Johnstone I. M. Ideal spatial adaption via wavelet shrinkage. Biometrika, 1994, 81:425-455 5. Jansen M. and Bultheel A. Multiple wavelet threshold estimation by generalized cross validation for Images with correlated noise. IEEE Trans. on IP., 1999, 8(7):947-953 6. Nowak R. D. and Baraniuk R. G. Wavelet-domain filtering for photon imaging systems. IEEE Trans. on IP., 1999, 8(5):666-678 7. Ching P. C., So H. C. and Wu S. Q. On wavelet denoising and its applications to time delay estimation. IEEE Trans. on SP., 1999, 47(10):2879-288
The PSD of the Wavelet-Packet Modulation Mingqi Li1 , Qicong Peng2 , and Shouming Zhong1 1
2
Applied Mathematics Department of the University of Electronic Science and Technology of China Chengdu, 610054, P. R. China
[email protected] Institute of Communication & Information Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, P. R. China
Abstract. On the wavelet-packet modulation scheme, wavelet packets are used as carriers, where information to be transmitted is encoded, via an inverse wavelet packet transform, as the coefficient of wavelet packets. The power spectrum density (PSD) of the modulated signals describes the property of modulation in frequency domain, which is discussed by many researchers by simulation. In this paper, the formula of the PSD of the modulated signals is derived. The characteristics of the modulated signals, such as spread spectrum and spectral flatness, is shown from the formula.
1
Introduction
Wavelet and wavelet-packet transform with its desirable characteristics, such as location in time and frequency, and orthogonality across scale and translation, has brought out many useful properties. Wavelet-packet modulation is one of its important application, where wavelet packets are used as the waveform for information transmission. This new kind of modulation scheme generalizes the traditional baseband modulation scheme. In fact, we find now, both rectangular and sinc pulse are scaling function corresponding Haar and Meyer wavelet respectively. On the other hand, wavelet packets modulation is seen as a new kind of multiplexing for its time and frequency overlapping, where TDM and FDM are taken as the special cases according to [1] . In the wavelet-packet modulation scheme, each user is assigned a set of waveform in a group of wavelet packets. The information of each user is impressed on the corresponding waveform via the coefficients. At the receiver, the desired signal is recovered by cross-correlation with a known reference signal in the wavelet packet basis. Spread spectrum and spectral flatness are very important in a communication system especially for channel fading and security. The simulation of waveletpacket modulation, discussed by many references, gives many advantages in communication, including covert and featureless waveforms (see [2],[3],[4],[5]). We find to catch the PSD of the modulated signals is very important in its application. In this paper, the formula of the PSD of the wavelet-packet modulated Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 151–156, 2001. c Springer-Verlag Berlin Heidelberg 2001
152
Mingqi Li et al.
signals is arrived (theorem 1). As a special case of wavelet-packet modulation, the PSD of wavelet modulation (theorem 2) is achieved, too. Now, we briefly summarize the concepts of wavelet and wavelet packets. A multiresolution analysis consists of a collection of embedded subspace sequence in the space of finite energy signal L2 (R). That is ... ⊂ V−2 ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ V2 ... j
Each subspace Vj has an orthogonormal basis {ϕj,k : ϕj,k = 2 2 ϕ(2j t−k), k ∈ Z}. ∞ √ ϕ(t) is called the scaling function satisfying ϕ(t) = 2 h[k]ϕ(2t − k). If k=−∞
functions {ϕj,k : k ∈ Z} forms an orthonormal basis of the space Vj , the following orthonormal constraint on h[n] must be satisfied: ∞ k=−∞
h[k − 2n]h[k − 2m] = δm,n ∞ k=−∞
h[k] =
√ 2
(1)
Then we get a quadrature mirror filter (QMF) h[n], g[n] := (−1)n h[1 − n]. ∞ √ The function ψ(t), defined by ψ(t) = 2 g[k]ϕ(2t − k), is called a wavelet k=−∞
function induced by ϕ(t). We define the recursive function sequence {Pn (t)} as Coifman, p2n (t) =
∞ √ 2 h[k]pn (2t − k)
p2n+1 (t) =
∞ √ 2 g[k]pn (2t − k)
k=−∞
(2)
k=−∞
where p0 (t) = ϕ(t), p1 (t) = ψ(t). We need also the following notation: l
φl,m (t) := 2 2 pm (2l t) Ulm := Clos{2
N −l 2
2m
pm (2N −l t − k) : k ∈ Z, m ∈ Z+ }.
(3) (4)
2m+1 0 1 = Ul−1 Ul−1 , VN = UN and WN = UN . In order to Then we get 0 m decompose the subspace VN = UN , the set of subspace Ul may be organized 0 m as a binary tree, where UN is on the top and UN −l is on the (m + 1)th node l
Ulm
of level l. There are 2 nodes in the same level. We can grow or prune the tree in any desired fashion, and the different fashion provides a different set of basis functions.
2
The Modulation and Its PSD
Firstly, we consider a TDM system in which there are Kl,m independent binary message signals interlaced with each other. Between two consecutive binary symbols of the same message, there are Kl,m − 1 other binary symbol: one from each
The PSD of the Wavelet-Packet Modulation
153
of the other message signals. The combined sequence forms a composite sequence of binary symbols σl,m [n], where σl,m [n] = ±1. The system we propose here seeks the representation of the binary symbols 1 and −1 by φl,m (t) and −φl,m (t), respectively. Then the modulated signal of the TDM sequence {σl,m [n]}, encoded by φl,m (t − 2l−N n), can be given by ∞
sl,m (t) =
σl,m [k]φl,m (t − 2l−N k).
(5)
k=−∞
Let the set M of (l, m) satisfy (l,m)∈M Ulm = VN . Since all the constituent terminal functions in a given tree structure M are orthogonal to each other, we may employ all of these functions to carry binary data from deferent TDM k Kl,m . Let σl,m [n] represent groups of users. So the total number of users is (l,m)∈M
the information sequence of the kth user while its assigned waveform is φl,m (t − m 2l−N (nKl,m + k)) in UN −l . We get the modulated signal s(t) satisfying
s(t) =
Kl,m
∞
(l,m)∈M k=1 n=−∞
k σl,m [n]φl,m (t − 2l−N (nKl,m + k))
(6)
We make the following reasonable assumptions to simplify the calculation of PSD: k [n]} of user (l, m) is stationary process; 1. Information sequence {σl,m k 2. Different user {σl,m [n]} with different (l, m) are statistically independent and k [n]) = 0. E(σl,m k We denote the correlation coefficient of {σl,m [n] : n ∈ Z} as k k k Rl,m [n] := E(σl,m [n]σl,m [n + h]).
(7)
Then we have ∗
E(s(t + τ )s (t)) = E((
Kl,m
∞
(l,m)∈M k=1 n=−∞
k)))(
k σl,m [n]φ∗l,m (t − 2l−N (nKl,m +
Ka,b
∞
k σa,b [n]φl,m (t + τ − 2a−N (dKa,b + k))))
(a,b)∈M c=1 d=−∞
so,we get E(s(t + τ )s∗ (t)) ==
Kl,m
∞
∞
(l,m)∈M k=1 n=−∞ h=−∞
k Rl,m [h]φ∗l,m (t − 2l−N (nKl,m +
k))φl,m (t + τ − 2l−N ((n + h)Kl,m + k))
154
Mingqi Li et al.
Define function g(t): g(t) :=
K l,m
∞
∞
(l,m)∈M k=1 n=−∞ h=−∞
k Rl,m [h]φ∗l,m (t − 2l−N (nKl,m +
k))φl,m (t + τ − 2
l−N
(8)
((n + h)Kl,m + k)).
When Kl,m is a constant K indepent l, m, each of waveforms has the same number of users. Then g(t) is a periodic function. That is g(t) = g(t + K). We know from the assumptions above that stochastic process s(t) is generalized cyclostationary with period T = Kl,m = K. So the correlation function of s(t) can be defined 1 K R(τ ) := g(t)dt (9) K 0 Then the R(τ ) of s(t) is ∞ ∞ K−2l−N (nK+k) K 1 k R(τ ) = Rl,m [h] φ∗l,m (u)φl,m (u+ K l−N (nK+k) n=−∞ −2 (l,m)∈M k=1 h=−∞
=
1 K
τ − 2l−N hK)du ∞ K ∞ k Rl,m [h]2N −l φ∗l,m (u)φl,m (u + τ − 2l−N hK)du
−∞
(l,m)∈M k=1 h=−∞
So we get the PSD of s(t). That is 1 ˆ R(ω) = K =
1 K
∞ K
l−N k Rl,m [h]2N −l |φˆl,m (ω)|2 e−jωh2 K
(l,m)∈M k=1 h=−∞
K
|φˆl,m (ω)|2
(l,m)∈M k=1
∞
l−N
k Rl,m [h]e−jωh2
K
h=−∞
ˆ So R(ω) is arrived, 1 ˆ R(ω) = K
K
k ˆ l,m 2N −l |φˆl,m (ω)|2 R (2l−N ω)
(10)
(l,m)∈M k=1
∞ ˆ k (ω) := Rk [h]e−jωhK . where R l,m l,m h=−∞
Summarizing the description above, we get the following theorem: Theorem 1. If assumption (1) and (2) are satisfied and Kl,m is a constant K independent on l, m, the PSD of s(t) is 1 ˆ R(ω) = K ˆ k (ω) = where R l,m
∞ h=−∞
K
k ˆ l,m 2N −l |φˆl,m (ω)|2 R (2l−N ω)
(l,m)∈M k=1
k Rl,m [h]e−jωhK .
(11)
The PSD of the Wavelet-Packet Modulation
155
When we select the waveforms of nodes (l, 1), the wavelet-packet modulation turns into wavelet modulation. We can get easily a similar result. Now, we discuss a wavelet modulation with m0 users and the waveforms l
ψl,n (t) = 2 2 ψ(2l t − n), m1 ≤ l ≤ m0 + m1 , m1 ∈ Z. We get the modulated signal s(t). That is m 0 +m1
s(t) =
∞
σl [n]ψl,n (t)
(12)
l=m1 n=−∞
where σl [n] = ±1 is the nth data of lth user. We denote ˆ l (ω) := Rl (k) := E(sl (n + k)s∗l (n)), R
∞ k=−∞
Rl [k]e−jωK
R(t + τ, t) := E(s(t + τ )s∗ (t))
(13)
Theorem 2. If {σl [n] : n ∈ Z} is a stationary process and {σl [n] : n ∈ Z} are statistically independent with E(σl [n]) = 0, the PSD of s(t) is ˆ R(ω) =
m 0 +m1
ˆ −l ω)|2 R ˆ l (2−l ω) |ψ(2
(14)
l=m1
Proof. The correlation function of s(t) is 1 R(τ ) = R(t + τ, t)dt
(15)
0
Then we have R(τ ) =
m 0 +m1
∞
l=m1 k=−∞
Rl [k]2l
∞ −∞
ψ ∗ (u)ψ(u + 2l τ − k)du.
(16)
So, the PSD of s(t) is ˆ R(ω) =
m 0 +m1
∞
l=m1 k=−∞
ˆ −l ω)|2 e−jω2−l K = Rl [k]|ψ(2
m 0 +m1
ˆ −l ω)|2 R ˆ l (2−l ω). |ψ(2
l=m1
From the theorems above, we know clearly that wavelet-packet (wavelet) modulation spreads the spectrum of the original signals. We will get wider spectrum of the modulated signals with greater N in theorem 1 and m1 in theorem 2. Because wavelet (especially wavelet packets) has wonderful time-frequency localization property, We can select N , l and m1 so that the spectrum of the modulated signals is in the domain we need. That’s very important for a communication system in the freqency-selective channel. We find also, from the formula, that the PSD of the modulated signals will vary slowly with frequency f . Further more, the PSD of the modulated signals will be flatter with lager N and m1 . The featureless waveform would be helpful for the covert in a communication system.
156
3
Mingqi Li et al.
Conclusion
The PSD formula of the wavelet-packet and wavelet modulated signals is derived in the paper. It gives the properties of the modulated signals in frequency domain. It will be helpful in application, especially in the design of a communication system based on wavelet-packet.
References 1. K. M. Wong, Jiangfeng Wu, et. al.: Performance of wavelet packet- division multiplexing in impulsive and gaussian noise, IEEE transactions on comm., Vol. 48, No. 7, pp.1083-1086, July. 2000. 151 2. A. R. Lindsey, J. C. Dill Proc.: A digital transceiver for wavelet-packet modulation, SPIE, Vol. 3391/255-264. 151 3. R. S. Orr, C. Pike, M. J. Lyall: Wavelet transform domain communication systems, Proc. SPIE, Vol. 2491/271-282. 151 4. Prashant P. Gandhi, Sathyanarayan S. Rao, et.al: Wavelets for Waveform Coding of Digital System, IEEE transactions on signal processing, Vol. 45, No. 9, pp.23872390, Sep. 1997. 151 5. R. E. Learned, et al: Wavelet-packet-based multiple access communication, Proc. SPIE, Vol. 2303/246-264. 151
Orthogonal Multiwavelets with Dilation Factor a Shouzhi Yang, Zhengxing Cheng, and Hongyong Wang Department of Mathematics, Xi’an Jiaotong University Xi’an, 710049, P.R.China
[email protected]
Abstract. There are perfect construction formulas for the orthonormal uniwavelet. However, it seems that there is not such a good formula with similar structure for multiwavelets. Especially, construction of multiwavelets with dilation factor a(a ≥ 2, a ∈ Z) lacks effective methods. In this paper, a procedure for constructing compactly supported orthonormal multiscale functions is first given, and then based on the constructed multiscale functions, we propose a method of constructing multiwavelets, which is similar to that of uniwavelet. Finally, we give a specific example illustrating how to use our method to construct multiwavelets.
1
Introduction
Since Geronimo, Hardin and Massopust [1] presented the first example of multiwavelets by using fractal interpolation functions, the study of multiwavelets has drawn many researcher’s attention(e.g. See [2],[3] and [4]). Later, more examples were provided in [5] and [6]. As we know, Daubechies [7] obtained perfect constructing formulas for the uniwavelet. Since multiwavelets is a vector-vauled function, the construction of multiwavelets is more difficult than the that of uniwavelet. Multiwavelets can possess simultaneously many desirable properties, such as continuty, compact and short supportedness, orthonormality, interpolating, and very important symmetry or antisymmetry. However, for the uniwavelet, some of these properties are impossible or incompatible. From this respect, applications of multiwavelets are more extensive than those of uniwavelet. Therefore, finding the approaches of construction for the multiwavelets is very significant both in theory and in applications. Donovan, Geronimo, and Hardin [8] discussed the above problem by using fractal interpolation functions, but their construction procedure is very complicated. The main objective of this paper is to give a way of constructing compactly supported multiscale functions and the associated multiwavelets.
2
Multiresolution Analysis
Let Φ(x) = (φ1 , φ2 , · · · , φr )T , φ1 , φ2 , · · · , φr ∈ L2 (R), satisfy the following twoscale matrix equation: M Pk Φ(ax − k), (1) Φ(x) = k=0 Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 157–163, 2001. c Springer-Verlag Berlin Heidelberg 2001
158
Shouzhi Yang et al.
where some r × r matrices {Pk } are called the two-scale matrix sequence. Φ(x) is termed multiscale functions with dilation a(a ≥ 2, a ∈ Z) and multiplicity r . Applying Fourier transformation to (1), we obtain M
ˆ ˆ w ), Φ(w) = P (z)Φ( a
P (z) =
1 iw Pk z k , z = e− a . a
(2)
k=0
P (z) is called the two-scale matrix symbol of matrix sequence {Pk } of Φ. Define subspace Vj = closL2 (R) φ:j,k : 1 ≤ ≤ r, k ∈ Z , j ∈ Z, here and j afterwards, for f ∈ L2 , we will use the notation f:j,k = a 2 f (aj x − k). As usual, Φ(x) in (1) generates a multiresolution analysis {Vj }j∈Z of L2 (R), if {Vj }j∈Z satisfy the nestedness, · · · ⊂ V0 ⊂ V1 ⊂ V2 · · · . Let Wj , j ∈ Z, denote the orthogonal complementary subspace of Vj in Vj+1 , and vectorvalued function Ψ(x) = (ψ1 , ψ2 , · · · , ψ(a−1)r )T , ψ ∈ L2 , = 1, 2, · · · , (a − 1)r, constitutes a Riesz basis for Wj , i.e., Wj = closL2 (R) ψ:j,k : 1 ≤ ≤ (a − 1)r, k ∈ Z , j ∈ Z. It is clear that ψ1 (x), ψ2 (x), · · · , ψ(a−1)r (x) are in W0 ⊂ V1 , Hence there exists a sequence of matrices {Qk }k∈Z such that Ψ(x) =
M
Qk Φ(ax − k).
(3)
k=0
From the two-scale relation (3), we obtain ˆ ˆ w ), Ψ(w) = Q(z)Φ( a
M 1 Q(z) = Qk z k . a
(4)
k=0
For column vector functions Λ and Γ with elements in L2 (R), define Λ, Γ = Λ(x)Γ (x)T dx. We call Φ(x) = (φ1 , φ2 , · · · , φr )T orthogonal multiscaling R function, if Φ(·), Φ(· − n) = δ I , n ∈ Z. Ψ(x) = (ψ , ψ , · · · , ψ )T will
0,n r
1
2
(a−1)r
be said to be orthogonal multiwavelets associated with multiscaling functions Φ, if Ψ(x) satisfy the following equations Φ(·), Ψ(· − n) = Ψ(·), Φ(· − n) = Or×(a−1)r and Ψ(·), Ψ(· − n) = δ0,n I(a−1)r , n ∈ Z, where Or×(a−1)r and I(a−1)r denote the zero matrix and unit matrix, respectively. Lemma 1 Let η = (η , η , · · · , η )T , where η , η , · · · , η ∈ L2 , then 1
2
r
1
2
r
{η (x − k) : 1 ≤ ≤ r, k ∈ Z} is a family of orthogonal functions if and only ηˆ(ω + 2kπ)ˆ η (ω + 2kπ)∗ = Ir , |z| = 1, here and throughout, the asterisk if k∈Z
denotes complex conjugation of transpose. Lemma 2 let Φ(x) be a multiscale function satisfying (1), P (z) be twoscale matrix symbol, then (i) Φ(x) is compactly supported, with supp Φ(x) ⊂ M [0, a−1 ]; (ii) P(1) has eigenvalue 1, and [P (1)]n converges as n → ∞; (iii) the ˆ is an eigenvector corresponding to the eigenvalue 1 of P(1); vector u = Φ(0) Similar to the case of a = 2(See [9]), (i) can be proved analogously. (ii) and (iii) also can be deduced by using the similar method in [2]
Orthogonal Multiwavelets with Dilation Factor a
159
Lemma 3 Let Φ be a multiscale function satisfying (1). If both P0 , PM are M not nilpotent, then Supp Φ = [0, a−1 ]. The Lemma 3 can be proved by using the Similar method in [9].
3
Construction of Orthonormal Multiwavelets
Theorem 1 Let Φ(x) be the orthogonal multiscaling functions defined in (1), P (z) be the two -scale matrix symbol, ωj , j = 1, 2, · · · , a be a roots of M P (ωj z)P (ωj z)∗ = Ir , |z| = 1. i.e., equation z a − 1 = 0,then j=1
M
∗ Pi Pi+ak = aδk,0 Ir , |z| = 1.
(5)
i=0
Further, suppose Ψ = (ψ1 , ψ2 , · · · , ψ(a−1)r )T is an orthogonal multiwavelets associated with Φ , Q(z) is two-scale matrix symbol, then M
P (ωj z)Q(ωj z)∗ = O,
j=1
M
Q(ωj z)Q(ωj z)∗ = I(a−1)r .
(6)
j=1
Eqs. (6) are equivalent to the following Eqs.(7), respectively, M i=0
T = O, Pi Qi+ak
M
T = aδ I Qi Qi+ak 0,k (a−1)r .
(7)
i=0
By using Lemma 1, we can easily prove Theorem 1 Analogous to Hermite cardinal spline interpolation, Φ(x) = (φ1 , φ2 , · · · , φr )T with common support is said to be interpolatory, if it satisfies the following condition: (j−1) Φ(j−1) (k + k0 ) = φj (k0 )δk,0 ej , e1 = (1, 0, · · · , 0)T , · · · , er = (0, · · · , 0, 1)T (j−1) φj (k0 ) = 0 (8) Theorem 2 Let Φ(x) be a multiscale function with dilation a and multiM ]),then plicity r as in (1) and satisfy (8) for some positive integer k0 (1 ≤ k0 ≤ [ a−1 we have 1 1 (9) Pak+k0 = δk,0 Pk0 , Pk0 = diag(1, , · · · , r−1 ), k ∈ Z a a proof Taking j − 1 derivatives to (1) and applying the interpolation con1 δk,0 ej , 1 ≤ j ≤ r, which implies (9). dition (8), we have Pak+k0 ej = aj−1 Theorem 3 Let Φ(x) = (φ1 , φ2 , · · · , φr )T be a multiscale function with dilation a as in (1) , P (z) be two-scale matrix symbol, if suppφi = [hi , gi ], 1 ≤ i ≤ r, then
160
Shouzhi Yang et al.
(i) φ2i−1 are symmetric and φ2i antisymmetric for all j in the following sense φi (x) = (−1)i−1 φi (hi + gi − x), 1 ≤ i ≤ r if and only if the entries Pi,j of the matrix P (z) satisfy Pi,j (z) = (−1)i+j z a(hi +gi )−(hj +gj ) Pi,j (z), 1 ≤ i, j ≤ r
(10)
(ii) φ1 , φ2 , · · · , φr1 are symmetric, the remainder φr1 +1 , · · · , φr are antisymmetric in the sense φi (x) = φi (hi + gi − x), i = 1, 2, · · · , r1 , and φi (x) = −φi (hi + gi − x), i = r1 , r1 + 1, · · · , r if and only if the entries Pi,j of the matrix P (z) satisfy a(h +g )−(h +g ) j j Pi,j (z), 1 ≤ i, j ≤ r1 or r1 + 1 ≤ i, j ≤ r z i i (11) Pi,j = −z a(hi +gi )−(hj +gj ) Pi,j (z), 1 ≤ i ≤ r1 and r1 + 1 ≤ j ≤ r or r1 + 1 ≤ i ≤ r and 1 ≤ j ≤ r1 (iii) If a(hi + gi ) − (hj + gj )(1 ≤ i, j ≤ r) strictly is less than zero or isn’t an integer, then Pi,j = 0 Proof If φ1 , φ2 , · · · , φr satify φi (x) = (−1)i−1 φi (hi + gi − x), 1 ≤ i ≤ r, let Sr = diag(1, −1, · · · , (−1)r ), then Φ(x) = (φ1 (x), φ2 (x), · · · , φr (x))T = Sr (φ1 (h1 + g1 − x), φ2 (h2 + g2 − x), · · · , φr (hr + gr − x))T , hence, ˆ ˆ Φ(ω) = Sr Dr (z a )Φ(ω) Dr (z) = diag(z h1 +g1 , z h2 +g2 , · · · , z hr +gr ) ˆ ω ) = Sr Dr (z a )P (z)Dr (z)Sr Φ( ˆ ω ). Since Successively using (2), we obtain P (z)Φ( a a {φ (x − k) : 1 ≤ ≤ r, k ∈ Z} is a Riesz basis of V0 , so P (z) = Sr Dr (z a )P (z) Dr (z)Sr . Or equivalently, Sr P (z)Sr = Dr (z a )P (z)Dr (z), which implies (10) holds. This completes the proof of Theorem 3 M ], then φ2i−1 Corollary 1 If suppφ1 = suppφ2 = · · · = suppφr = [0, a−1 are symmetric and φ2i antisymmetric for all j if and only if Pk = Sr PM−k Sr . M ], a(hi + gi ) − (hj + gj ) ≡ M , we obtain In fact, since suppφi = [0, a−1 M P (z) = z Sr P (z)Sr by (10). Hence, Corollary 1 holds. As we know, for a multiscale function Φ(x), if suppΦ(x) = [0, M ], then T T T suppΦ (x) = [0, M a ], where Φ (x) = [Φ (ax), Φ (ax − 1), · · · , Φ (ax − a + 1)]T . Hence, without loss of generality, we only investigate the construction of multiwavelets with a + 1-coefficient. i.e., ΦT (x) satisfies the following equation Φ(x) =
a
Pk Φ(ax − k)
(12)
k=0
In the applications of multiwavelets, certain special properties is desirable , such as interpolating and symmetry. In the two-scale matrix sequence {Pk }, associated with those multiwavelets with these properties , there must exists some Pi , 0 ≤ i ≤ a such that the matrix (aI − Pi PiT )−1 Pi PiT is a positive definite matrix
Orthogonal Multiwavelets with Dilation Factor a
161
Lemma 4 Let Φ(x) be the orthogonal compactly supported multiscale function with dilation a and multiplicity r satisfing (12), Assume that there exists an Pi , 0 ≤ i ≤ a such that the matrix H defined in following equation is a positive definite matrix H 2 = (aIr − Pi PiT )−1 Pi PiT ,
(13)
Let Hs (s = 1, 2, · · · , a − 1) be (a − 1) essentialy different symmetric matrices (s) (s) satisfing (13), define qj = Hs Pj (j = i), and qj = −Hs−1 Pj (j = i), herej = 0, 1, · · · , a; s = 1, 2, · · · , a − 1. then P0 (qa(s) )T = O,
(14)
(s) (s) P0 (q0 )T + P1 (q1 )T + · · · + Pa (qa(s) )T = O
(15)
() (q0 )(qa(s) )T = O, , s = 1, 2, · · · , a − 1
(16)
(s) (s) (s) (s) (q0 )(q0 )T + (q1 )(q1 )T + · · · + (qa(s) )(qa(s) )T = aIr .
(17)
Proof For convenience, let i=1. (14) and (16) can be proved easily by using (6). For (15) and (17), we have from (6) that a =0
(s) P (q )T = P0 P0T Hs − P1 P1T (Hs−1 ) + · · · + Pa PaT Hs
= [P0 P0T + P2 P2T + · · · + Pa PaT ]Hs − P1 P1T (Hs )−1 = [aI − P P T ]H − P P T (H )−1 r
1 1
s
1 1
s
= [(aIr − P1 P1T )(Hs )2 − P1 P1T ](Hs )−1 = O a (s) (s) q (q )T = Hs P0 P0T Hs + (Hs )−1 P1 P1T (Hs )−1 + · · · + Hs Pa PaT Hs =0
= Hs [P0 P0T + P2 P2T + · · · + Pa PaT ]Hs + (Hs )−1 P1 P1T (Hs )−1 = H [aI − P P T ]H + (H )−1 P P T (H )−1 s
r
1 1
s
s
1 1
s
= (Hs ) [(Hs ) (aIr − P1 P1T )(Hs )2 − P1 P1T ](Hs )−1 = (Hs )−1 [(Hs )2 P1 P1T + P1 P1T ](Hs )−1 = (Hs )−1 [(Hs )2 + Ir ]P1 P1T (Hs )−1 = H [P P T + (H )−2 P P T ](H )−1 = H aI (H )−1 = aI −1
s
1 1
2
s
1 1
s
s
r
s
r
This completes the proof of Lemma 4. (s) In the setting of Lemma 4, we can generate a − 1 sequences {qk }, s = 1, 2, · · · , a − 1. We construct the following functions in terms of these sequences, ψs (x) =
a k=0
(s)
qk Φ(ax − k), s = 1, 2, · · · , a − 1.
(18)
162
Shouzhi Yang et al.
Appling Schmidt orthonormalizing to a functions Φ(x), ψs (x), s = 1, 2, · · · , a − 1, and generating a functions Φ(x), Ψs (x), s = 1, 2, · · · , a − 1, we can (s) conclude that there must exist a − 1 sequences {Qk }, s = 1, 2, · · · , a − 1, such that a (s) Qk Φ(ax − k), s = 1, 2, · · · , a − 1. (19) Ψs (x) = k=0
Hence, we have the following theorem: Theorem 4 In the setting of Lemma 4, let Ψs (x), s = 1, 2, · · · , a − 1 be defined as in (19). Define Ψ(x) = [Ψ1 (x)T , Ψ2 (x)T , · · · , Ψa−1 (x)T ]T , then Ψ(x) is compactly supported orthogonal multiwavelets with dilation a associated with Φ(x) , and satisfies the following two-scale matrix equation Ψ(x) =
a k=0
(1) (2) (a−1) T T [(Qk )T , (Qk )T , · · · , (Qk ) ] Φ(ax − k)
(20)
Corollary 2 In the setting of Lemma 4, (i) If dilation factor a = 2, then ψ1 (x) defined in (18) is compactly supported orthogonal multiwavelets with dilation 2 associated with Φ(x); (ii) If dilation factor a = 3, and Ψs (x) = 3 (s) Qk Φ(ax − k), s = 1, 2 . Let Ψ(x) = [Ψ1 (x)T , Ψ2 (x)T ]T , then Ψ(x) is k=0
compactly supported orthogonal multiwavelets with dilation 3 associated with Φ(x), and satisfies (20) in which , (1) (1) Qk = qk 3 (21) (2) (1) (1) , k = 0, 1, 2, 3. 1 (2) Q(2) qh (Qh )T Qk ] k = 2 [qk − h=0
4
Example
We will illustrate by a specific example how to construct orthogonal multiwavelets based on our method. Example (Construction of orthogonal multiwavelets with dilation 3 and multiplicity 3) Let Φ(x) = (φ1 , φ2 , φ3 )T , satisfy Φ(x) = P0 Φ(3x) + P1 Φ(3x − 1) + P2Φ(3x − 2). By Lemma 2 ,suppΦ(x) ⊂ [0, 1]. Suppose both φ1 and φ3 are symmetric and φ2 is antisymmetric, Φ(x) satisfies the interpolatory condition (8) with k0 = 1, then in view of (9), taking i = 1 and using Theorem 4, we obtain √ 1 1 0 0 0 − 2 √ 2 √2 √ √ (1) (1) 182 26 q0 = − 5226 156 q1 = 0 − 326 0√ , 26 , √ 2 0 0 0 0 − 119 2 18 1 1 1 −1 0 0 2 √ 2 √2 √ √2 √ √ (1) (2) 182 182 26 q2 = 5226 156 , −√2626 , q0 = − 5226 156 26 √ 2 2 0 0 0 0 − 18 18
Orthogonal Multiwavelets with Dilation Factor a
(2)
q1
√ 0 − 2 √ = 0 − 326 0 0
0 0√ ,
11 2 9
(2) q2 =
1 −1 √2 √ 2 26 182 52 156
0
0
0
163
√
26 −√ 26 . − 182
Finally, we obtain orthogonal multiwavelets by (20) and (21).
References 1. Geronimo,J., Hardin,D. P., Massopust,P.: Fractal Functions and Wavelet Expansions Based on Several Scaling Functions. J. Approx. Theory. 78(1998) 373-401 2. Chui,C. K., Lian,J.: A Study on Orthonormal Multiwavelets. J. Appl. Numer. Math., 20(1996) 273-298 3. Lian,J.: Orthogonal Criteria for Multiscaling Functions. Appl. Comp. Harm. Anal. 5(1998) 277-311 4. Hardin,D. P., Marasovich,J. A.: Biorthogonal Multiwavelets on [-1,1]. Appl. Comp. Harm. Anal. 7(1999) 34-53 5. Goh,S. S., Yap,V. B.: Matrix Extension and Biorthogonal Multiwavelets Construction. Linear Algebra and Applictions. 269(1998) 139-157 6. Marasovich,J.:Biorthogonal Multiwavelets, Dissertation, Vanderbilt University, Nashville, TN, (1996) 7. Daubechies,I.: Ten lectures on wavelets,SIAM, Philadelphia, PA, (1992) 8. Donovan,G. C., Geronimo,J., Hardin,D. P.: Construction of Orthogonal Wavelets Using Fractal Interpolution Functions. SIAM J. Math. Anal. 27(1996) 1158-1192 9. Wang So, Jianzhang Wang, Estimating the Support of a Scaling Vector. SIAM J. Matrix Anal. Appl. 1(1997) 66-73
A Wavelet-Based Image Indexing, Clustering, and Retrieval Technique Based on Edge Feature Masaaki Kubo1 , Zaher Aghbari1 , Kun Seok Oh2 , and Akifumi Makinouchi1 1
Graduate School of Information Science and Electrical Engineering, Department of Intelligent Systems, Kyushu University 6-10-1 Hakozaki, Higashi-ku, Fukuoka-shi 812-8581, Japan {kubo,zaher,akifumi}@db.is.kyushu-u.ac.jp 2 Division of Computer Engineering College of Engineering Chosun University 375 Susuk-dong Dong-gu Kwangju 501-759 Korea
[email protected]
Abstract. This paper proposes a technique for indexing, clustering and retrieving images based on their edge features. In this technique, images are decomposed into several frequency bands using the Haar wavelet transform. From the one-level decomposition sub-bands an edge image is formed. Next, the higher order auto-correlation function is applied on the edge image to extract the edge features. These higher order autocorrelation features are normalized to generate a compact feature vector, which is invariant to shift, image size and gray level. Then, these feature vectors are clustered by a self-organizing map (SOM) based on their edge feature similarity. The performed experiments show the high precision of this technique in clustering and retrieving images in a large image database environment.
1
Introduction
In the past decade, the number of digital images has increased tremendously due to the steady growth of computer power, decline of storage cost, and rapid increase in access to the Internet. Therefore, fast and effective methods to organize and search images in large image database environments are essential. In particular, images need to be effectively clustered and then fast content-based mechanism is required to retrieve desired images. Currently, two main indexing approaches exist: (1) indexing images based on features from raw image data [1][2], such as pixel intensity, histogram, etc. (2) indexing images based on coefficients in the transform domain [3][4][5][6], such as total energy of wavelet coefficients. These extracted features are represented by means of a feature vector, which is a compact representation of the image content. These feature vectors are then organized by a spatial access method (SAM), such as B-tree, R-tree, etc., or a clustering method, such as self-organizing map (SOM)[7]. When a query Q, such as ”Find images similar to Q”, is issued, Q is compared with the database of feature vectors that represent the image database. As a result, the K most similar images to Q are returned to the user. Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 164–176, 2001. c Springer-Verlag Berlin Heidelberg 2001
A Wavelet-Based Image Indexing, Clustering, and Retrieval Technique
165
This paper presents a Haar wavelet-based technique that extracts the edge features from an edge image, which is generated from the one-level decomposition sub-bands of an image, by means of the higher order autocorrelation method. Since image in large databases are found in different sizes and gray levels, it is essential to adopt the extracted features to tolerate such differences, an important property lacked in previous work [3][4]. Thus, in this work, the extracted higher order autocorrelation features are normalized; as a result, they become invariant to shift, image size and gray level. The normalized features of an image are combined into a compact feature vector (25 feature values). Then, the feature vectors of all images are clustered by a SOM method. The system supports query-by-example access to the images. The rest of this paper is organized as follows: the related work is surveyed in Sect. 2. In Sect. 3, we present the system architectur. The indexing and clustering technique of our system is discussed in Sect. 4. Then, the querying method and experimental results are discussed in Sect. 5. Finally, we conclude the paper in Sect. 6.
2
Related Work
An example of indexing based on raw image data is the (QBIC) system [1] of IBM that indexes images on multiple features, such as color histograms, texture, shapes, etc. Although such multiple features provides an effective representation of an image, they are computationally expensive during both the index computation phase and the query processing phase. Another example of indexing based on raw image data is the VisualSEEK system [2] which indexes each image in the database by its salient color regions. For indexing based on the transformed-domain coefficients, Wang et al. [5] have proposed a wavelet-based image indexing and searching (WBIIS) algorithm. In the WBIIS project, Daubechies’ wavelet transform are employed to produce color feature vectors that provide better frequency localization than other traditional color layout coding algorithms, as argued by the authors. Another example of indexing based on the transformed-domain is proposed by Jacobs et al. [8] in which an image searching algorithm that makes use of the multiresolution Haar wavelet decompositions of images is presented. In large image databases, it is essential to organize and/or classify feature vectors into different clusters to speed up the search. This organization and/or clustering of images is based on the similarity of feature vectors of images. Here we introduce some examples that utilize such algorithms. Albuz et al. [3] have proposed an algorithm to cluster the feature vectors, which represent images, in a modified k order B-tree data structure, where k is the maximum number of clusters. This approach have utilized the multiresolution property of the wavelet transform to compute the feature vectors. The problem with this approach is that the number of clusters have to be decided by the user before inserting keys into the B-tree. Oja et al. [9] have introduced the PicSOM system to cluster images based on a Tree Structured Self-Organizing Maps (TS-SOMs). The TS-
166
Masaaki Kubo et al.
SOM is a tree-structured vector quantization algorithm that uses SOMs [7] at each of its hierarchal levels. However, since the SOM algorithm is not scalable to new classes, if a new class of images is to be inserted into the database the TSSOM, which is a hierarchy of SOMs, has to undergo a computationally expensive process of retraining at each hierarchy.
3
System Architecture
The basic architecture of our system is shown in Fig. 1. The solid arrows show the sequence of processes of indexing and clustering images and the dashed arrows follow the sequence of processes of querying. As shown in Fig. 1, both the images to be indexed and the query image go through the same sequence of processes. However, in case of indexing and clustering, after the SOM-Based Clustering process the feature vector of an image is added to the corresponding cluster in the database. In case of querying, the images associated with the best matching node (cluster) to the query image are returned to the user. A detailed discussion of these processes is in Sect. 4.
4
Indexing and Clustering
As shown in Fig. 1, the system applies several processes on an image to index and cluster it. In this Sect., we discuss these processes. 4.1
Haar Wavelet Transform
The wavelet transform describes the image in terms of a coarse overall shape, plus some details that range from broad to narrow. The Haar wavelet transform
Images to be indexed and clustered
Haar Wavelet Transform
Edge Image Construction
Feature Vector Generation
Feature Vector Normalization
SOM-Based Clustering
Query image Returned list of images
DB of clustered feature vectors
Fig. 1. Basic architecture of a system: The solid arrows show the path of image indexing and clustering. The dashed arrows show the path of image querying
A Wavelet-Based Image Indexing, Clustering, and Retrieval Technique w
167
w/2 HL2 h/2
h
LL
HL
LH
(a)
HL1 LH2
Original Image HH
(b)
HH2
LH1
HH1
(c)
Fig. 2. Wavelet multiresolution property of an image: (a) represents original image, (b) a one-level decomposition produces 4 sub-bands, namely LL, LH, HL and HH, (c) a four-level decomposition produces 13 sub-bands
is applied iteratively on an image to generate multi-level decomposition (see Fig. 2). At level l decomposition, 3l + 1 sub-bands are produced. In a large image database environment, it is essential to represent images by a method that supports the following requirements on a feature vector: (1) Compact, (2) Fast to compute, and (3) Supports similarity retrieval. Therefore, we are using a Haar wavelet transform to decompose images into several frequency bands and then compute a feature vector from these bands. The above requirements are satisfied as follows: 1. Compact: By making use of the wavelet multiresolution property, we can decompose an image and then use only a few coefficients to represent the image content sufficiently. As shown in Fig. 3, the Haar wavelet transform decomposed the original image (see Fig. 3.a) into four sub-bands: LL, LH, HL and HH (see Fig. 3.b). The Haar wavelet coefficients, Haar basis and coefficient details, are computed by Equations 1 and 2, respectively.
Fig. 3. An Example of wavelet decomposition: (a) original image, (b) one-level decomposition
168
Masaaki Kubo et al.
1 F0 (x(n)) = √ (x(n) + x(n + 1)) 2
(1)
1 F1 (x(n)) = √ (x(n) − x(n + 1)) 2
(2)
Where, x(n) and x(n + 1) are the current and next values of an image, respectively. The LH and HL sub-bands are used to generate an edge image (see Subsection 4.2). Then, we use the higher order autocorrelation function to extract the edge features from the edge image (see Subsection 4.3). Only a few (25 coefficients) of the extracted higher order autocorrelation coefficients are used to produce a feature vector. Thus, the feature vector of an image is compact. 2. Fast to Compute: The Haar wavelet basis is the simplest wavelet basis, in terms of implementation, and the fastest to compute [6][8]. From Equation 1, we notice that the Haar wavelet transform is mathematically equivalent to the averaging of color blocks [5]. Because Haar wavelets are fast to compute, they become a key to several applications such as data compression, data transmission, denoising, and edge detection. However, one drawback of Haar basis for lossy compression is that it tends to produce blocky image artifacts for high compression rates [8]. However, in our application, the result of compression is never viewed; therefore, these artifacts do not affect our indexing and querying processes. 3. Supports Similarity Retrieval: Similarity retrieval is preferred in image databases because users can simply select an image that is similar to the wanted image and then issue a query ’Find images that are similar to this query image’. Or, a user can simply make a rough sketch, such as the dominant edges, of a wanted image and issue a query ’Find images that are similar to this sketch’. To achieve this goal, we use only 25 normalized higher order autocorrelation coefficients to represent the image. These 25 coefficients sufficiently approximate the image and provide some margin for similarity retrieval. 4.2
Edge Image Construction
From a signal processing point of view, the wavelet transform is basically a convolution operation, which is equivalent to passing an image through low-pass and high-pass filters. Let the original image be I(w, h), then the LH sub-band represents the vertical edges and HL sub-band represents the horizontal edges of I(w, h). Using these properties of the LH and HL sub-bands, we construct an edge image. If an element in the LH sub-band is vm,n and an element in the HL sub-band is hm,n , then the corresponding element em,n in the edge image is given by Equation 3. Let w and h be the width and height, respectively, of the LH and HL sub-bands. (3) em,n = (vm,n )2 + (hm,n )2
A Wavelet-Based Image Indexing, Clustering, and Retrieval Technique
169
Where, 1 ≤ m ≤ w and 1 ≤ n ≤ h. Fig. 4 shows the constructed edge image from the LH and HL sub-bands of Fig. 3.b using Equation 3. In our system, we use the LH and HL sub-bands of the one-level decomposition because they have more detailed information of the dominant edges in the original image.
Fig. 4. Constructed edge image from the LH and HL sub-bands of Fig. 3.b
4.3
Feature Vector Generation
The higher order autocorrelation features are the primitive edge features that we use to index and retrieve images. Such features are shift-invariant (irrelevant to where the objects are located in the image), which is a useful property in image querying. As defined in [10] and [11], let an image plane be P and a function I(r) represents an image intensity function on the retinal plane P such that r ∈ P . That is, r is the image coordinate vector. A shift (translation) of I(r) within P is represented by I(r+ai ), where ai is the displacement vector. Therefore, the N thorder autocorrelation functions with N displacements a1 , ..., aN are defined by I(r)I(r + a1 )...I(r + aN ) (4) RN (a1 , ..., aN ) = P
It is obvious from Equation 4 that the number of autocorrelation functions obtained by the possible combinations of the displacements over the image plane is large. Therefore, it is essential to reduce this large number for practical applications. Here, we limit the order N up to 2 (N = 0, 1, 2). Also, the range of displacements is limited to within a local 3 × 3 window, of which the center is the reference local point. The local mask pattern for extracting higher order autocorrelation features is shown in Fig. 5. The 0th-order autocorrelation function corresponds to the average gray level of the image I(r). By eliminating the displacements that are equivalent by shift, the number of unique patterns is reduced to 25 as shown in Fig. 5. Using these mask patterns, the feature vector f v that contains the higher order autocorrelation functions is defined as follows:
170
Masaaki Kubo et al.
Fig. 5. The 25 Local mask patterns for extracting higher order autocorrelation features, where the order N is limited to 2
f v = f1 , ..., f25
(5)
Let the position of the mask pattern in the 3 × 3 window be denoted by x and y coordinates, such that Ix,y denotes the mask pattern of the 0th-order autocorrelation function f1 . Also, let the width and height of the edge image I be w and h. Thus, each fi is defined as: f1 = f2 = .. . f5 = f6 = .. . f25 =
x
y
x
y
x
y
x
y
x
4.4
(Ix,y ) (Ix,y )(Ix,y+1 )
(Ix,y )(Ix−1,y−1 ) (Ix,y )(Ix−1,y )(Ix+1,y )
(Ix,y )(Ix−1,y−1 )(Ix+1,y−1 )
y
Feature Vector Normalization
As mentioned in Sect. 4.3, the extracted higher order autocorrelation features (see Equation 5) are invariant to shift (location of objects in the image). However,
A Wavelet-Based Image Indexing, Clustering, and Retrieval Technique
171
in large collections of images, such as images in the Internet, digital libraries, image archives, etc, images exist in different sizes and different intensities (gray levels). Therefore, it is important to design our features so that the search result should include the wanted image even if the selected image query (during the query by example process) is shifted, different in size, or different in gray level as compared with the wanted image. In addition to being invariant to shift, we consider the following essential requirements on features for practical image search: 1. Features should be invariant to the size of an image. 2. Features should be invariant to the gray level of an image. invariant to image size: For the first requirement, we divide the higher order autocorrelation functions by the width w and height h of the original image. As a result, the Feature values will not be proportional to the size of the original image, hence reducing the effect of the size difference between the query image and wanted image (see Equation 6). invariant to gray level: we notice from Equation 5 that the extracted values of the higher order autocorrelation functions are proportional to the order of autocorrelation. For example, say that the sum of values of gray level of the original image equal to S, then when the order N = 0 the gray level value of f1 = S 1 , and when N = 2 the gray level value of, say f5 ≈ S 3 . Therefore, we normalize the gray level values of the extracted higher order autocorrelation features by raising them to the power 1/N , where N is the order of autocorrelation (see Equation 6). 1 N I(r)I(r + a1 ) · · · I(r + aN ) (6) wh P 1 2 (Ix,y )(Ix,y+1 ). The effect of normalizing the For example, f2 = wh x y feature vectors are shown in Fig. 6 and Fig. 7. The two images in Fig. 6 are different in size and gray scale. Figures 7.a and 7.b show the feature vectors (higher order autocorrelation features) of the two images before and after the normalizing process, respectively. Hence, the normalization process of feature vectors brings similar images, but different in their sizes and gray levels, closer together, which is a useful property in similarity-based retrieval. After being normalized, the feature vectors are inserted into a SOM to be clustered. The next section introduces briefly the SOM-based clustering process. 4.5
SOM-Based Clustering
The self-organizing Map (SOM) [7] is unsupervised neural network that maps high-dimensional input data n (in our case, normalized higher order autocorrelation features of an image) onto a usually two-dimensional output space while preserving the topological relations (similarities) between the data items. The SOM consists of nodes (neurons) arranged in a two-dimensional rectangular or hexagonal grid. In our system, we simply arranged the SOM nodes in a
172
Masaaki Kubo et al.
Fig. 6. An example to show the effect of normalizing feature vectors: the two images are different in size and gray scale
Fig. 7. Effect of normalizing the feature vectors: (a)feature vectors of the two images in Fig. 6 before the normalizing process, (b) after the normalizing process 2-dimensional rectangular grid. With every node i, a weight vector mi ∈ n is associated. An input vector x ∈ n is compared with mi , and the best-match-node (BMN), which has the smallest angle θBMN (see Equation 7), is determined. The input is thus mapped onto the location of the determined BMN. x · mi θBMN = arccos (7) x mi The reason we used angle θ between vectors as a measure of distance rather than a simple Euclidean distance is illustrated in Fig. 8. The distances d(a, c) and d(b, c) between vectors are correctly expressed by the angles θac and θbc rather than the Euclidean distances a − c 2 and b − c 2 between the vectors, respectively. The weight vector mc of the BMN is adopted to match the input vector. That is done by moving mc towards x by a certain fraction of the angle θBMN . Moreover, the weight vectors of nodes in the neighborhood of the BMN are moved towards x, but to a lesser extent than the BMN. This learning process finally leads to a topologically-ordered mapping of the input vectors. That is the cluster structure within the data and the inter-cluster similarity is represented
A Wavelet-Based Image Indexing, Clustering, and Retrieval Technique
173
a |a-c| 2
b Θ bc
|b-c| 2
c
Θ ac
Fig. 8. An example to illustrate the effectiveness of using the angle θ between vectors instead of the 2-norm (Euclidean distance) as a measure of dissimilarity (distance)
clearly in the map. The map is called a topological feature map and the weight vector held by a node is called a codebook vector.
5
Querying and Results
The programs of our system are written in C++. The database (images, codebook vector and clusters of images) is managed by the Jasmine-supported object database management system developed by FUJITSU and Computer Associates. Hence, we used Jasmine’s Weblink to build a user interface that supports queryby-example type of queries and create the HTML retrieval templates to display the results on a web browser. The experiments are performed on 620 images. The system runs on a sparc Ultra-5 10, 270 MHz, 128 MBytes Sun Workstation. 5.1
Querying
Currently, our system supports query-by-example in which a user selects a query image Q that is most similar to the wanted image(s) from a set of displayed images. Then, Q undergoes the same sequence of processes described in Sect. 4. Briefly, Q is decomposed, an edge image is generated, the higher order autocorrelation feature vector is extracted, normalized, and compared with the codebook vectors of all nodes of the SOM. Again, the BMN that is most similar (has the smallest θBMN , see Equation 7) to Q is determined. Finally, the images associated with the BMN are returned to the user for further manual browsing. 5.2
Results
To provide numerical results, we tested 7 sample queries chosen randomly from the image database. The result of each query Q is the set of images that are associated with the BMN, which is the most similar SOM node to Q. By examining the SOM clusters, we found that the number of images associated with any of the SOM nodes is less than 15, which is small enough for manual browsing.
174
Masaaki Kubo et al. 0.9
0.8
0.7
Precision
0.6
0.5
0.4
0.3
0.2
0.1
0 0
1
2
3
4
5
6
7
8
Query Number
Fig. 9. Precision of the 7 sample queries and their average precision (dotted line) Table 1. Average time to determine the BMN and average precision of query results Time to determine BMN Precision of query results
Average value 4.96 seconds 70.7%
From the result of each sample query, we calculated the precision p of the query results. Since all the images clustered under, or associated with, the BMN are returned to a user as a result of Q, the computed p is also a measure of precision of the SOM-based clustering method. To compute the precision of query results, let NT be the total number of returned images (images associated with BMN) and NR be the number of relevant images in NT . Then, the precision pi of the result of query qi is computed as follows: pi =
NR NT
(8)
Figure 9 shows the precision of the 7 sample queries and their average precision (the dotted line). The average precision p¯ is computed as follows: NQ 1 pi p¯ = NQ i=1
(9)
Where, NQ is the total number of sample queries, which is equal to 7 in our test. As shown in Table 1, the average precision of query results is about 70.7% (it is also a measure of precision of the SOM-based clustering method). We, also, measured the average query response time, which is equal to the time it take to determine the BMN of a query. Table 1 shows that the average query response time is equal to 4.96 seconds. Even though it is difficult to compare with other systems due to differences in computing environments, our average
A Wavelet-Based Image Indexing, Clustering, and Retrieval Technique
175
query response time is comparable to many systems such as [1][5][8] based on the recorded search time in the corresponding papers.
6
Conclusion
In this work, we have implemented a wavelet-based indexing and retrieval system that clusters images into a database and provides query-by-example access to the stored images. We showed that the edge feature is important in indexing and retrieving images. Our edge feature vector is compact, fast to compute and supports similarity retrieval. By normalizing the edge features (high order autocorrelation features), they become invariant to shift, image size and gray level, which are essential properties for similarity-based retrieval in large image database environments. Even though the system currently only supports queryby-example querying, it can be easily extended to support querying by sketch of dominant edges, which is a rough representation of the image. Based on the experimental results, the system shows a high search precision, on the average, which is due to the SOM-based clustering of similar images. Although most of the search time is spent in finding the BMN, the overall search time is comparable to many existing systems.
References 1. M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D, Lee, D. Perkovic, D. Steele, P. Yanker. Query by Image and Video Content: The QBIC System. IEEE Computer Magazine, Sept. 1995. 164, 165, 175 2. J. R.Smith, S. F.Chang. VisualSEEK: A Fully Automated Content-Based Image Query System. ACM Multimedia Conference, Boston, pp.87-98, Nov. 1996. 164, 165 3. E.Albuz. E.Kocalar, A. A.Khokhar. Scalable Image Indexing and Retrieval Using Wavelets. ICASSAP 1999. 164, 165 4. M.Kobayakawa, M.Hoshi, T.Ohmori, T.Terui. Interactive Image Retrieval Based on Wavelet Transform and Its Application to Japanese Historical Image Data. IPSJ Trans. on , Vol.40, No.3, pp.899-911, March 1999. (In Japanese) 164, 165 5. J. Z.Wang, G.Wiederhold, O.Firschein, S. X.Wei. Content-based Image Indexing and Searching Using Daubechies’ Wavelets. Springer-Verlag Int’l Journal on Digital Libraries. Vol.1, pp.311-328, 1997. 164, 165, 168, 175 6. A.Natsev, R.Rastogi, K.Shim. WALRUS: A Similarity Retrieval Algorithm for Image Databases. SIGMOD record, vol.28, no.2, pp.395-406, Philadelphia, PA, 1999. 164, 168 7. T.Kohonen. Self-Organizing Maps. Springer-Verlag, 1997. 2nd extended edition. 164, 166, 171 8. C. E.Jacobs, A.Finkelstein, D. H.Salesin. Fast Multiresolution Image Querying. Proc. of ACM SIGGRAPH, New York, 1995. 165, 168, 175 9. E.Oja, J.Laaksonen, M.Koskela, S.Brandt. Self-Organizing Maps for ContentBased Image Database Retrieval. Published by Elsevier Science B. V., in Kohonen Maps, pp.349-362. 1997. 165 10. T.Kurita, N.Otsu, T.Sato. A Face Recognition Method Using Higher Order Local Autocorrelation And Multivariate Analysis. Prod. of 11th Int’l Conf. on Pattern Reconition, pp.213-216, The Hague, 1992. 169
176
Masaaki Kubo et al.
11. M.Kreutz, B.Volpel, H.Janssen. Scale-Invariant Image Recognition Based on Higher Order Autocorrelation Features. Pattern Recognition, Vol.29, No.1, pp.1926, 1996. 169
Wavelet Applications in Segmentation of Handwriting in Archival Documents Chew Lim Tan, Ruini Cao, and Peiyi Shen School of Computing, National University of Singapore Kent Ridge, Singapore 117543 {tancl,caorn,shenpy}@comp.nus.edu.sg
Abstract. The National Archives of Singapore keeps a large number of double-sided handwritten archival documents. Over long periods of storage, ink sipped through the pages of these documents, resulting in interfering images of handwriting coming from the back of the page. This paper addresses this problem of segmenting handwriting from both sides of a document by means of a wavelet approach. We first match both sides of a document page such that the interfering strokes are mapped with the corresponding strokes originating from the reverse side. This allows the identification of the foreground and interfering strokes. A wavelet reconstruction process then iteratively enhances the foreground strokes and smears the interfering strokes so as to strengthen the discriminating capability of an improved Canny edge detector against the interfering strokes. Experimental results confirm the validity of the wavelet approach.
1
Introduction
Document image analysis is an important research area of image processing and pattern recognition [1]. As an essential step, traditionally, text extraction is the segmentation of text from the background. But this paper introduces a rather different problem, that is, how to extract clear text strings from the seriously seeping, dominating, overlapping and interfering images originating from the reverse side. This problem is faced by the National Archives of Singapore in restoring the original appearance of these valuable archival documents. Over long periods of storage of these documents, the seeping of ink has resulted in double images as shown in Fig. 1. Our task now is to segment the foreground handwriting from the interfering handwriting originating from the reverse side. Usually, the foreground writing appears darker than the interfering strokes. However, there are cases where the foreground and interfering writings have similar intensities, or worst still, the interfering strokes are more prominent than the foreground. At the request of the National Archives of Singapore, we first look for available methods in the literature [1][2] to solve this problem. First that came to our attention Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 176-187, 2001. Springer-Verlag Berlin Heidelberg 2001
Wavelet Applications in Segmentation of Handwriting in Archival Documents
177
was Negishi et al.[3]’s automatic thresholding algorithms which dealt with comparison of old manuscripts with printed matters on the internet. These algorithms, based on Otsu’s [4] method, extract the character bodies from the noisy background. Next, we found Liu and Srihari [5]’s thresholding algorithm which extracts characters from the run-length featured texture background based on the structure-stroke units of text and the distinguishable gray-level ranges between the characters and the background. Similar works were seen in Liang and Ahmadi’s algorithm [6] which adopts a morphological approach to extract text strings from regular periodic overlapping text/background images. White and Rohrer’s [7] method may be more traditional. It is basically an image thresholding technique based on the boundary characteristics to suppress unwanted background patterns. Very similar work can be seen in Don’s work which segments the double-sided images based on the isolated gray-scale range of interfering images and the noise characteristics[8]. Methods surveyed thus far basically assume separable gray scale and/or distinctive features between the foreground and background. Our present problem however violates these assumptions. Valuable work was further found in Lu et al’s contribution[9]. Their method not only enhances the contrast of the edges in the low contrast area but also changes the intensity of the gray level of the edges. Lu also presents another wavelet method by decreasing the edge contrast and smearing the direct components of the edges with its neighboring pixels [10]. This appears to present an exciting avenue for comparing corresponding edges from both sides of a document page. However, though his edge-based wavelet image preprocessing method can handle the change of the feature coefficients (local maxima)[11][12][13], it is found inadequate in meeting the following challenges in our present problem: (1) Due to the anisotropic absorption of the paper materials, the edges could be very different in shape and position between the interfering strokes appearing on the front and their corresponding originating strokes written on reverse side. (2) As a result, any mismatch between the interfering strokes observed on the front and their original strokes on the reverse side will result in a mistaken identity of interfering strokes as foreground edges. In view of the above, we have proposed an improved Canny detection method to suppress unwanted interfering strokes [14]. The orientation information from the canny detector is also used to favor foreground strokes that are predominantly slanting at an angle [15]. This paper further reports a wavelet approach to enhancing (i.e. sharpening) the foreground strokes and weakening (i.e. smearing) the interfering strokes in order to provide an even greater discriminating power of the improved Canny detector between the writings from both sides. Section 2 describes how both sides of a document are matched to identify foreground and interfering strokes as candidates for enhancement and smearing, respectively. Section 3 then provides the details of an iterative wavelet reconstruction process to progressively enhance and smear the respective components. Section 4 discusses how the resultant enhanced and smeared images provide the robustness of the Canny edge detector. Experimental results with images from the National Archives of Singapore are given in section 5 followed by the conclusion and future works in section 6.
178
Chew Lim Tan et al.
(a)
(b)
(c)
(d)
Fig. 1. Sample images: (a) front side of sample 1; (b) reverse side of sample 1; (c) front side of sample 2; (d) reverse side of sample 2
2
Image Matching and Overlay
It is observed that the interfering strokes are not as sharp as the normal strokes. Also, it is natural that weak foreground strokes may not necessarily sip into the reverse side (see Fig.1 (d)). On the other hand, interfering strokes must have been originated from strong foreground strokes on the reverse side. Thus, we match both images from either side of a page by hand. To facilitate the ensuing wavelet operations, a sub-image of M×N is taken from the whole image. The sub-images are reassembled in the final result. Let F(m,n) denote the k bits per pixel gray-scale front images, and B(m,n) the reverse side image of the same page, where m and n represent the line and the column respectively. An overlay operation is carried out as follows:
Wavelet Applications in Segmentation of Handwriting in Archival Documents
179
(a) Invert the reverse side image:
invert ( B(m, n)) = 2 k − 1 − B (m, n)
(1)
(b) Flip the inverted image and superimpose it on the front image such that corresponding strokes on either side are matched:
A(m, n) = flip (invert ( B (m, n))) + F (m, n)
(2)
where flip() means flipping the image horizontally resulting in its mirror image:
flip ( B (m, n)) = B (m, N − n)
(3)
(c) Scale the resultant image:
C (m, n) =
A(m, n) − min( A) * (2 k − 1) max( A) − min( A)
(4)
Fig. 2 shows the results of overlay processing. Comparing Fig.1 and Fig.2, it can be seen that most of the interfering strokes have been weakened by the overlay process while the majority of the foreground strokes remain intact. These foreground strokes, though somewhat impaired, now serve as seeds to start the following enhancement and smearing processes. The idea now is to detect the foreground strokes on the front and enhance them using wavelets. The detected and binarized strokes [14][15] from the foreground overlay image form what we call the “enhancement feature image”. At the same time, we detect the foreground strokes on the reverse side to locate their corresponding interfering strokes on the front so as to smear these interfering strokes by means of wavelets. The detected and binarized strokes [14][15] from the reverse side overlay image result in what we call the “smearing feature image”. Iterative enhancement and smearing processes are then carried out on the original front image using the enhancement and smearing feature images to identify candidate strokes.
3
Iterative Wavelet Reconstruction
Let f(x,y) be the original image, E(x,y) be the enhancement feature image and S(x,y) be the smearing feature image. The three sub-images have the same dimension M×N. The enhancement and smearing features may be described as follows:
0, background 0, background ; S ( x, y ) = (5) E ( x, y ) = detected stroke 255 , 255, detected stroke The wavelet decomposition of f(x,y) is written as follows[16], where j is the scale number of the wavelet decomposition:
180
Chew Lim Tan et al.
C D D D
j 1 j 2 j 3 j
f ( m , n ) = ( < f ( x , y ), Φ f ( m , n ) = ( < f ( x , y ), Ψ f ( m , n ) = ( < f ( x , y ), Ψ f ( m , n ) = ( < f ( x , y ), Ψ
j ,m 1 j ,m 2 j ,m 3 j ,m
,n
( x , y ) > ) ( m , n )∈ Z 2
,n
( x , y ) > ) ( m , n )∈ Z 2
,n
( x , y ) > ) ( m , n )∈ Z 2
,n
( x , y ) > ) ( m , n )∈ Z 2
(a)
(b)
(c)
(d)
(6)
Fig. 2. Overlay results: (a) front side of sample 1; (b) reverse side of sample 1; (c) front side of sample 2; (d) reverse side of sample 2
A 10-scale wavelet decomposition of original image f(x,y) may be described as follows:
Wf ( x, y) = {C9 f ( x, y), D01 f ( x, y), D02 f ( x, y), D03 f ( x, y),..., D91 f ( x, y), D92 f ( x, y), D93 f ( x, y)}
(7)
Wavelet Applications in Segmentation of Handwriting in Archival Documents
181
With the image wavelet representation Wf(x,y), the enhancement feature E(x,y) and smearing feature S(x,y), the iterative wavelet reconstruction may be described as follows: The multi-scale decomposition of the foreground. Unlike the traditional wavelet reconstruction, the resultant image in each scale and iteration retains the same size as the original foreground sub-image, ie. M×N.
Wf ( x, y ) = {C 9 ( x, y ), D kj ( x, y ), j = 0,...,9, k = 1,2,3}
(8)
The magnitude of the wavelet coefficients in all the scales is revised by the enhancement coefficient
e kj and smearing coefficient s kj according to the algorithm in
equation (12), where
e kj and s kj {e kj > 1, 1 > s kj > 0, j = 0,...,9, k = 1,2,3,} are set
empirically. The enhanced/smeared image f’(x,y) is reconstructed from the modified coefficients. k
k
k
k
k
k
Do { if E(x,y) = = 255 D j ( x, y ) = e j D j ( x, y ) ; if S(x,y) = = 255 D j ( x, y ) = s j D j ( x, y ) ; } while (j=0, …, 9; k=1, 2, 3; x=0, …, N; y=0, …, N)
(9)
f ' ( x, y ) = inverse wavelet transform(
{C 9 ( x, y ), D kj ( x, y ), j = 0,...,9, k = 1,2,3} )
(10)
To the reconstructed image f’(x,y), do the wavelet transform again. Note that in k
obtaining the inverse of the wavelet transform, the revised D j obtained in equation (9) is used again.
Wf ′( x, y ) = {C 9′ ( x, y ), D ′j k ( x, y ), j = 0,...,9, k = 1,2,3}
(11)
f ' ( x, y ) = inverse wavelet transform( {C 9′ ( x, y ), D kj ( x, y ), j = 0,...,9, k = 1,2,3} )
(12)
Iteratively process the wavelet decomposition and reconstruction using equation (11) and (12), we could get the final enhanced/smeared gray-scale image. Clip the final enhanced/smeared image using the following function:
0 , f ' ( x , y ) <= 0 f ' ( x , y ) = 255 , f ' ( x , y ) >= 255 f ' ( x , y ), otherwise
(13)
In our implementation, we used wavelet transform up to 10 scales for images of size 512x512. In the reconstruction process, we set 15 to be the maximum number of
182
Chew Lim Tan et al.
iterations. Fig. 3(a) and (d) show the results after 15 iterations. After the final enhancement and smearing, the resultant image containing both the enhanced and smeared features is then processed by the improved Canny edge detection [14][15]. As the Canny edge detector favors sharper edges over smoother ones, the above enhancement and smearing processes strengthen this discriminating capability of the Canny detector. The detected edges are used as loci to recover from the original image within a 7×7 window centered at each detected edge point. The recovered gray level images are shown in Fig. 3(b) and (e), respectively. The Niblack’s threshold [17] is then adapted to binarize the image to give a clear readable copy for the reader as shown in Fig. 3(c) and (f).
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 3. Enhancement/smearing and segmentation results: (a) enhanced/smeared front side of sample 1; (b) segmentation results of (a); (c) binarization results of (b); (d) enhanced/smeared front side of sample 2; (e) segmentation results of (d); (f) binarization results of (e)
4
Robust Threshold Decision in Canny Edge Detector
The edge strength images of the original front images and their enhanced/smeared images are shown in Fig. 4(a), (b) and (c), (d), respectively where the magnitude of the gradient is converted into the gray level value. We can see that the darker the edge is, the larger is the gradient magnitude. It is obvious from Fig. 4(a) and (b) that without the enhancement/smearing processes, the edge strength of strong interfering strokes are similar to that of the foreground strokes. Thus it is difficult to set a univer-
Wavelet Applications in Segmentation of Handwriting in Archival Documents
183
sally valid pair of dual thresholds for the Canny edge detector in conventional methods. This is especially so in view of the great variety of the relative strengths between the foreground and interfering strokes among these archival documents. In fact, it is sometimes even impossible to set one single set of thresholds for the same page due to the variation of strokes intensity across of the page. On the other hand, from Fig. 4(c) and (d), it is seen that the enhancement/smearing processes have significantly highlighted the foreground strokes against the interfering strokes.
(a)
(b)
(c)
(d)
Fig. 4. Magnitude of gradient of detected edges: (a) front side of sample 1 before enhancement/smearing; (b) front side of sample 2 before enhancement/smearing; (c) front side of sample 1 after enhancement/smearing; (b) front side of sample 2 after enhancement/smearing
The enhancement and smearing processes work with each other to the advantage of our desirable results. Generally, a lower value for the Canny detector’s upper threshold is adopted to detect as many features as possible. The enhancement feature image may have erroneously picked up interfering strokes as enhancement features, resulting in a noisy enhancement feature image. However, with the same lower threshold value, more smearing features will also be included in the smearing feature image. Some of the smearing features will be in the overlapped areas (partially of fully) with the falsely identified strokes in the enhancement feature image. As a result, these false
184
Chew Lim Tan et al.
alarms will be eventually suppressed by the subsequent smearing process. The novelty of this property is that as long as a smearing feature covers any part of a mistaken enhancement feature, this false positive will eventually be “smeared” away. The unwanted strokes will finally be sifted out by the cancellation effect of the smearing process. This collaborative nature between enhancement and smearing makes our method robust to the threshold setting. In other words, unlike the conventional edge detection, the final detection of the foreground strokes from the enhanced/smeared image using our approach is not so sensitive to the threshold value.
5
Experiment Results
Over 200 scanned images of historical handwritten documents from the National Archives of Singapore were tested in our experiment. These images were scanned at 150 dpi and saved as TIF format without compression. Most of the images are moderately noisy and were satisfactorily cleaned up. To assess the performance of our method especially for difficult cases, 12 severely interfering images were selected for evaluation. The selected images were visually inspected to assess the readability of the extracted words. Fig.5 shows all the 12 sample images in cut off strips and the final binary segmentation while Fig. 6 gives a full view of one of the images and its final result. The well-known Information Retrieval measures, precision and recall (defined below), are used to measure the performance of the proposed method [18]. Precision =
Recall =
No. of Correctly Detected Words No. of all Words detected by the System
(14)
No. of Correctly Detected Words Total No. of Words present in the Document
(15)
Precision reflects the performance of removing the interfering strokes and recall reflects the performance of restoring the foreground words. The results in Table 1 show average precision and recall rates of 84% and 96% respectively. Table 1. Evaluation of the proposed method
Image no. Total words Precision Recall
1 132
2 124
3 103
4 125
5 125
6 123
7 121
91% 98%
86% 100%
76% 90%
94% 94%
92% 98%
80% 94%
79% 89%
Image no. Total words Precision Recall
8 128
9 112
10 113
11 114
12 114
Average
75% 97%
84% 97%
82% 98%
91% 96%
78% 96%
84% 96%
Wavelet Applications in Segmentation of Handwriting in Archival Documents
185
Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 Image 8 Image 9 Image 10 Image 11 Image 12 Fig. 5. Sample images in Table 1 and their final binarization results
6
Conclusion
The problem of interfering images of handwriting from the reverse side of a archival document due to the sipping of ink has been solved by our wavelet-based segmentation method. The enhancement/smearing algorithm presented in the paper performs well even for cases containing weak foreground strokes among strong interference.
186
Chew Lim Tan et al.
The whole system including the improved Canny edge detector [14][15] is able to segment the foreground writing from the interfering strokes effectively. One problem encountered presently is in getting a perfect manual overly between the front and reverse side images due to differences between both images caused by factors like document skews, different scales during image capture, and warped surfaces at books’ spine areas. A future work for us is to develop a computer-aided overlay process to take over the present manual image matching.
Acknowledgement This project is jointly supported by the National Science and Technology Board and the Ministry of Education, Singapore, under the joint research grant R-252-000-071112/303. The provision of archival documents by the National Archives of Singapore is gratefully acknowledged.
Fig. 6. One whole original page image and its final binarization results
References 1. 2. 3.
Nagy, G.: Twenty Years of Document Image Analysis in PAMI. IEEE Trans. PAMI, Vol. 22, No. 1, Jan. 2000, 38-62 Casey, R.G., Lecolinet, E.: A Survey of Methods and Strategies in Character Segmentation. IEEE Trans. PAMI, Vol.20, No. 7, July 1996, 690-706 Negishi, H., Kato, J., Hase, H., Watanabe T.: Character Extraction from Noisy Background for an Automatic Reference System. In: Proc. 5th Int. Conf. Document Analysis and Recognition, Bangalore, India, Sept. 1999, 143-146
Wavelet Applications in Segmentation of Handwriting in Archival Documents
4. 5. 6. 7. 8. 9. 10.
11. 12.
13. 14. 15. 16. 17. 18.
187
Otsu, N.: A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. System, Man, and Cybernetics, Vol. 9, No. 1, 1979, 62-66 Liu, Y., Srihari, S.N.: Document Image Binarization Based on Texture Features. IEEE Trans. PAMI, Vol. 19, No. 5, May 1997, 540-544 Liang, S., Ahmadi, M.: A Morphological Approach to Text String Extraction from Regular Periodic Overlapping Text/Background Images. Graphical Models and Image Processing, CVGIP, Vol. 56, No. 5, Sept. 1994, 402-413 White, J.M., Rohrer, G.D.: Image Thresholding for Optical Character Recognition and Other Applications Requiring Character Image Extraction. IBM J. Res. Dev. 27(4), 1983, 400-410 Don, H-S.: A Noise Attribute Thresholding Method for Document Image Binarization. In: Proc. 3rd Int. Conf. Document Analysis and Recognition, 1995, 231234 Lu, J., Healy, D.M., Weaver, J.B.: Contrast Enhancement of Medical Images Using Multi-scale Edge Representation. Optical Engineering, 33(7), 1994, 2151216110. Lu, J.: Image De-blocking via Multi-scale Edge Processing. In: Unser, M.A., Aldroubi, A., Laine, A.F. (eds.): Proc. of SPIE, Wavelet Applications in Signal and Image Processing IV, Vol. 2825, Part two, Denver, Colorado, Aug. 1996, 742-75. Mallat, S., Zhong, S.: Characterization of Signals from Multi-scale Edges. IEEE Trans. PAMI, Vol. 14, No.7, July 1992, 710-732 Hwang, W.L., Chang, F.: Character Extraction from Documents Using Wavelet Maxima. In: Unser, M.A., Aldroubi, A., Laine, A.F. (eds.): Proc. of SPIE, Wavelet Applications in Signal and Image Processing IV, Vol. 2825, Part two, Denver, Colorado, Aug. 1996, 1003-1015 Etemad, K., Doerman, D., Chellappa, R.: Multi-scale Segmentation of Unstructured Document Pages Using Soft Decision Integration. IEEE Trans. PAMI, Vol. 19, No. 1, Jan. 1997, 92-96 Cao, R., Tan, C.L., Wang, Q., Shen, P.: Segmentation and Analysis of DoubleSided Handwritten Archival Documents. In: Proc. 4th IAPR Int. Workshop on Document Analysis Systems, Rio de Janeiro, Brazil, Dec. 2000, 147-158 Tan, C.L., Cao, R., Shen, P., Chee, J., Chang, J.: Removal of Interfering Strokes in Double-Sided Document Images. In: Proc. 5th IEEE Workshop on Applications of Computer Vision, Palm Springs, California, Dec. 2000, 16-21 Feng, L., Tang, Y.Y., Yang, L.H.: A Wavelet Approach to Extracting Contours of Document Images. In: Proc. 5th Int. Conf. Document Analysis and Recognition, Bangalore, India, Sept. 1999, 71-74 Niblack, W.: An Introduction to Digital Image Processing. Englewood Cliffs, N.J., Prentice Hall (1986) 115-116 Junker, M., Hoch R., Dengel, A.: On the Evaluation of Document Analysis Components by Recall, Precision, and Accuracy. In: Proc. 5th Int. Conf. Document Analysis and Recognition, Bangalore, India, Sept. 1999, 713-716
Wavelet Packets for Lighting-Effects Determination Abbas Z. Kouzani and S. H. Ong School of Engineering and Technology, Deakin University Geelong, Victoria 3217, Australia
Abstract. This paper presents a system to determine lighting effects within face images. The theories of multivariate discriminant analysis and wavelet packets transform are utilised to develop the proposed system. An extensive set of face images of different poses, illuminated from different angles, are used to train the system. The performance of the proposed system is evaluated by conducting experiments on different test sets, and by comparing its results against those of some existing counterparts.
1
Introduction
The appearance of a person is highly dependent on the lighting conditions. Often slight changes in lighting produce large changes in the person’s appearance. Since the face images in the known face database are taken under front-lit lighting, recognition of a face image taken under a different lighting condition becomes difficult. Determining the lighting effects within a face image is therefore the first crucial step of building a lighting invariant face recognition system. While there has been a great deal of literature in computer vision detailing methods for face recognition, few efforts have been devoted to image variations produced by changes in lighting. In general, recognition algorithms have either ignored lighting variation, or dealt with it by measuring some properties or features of the image which are at least insensitive to the variability. Yet, features do not contain sufficient information necessary for recognition. Furthermore, faces often produces inconsistent features under different lighting conditions. In this paper, a hybrid method is proposed based on theories of multivariate discriminant analysis and wavelet packets transform to classify face images based on the lighting effects present in the image. An extensive set of face images of different poses, illuminated from different angles, are used in the training of the system. The paper is organised as follows. In Section 2, the existing work is reviewed. Section 3 presents the lighting-effects determination system. In Section 4, the experimental results are presented and discussed. Finally, the concluding remarks are given in Section 5.
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 188–199, 2001. c Springer-Verlag Berlin Heidelberg 2001
Wavelet Packets for Lighting-Effects Determination
2
189
Review of Existing Methods
To handle image variations that are due to lighting, three main methods have been used in the literature. These methods, used by object recognition systems as well as by systems that are specific to faces, are explained below. 2.1
Shape from Shading
The shape-from-shading method [1] utilises the gray-level information to determine the 3D shape of the object. Most algorithms which attempt to determine shape-from-shading, are designed for images of arbitrary objects with smooth brightness variations [1]. These algorithms estimate shapes from the limited information contained within an image. However, since the knowledge about the surface of human heads is not used by these algorithms, the estimation of head shapes from the limited information of a 2D image restricts the performance of these algorithms in practical applications, and therefore their use is unsuitable for face recognition. 2.2
Image Representation Models
Ideally, an image representation should be invariant to lighting changes. It has been theoretically shown that a representation which is invariant to lighting does not exist for unconstrained 3D objects [2]. However, for certain classes of objects this limitation does not necessary apply. Four image representations are explained below. 1. Edge Maps: Intensity edges coincide with gray-level transitions. Gray-level transitions can be due to discontinuities in the surface colour or orientation. Such edges are expected to be insensitive to lighting changes. The advantage of using an edge representation is that it is a relatively compact representation. Such an edge representation is used by several face recognition systems [3]. 2. Gabor-Like Filters: Physiological evidence indicates that at the early stages of the human visual system the images are processed by local, multiple, and parallel channels that are sensitive to both spatial frequency and orientation. Several face recognition systems filter the gray-level image by a set of 2D Gabor-like functions before attempting to recognise the faces in the image [3,4]. Convolving the image with 2D Gabor-like filters is often similar to enhancing edge contours, as well as valleys and ridge contours from the image. 3. Derivatives of Gray-Level: Derivatives of the gray-level distribution were used by several face recognition systems [3] to reduce the effects of changes in lighting conditions on face images. The derivatives used include directional and non-directional first- and second-order derivatives. It can been shown analytically that, under certain conditions, changes in ambient light will affect the gray-level image but not its derivatives. However, this is not the
190
Abbas Z. Kouzani and S. H. Ong
case in the natural lighting conditions where the direction of the light source is also changed. 4. Logarithmic Transformation: Logarithmic transformation is a non-linear transformation of the image intensities used in computer vision [5]. There is physiological evidence that logarithmic transformation approximates the response of cells in the retina of the human eye. Adini et al. [6] reported that for most image representations considered, the percentage of miss-recognition was above 50 percent. Therefore, the above listed image-representation methods can be used in a lightin-effects determination system. 2.3
Example-Based Models
An example-based method handles image variations that are due to lighting differences by using, as a model, an explicit 3D model or, alternatively, a number of corresponding 2D face images taken under different lighting conditions. A number of 2D images can either be used as independent models or combined into a model-based recognition system such as those described in [7]. In the following, three examples of this method are given. 1. Independent Image Comparison: The face model here consists of a large set of images of the same face containing all possible variations. The recognition process involves the comparison of the distances between an input image and all the images comprising the model. A problem with this approach is that the number of images that the model must contain may be very large. Furthermore, this approach has limited generalisation capacity beyond the parameter values that are sampled and stored. 2. Learning the Lighting Direction: Learning the input/output mapping from examples is a powerful problem-solving mechanism, once a large number of examples is available. Brunelli [8] used one crude 3D head model to generate computer-generated masks for modulating the intensity of 2D front-view face images in order to produce images illuminated from different angles. The produced images are used for training an HyperBF network in which the lighting direction of the light source is associated with a vector of measurements derived from a front-view face image. The images for which the lighting direction must be computed are very constrained - they are front-view faces with a fixed inter-ocular distance [9]. In addition, the calculation and compensation of the lighting direction are done based on a simple lighting model of the light source that does not represent a variety of complicated lighting conditions which exist in practical situations. 3. Fisherfaces: This approach which is reported to perform better than the others, was proposed by Belhumeur et al. [10]. The idea is to produce classes in a low dimensional face image subspace obtained from linearly projecting a high-dimensional image space to the subspace. The multivariate discriminant analysis [11] is used to select most discriminating features. In the most
Wavelet Packets for Lighting-Effects Determination
191
discriminating feature space, the factors that are not related to classification are discarded or weighted down, and factors that are crucial to classification are emphasised. Belhumeur et al. have conducted experiments on fisherfaces and three standard face recognition methods including the eigenfaces, and have reported lower error rates for the fisherfaces method. A drawback of this approach is that the transformation coefficients of different classes are very close to each other, compared to the other methods. That will cause false recognition [12]. 2.4
Discussions
Among the methods described above, the image representation models improve the accuracy of the recognition, but fail to offer a robust invariance to lighting changes [6]. The example-based methods such as Brunelli’s method [8] are promising and can produce better results than those of the image representation models. However, the performances of the existing example-based methods are still not satisfactory and there is plenty of room for improvement.
3
Proposed System
In the proposed system, the theories of the multivariate discriminant analysis and the wavelet packets transform are combined to form a learning system for determining the lighting-effects in the input face image. This combination is explained in the following. 3.1
Multivariate Discriminant Analysis
Multivariate discriminant analysis performs dimensionality reduction using linear projection [11]. Each image is considered as a sample point in this highdimensional space. A problem with this method is that if the within-class scatter matrix [11] is singular in the computation of lighting direction for face images, This stems from the facts that the number of images in the training set is much smaller than the number of pixels in each image. In order to overcome the complication of the singular within-class scatter matrix, the Principal Component Analysis (PCA) [13] is employed. The PCA builds a low-dimensional face space from a high-dimensional image space using example face images. The face space built by the PCA is an approximation of the real face space. But in order to have a reasonable approximation of the real face space, a large number of face images should be presented to the PCA method. If a large number of face images is not available, the PCA builds a face space that poorly approximate the real face space. We propose the utilisation of the wavelet packets projection for reducing the dimensionality of face images. In order to overcome the complication of the singular within-class scatter matrix, the training image set is first projected to a lower-dimensional space using the wavelet packets transform. This projection
192
Abbas Z. Kouzani and S. H. Ong
reduces the size of the matrix; therefore, the matrix becomes square which will make it non-singular and invertible. Then, the discriminant analysis projection is performed in the space of the wavelet packets projection. 3.2
Wavelet Packets
The main difference between the wavelet packets transform and the wavelet transform is that, in the wavelet packets, the basic two-channel filter bank can be iterated either over the low-pass branch or the high-pass branch. This provides an arbitrary tree structure with each tree corresponding to a wavelet packets basis. The decision to split or merge is aimed at achieving minimum distortion. Best Basis Method: The wavelet packets transform offers a choice of optimal bases for the representation of a specific signal [14]. Therefore, it is possible to seek the best basis by a criterion. The chosen basis should carry substantial information about the signal. Since compression is the goal, the basis which minimises the number of significantly non-zero coefficients in the resulting transform is chosen. Entropy is a suitable cost function for compression. 3.3
Selection of Best Basis for Face-Image Class
The wavelet packets transform and the best basis selection algorithm find the optimal basis for the representation of a specific signal such as face images. To select the basis for face images, 200 gray-scale front-view 64 × 64 face images are used as the training set. The training set is divided into four groups; each group consists of 50 face images. The following experiment is separately performed on each group of the face images. For each face image the stat-quadtree of entropy values is first created. For each group, 50 stat-quadtrees are obtained. Next, the entropy values of 50 stat-quadtrees are averaged. This generates four stat-quadtrees, one for each of the groups of the training set. Then, on each statquadtree, the best basis selection algorithm is performed to pick out the best basis from all the possible bases. The algorithm minimises the entropy values in the stat-quadtree. After obtaining the four best bases for the three groups of the training set, it is found that the four bases are the same. The maximum depth of splitting is chosen as 6 (explained in the following). 3.4
Selection of Best Filter and Best Decomposition Level
In the wavelet transform, the choice of filters is crucial not only for obtaining satisfactory reconstruction of the original signal, but also for determining the shape of the wavelet used for performing the analysis. To achieve the best compression of human face images, the best filter and the best decomposition level in the wavelet packets transform must be chosen. The best filter can be chosen by examining different filters and selecting the one with the highest information packing capability. An experiment is carried out to
Wavelet Packets for Lighting-Effects Determination
193
select the best filter and the best decomposition level for the face-image class. Four groups of the training set of the face images are used in each experiment. Each group contains 50 gray-scale front-view 64 × 64 face images. Six types of orthonormal quadrature mirror filters (Haar, Beylkin, Coiflet, Daubechies, Symmlet, and Vaidyanathan) are examined. Each one of the six types of filters is used with a specific filter parameter. For instance, the Symmlet filter is used with various number of vanishing moments varying from 4 to 10. In addition, together with each particular type of filter and parameter, different levels of decomposition are used. The applicable range of the decomposition level in this experiment is 2-6. A total of 96 filter variants are constructed using all combinations of filter type, parameter, and decomposition level.
Fig. 1. Best filter and best decomposition level selection results for orthonormal quadrature mirror filters
Each of the above 96 filter variants is applied to each of the four sub-training sets, and the best basis is searched and selected. Compression is then carried out on all the training face images. After compression, the reconstruction is performed on the compressed images. In the reconstruction stage, each face image is reconstructed from 1%, 7.5%, and 15% of the most important information of the transformed coefficients (the coefficients with the highest absolute values). The rest of the coefficients are set to zero before reconstruction. Therefore three images are reconstructed from each compressed image. The errors between the original image and the three reconstructed images are calculated and summed. This is done for all the 50 training face images. The average error is obtained and stored. Figure 1 displays the best filter and the best decomposition level
194
Abbas Z. Kouzani and S. H. Ong
selection results for the orthonormal quadrature mirror filters. Each entry on the horizontal axis represents the measured error for a particular filter and a certain decomposition level. For instance, entry 79 denotes the measured error for the Coiflet filter with parameter 5 and the decomposition level 6. The results show that the Symmlet filter with 5 vanishing moments and the decomposition level 6 is the best choice for the face image database. The best basis, the best filter, and the best decomposition level selected for the face-images class are employed for reducing the dimensionality of face images. Although the best basis was obtained by using a training set with a limited number of face images, it is experimentally found that adding more face images to the training set does not significantly affect either the structure of the basis nor the compression ratio. A better reconstruction of a face image that is not in the training set is possible using the wavelet packets transform than using the PCA. 3.5
Lighting-Effects Determination System (LEDS)
The LEDS takes an input face image and classifies it into one of possible lightingeffects classes under examination. The LEDS learns to compute the lightingeffects using the multivariate discriminant analysis and the wavelet packets transform. Although the multivariate discriminant analysis has been used by Swets et al. [15] as the most discriminating feature, and later by Belhumeur et al. [10] as the fisherfaces, both the most discriminating feature and the fisherfaces were developed for the purpose of one-step recognition. In the LEDS, however, the utilisation of a combination of the multivariate discriminant analysis and the wavelet packets transform is proposed as an example-based scheme for determining the lighting effects, not for recognising faces. Algorithm 1 (Lighting-Effects Determination) The lighting-effects determination process is performed in two stages as described in the following. Training: This stage involves the following operations which are performed only once. 1. A training set of face images of different subjects is acquired. For each possible lighting effect, one image is taken from each subject. 2. The face images of the training set are grouped into different lighting-effects classes based on the lighting-effects that they contain. 3. An image is manually selected from each class and is named the reference image of the class. Although this selection is an arbitrary choice, the employed principle is that the face should be located in the centre of the image. 4. All images of each class are aligned based on the associated reference image using the pixel-based correspondence representation [12]. 5. The multivariate discriminant analysis and wavelet packets transform is applied to the training set to obtain a dimensionally reduced lighting space. 6. The set of weights obtained from projecting each face image of the training set onto the lighting space is stored.
Wavelet Packets for Lighting-Effects Determination
195
Determination: This stage involves the following operations to classify the input face image into one of the lighting-effects classes. 1. The input face image is projected onto the lighting space and a set of weights is calculated. 2. The weight pattern is classified into one of the lighting effects classes using the stored weight patterns of the face images in the training set. 3.6
Training Set
A collection of 63 3D head models are used to generate the training database using computer graphics techniques. The head models have been generated from stereo images obtained using the C3D system of the Turing Institute. The training database contains 63 sets of 1331 2D full-face images of various poses within ±45◦ rotations about X, Y , and Z directions and with the resolution of 9◦ (see Figure 2). To quantify the effects of varying lighting, 66 different lighting conditions are considered. For each of the 1331 poses, 66 full-face images are rendered under different lighting conditions. In each image, specific direction and distance of a single light source are implemented. The longitudinal and latitudinal of the light source direction are within 15◦ − 75◦ of the camera axis. First, the face images are grouped based on the pose of each face. Each group representing a specific pose contains 4158 face images of 63 people. Then, the face images within each
Fig. 2. Face images rendered from a 3D head model of the Turing database
196
Abbas Z. Kouzani and S. H. Ong
group are divided into 66 different classes. This classification is done based on the lighting direction of each face image. Therefore, each group that represents a specific pose will contain 66 classes of different lighting effects. It should be stated that both the most discriminating feature and the fisherfaces put the face images of one person taken under different lighting conditions into the same class for the purpose of recognition. However, since this aim of this work is to determine the lighting effects, only the face images with similar lighting effects are put into the same class in the face space proposed here.
4
Experimental Results
To evaluate the performance of the LEDS, the results of experiments performed on three different test sets, are presented and discussed below. The test sets used in these experiments are as follows. – Test Set 1 contains 411 face images of the Harvard face database. In each image in this database, a subject holds his head steady while being illuminated by a dominant light source. The space of the light source directions is then sampled in 15◦ increments. Figure 3 illustrates sample face images from Test Set 1. – Test Set 2 consists of 495 images constructed from the Yale face database. The 165 face images of the Yale face database are first copied into the test set. Then, 330 extra images are produced by rotating each image randomly within the range of 10◦ − 90◦ twice in the 2D plane. These images are added to the test set. Figure 4 illustrates sample face images from Test Set 2. – Test Set 3 is constructed by the author and contains 2710 face images. Face images of ten people were used to build this test set. A set of 271 lighting masks are superimposed on each face image to generate 271 images under different lighting conditions. In each mask, specific direction and distance of a single light source are implemented. Figure 5 illustrates sample face images from Test Set 3. Face images of Test Sets 1-3 are aligned using the pixel-based correspondence method [12], and are presented to two different systems. The first system uses the multivariate discriminant analysis and the PCA for classification of the lighting effects in the test face images. The PCA method has been trained on 200 frontview face images. The second system, that is the LEDS, uses the multivariate discriminant analysis and the wavelet packets transform for classification of the lighting effects in the test face images. The wavelet packets transform has also been trained on the 200 front-view face images. Table 1 summarises the results obtained from this experiment. As can be seen from the table, the LEDS achieves a higher correct classification of the lighting effects for all three test sets than that of the method which uses the multivariate discriminant analysis and the PCA. It can be seen that the LEDS achieves a classification rate of 86.7% for Test Set 3 which is not as high as the rate obtained for Test Sets 1-2. The reason for this performance
Wavelet Packets for Lighting-Effects Determination
Fig. 3. Test Set 1 sample face images from the Harvard face database
Fig. 4. Test Set 2 sample face images from Yale face database
Fig. 5. Test Set 3 sample face images
197
198
Abbas Z. Kouzani and S. H. Ong
is that the LEDS is trained on the face images containing real lighting effects, whereas the test images of Test Set 3 contains synthesised lighting effects in which the lighting masks are simply superimposed on face images taken under front-lit lighting. These lighting masks are simple approximations of the real lighting effects. Therefore, the images produced would only be an imitation of a corresponding real illuminated face image. However, training the LEDS on the example images containing synthesised lighting effects can improve the correct classification rate when the system is tested on this kind of face images.
Table 1. Classification of lighting effects for Test Sets 1-3 Method
Test Set Correct Classification Classification Rate 1 411 100% Ideal System 2 495 100% 3 2710 100% Multivariate Discriminant 1 377 91.7% Analysis + PCA 2 411 83.0% 3 2043 75.4% 1 396 96.3% Proposed LEDS 2 458 92.5% 3 2349 86.7%
5
Concluding Remarks
A method has been proposed based on theories of multivariate discriminant analysis and wavelet packets transform to classify face images based on the lighting effects present in the image. An extensive set of face images of different poses, illuminated from different angles, are used in the training of the system. The performance of the system has been evaluated by conducting experiments on different test sets and by comparing its results against those of the existing counterparts. The system improves the performances of the existing counterparts because of the utilisation of the combination of the multivariate discriminant analysis and the wavelet packets transform for determination of the lighting effects, and the utilisation of training face images containing realistic lighting effects. The system may fail to determine an lighting effect in the input face image if the image contains a lighting effect that is not covered by the system or the image contains an extreme lighting effect. The performance of the system can be improved by increasing the number of face images in the lighting-effects classes, and by including more lighting-effects classes in the training sets.
Wavelet Packets for Lighting-Effects Determination
199
References 1. B. K. P. Horn and M. J. Brooks, Eds., Shape from Shading, MIT Press, Cambridge, Mass., 1989. 189 2. Y. Moses and S. Ullman, “Limitation of non-model-based recognition schemes,” in Proc. European Conference on Computer Vision, G. Sandini, Ed., 1992, pp. 820–828. 189 3. R. Brunelli and T. Poggio, “Hyperbf networks for real object recognition,” in Proc. IJCAI, Sydney, Australia, 1991, pp. 1278–1284. 189 4. J. Buhmann, M. Lades, and F. Eeckman, “Asilicon retina for face recognition,” Tech. Rep. 8996-CS, Institute of informatik, University of Bonn, 1993. 189 5. D. Reisfeld and Y. Yeshurun, “Robust detection of facial features by generalised symmetry,” in Proc. International Conference on Pattern Recognition A, 1992, pp. 117–120. 190 6. Y. Adini, Y. Moses, and S. Ullman, “Face recognition: The problem of compensating for changes in illumination direction,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 721–732, July 1997. 190, 191 7. P. Hallinan, “A low-dimensional representation of human faces for arbitrary lighting conditions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1994, pp. 995–999. 190 8. R. Brunelli, “Estimation of pose and illumination direction for face processing,” Tech. Rep. TR-AI 1499, Massachusetts Institute of Technology, November 1994. 190, 191 9. R. Brunelli and T. Poggio, “Face recognition: Features versus templates,” IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 15, no. 10, pp. 1042–1052, 1993. 190 10. P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. fisherfaces: Recognition using class specific linear projection,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711–720, July 1997. 190, 194 11. G. J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition, Wiley, New York, 1992. 190, 191 12. A. Z. Kouzani, F. He, and K. Sammut, “Towards invariant face recognition,” International Journal of Information Science, vol. 123, no. 1-2, pp. 75–101, 2000. 191, 194, 196 13. A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley and Sons, 2001. 191 14. R. R. Coifman and M. V. Wickerhauser, “Entropy-based algorithms for best basis selection,” IEEE Trans. Infor. Theory, vol. 38, no. 2, pp. 713–718, March 1992. 192 15. D. L. Swets and J. J. Weng, “Shoslif-o: Shoslif for object recognition and image retrieval (phase ii),” Tech. Rep. CPS-95-39, Michigan State University, October 1995. 194
Translation-Invariant Face Feature Estimation Using Discrete Wavelet Transform Kun Ma and Xiaoou Tang Department of Information Engineering, The Chinese University of Hong Kong Shatin, Hong Kong
[email protected]
Abstract. In this paper, we conduct a series of experiments to demonstrate the translation invariant property of a set of discrete wavelet features in a face graph. Using local-area power spectrum estimation based on discrete wavelet transform, we compute a feature vector that possesses both an efficient space-frequency structure and the translation invariant property.
1
Introduction
Wavelet transform has been widely studied in many aspects of image processing [2] [3] [4]. Especially, since discrete wavelet transform provides an efficient and nonredundant space-frequency representation of a signal or image, it has been widely studied in image compression and denoising research. However, for pattern recognition study, discrete wavelet transform has not been widely used. The basic requirement on a feature extraction method is translation invariance. That is, when a pattern is translated, its feature descriptors should also be translated, but not modified in its form. Such a property does not apply to the wavelet coefficients generated by fast discrete wavelet transform. The conflict between non-redundant structure and translation invariance is the main obstacle for wavelet application in pattern recognition. In this paper we use a local area spectrum computation to estimate wavelet features for face graph registration. The extracted features are shown to closely approximate translation invariance. Unlike traditional methods which solve the translation invariance problem by restoring the full-density representation [1] [4] [5], our method still uses the efficient computational structure of the discrete wavelet transform.
2
Translation Invariance
A system is time invariant if a time shift in the input signal results in an identical time shift in the output signal. If y (t ) is the output of a continuous-time system given x (t ) as the input, the system is time invariant if Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 200-210, 2001. Springer-Verlag Berlin Heidelberg 2001
Translation-Invariant Face Feature Estimation Using Discrete Wavelet Transform
201
x(t − T ) → y ( t − T ) ,
(1)
where T is a time shift. Spatial shift invariance is the two-dimensional analog of time shift invariance. If the input image is shifted relative to its origin, the output image is shifted in the same way. The time shift invariance and spatial shift invariance are called translation invariance in general. In a pattern recognition system, the translation-invariance property is crucial for stable feature estimation. CWT by Biorthogonal wavelet-bior2.2 122
114
114
106
106
98
98
90
90
82
82
74
74
scale: s
scale: s
CWT by Biorthogonal wavelet-bior2.2 122
66 58
66 58
50
50
42
42
34
34
26
26
18
18 10
10 2
50
100
150
200
2
250
50
100
150
200
250
time: u
time: u
(a) CWT of original signal
(b) CWT of shifted signal
Fig. 1. Translation invariance of CWT. The signal in (b) is shifted 2 pixels to the right
It is straightforward to see that both continuous wavelet transform and dyadic wavelet transform are translation invariant. Let fτ ( t ) = f (t − τ ) be a translation of
f (t ) ∈ L2 (R) by τ . The continuous wavelet transform of fτ (t ) is 1 * t −u )dt ψ ( s s +∞ . t '− (u − τ ) 1 = ∫ f (t ') ψ * ( )dt ' (t ' = t − τ ) −∞ s s = Wf (u − τ , s)
Wfτ (u, s) = ∫
+∞
f (t − τ )
−∞
(2)
Since the output is shifted the same way as the input signal, the continuous wavelet transform is translation invariant, as illustrated in Figure 1. The dyadic wavelet transform of fτ (t ) is Wfτ (u,2 j ) = ∫
+∞
−∞
=∫
+∞
−∞
1
f (t − τ )
f (t ')
1 j
2
j
ψ *(
ψ *(
2 = Wf (u − τ , 2 j )
t−u )dt 2j
. t '− (u − τ ) )dt ' (t ' = t − τ ) 2j
(3)
This shows that the dyadic wavelet transform is also translation invariant. A dyadic wavelet transform example is shown in Figure 2.
202
Kun Ma and Xiaoou Tang
1.5 1 0.5 0 -0.5
Dyadic wavelet transform by Biorthogonal-2.2
50
100
150
200
250
j=1
j=2
j=3
j=4
j=5
(a) Orignal signal f (t ) and its dyadic wavelet transform Wf (u, 2 j ) 1.5 1 0.5 0 -0.5
Dyadic wavelet transform by Biorthogonal-2.2
50
100
150
200
250
j=1
j=2
j=3
j=4
j=5
(b) Shifted signal f (t − 2) and its dyadic wavelet transform Wf (u, 2 j ) Fig. 2. Translation invariance of dyadic wavelet transform
Translation-Invariant Face Feature Estimation Using Discrete Wavelet Transform
203
However, the discrete orthogonal/biorthogonal wavelet transform is not translation invariant. Let fτ ( t ) = f (t − τ ) be a translation of f (t ) ∈ L2 (R) by τ . The orthogonal wavelets,
ψ j ,n ( t ) =
1 2
j
t − 2 j n , n, j ∈ Z , j 2
ψ
(4)
yield orthogonal wavelet transform coefficients, d j [n ] = Wf (2 j n, 2 j ) = f ,ψ j ,n = ∫
+∞
−∞
f (t )
1 2
ψ *( j
t − 2jn )dt . 2j
(5)
Translating f (t ) by τ gives d ' j [n ] = Wfτ (2 j n, 2 j ) = fτ ,ψ j ,n =∫
+∞
−∞
f (t − τ )
1 2
ψ *( j
t − 2jn )dt 2j
t '− (2 j n − τ ) ψ = ∫ f (t ') ( )dt ' (t ' = t − τ ) −∞ 2j 2j = Wf (2 j n − τ ,2 j ) +∞
1
. (6)
*
From Eq.(5) & (6), only when τ = k ⋅ 2 j , k ∈ Z , Wfτ (2 j n, 2 j ) = Wf (2 j ( n − k ), 2 j ) , i.e. d ' j [n] = d j [n − k ] . This means if the translation is the multiple of 2 j , the orthogonal wavelet coefficients of fτ (t ) is the translation of the coefficients of f (t ) ; otherwise, these coefficients may be very different. Therefore, discrete orthogonal wavelet transform is not translation invariant. For biorthogonal wavelets, a dual wavelet function is used for reconstruction. The above conclusion still holds. There are apparent differences between the dyadic wavelet transform and the discrete wavelet transform. The dyadic wavelet transform is a translation invariant representation because it does not sample the translation factor. But this creates a highly redundant signal representation. On the other hand, the discrete wavelet transform samples the time and scale in a dyadic grid, which is implemented by subband filters with downsampling operation. It has a very efficient computation scheme and a compact data structure. However, in such a multi-rate system, the translation of the input signal do not produce a simple translation of the output, unless the translation is a multiple of the corresponding downsampling factors. We use a simple experiment on a rectangle signal f [n ] to illustrate this problem. The discrete orthogonal wavelet transforms of both the original signal and a shifted signal are shown in Figure 3. Comparing the two transforms we see that the wavelet coefficients d1[n ] in the first layer (downsampling factor is 2) shift 1 unit without changes in value. However, in the other layers (downsampling factor is 4, 8, 16,…) the produced wavelet coefficients d j [n ] ( j = 2,...,5 ) change quite significantly. This conflict between computational efficiency and translation invariance has greatly hindered applications of wavelet transform in patter recognition.
204
Kun Ma and Xiaoou Tang
1.5 1 0.5 0 -0.5 d1[n]
Discrete wavelet transform by Daubechies wavelet db2
50
100
150
200
250
d2[n] d3[n]
d4[n]
d5[n]
a5[n]
(a) Original signal f [n ] and its DWT representation 1.5 1 0.5 0 -0.5 d1[n]
Discrete wavelet transform by Daubechies wavelet db2
50
100
150
200
d2[n]
d3[n]
d4[n]
d5[n]
a5[n]
(b) Shifted signal f [n − 2] and its DWT representation Fig. 3. Discrete wavelet transform and translation variance
250
Translation-Invariant Face Feature Estimation Using Discrete Wavelet Transform
3
205
Fast Translation Invariant Feature Extraction
Several approaches have been proposed to solve the translation invariance problem by restoring the full-density representation [1] [4] [5]. The methods either try to reduce computation complexity by introducing more storage complexity or try to reduce storage complexity by introducing more computation complexity. The non-redundant structure and translation invariant property seem incompatible when using fast DWT. For pattern recognition, the extracted features do not need to give a precise representation of the original image. The only requirement is that they can distinguish different patterns. So instead of using the DWT coefficients as image features directly, we estimate the local energy distribution in each subband as the feature values. Let G(ω ) be the power spectrum density of a wide sense stationary signal f (t ) , the time-scale based spectral estimator can be written as the time marginal of the scalogram (squared modulus of the wavelet transform) [1], G (ω j ) =
1 Nj
∑ < f (t ),ψ
j ,n
2 > ,
(7)
n
where N j is the number of wavelet coefficients in scale j. It has been proven that the dyadic sampling grid both in time and frequency do not deteriorate the estimation performance [1]. We can use this property to estimate the local area power spectrum. For a small local area, the spectrum content of the image should remain relatively constant with respect to translation. We now look into this property in a face image matching study. Let A be a small window around a fiducial point p = ( x p , y p ) in the face image,
A( p ) = {I ( x , y ) ( x , y ) − ( x p , y p ) < δ } ,
(8)
where δ defines the size of the neighborhood. In the DWT domain, the wavelet coefficients corresponding to the window A are distributed in all subbands and form a space-frequency tree. Let Rk be the window of the set of related coefficients in the kth subband,
{
l
Rk ( p ) = Wf k (u , v ) ( u, v ) − ( x p , y p ) / 2 <
δ 2
l
}
,
(9)
where Wf k (u, v ) represents the wavelet coefficients in the k-th subband. However, such a space-frequency tree is not suitable for fiducial points matching since the wavelet coefficients are not shift invariant due to the down sampling at each level. To alleviate the problem, we use the local square sum of wavelet coefficients within the small window Rk at each level to estimate the local area power spectrum around the
fiducial point,
206
Kun Ma and Xiaoou Tang
Ω ( Rk ) =
∑ Wf (u, v ) k
2
.
(10)
Rk
We then describe a fiducial point p by the vector, T
J ≡ J ( p ) = [Ω( R1 ), Ω ( R2 ),..., Ω( RK )] ,
(11)
where K is the number of subbands covered by the space-frequency tree. So each element in the vector approximates the energy of a small area of the original image at a particular location and in a particular frequency band. The whole vector can be seen as a power spectrum estimation around each fiducial point in a face image. Given two vectors J and J ' in two face images, their similarity function can be defined as the normalized correlation:
∑Ω ⋅ Ω ' j
S ( J, J ') =
j
,
j
∑Ω ⋅ ∑Ω ' 2 j
j
2 j
(12)
j
where Ω and Ω ' is the element of J and J ' vector. This similarity function gives a measure of whether two fiducial points are similar. The function has a value close to one when the two fiducial points match each other closely. Such a measure is important in face detection study.
4
Translation Invariant Analysis Experiments
In this section, we design a set of experiments to investigate the translation invariant property of the local-area power-spectrum estimation method. Given a face image I0 and its space-frequency tree R0 that centered at a point p, the local power spectrum vector is J0. We shift the original face to Is with displacement s, and extract new power spectrum vector Js corresponding to the space-frequency tree Rs of the shifted point p. To investigate the translation invariance of the power spectrum vector, we measure the similarity function between the two vectors. If the similarity function value is close to one, it shows that the two vectors are similar to each other thus the vector is translation invariant. A face image I0 of size 256x256 is shown in Figure 4(a), where the face portion occupies an area of 128x128. A six level DWT of the face image is displayed in Figure 4 (b). Figure 4 (d) shows the fiducial space-frequency tree for a 64x64 window centered around point p(64, 64). The DWT spectrum vector J0 is computed from the space-frequency tree. Figure 4 (c) is the face area reconstructed from the spacefrequency tree in (d). We now shift the face image to Is with displacement s(7,15), and compute a new spectrum vector Js from the shifted space frequency tree shown in Figure 5. Note that zero expanding is used to remove the boundary effect. If we replace the space-frequency tree of Is with the space-frequency tree in I0, then reconstruct a new face image, as shown in Figure 6, we can see that the reconstructed fiducial area changes dramatically comparing with the original face.
Translation-Invariant Face Feature Estimation Using Discrete Wavelet Transform
207
This shows that the shifted space-frequency tree is very different from the original one. So space-frequency tree is not shift invariant. However, the similarity value of the two local spectrum vectors is close to one. This shows that the local spectrum vector does not change much with the shift. To further verify the shift invariant property of the spectrum vector, we shift the face image over every point within an area of 64x64. Then the vector similarity values are computed for all the shift locations. The statistical results are shown in Figure 7 and Table 1. We can see that the similarity values are very close to one, with an average of 0.97. Thus local spectrum vector closely approximates shift invariant.
5
Conclusion
In this paper, we demonstrated the shift invariant property of local power spectrum vector of discrete wavelet transform using a set of experiments. Such a property is crucial for pattern recognition applications. It solves the basic conflict between efficient space-frequency representation and shift invariance. We are currently studying the application of this feature vector in face detection and face recognition research.
Acknowledgments We thank the Computer Vision Center of Purdue University for the face image database. The work described in this paper was fully supported by an AOE in Information Technology grant and a RGC grant (Project no. CUHK 4190/01E) from the Research Grants Council of the Hong Kong Special Administrative Region.
References 1. 2. 3. 4. 5.
Antoniadis and G. Oppenheim, Wavelet and statistics, Springer-Verlag, 1995. K. Chui, An Introduction to Wavelets, Academic Press, Boston, 1992. Daubechies, Ten Lectures on Wavelets, SIAM Publ., Philadelphia, 1992. S. Mallat, A Wavelet Tour of Signal Processing, 2nd Ed., Academic Press, 1999. E. P. Simoncelli, W.T. Freeman, E.H. Adelson, and D.J. Heeger, "Shiftable multiscale transforms", IEEE Trans. on Information Theory, Vol. 38, No. 2, pp. 587-607, Mar. 1992.
208
Kun Ma and Xiaoou Tang
(a) Original image
(c) Reconstruction from (d)
(b) Wavelet decomposition of (a)
(d) A space-frequency tree
Fig. 4. Original image and its space-frequency tree
Translation-Invariant Face Feature Estimation Using Discrete Wavelet Transform
(a) Image shifted by (7, 15)
(b) Wavelet decomposition of (a)
(c) Reconstruction from (d)
(d) A shifted space-frequency tree
209
Fig. 5. Shifted image and its space-frequency tree
Fig. 6. Reconstruction from space-frequency tree of Fig. 4(d) but at the position in Fig. 5(d)
210
Kun Ma and Xiaoou Tang Table 1. Statistics of spectrum vector similarity values
Max 1
S(J0, Js )
Min 0.886
Mean 0.9705
S.T.D. 0.0051
Jet Similarity Distribution
0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0.88
0.9
0.92
0.94
0.96
0.98
1
(a) Probability distribution of the spectrum vector similarity values
(b) Spectrum vector similarity values for all shifted locations Fig. 7. Translation invariance verification
Text Extraction Based on Nonlinear Frame Yujing Guan1 and Lixin Zhang2 1
Jilin University Information Technologies Co. Ltd Qianjin Rd. 95, Changchun, 130012, P. R. China
[email protected] 2 Mathematics Department, Jilin University Changchun, 130012, P. R. China zhang
[email protected]
Abstract. Locating and extracting text in image or video has been studied in recent decade. There is no method robust for all kinds of text, it may be necessary to apply different methods to extract different kinds of text and fuse these results temporarily. So finding new method is important. In this paper, we combine order statistic and frame theory and give a new method, it can extract text of various colors and size once, the experimental result is satisfying.
1
Introduction
In this new era of information explosion, especially because of the development of Multimedia and Internet, a lot of information present themselves as image or video. Problems about how to obtain the information one wants from them become more and more important. Among them, locating and extracting text in image is a very useful and challenging work. The text embedded in image or video usually provide information about the names of people, organization, or about location, subject, date, time and scores, etc. Those texts are powerful resources for indexing, annotation and content-oriented video processing. So a lot of people get to work with this problem in recent decade, many methods are proposed [1,2,3,4,5,10,11,12]. But it seems that each method has its limitation. For example, current optical character recognition(OCR) technology is restricted to finding text printed against clean backgrounds, and can not handle text printed against shaded or textured backgrounds or embedded in images. Even as S. Antani said in [1], none of the proposed text detection and localization methods was robust for detecting all kinds of text, it might be necessary to apply different methods to extract different kinds of text and fuse these results temporarily. This may be induced by the essential complexity of the problem but make it important to provide more methods for people to select according to the problem they face. In [6] and [7] Dr. Ma and Dr. Tang apply order statistic to detecting stepstructure and page segmentation, they get a good result. But their method is only used for binary image and can’t be used for gray image. In this paper we combine order statistic and frame theory and apply them to extracting text in Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 211–216, 2001. c Springer-Verlag Berlin Heidelberg 2001
212
Yujing Guan and Lixin Zhang
complex background and the result is satisfying. As we know, they have not been used in this field up to now. The proposed method first partition the gray image into a number of small adaptive blocks, for example of 16×16 size, and proceed to find text in each block. The value of each pixel in the block is supposed to be a sample observation of a random variable. Sort the samples in ascending order and get an order statistic. According to the text characters represented in the order statistic, if there is text in the block, there will be steps in the values of the order statistic. Frame is used to detect the steps. Of course, it is possible that there is step in the order statistic while there is no text, so at last it is necessary to test whether the step is formed by text by applying text characters.
2
Order Statistic and Text Characters
Definition 1 . Let (X1 , X2 , · · · , Xn ) be a sample. Order statistic is the statistic obtained by replacing (X1 , X2 , · · · , Xn ) in ascending order. They are denoted by (X(1) , X(2) , · · · , X(n) ), which satisfy X(1) ≤ X(2) ≤ · · · ≤ X(n) . In fact, we first partition the gray image into several blocks with suitable area, suppose there are n pixels in a block and sort them in ascending order, then we get order statistics. If there is text in this block, the order statistics would have the following characters: 1. The values in text comprise a subsequence of (x(1) , x(2) , · · · , x(n) ), which we denote (x(b) , x(b+1) , · · · , x(e) ), and its mean is distinct from those of its left subsequence and right subsequence. 2. There are k and K such that k ≤ e − b ≤ K; 3. There exists a positive constant ε > 0 such that x(e) − x(b) ≤ ε; 4. There exists a positive constant σT such that the variance of the text subsequence is smaller than σT2 . 5. All pixels in text form one or more curves.
3
Frame Transform
As we have declared in section 2, the gray values of the text form a subsequence, (x(b) , x(b+1) , · · · , x(e) ), in the order statistic (x(1) , x(2) , · · · , x(n) ) of the whole gray values in a block. In ideal case, The values in this subsequence are almost equal, but the neighboring values on the left or the right of it are significantly smaller or larger. In other words, there exist singularity points at the two endpoints of the subsequence. If we can find the correct singularity points, we are able to continue to separate the text from the background graphics. Wavelet has been successfully and frequently applied to singularity detection, but it is not adaptive here.
Text Extraction Based on Nonlinear Frame
213
Definition 2 The sequence {φn }n∈Λ is called a frame of a Hilbert space H, if there exist two constants A > 0 and B > 0 such that for any f ∈ H, A f 2 ≤ |< f, φn >|2 ≤ B f 2 . n∈Λ
When A = B the frame is said tight. Example 1: Let N = 2n + 1, be a positive odd number, −1 N √N , 0 ≤ t < 2 , ψ N (t) = √1N , N2 ≤ t < N, 0, otherwise. N (t) = and ψj,n
√1 2j
N ψ N ( 2tj − n), then {ψj,n (t)}j,n∈Z is a frame.
i,N N N Let fj,n =< f, ψj,n >, then the frame coefficient fj,n indicates the difference of two means of f at the left side and the right side of somepoint. In the following of this paper, call it Haar-N frame. It is obvious that this kind of frame vanish for constant. Similar to wavelet, a larger absolute value of frame coefficient indicates a larger step. But wavelet coefficient indicates a sharp change of the value of the function at a point and Haar-N frame coefficient indicates a step of means of the function at the left side and right side of a point. So, the Haar-N frame is not sensitive to noise while high frequency wavelet coefficient is very sensitive to noise. This character of Haar-N frame is very adaptive to detect change point in order statistic. To simplify discussion, denote the gray value of text, background, text noise and background noise by X,Y ,W 1,W 2 respectively, suppose W 1 and W 2 are zero mean, the sample of X + W 1 is less than the sample of Y + W 2. Because we do not know any other statistic property of the above statistic than E(X +W 1) is less than E(Y +W 2), so we want to detect E(Y +W 2)−E(X +W 1) to make sure where the samples of text order statistic are and where the samples of background order statistic are. The existence of noise usually makes the difference of adjoining points in the order statistic decrease or vanish, even makes it smooth, thus it is difficult to detect step with wavelet because wavelet coefficient is generally a linear combination of difference of adjoining points in the order statistic. For example, if we use haar wavelet, the wavelet coefficient at the step point is min(Y + W 2) − max(X + W 1), obviously it is very sensitive to noise and different from E(Y ) − E(X). But frame is not sensitive to noise, From the propostion [8, pp.139], we know the variance of noise becomes N1 times. But for our order statistic, we can not get such a good result, because we do not know where the samples of text are and where the samples of background are. On the other hand, though E(X +W 1) and E(Y +W 2) are unobtainable, we can calculate the mean of some samples with greater values for the text and that of some samples with smaller values for the background graphics respectively. Lastly we calculate the difference of these two means and base our detection on it instead of E(Y + W 2) − E(X + W 1). Sometimes, there is error, but it is better than wavelet.
214
Yujing Guan and Lixin Zhang
In fact, theoretically we have E(Y + W 2) − E(X + W 1) i=0,N −1 Yi + W 2i i=0,N −1 Xi + W 1i = lim − . N −→∞ N N So the previous difference of two means is just an approximate estimate of E(Y + W 2)−E(X +W 1) using finite samples, and at the same time it is also the Haar-N frame transform. Since the number of text sample is unknown, we must choose an adaptive N. A large absolute value of the frame coefficient indicates a step, but a step will make one or more coefficients’ absolute values large. Naturally, we should choose the coefficient with greatest absolute value to make sure where the step happens. In wavelet theory, these points are called Maxima points, more details see [9] and [8, ch.6], in this paper we also use this concept. Moreover, we should also notice that there may be large step in the order statistic induced by the complexity of background graphics. Thus we need a threshold τ , to indicate whether a step is large enough, since our supposition that the gray values of text are distinct from those of background graphics has guaranteed that the step at the correct change point should not be trival. By the way, we should notice that our frame coefficients for the order statistic are all nonnegtive in this paper. More precisely, after we get the frame coefficients, we proceed to let those smaller than τ be neglected, and we only test whether those points corresponding to the left frame coefficients are from text using the text characters presented in section 2.
4
Algorithms
After partitioning the gray image into a number of continuity regions with suitable area, such as into squares with m × n pixels, we replace the values in one block in ascending order and get the order statistic (x(1) , x(2) , · · · , x(mn) ). We use the stationarity of the gray values of the text and their distinction from the other values from the background graphics to reduce the separation of text to step points detection. Wavelet is not adaptive here, and we use Haar-N frame. Those step points separate the order statistic into several subsequences, and we will use the text characters to decide which one or several are from text. Here N > 0 is an integer, and let τ be a threshold, if a frame coefficient fi satisfy |fi | < τ then fi must not be a Maxima point formed by any step. Suppose the image has M blocks whose size is m × n, ε is the difference of the maximum and minimum of text sample, σT2 is the maximum variance of text, k and K are minimum and maximum number of text point in one block if there is text in this block. We give our algorithm as follows: Algorithm : For every block do 1. Get order statistic: Get the samples from current block, and sort them in ascending order to get the order statistic, X(0) , X(1) , · · · , X(mn−1) .
Text Extraction Based on Nonlinear Frame
215
2. Calculate frame coefficients: for i=0,1,· · ·,mn-1, calculate N i+ 2 −1 i−1 1 fi = √ X(j) − X(j) . N N j=i j=i−
2
Here when j < 0 , X(j) = X(0) , when j > mn − 1, X(j) = X(mn−1) . 3. Find Maxima points: (a) For i=0,1,· · ·,mn-1, if fi < τ , set fi = 0. (b) Find Maxima points, suppose the number of the Maxima points is a, sort the Maxima points in ascending order and denote them as α1 , α2 , · · · , αa ; So there are a + 1 subsequences of the order statistic as follows: {X(α0 ) , · · · , X(α1 −1) }, {X(α1 ) , · · · , X(α2 −1) }, · · · , {X(αa ) , · · · , X(αa+1 ) }, where we let α0 = 0, αa+1 = mn − 1. 4. For every subsequence of the order statistic, i = 0, 1, · · · , a, decide whether it is text according to the text charecters in setion 2.
Fig. 1. A scanned image and the extracted text image
5
Examples
We applied our new method to some pictures and got a satisfying result. Fig.1 left was a scanned image, the background is a purple flower, Fig.1 right is the result of the extracted text image. The text of Fig.2 left was added by computer, 3 color text, black, blue and red were added, and the image was conversed to gray image, Fig.2 right is the extracted text image from the background.
216
Yujing Guan and Lixin Zhang
Fig. 2. ext image added by computer and the extracted text image
References 1. S. Antani, D. Crandall, A. Narasimhamurthy, V. Y. Mariano, R.Kasturi, Evaluation of Methods for Detection and Location of Text in Video, In Proc. 4th IAPR International workshop on document analysis systems - DAS ’2000, Rio Othon Palace Hotel - Rio de Janevio, 10-13 December 2000. 211 2. A. Antonacopoulos and D. Karatzas, An Anthropocentric Approach to Text Extraction from WWW Images, In Proc. 4th IAPR International workshop on document analysis systems - DAS ’2000, Rio Othon Palace Hotel - Rio de Janevio, pp 515-525, 10-13 December 2000. 211 3. U. Gargi, S. Antani, R. Kastui, Indexing Text Events in Digital Video Databeses, In Proc. International conference on pattern Recognition, Vol. 1, pages 916-918, Aug. 1998. 211 4. Yassin M. Y. Hasan and Lian J.Karam, Morphological Text Extraction from Images, IEEE Transaction on Image Processing, Vol. 9, No. 11, pp 1978-1983, Nov. 2000. 211 5. Huiping Li, David Doermann and Omid Kia, Automatic Text and Tracking in Digital Video, IEEE Transaction on Images processing, Vol. 9, No. 1, pp 147-156, Jan. 2000. 211 6. Hong Ma, Yong Yu, Li Ma, M. Umeda, Detection of Step-Structure Edge Base on Order Statistic Filter, preprint. 211 7. Hong Ma, Zhou Jie, Yuanyang Tang, Nonlinear Stochastic Filtering Methods of Adaptive Page Segmentation, preprint. 211 8. Stephne Mallat, A Wavelet Tour of Signal Processing, Academic Press, San Diego, 1998. 213, 214 9. Stephne Mallat and W. L. Hwang, Singularity detection and processing with wavelets. IEEE trans. on info. theory, (38):617-643, March, 1992. 214 10. Anil K. Jain and Bin Yu, Automatic Text Location in Images and Video Frames, Pattern Recognition, Vol. 31, No. 12, pp 2055-2076, 1998. 211 11. Victor Wu, Raghvan Manmatha, and Edward M. Riseman, TextFinder: An Automatic System to Detect and Recognize Text in Images, IEEE Transaction on Patter Analysis and Machine Intelligence, Vol. 21, No. 11, pp 1224-1229, Nov. 1999. 211 12. Yu Zhong, Hongjiang Zhang, and Anil K. Jain, Automatic Caption Localization in Compressed Video, IEEE Transaction on Patter Analysis and Machine Intelligence, Vol. 22, No. 4, pp 385-392, Apr. 2000. 211
A Wavelet Multiresolution Edge Analysis Method for Recovery of Depth from Defocused Images Wang Qiang 1, Hu Weiping 1, Hu Jianping 2, and Hu Kai 2 1
Dept. of Physics and Electronic Science Guangxi Normal University, Guilin, Guangxi, 541004 2 Dept. Of Computer Science and Engineering, Beijing University of Aeronautics and Astronauts, Beijing 100083
[email protected]
Abstract. A approach of depth recovery from defocused image based on wavelet multiresolution analysis is proposed. The Lipschitz exponent is used to describe the singularity of the edge of an object in image. A curve of relationship between Lipschitz exponent and the distance from interested object to camera is obtained. Experiment proved the effective of the method.
1
Introduction
With the exploitation of industrial automation, It is more and more difficult for human vision to perform the product test task in the large scale producing line. Computer vision is becoming the key technique used to promote the producing efficiency and ensure the products qualities. It is more and more widely adopted. For example, the computer vision system can be used in automatic testing of the mechanical component production and also used in the monitoring or controlling of the conditions in the large scale producing line. For ordinary purpose, a two-dimension grayscale image system can serve the purpose of product testing and monitoring. There are also many application conditions that need three-dimension system to test the objects or products that are interested. In such an application the three-dimension construct must be dealt with and the three-dimension depth measure must be performed. Human eyes are very effective depth measuring system. The depth of interested object surface is obtained and set up by the combination of the two planar images obtained by the different viewpoints of the eyes. Because such kind solid viewing is under the natural light it is belong to the passive measuring[1]. Though the human two eyes viewing is very effective it is very difficult for the computer system to simulate such a solid vision. The first thing to be solved is that the corresponding points between the two planar images must be found. This will cost a lot of time consuming and complex computing work. In addition there may not be the sufficient information which can be used to set up the one to one corresponding relationship at the interested points. Then can we bypass the computing and finding of the corresponding points in a solid scene and get the depth information by analyzing and Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 217-222, 2001. Springer-Verlag Berlin Heidelberg 2001
218
Wang Qiang et al.
computing a single grayscale image? We will answer the question and propose an approach of depth recovery from defocused image based on wavelet multi resolution analysis in this paper. Experiment proved the effective of the method. The edge of object in a scene is very important. It carries a lot of useful information about the object. Usually, an image is processed only by edge finding and binary coding [3]. A lot of useful information is wasted. It has advantages to use the wavelet multi resolution to analyze the singularity of an object and use some parameters to describe the different characters of the objects’ edge in scene [2][5][6]. This will be very helpful to understand the content in a scene. In this paper we try to compute the Lipschitz exponent of an objects’ image by the wavelet transform multiresolution analysis and use the Lipschitz exponent as a criterion to judge the objects’ defocused degrees. Then we can get the relationship between the Lipschitz exponent of the interested object and the distance from it to camera in curve line. This paper is organized as following: The principle of optical imaging of an object in a scene is introduced first. Then the wavelet edge multiresolution analysis and the Lipschitz exponent computing method is described. Some of the experimental result are given in the following part . Finally we give the estimating of advantages of the proposed approach.
2
Object‘s Defocused Imaging
The front-terminal of the computer vision system is sensor part .To satisfy the realtime testing request in the industrial automation system and to reduce the data processing a camera with 256 degree grayscales is usually selected as a sensor. The optical system construction of the sensor can be abstracted as figure 1[4].
Fig. 1. Focused image of a point
Fig. 2. Defocused image of a point
In figure 1, suppose P is a single point at the surface of the interested object. Consider a luminous point P in a 3D scene located on the optical axis of the camera lens as shown in figure 1. Light emits from point P in all directions. The divergent bundle of rays passes through the lens and converges to a point again on the optical axis. If the convergent point lies exactly on the CCD image plane it forms a sharp point (as shown in figure 1). Under such a condition if the point was replaced by a real object the clear image of the object forms on the CCD image plane. If we change the position of point P by increasing the distance between P and the camera lens the image of P point no longer converges on the CCD image plane. This is shown on figure 2. At such condition the point P forms blur circle area instead of a clear focused image point. The radius of the blur circle area is related to the defocused extent
A Wavelet Multiresolution Edge Analysis Method for Recovery of Depth
219
(denoted by ‘r’. If we replace the P point with a real object at this condition then we obtained a blur image on the CCD plane. In the practical application system, when the camera is selected the distance between lens and the CCD plane remains unchanged. If the focal length is fixed then the distance which enable the interested object in scene imaging clearly is fixed. If we change the distance between the object and the camera along the optical axis the blur degree of the object image is also changed. The further the object departs the P point the severer the defocused phenomena becomes (Of course there are two conditions of defocused R>R0 and R
R0) Through above analysis we can derive that if we denote ‘r’ as the radius of defocused blur area and R0 denote the focal length and R denote the distance between the defocused object and the camera then we have following relation. r=K(1/R0-1/R)
(1)
Here K is a coefficient. It relates the distance between the lens and the CCD plane of the camera. It also relates the focal length and the aperture of the camera. When all of these parameters are decided the K remains unchanged. Then from the formula (1) we can derive that the radius of defocused circle area will increase with the distance increasing. The formula (1) can be written as following form: R=R0*K/(K-r*R0)
(2)
This formula express that if the R0 and K are given the distance (denoted by R) between the interested object and the camera lens can be obtain by measuring the defocused radius ‘r’. This principle has been used by a few researchers [4],[7]. In this paper we will introduce an algorithm based on the wavelet multi resolution analysis to implement the principle and introduce the Lipschitz exponent to describe the regularity of the object edge in scene so that to determine the distance from camera to the interested object. If the objects in scene have some evident feature of texture the defocused extent of the texture can also be used to measure the depth of the object [4] In this paper the general condition is considered so we only consider the defocused extent of the edge and contour.
3
Wavelet Multiresolution Analysis and Lipschitz Exponent
From above analysis we can make a conclusion that the distance between the camera and the interested point can be measured by computing the radius of defocused area. The contour of the real object is consisted of countless points so if the interested point is replaced by the real object and all of the points are in the defocused position we will obtain a blur contour. Actually we can measure the depth of an object by analyzing the grayscale gradient (say singularity) of its edge. To do this, an effective way is wavelet multi resolution edge analysis.
220
•
Wang Qiang et al.
Wavelet transform
The basic theory of the 2D image Wavelet transform has been stated in many books and referent materials [5],[6]. We don’t want to describe the theory in detail. For thorough presentation of the wavelet transform, refer to the mathematical books of Meyer [5],[6]. In this paper the Mallat fast algorithm for 2D wavelet transform is used as following: j= 0 while (j<J) w1dj +1 f =s d j f ∗(G j , D ) 2 2 w2 dj+1 f =s d j f ∗( D,G j ) 2 2 d d f = s j +1 s j f ∗( H j, H j ) 2 2 j= j+1 end of while The G,D,H are coefficient of filter. The G is the coefficient for high frequency band, while H is the coefficient for low frequency band and D is Dirac function. To know the value of these coefficients and the method to convolute them with the 2D image signals, refer to the reference [5]. If we use w1 j f ( x, y ) and w2 j f ( x, y ) to denote horizontal and vertical 1D 2
2
wavelet transform respectively. The 2D wavelet transform modulus and direction of the gradient are given respectively by M 2 f ( x, y )= | w12 f ( x, y ) | 2 + | w22 f ( x, y ) | 2 j
j
j
A2 j f ( x, y ) = arctan( •
| w22 j f ( x, y ) | | w12 j f ( x, y ) |
)
(3)
(4)
Lipschitz exponent computing
We perform the dyadic wavelet transform in the image in which the interested object is included. We remain the pixels with the wavelet transform modulus above a given threshold. Then we can obtain the contour of the object. Finding the modulus maximum along the gradient direction of transform modulus and linking these pixels can set up a link of transform modulus maximum. We calculate the average value of the transform modulus in maximum link for each scale and denote it by Mj. We do the same from the evolution across dyadic scales (generally, 3 or 4 scales is enough) It can be proved that the 2D wavelet transform satisfies the following inequality: | w2 j f ( x, y ) |≤ k 2 j s 0
α −1
here s 0 = 2 2 j + σ 2
(5)
A Wavelet Multiresolution Edge Analysis Method for Recovery of Depth
221
We computer the three parameters K,α,and σso that the inequality of (6) is as close as possible to an equality for each dyadic scale. That is to minimize the following: i
∑ (log 2 | M j | − log 2 (k ) − J − j =1
α −1 2
)
log 2 (σ 2 + 2 2 j ) 2
(6)
The input parameters of (7) are average modulus maxima Mj and the dyadic scale j. Theαin (7) is the Lipschitz exponent which can describe regularity of the object edge. The k and σare two other parameters related to the object edge.
4
Experiment
We use the 256 grayscale industrial camera in the experiment. The focal length of the camera is adjusted to minimum (30cm). In order to decrease the effective scene depth of the camera the aperture is set to maximum so that to enhance the sensitivity to distance changing of interested object. Five images of capital character D were taken in our experiment. Figure3 (a) shows that the image of the character ‘D’ which is at the focal plane of the camera (30cm). Figure3 (b) to (e) each increase the distance of 5cm form the former respectively.
(a)
(b)
(c)
(d)
(e)
Fig. 3. Images with different distance
We use the quadratic spline as wavelet. In the experiment we perform the wavelet transform in different dyadic scale 2j (j=1,2,3) and record the modulus and gradient angle computed by formula (3) and (4). Then we threshold these modulus and search the modulus maximal along the direction of grayscale gradient. We obtain the modulus maximal link by linking these pixels. The average of the pixels modulus of the linking line are denoted by M1 M2 M3 respectively. Table 1 shows the data in experiment. The ‘d’ denote distance, ‘M1’,‘M2’,‘M3’ denote average value of the linking line, ‘α’denote singularity of edge, ‘k’ and ‘σ’ are two other parameters . Table 1. The data in experiment
sample #1 #2 #3 #4 #5
d (cm) 30 35 40 45 50
M1 103 93 63 47 34
M2 119 120 96 72 55
M3 135 139 128 115 94
α 0.24 0.31 0.50 0.64 0.65
k 83.8 74.95 46.34 30.50 24.22
σ 0.00 0.00 0.72 0.77 12.83
222
Wang Qiang et al.
We can draw a curve according to the distance d and the responding Lipschitz exponent α shown in figure 4.
Distance between object and lens (cm) Fig. 4. The curve of relation betweenαand distance
If we take image of objects with unknown distance we can measure the distance and the depth from figure 4 by computing the Lipschitz exponent α(in certain range).
5
Conclusion
The validity of the proposed method is proved by the results of the experiment. This new approach has the following advantages: With the new approch above we can get the depth information of different objects in a scene. The equipment is simple(only a set of the grayscale camera system) and the measuring is easy to realize. The method can be used in image segmentation. We can use the mehod above to distinguish the blur extent of the edges so that to realize the image segmentation. Different kind edges and contours can be chosen and segmented according to the prior knowledge about edge of the objects.
References 1. 2. 3. 4. 5. 6. 7.
Zheng Nanning,Computer Vision and Pattern Recognition, Defence and Industry Press,China (1998) PP169-191 Zhang Huobao, Multiresolution Edge Extraction Based on Orthowavelet, Chinese Transaction on Image and Graphics, Vol.3, No.8, 1998, PP651-654 Canny J, A Computational Approach to Edge Detection, IEEE T-PAMI, Vol.8, No.6, 1986, PP679-698. Sridhar R. Kundur, Novel Active Vision-Based Visual Threat Cue for Autonomous Navigation Tasks, Computer Vision and Image Understanding, Vol.73, No.2, 1999, PP169-182. Mallet S , Zhang S, Characterization of Signals from Multiscale Edge, IEEE TPAMI, Vol.14, No7, 1992, PP710-732 Mallet S, Hwang W, Singularity Detection and Processing with Wavelets, IEEE Transaction on Information Theory, Vol.38, No.2, 1992, PP617-643. A. N. Rajagopalan and S.Chaudhuri, An MRF Model-based Approach to Simultaneous Recovery of Depth and Restoration from Defocused Images, IEEE T-PAMI, Vol.21, No.7, 1999, PP578-589.
Construction of Finite Non-separable Orthogonal Filter Banks with Linear Phase and Its Application in Image Segmentation Hanlin Chen1 and Silong Peng2 1
2
Inst. of Math., Academia Sinica 100080, Beijing, PRC [email protected] NADEC, Inst. of Automation, Academia Sinica 100080, Beijing, PRC [email protected]
Abstract. In [7], a large class of bi-variate finite orthogonal wavelet filters was constructed. In this paper, we propose a more general expression of the filter bank with linear phase which is called standard method. Beside this, a non-standard method is also presented. A interesting example is also given. By using this non-separable wavelet filter bank, we present a novel method of segmenting a image into two parts: one part is texture with special property and another part is image of piecewise smooth in some sense.
1
Introduction
Recently, many researchers are working on non-separable wavelets (see [1,2,3,4,7,8] and the references therein). In [7], a large class of bivariate compactly supported orthogonal symmetric wavelet filters (low-pass and high-pass) with arbitrary length are presented in explicit expression. In this paper, we give another two methods of constructing bivariate compactly supported orthogonal symmetric wavelet filters. The standard method in this paper is similar to that of [7], but it’s result is more general. We prove that non-standard method is included in the standard method. The standard method is introduced in next section. The non-standard method will be given in section 3. A simple image segmentation method by using the filters is given in section 4.
2
Standard Method
Let {Vj } be a two dimensional MRA, then there exists a function m0 (ξ, η)(ξ, η ∈ ˆ η), where ϕˆ is the Fourier transform of ϕ, R) such that ϕ(2ξ, ˆ 2η) = m0 (ξ, η)ϕ(ξ, and m0 is called Symbol Function of the scaling function ϕ. The orthogonality of {ϕ(x − j, y − k)}j,k∈ZZ implies that m0 satisfies |m0 (ξ, η)|2 + |m0 (ξ + π, η)|2 + |m0 (ξ, η + π)|2 + |m0 (ξ + π, η + π)|2 = 1. Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 223–229, 2001. c Springer-Verlag Berlin Heidelberg 2001
(1)
224
Hanlin Chen and Silong Peng
If a trigonometric polynomial m0 (ξ, η) satisfies (2.1) and m0 (0, 0) = 1, we call m0 a Orthogonal Lowpass Wavelet Filter. Assume that m(x, y) is a polynomial of x and y with real coefficients. Rewrite m(x, y) into its polyphase form as m(x, y) = f1 (x2 , y 2 ) + xf2 (x2 , y 2 ) + yf3 (x2 , y 2 ) + xyf4 (x2 , y 2 ).
(2)
It is easy to see that m(eiξ , eiη ) satisfies (1) is equivalent to 4 ν=1
|fν (eiξ , eiη )|2 =
1 , 4
ξ, η ∈ R.
(3)
In this paper, all polynomials are with real coefficients in default. In some applications, it is better to use a filter with linear phase than a filter with nonlinear phase. It is well known that in one dimension case, there does not exist a orthogonal filter with linear phase beside Haar filter. But in high dimension case, we can find many filters with linear phase. Definition 1. Given a polynomial m(x, y), if m(eiξ , eiη ) = ±e−iM1 ξ e−iM2 η m(eiξ , eiη ),
(4)
where M1 and M2 are positive integers, then we say that m(eiξ , eiη ) has Linear Phase. To our purpose, we introduce a kind of matrix transform. For a matrix A of size m × m, define AS := Hm AHm , where Hm = (hkl )m k,l=1 is matrix of size m × m, with hkl = 1 when k + l = m + 1 and 0 otherwise. Moreover, denote U2 to be the set of all real unitary matrices with size 4 × 4. The following theorem give a large class of symmetric wavelet filters. Theorem 1.
Let N 1 (Uµ Dµ (x2 , y 2 )UµT ) · V0 , m(x, y) = X · 4 µ=1
(5)
where Uµ ∈ J2 := {U |U ∈ U2 , U = ±U S }, and Dµ (x, y) = diag{1, x, 1, x}, or diag{1, y, 1, y}, for µ = 1, · · · , N , and V0 = (1 1 1 1)T , X = (1 x y xy), then m(eiξ , eiη ) is a symmetric filter and satisfies (1). Proof. The proof is direct. Remark 1. Although the form of (5) is similar to that of [7], but we can see that this form is more general. In fact, the filters given by non-standard method later are included in this form, but not included in the expression of [7].
Construction of Finite Non-separable Orthogonal Filter Banks
3
225
Non-standard Method
In this section, we will introduce a new method to construct finite orthogonal symmetric wavelet filters. Given a polynomial with real coefficients m(x, y), and let (6) m(x, y) = f1 (x2 , y 2 ) + xf2 (x2 , y 2 ) + yf3 (x2 , y 2 ) + xyf4 (x2 , y 2 ). If
m(eiξ , eiη ) = ±e−i(2M1 +1)ξ e−i(2M2 +1)η m(eiξ , eiη ), 2M1 +1 2M2 +1
that is m(x, y) = ±x
y
m( x1 , y1 ),
1 1 f1 (x, y) = ±xM1 y M2 f4 ( , ), x y
(7)
then we obtain
1 1 f2 (x, y) = ±xM1 y M2 f3 ( , ). x y
(8)
If m(eiξ , eiη ) satisfies (1), then f1 , f2 , f3 and f4 satisfy (3). Substitute (8) into (3) to obtain 1 (9) |f1 (eiξ , eiη )|2 + |f2 (eiξ , eiη )|2 = . 8 Conversely, if we have two polynomials f1 and f2 satisfy (9), then (8) will give f3 and f4 , such that we can get a finite orthogonal symmetric wavelet filter. The following theorem give a large class of the solutions of (9). Theorem 2. Let (f1 (x, y) f2 (x, y))T =
N 1 (Aµ Eµ ATµ ) · (1 1)T , 4 µ=1
(10)
where Aµ is any real unitary matrix of size 2 × 2, Eµ = diag(1, x) or diag(1, y), for µ = 1, · · · , N , then f1 and f2 satisfy (17). Proof. Since Aµ ’s are unitary matrices, and Eµ ’s are paraunitary matrices, the conclusion is followed immediately. The non-standard method looks like different from the standard method, but in fact, all filters result from non-standard method can be constructed by standard method, which is the following theorem. Theorem 3. Let m(x, y) = f1 (x2 , y 2 ) + xf2 (x2 , y 2 ) + yf3 (x2 , y 2 ) + xyf4 (x2 , y 2 ), where f1 (x, y) and f2 (x, y) satisfy (10), m(x, y) satisfies (7), then we have m(x, y) =
where Uµ =
Aµ 0 0 ASµ
N 1 X· (Uµ Dµ (x2 , y 2 )UµT ) · V0 , 4 µ=1
, Dµ (x, y) =
Eµ (x, y) 0 0 Eµ (x, y)
(11) .
226
Hanlin Chen and Silong Peng
By using this construction method, we can construct the following nonseparable filter banks. The lowpass filter is: 1 −1 1 1 1 1 1 1 −1 . 8 −1 1 −1 1 1 1 −1 1 The three highpass filters are
1 −1 1 1 −1 1 1 1 1 −1 −1 −1 1 1 1 1 −1 , 1 −1 −1 1 −1 , 1 1 1 −1 1 . 1 −1 −1 −1 1 −1 1 1 1 −1 1 1 8 8 8 −1 −1 1 −1 −1 −1 −1 1 −1 −1 −1 1
4
Image Segmentation with Non-separable Symmetric Filter
The filters given in previous section is very good in some sense: their element are 1 or −1 (if we omit the factor 18 ) which will be useful in computation; they are all with linear phase, two of them are symmetric, the other two are anti-symmetric. These filters have bad regularity in contrast with the well known biorthogonal 9/7 wavelets. These filters act as derivative operators such as Sobel operator. The following examples illustrate this fact.
Fig. 1. Original image
Image segmentation is important in many applications such as image compression and computer vision. In some applications, it will be useful to segment an image into two parts: one part is include regions with dense edges, and the other regions are with few edges. In general case, a sub-area of a image with dense edges maybe texture, which will be difficult to processed in applications such as compression. By using the derivative property of the filters, we present a novel method to do segmentation. In the area which may be texture, the distance between edges are short. In addition, there are always all direction of edges, this means that for each channel
Construction of Finite Non-separable Orthogonal Filter Banks
227
Fig. 2. Filtering result of the first high-pass filter. Left: bigger than 16. Right: smaller than -16
Fig. 3. Filtering result of the second high-pass filter. Left: bigger than 16. Right: smaller than -16 of the above filter, various edges will appear in the texture region. On the other hand, we only have one kind of edge near the edge of piecewise smooth area. Therefore we utilize this fact to segment a image. Here we have three high-pass filters, we call them three channels. In this algorithm, we do not do the down-sampling, just convolute the image with the filters. Suppose B is the currently processed channel. Let BP 1 = B > th1 and BP 2 = B > th2, where th1 > th2 are two positive numbers. In BP 2, we can find all the areas which contain at least one point of BP 1, all these areas put together to obtain BP . Similarly we can get the BM in which every point is negative number in B. Let DBP is a matrix with same size of B, its elements is 0, BP (p) = 0 DBP (p) = d(p, BM ), otherwise where p = (i, j) is a point, d(p, BM ) is the distance of p and the nonzero point in BM . Similarly, we can define DBM . If a point in B is located in the texture area, then at least one of the corresponding value in DBP and DBM are small. Let SB is a matrix which indicates that at each point, both values in DBP and DBM are smaller than a given threshold th. Of course, the selection of th depend on the scale of the texture one prefer, and the segment result depends on the selection of the threshold. Let S1, S2 and S3 be the corresponding segment results of three channels respectively. Then we can segment roughly as S = S1 + S2 + S3 > 1, which
228
Hanlin Chen and Silong Peng
Fig. 4. Filtering result of the third high-pass filter. Left: bigger than 16. Right: smaller than -16 means that, if a point is located in the texture area, then it will appear in at least two of the channels. At last to obtain a true area, we need to do some small dilation and erosion. A segmentation example is given as follows by using the above filters and the well known image.
Fig. 5. Original Image
Fig. 6. Segmentation result (th = 20)
References 1. I. Daubechies, Ten Lectures on Wavelets, CBMS, 61,SIAM, Philadelphia, 1992. 223 2. Wenjie He and Mingjun Lai, Construction of Bivariate Compactly supported Biorthogonal Box Spline Wavelets with Arbitrarily High Regularities, Applied Comput. Harmonic Analysis, 6(1999) 53-74. 223 3. Wenjie He and Mingjun Lai, Examples of Bivariate Nonseparable Compactly Supported Orthonormal Continuous Wavelets, Wavelet Applications in Signal and Image Processing IV, Proceedings of SPIE, 3169(1997) 303-314. 223 4. J. Kovacevic and M. Vetterli, Nonseparable multidimensional perfect reconstruction filter banks and wavelet bases for Rn , IEEE Tran. on Information Theory, 38, 2(1992) 533-555. 223 5. S. Mallat, Review of Multifrequency Channel Decomposition of Images and Wavelet Models, Technical report 412, Robotics Report 178, New York Univ., (1988).
Construction of Finite Non-separable Orthogonal Filter Banks
229
6. Y. Meyer, Principe d’incertitude, Bases hilbertiennes et algebres d’oper-ateurs, Seminaire Bourbaki 662,1985-86, Asterisque (Societe Mathematique de France). 7. Silong Peng, Construction of Two Dimensional Compactly Supported Orthogonal Wavelet Filters with Linear Phase, (to appear in ACTA Mathematica Sinica), (1999). 223, 224 8. Silong Peng, Characterization of Separable Bivariate Orthonormal Compactly Supported Wavelet Basis, (to appear in ACTA Mathematica Sinica), (1999). 223 9. Silong Peng, N dimensional Compactly Supported Orthogonal Wavelet Filters, (to appear in J. of Computational Mathematics), (1999).
Mixture-State Document Segmentation Using Wavelet-Domain Hidden Markov Tree Models Yuan Y. Tang 1, Yuhua Hou 2, Jinping Song 2, and Xiaoyi Yang 2 1
Department of Computer Science, Hong Kong Baptist University Hong Kong [email protected] 2 Department of Mathematics, Henan University Kaifeng, 475001, China [email protected]
Abstract. In this paper we introduce a mixture-state document segmentation method based on wavelet and the hidden Markov tree (HMT) models. First we propose a three-state HMT segmentation method that is similar to those in the reference [1]. Then through comparing the difference weights to the three-density Gaussian mixture distribution of different textures, we find that background, text and image can be well approximated respectively by one-state and two-state and three-state HMT models. Then we get a new segmentation method, mixture-state HMT segmentation. Experiments with scanned document images indicate that the new approach improves the segmentation accuracy over the raw segmentation in [1].
1
1.1
Three-State HMT Segmentation
Two-State HMT Segmentation
The work on document segmentation by wavelet-domain hidden Markov tree (HMT) methods have been considered in many papers, such as [1,2], in which the documents are divided into classification blocks, and decisions are made independently for the class of each block. For example, Hyeokho Choi and Richard Baraniuk [1] proposed a multiscale document segmentation algorithm which divided document into dyadic squares at different scales, the dyadic squares at some scale are obtained simply by recursively dividing the document into four square subdocuments of equal size. In that way, every parent square has four children squares and then all dyadic squares have a convenient quad-tree structure. By using the simplest 2-D Haar wavelet LH
HL
HH
transform, the wavelet coefficient matrices w , w , w at different scales lead naturally to quad-tree structure on the wavelet coefficient in each subband and each wavelet coefficient node corresponds to a related dyadic image square. Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 230-236, 2001. Springer-Verlag Berlin Heidelberg 2001
Mixture-State Document Segmentation
Just as the notations in [1], we denote a dyadic square at scale abstract index enumerating the squares at scale
231
j by d i j (with i an
j ). The two extremes d 00 and d iJ
wi denote a generic wavelet coefficient of some subband tree at a certain scale (sometimes we denote i for simple). Define ρ (i ) tobe the parent of node i . Given a subband, define Ti to be the subtree of wavelet coefficients with root node i . In practice the probability density function pdf f ( wi ) is unknown. The paper [1] are root and leaves of the tree respectively.
associates a discrete hidden state states
Si = r
in (r
Let
Si that takes two values and connects the hidden
a
=
directed Markov-1 probabilistic graph. Given a condition: S , L ), the probability density function pdf f ( wi ) of each wavelet
coefficient is approximated by a two-density Gaussian mixture model
f ( wi ) = ∑ pS i (r ) f ( wi | Si = r ) r =S ,L
where
(1)
f ( wi | Si = r ) ~ N ( µi , r ,σ i2,r ) , and pS i ( S ) + pS i ( L) = 1 .
Θ LH , Θ HL , Θ HH the parameter sets for LH , HL and HH subbands, and M := Θ LH , Θ HL , Θ HH . Define β i (r ) := f (Ti | Si = r , Θ) , and denote Denote
{
}
f (Ti | Θ) = ∑ β i (r ) p ( Si = r | Θ) r = S ,L
(2)
The overall likelihood of some dyadic squares can be computed by
f (d i | M ) = f (Ti LH | Θ LH ) f (Ti HL | Θ HL ) f (Ti HH | Θ HH )
(3)
The raw HMT segmentation method [1] is then given by
ciML := arg max
c∈{1, 2 ,!, N c }
f (d i | M c )
(4)
N c is the number of class labels, and M c indicates the corresponding parameter set M for some class label c .
where
1.2
Three-State HMT Segmentation
In this paper, we associate the discrete hidden state
Si that takes on three values
r = S , M , L with probability mass function pmf pS i (r ) . Condition Si = r , wi is three-density Gaussian mixture model with mean overall pdf is given by
µi , r and variance σ i2,r
. Thus, the
232
Yuan Y. Tang et al.
f ( wi ) =
∑p
r =S ,M ,L
where
Si
(r ) f ( wi | S i = r )
f ( wi | Si = r ) ~ N ( µi , r ,σ i2,r ) ,
each parent-child pair of hidden states
{S
(5)
pS i ( S ) + pS i ( M ) + pS i ( L) = 1 . For ρ (i )
, S i }, the state transition probability
matrix becomes
ε iρ, S( i ), S ρ ( i ), M ε i ,S ε ρ ( i ), L i ,S
ε iρ,M(i ),S ε iρ, M(i ),M ε iρ,M(i ), L
1 − ε iρ,M(i ), S − ε iρ, L( i ), S ε iρ, S( i ),M = ε iρ, S( i ), L
ε iρ, L(i ),S ε iρ, L(i ),M ε iρ, L(i ),L ε iρ, M( i ),S
ε iρ, L(i ),S ε iρ, L(i ),M
1 − ε iρ, S( i ), M − ε iρ, L( i ), M
ε iρ, M( i ), L
(6)
1 − ε iρ, S( i ), L − ε iρ, M( i ), L
Accordingly, the formula (2) turns into
f (Ti | Θ) =
∑ β (r ) p( S i
r =S ,M ,L
i
= r | Θ)
(7)
Similar to [1], we can obtain a three-state HMT segmentation method through Eqs. (3)-(7). Experiments of the scanned document images indicate that the new approach improves somewhat the accuracy of the segmentation over the raw segmentation in paper [1], as shown in Fig. 1. For all results of the segmentation, we use “white”, “gray”, and “black” to represent background, text and image regions respectively.
Fig. 1ab. (a) 512 × 512 document image used for training of the HMT models for text, image, and background textures. (b) Original 512 × 512 document image to be segmented
Mixture-State Document Segmentation
233
8x8 block size
2x2 block size
4x4 block size
Fig. 1c. Segmentation result by two-state HMT segmentation
(d) Fig. 1d. Segmentation result by using the proposed three-state HMT segmentation algorithm
2
Mixture-State HMT Segmentation
During the course of training three-state HMT models by a lot of experiments, we find that the weights, which is the values of pmf p Si ( r ) , to the three-density Gaussian mixture distribution of different textures are stable relatively. This can found in Figs. 2 and 3.
p Si (r ) to the three-density Gaussian HH subband at J scale, the finest scale
Fig. 2. The different weights background, text, image in
mixture distribution of
234
Yuan Y. Tang et al.
p Si (r ) to the three-density Gaussian background, text, image in HH subband at scale of J − 1
Fig. 3. The different weights
mixture distribution of
At the finest scale, and in HH subband, Fig. 2 shows that the three weights 0.8907, 0.0502, 0.0591 for background, the three weights 0.0889, 0.4175, 0.4936 for text, the three weights 0.1627, 0.3032, 0.5341 for image. Similar results could be obtained in other subbands and scales. This reminds us that background, text and image perhaps can be well approximated respectively by onedensity, two-density and three-density Gaussian mixture models. With this idea, we propose a new HMT document segmentation algorithm, the mixture-state HMT segmentation, that approximates background with one-density Gaussian model, text with two-density Gaussian model and image with a multidensity Gaussian model (specially with a three-density Gaussian model in this paper). In fact, we can regard multi-density Gaussian mixture models as a set:
p (r ) f ( w | S = r ) , ∑ S i i i r for some p Si ( r ) = 0 . We can obtain one-density, two-density, three-density Gaussian mixture models and so on. To obtain the segmentation from an original picture, first we train HMT models respectively for background with one-state, text with two-state and image with three-state to achieve parameter sets M. Then the likelihood of the coefficients in subtree Ti can be computed by
f (Ti | Θ) = ∑ β i (r ) p( S i = r | Θ) r
(8)
where r stands for the state. Finally, we can obtain the result of the segmentation by using formulas (3) and (4).
Mixture-State Document Segmentation
235
Experiments with scanned document images, as shown Fig. 4, indicate that the new approach, the mixture-state HMT segmentation, improves the accuracy of the segmentation much batter than the raw segmentation in paper [1] and the three-state HMT segmentation proposed previously by Choi and R. G. Baraniuk. Furthermore, since the new method regard background, text and image as the different density Gaussian mixture distributions, the training and testing processes will become simpler than the multi-state HMT segmentation method. And our algorithm can offer improved segmentation accuracy with lower computational burden compared with the raw segmentation in [1].
Mixture-state
(b) Three-state
Two-state
(a)
Mixture-state
Three-state
Two-state
(c)
Mixture-state
Three-state
Two-state
(d)
(e) Fig. 4. (a) 512 × 512 document image used for training of the HMT models for background, text, and image textures. (b) Original 512 × 512 document image to be segmented. (c) Segmentation in 2 × 2 block size by using two-state HMT segmentation, three-state HMT segmentation and mixture-state HMT segmentation respectively. (d) Segmentation in 4 × 4 block size. (e) Segmentation in 8 × 8 block size
236
Yuan Y. Tang et al.
References 1. 2.
H. Choi and R. G. Baraniuk, Multiscale Document Segmentation using WaveletDomain Hidden Markov Models, Science & Technology , Janu. 2000. M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, “Wavelet-Based Statistical Signal Processing using Hidden Markov Models,” IEEE Trans. Signal Proc. 46, April 1998.
Some Experiment Results on Feature Analyses of Stroke Sequence Free Matching Algorithms for On-Line Chinese Character Recognition Tak Ming Law Hong Kong Institute of Vocational Education (Morrison Hill) Department of Computing, 6 Oi Kwan Road, Wan Chai, Hong Kong [email protected]
Abstract. We have built several trial programs (system) to test the assumptions we have made on improving the speed and accuracy of online Chinese character recognition results. This paper describes an online Chinese character database, which is built upon the analysis of the character segments. Therefore, the structure of the database covers only on the features of individual segment and the relations used to distinguish one character from another. On the other hand, the stroke sequence free algorithm checks the stroke segments iterately until all segments are evaluated between the input and reference characters. Therefore, even the users input the characters against the stroke order rules, the system still be able to get the correct result. Since the distance measure algorithm depends only on the features of segments, the system need not to distinguish radical and structure within characters.
1
Introduction
In order to improve the handiness and speed of On-line Chinese character recognition, a segment based Chinese character database (was presented in [1,2]) is applied to achieve the stroke sequence free feature. Our approach tends to put all the important features of each segment of characters into the database, which will shorten the time of character retrieval in the matching stage and solve the problem of incorrect stroke sequence inputting automatically. In other words, a stroke sequence free algorithm checks the segment iterately until all segments are matched. Therefore, the users do not need to input the characters according to stroke order. The accuracy and speed of recognition is increased due to the completeness of features in the dictionary database, and the efficiency of the stroke free matching algorithm. The recognition stage is to filter out the inappropriate candidates as much as possible. The preliminary match stage further reduces the number of candidates for the final detailed match in the recognition stage. This paper is truncated into a very brief summary and becomes insufficiently selfcontented. The remainder of this paper will be organized as follows: in the next Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 237-241, 2001. Springer-Verlag Berlin Heidelberg 2001
238
Tak Ming Law
section, the concept of segment of character will be briefly introduced; in section 3, the performance of stroke sequence free matching algorithm is described; and finally some experimental results and discussion will be concluded in the section 4.
2
Segments
In our database (which was presented in [1]), for simplicity and performance, we only consider five types of stroke, which is the basic stroke. Basic stroke can be divided as five strokes presented in the following table (which was extracted from [3]). Table 1. Five basic types of segment
Stroke Name
Horizontals
Verticals
South-West Slanting
South-East Slanting
Dot
Symbol Stroke Shape
h
V
P
n
d
The above five kinds of basic strokes are in their own unique directions without turning points. In our system, we call the above basic stroke types as segment types and which as the basic elements of the whole Chinese character database. Our system breaks all the compound-segment strokes into segments that used as the elements to represent the entire character. The features of each segment have been analyzed and placed into a single vector. Segments compose a compound-segment stroke. The amount of stroke count is different from segment count for a particular character. For example, The character ( ) is counted as 12 strokes in the regular Chinese database but 16 segments in our system since the character contains 4 compound-segment strokes ( ) each consisting of 8 segments. Our system breaks all the compound-segment strokes into segments that are used as the elements to represent the entire character. The features of each segment have been analyzed and placed into a single vector.
3
Stroke Sequence Free Matching Algorithm
Once the segment features of the input character have been obtained, the stroke sequence free matching algorithm can be started. The solutions were inspired from [4]. The steps are as follows: (1) each one of the segments in the input character will be matched iterately with all the segments in the reference character. (2) each iteration will get one best match pair of the input and reference characters. Best match pair denotes any pair of two segments with the lowest distance (Fig.1). (3) The segments in the input character will be iterately matched with the reference character one by one until all the segments are processed and the best match pairs will have been set up (Fig.2).
Some Experiment Results on Feature Analyses
239
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4
5
5
5
5
5
5
5
5
6
6
6
6
6
6
6
6
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Sometimes there may be two or more segments on one side matched with only one segment on the other side (Fig.3). This occasion shows there must be an incorrect matching between those matched pairs. The degree of distance between two segments is called match strength. The higher the distance ratio, the lower the match strength. When the above occasion occurs, the system will iterate again and distinguish the best match pairs according to the match strength between segments. Finally, rearrange all the best match pairs within the character as shown in Fig.4.
4
Experiment Results and Analysis
We had built and tested the segment database by using stroke sequence free matching algorithm. We perform the experiment by combining the techniques mentioned in [5], [6], [7] and [8] as a whole system. Some ideas of the testing were extracted from [9]. An over all recognition rates of 98.2% were achieved and the average speed of recognition were less than 1/2 second per character on IBM Compatible PCs. It is a closed result tested by the author himself, with limited cursive writing, in his own laboratory. Although the result is writer dependent, it still shows that an integrated recognition system using the proposed database is very promising. To perform the practical experiment, a database composed of 1100 Chinese characters was constructed as the dictionary, which had been trained for five times during the data learning stage. The segment numbers of the characters range from 1 to 31. Figure 5 indicates that the characters with a large segment number have higher recognition accuracy. All characters in the database had been tested one by one with only one try for each character.
240
Tak Ming Law Accuracy Percentage 100 98 96 94 92 90 88 86 84 82
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Stroke number of character
Fig. 5. The accuracy flow between different segment numbers
4.1
Conclusion
Compared with the other methods, the proposed method has the following advantages: 1.
Save time and computing resources during the recognition stage. Since all the features are in the Chinese character dictionary; the system only performs matching algorithms instead of performing both the calculation and the matching process for all candidates during the recognition stage.
2.
Capable of recognizing the Chinese character without radicals. The recognition stage does not count on the radical detection. The system only considers the features of each individual segment within the Chinese character dictionary. The system is able to recognize all the characters in the dictionary including those without radicals.
3.
Flexible adaptation The system can adapt to the variations related to the stroke relationship.
4.
Cursive writing handling The proposed method can match cursive Chinese character and can separate similar Chinese characters.
5.
Free from stroke-order and stroke-number variations. There is an algorithm mentioned in section 3 that disregards the stroke-order by selecting the high similarity stroke from the template as the result of the specific matching stroke of the input character instead of matching each stroke one by one in sequence between the template and the input character. Besides, the system permits a certain degree of tolerance for stroke-number variations. In this experiment, we set the tolerance for stroke-number variations as |2|.
6.
Training for the user is not required. The stroke free matching algorithm is powerful enough to isolate a small number of probable candidates for the final recognition stage, therefore, the time
Some Experiment Results on Feature Analyses
241
consumed for each character is within tolerable limits, i.e. 1/2 second. The main factor of the mis-recognition is due to the confusion of similar characters. The segment relations of these confusion pairs are not significant enough to discriminate one from another. Although it still remains a lot of problems to be solved, the current results are encouraging and inspire us to put further effort to discover more solutions. Finding out different features to recognize the similar characters will be the direction of our future.
Reference 1. Tak Ming Law, Signal Learning Algorithms and Database Architecture for On-line Chinese Characters Recognition, Proceedings of 2000 International Workshop on Multimedia Data Storage, Retrieval, Integration and Applications, Hong Kong (2000) 68-74. 2 Tak Ming Law, An On-line Chinese Character Recognition, Master of Philosophy thesis, The Chinese University of Hong Kong, Computer Science and Engineering Department, (1996). 3. Chi Chung Zhang, Chinese Recognition Techniques, Chinese Signal Processing, Tsing Hwa University Press, (1992). 4. Sheng-Li n Chou and Wen-Hsiang Tsai, On-Line Chinese Character Recognition through Stroke-Segment Matching using a New Discrete Iteration Scheme, Computer Processing of Chinese and Oriental Languages, Vol.7, No. 1, (1993) 120. 5. Tak-Ming Law, The Decision Path Classification For A Segment-Based On-Line Chinese Character Recognition, Proceedings Of The Conference On Applications Of Automation Science And Technology, Hong Kong (1998) 227-231. 6. Tak-Ming Law, Signal Smoothing, Sampling, Interpolation And Stroke Segmentation Algorithm For On-Line Chinese Character Recognition, Proceedings Of The Second International Conference On Information, Communications & Signal Processing, Singapore (1999),. 7. Tak-Ming Law, Signal Learning Algorithms And Database Architecture For OnLine Chinese Characters Recognition, Proceedings Of The 2000 International Workshop On Multimedia Data Storage, Retrieval, Integration And Applications, Hong Kong (2000) 68-74. 8. Tak-Ming Law, Segmentation Analysis And Similarity Measure For Online Chinese Character Recognition, Proceedings Of The International Conference On Chinese Language Computing, Chicago, Illinois, USA (2000). 9. Mr. Wong, An On-line Chinese Character Recognition, Master of Philosophy thesis, The Chinese University of Hong Kong, Information Engineering Department, (1993).
Automatic Detection Algorithm of Connected Segments for On-line Chinese Character Recognition Tak Ming Law Hong Kong Institute of Vocational Education (Morrison Hill) Department of Computing, 6 Oi Kwan Road, Wan Chai, Hong Kong. Email: [email protected] Abstract. This paper presents a very easy way to detect the improper connected strokes by simply breaking all the strokes into pieces of segments. Once the strokes of the character decomposed into segments, as the basis of recognition, the connected stroke problem is no longer exists anymore.
1 Introduction One of the most popular multimedia devices for people to enter Chinese characters into the system is on-line Chinese character recognition system. There are so many ways to perform on-line Chinese character recognition. For examples, some researchers utilize individual classifiers [1] to derive the best final decision from the statistical point of view [2] and others classify characters by feature extraction [3] or structural [4]. The simplest method used for the recognition is template matching [5]. Some works emphasized on characters searching look up [6]. Relaxation is a well-known matching method, which has been employed for the recognition of Chinese character [7]. Some other methods like attributed string matching by split-and-merge and segment-order free techniques [8] are also applied in on-line characters and numeric digit recognition. Now, some researchers are developing on-line Chinese signature verification by using some advanced character recognition techniques [9]. However, some products in the market do generate some incorrect results. The problems may be due to the inefficiency of database structure and retrieval methods. On-line Chinese character recognition algorithms are usually based on comparing the similarities of the individual stroke segments between the input and reference characters. However, the accuracy of the measurement always hindered by the improper connected strokes caused by running handwriting on the electronic tablet. We have found a very easy way to detect the improper connected strokes by simply breaking all the strokes into pieces of segments. Once the strokes of the character decomposed into segments, as the basis of recognition, the connected stroke problem is no longer exists anymore. Lets start by looking at the foundation that our recognition system based on in section 2. Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 242-247, 2001. c Springer-Verlag Berlin Heidelberg 2001
Automatic Detection Algorithm of Connected Segments
243
2 Basic Stroke Types (Segment Type) In our database, for simplicity and performance, we only consider five types of stroke, which is the basic stroke. Basic stroke can be divided as five strokes presented in the following table [10]. Table 1. Five basic types of segment Stroke Name
Symbol
©
Verticals
û
SouthWest Slanting
South-East Slanting
Dot
h
v
p
n
d
Horizontals
³
Stroke Shape
The above five kinds of basic strokes are in their own unique directions without turning points. In our system, we call the above basic stroke types as segment types and which as the basic elements of the whole Chinese character database. Our system breaks all the compound-segment strokes into segments that used as the elements to represent the entire character. The features of each segment have been analyzed and placed into a single vector. Segments compose a compound-segment stroke. The amount of stroke count is different from segment count for a particular character. For example, The character ( ) is counted as 12 strokes in the regular Chinese database but 16 segments in our system since the character contains 4 compound-segment strokes ( ) each consisting of 8 segments. Our system breaks all the compound-segment strokes into segments that are used as the elements to represent the entire character. The features of each segment have been analyzed and placed into a single vector.
í
3 Connected Segments Handling A stroke segment is measured from pen down to pen up. A connected segment is a segment with freeman code 1, 2, 3, 4, and is located between standard segments. In this system, we can easily detect all segments with the freeman code 1, 2, 3 and 4 from the input characters. The system counts them as the end of the segments; otherwise, they will be counted as connected segments. Fig. 1 shows an example of eliminating hooks.
244
Tak Ming Law
2
1
3
4
Fig. 1 Connected segment detection Algorithm utilizes the Freeman Code 1, 2, 3 and 4 to eliminate hooks. Segments will be ignored if detected as the code within 1, 2, 3 or 4. This algorithm can also be applied in dehooking (eliminating the necessary segments of characters).
If two standard segments without intermediate line segments are connected by tail to head, it is the head-tail type connection. For instance ( ) and ( ), connected as ( ), is a head-tail connection. On the other hand, if two standard segments are connected with an intermediate line segment, it is the backward type connection.
For instance, (
) and (
) with an intermediate (
), connected as
( ), is a backward connection. If a character has several connected segment radicals, but there is no connected segments within, then the writing is running hand writing. Obviously, the key problem for recognizing running hand writing is to solve the problem of connections between segments. The knowledge-based approach is applied for decomposing a connected segment into separate segments. Some rules based upon the knowledge of segment connection are summarized as follows:
@
1. If a connected segment is classified as a standard segments, e.g., ( ) ---> ( ), then the segment is decomposed into line segments for recognizing radicals because the number of segments as well as corresponding directions are the same. 2. When one writes characters, one usually starts writing character by character from left to right, line by line from top to bottom. If a compound-segment segment consists of several line segments, then the direction of the segment is between “h”, “v”, “p”, “n” segment type. Other than that, We can say it is a connected segment and the linkage can be cut off. That is, if the freeman code of that line segment is within 1, 2, 3 and 4, then our system will not consider it as a standard segment and will automatically ignore it. Some cases of connected segments are shown in the following table. (The above explanations are originated from [11])
Automatic Detection Algorithm of Connected Segments
245
Table 2. Examples of connected segments (Cited from [11])
H V P N
h
© û ³
©
û
v
p
³
n
Since our system will ignore those stroke segments with freeman code 1, 2, 3 and 4, around 95% of the connected segments shown on the above table are automatically detected. When people write characters in free hand, the frequency of segment number variation will occur as shown in Fig. 2 This shows a 37.4% frequency of missing segments and the maximum missing segment count is seven at a time. Moreover, there is only 1.4% frequency of additional segments and the maximum number is two segments at a time.
10 20 30 40 50 60 70
Percentage
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
Strokes
Fig. 2 Segment variation for Chinese script writing. (Cited from [10])
The range of segment number variation is from -2 to +2 as usual.
As in
I
section 2, we assume the input character X ( ) and the reference character G ( ) have the same segment number (Fig. 3). The system matches them by calculating the segment distance between the two characters. The result is the same as our assumption. As mentioned before in section 2, each vector in the data dictionary is dedicated for one segment of the character. The segmentation process will break the character into segments during the preprocessing stage. Although the input character X has one head-tail connected segment; it does not affect the amount of the total stroke segment of the character. Thus, the connected segment does not have any influence on the recognition stage. (The above example was inspired from [10])
246
Tak Ming Law
Fig 3 The connected segments do not affect the amount of segment number after segmentation because the reference characters in dictionary are in segment based. (Inspired from [10]).
4 Concluding Remarks The above connected stroke detection algorithm has been tested by implementations and shows satisfactory results on improving the accuracy of overall Chinese character recognition. In order to recognize the contribution of the automatic detection algorithm, we perform the experiment by combining the techniques mentioned in [12], [13], [14] and [15] as a whole system. 4.1
Experiment Results
To perform the practical experiment, a database composed of 1100 Chinese characters was constructed as the database, which had been trained for five times during the signal learning stage. The segment numbers of the characters ranged from 1 to 31. In order to include the variations of segment features in the database, each time of the signal learning was trained with a different writing style. It is a closed result tested by the author himself, with limited cursive writing, in his own laboratory. Although the result is writer dependent, it still shows that an integrated recognition system using the proposed algorithm is very promising.
Reference 1. K.Yamamoto And A. Rosenfeld, Recognition Of Hand-Printed Kanji Characters By Relaxation Method, Proc. 6th ICPR, (1982) 395-398.
Automatic Detection Algorithm of Connected Segments
247
2. Eveline J. Bellegarda, Jerome R. Bellegarda, David Nahamoo, And Krishna S. Nathan, A Fast Statistical Mixture Algorithm For On-Line Handwriting Recognition, IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 16, No. 12, (1994). 3. J.-W. Tai, T.-J. Liu And L.-Q. Zhang, A New Approach For Feature Extraction And Feature Selection Of Handwritten Chinese Character Recognition, From Pixels To Features III: Frontiers In Handwriting Recognition S. Impedovo And J.C. Simon (Eds.) (1992) 479-491. 4. Yih-Tay Tsay And Wen-Hsiang Tsai, Attributed String Matching By Split-And-Merge For On-Line Chinese Character Recognition, IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 15, No.2, (1993) 180-185. 5. M.Nakagawa, K. Aoki, T. Manable, S. Kimura, And N. Takahashi, On-Line Recognition Of Hand-Written Japanese Characters In JOLIS-1, Proc. 6th ICPR,(1982) 776-779. 6. K.S. Leung, Y. Fan And F.Y. Young, A Chinese Dictionary System Based On Fuzzy Logic And Object-Oriented Approach, Computer Processing Of Chinese And Oriental Languages, Vol. 6, No. 2, (1992) 205-219. 7. C. H. Leung, Y. S. Cheung And Y. L. Wang, A Knowledge-Based Stroke Matching Method For Chinese Character Recognition, IEEE Trans. Man Cybern. 17, (1987) 993999. 8. S.-L.Shiau, S.-J. Kung, A.-J. Hsieh, J.-W.Chen And M.-C.Kao, Stroke-Order Free OnLine Chinese Character Recognition By Structural Decomposition Method, From Pixels To Features III:Frontiers In Handwriting Recognition, (1992) 117-127. 9. Ke Jing, Qiao Yi Zheng, A Local Elastic Matching Method For On-Line Chinese Signature Verification, Journal Of Chinese Information Processing, Vol.12, No.1, (1998) 57-63. 10.Chi Chung Zhang, Chinese Recognition Techniques, Chinese Signal Processing, Tsing Hwa University Press, (1992). 11. Y.J.Liu And J.W. Tai, An On-Line Chinese Character Recognition System For Handwritten In Chinese Calligraphy, From Pixels To Features III: Frontiers In Handwriting Recognition, S. Impedovo And J.C. Simon (Eds), (1992) 87-99. 12.Tak-Ming Law, The Decision Path Classification For A Segment-Based On-Line Chinese Character Recognition, Proceedings Of The Conference On Applications Of Automation Science And Technology, Hong Kong (1998) 227-231. 13.Tak-Ming Law, Signal Smoothing, Sampling, Interpolation And Stroke Segmentation Algorithm For On-Line Chinese Character Recognition, Proceedings Of The Second International Conference On Information, Communications & Signal Processing, Singapore (1999),. 14.Tak-Ming Law, Signal Learning Algorithms And Database Architecture For On-Line Chinese Characters Recognition, Proceedings Of The 2000 International Workshop On Multimedia Data Storage, Retrieval, Integration And Applications, Hong Kong (2000) 6874. 15.Tak-Ming Law, Segmentation Analysis And Similarity Measure For Online Chinese Character Recognition, Proceedings Of The International Conference On Chinese Language Computing, Chicago, Illinois, USA (2000).
Speech Signal Deconvolution Using Wavelet Filter Banks Hu Weiping1,2 and Robert Linggard 1 1
Australian Research Centre for Medical Engineering University of Western Australia Nedlands, WA6009 Australia {huwp,bobling}@ee.uwa.edu.au 2 Department of Physics and Electronics, Guangxi Normal University Guilin, 541004, P.R.China [email protected]
Abstract. Cepstral analysis has been used on voiced speech to separate (deconvolve) the vocal tract filtering effect from the excitation produced by vocal fold vibration (voicing). This paper presents a new approach to speech deconvolution via the biorthogonal wavelet decomposition and reconstruction. The results of some experiments using wavelet deconvolution with voiced speech are given, and these results are compared with the cepstral method. They show that the wavelet method has the property of robustness. It is also automatic and easy to implement.
1
Introduction
The objective of deconvolving the speech signal is to separate the filtering effect of the vocal tract from its excitation at the glottis (vocal cord vibration). In speech and speaker recognition, deconvolution is done in order to find the resonant frequencies of the vocal tract (formants) which are phonetically important parameters. However, in the analysis of pathological voicing, the need is to eliminate the filtering effects of the vocal tract in order to focus on the details of the vocal cord vibration. Traditional methods for the deconvolution of speech into vocal tract and glottal components are cepstral separation, and LPC derived inverse filtering [1]. In the past decade, a new method of energy separation has also been used for speech deconvolution [2]. Here, we propose a new method of deconvolution which uses wavelet decomposition and reconstruction in the frequency domain.
2
Cepstral Method
According to the linear model of speech production [3], voiced speech is the convolution of the excitation of the vocal tract system and its impulse response, so that we may assume the following relationship: Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 248-256, 2001. Springer-Verlag Berlin Heidelberg 2001
Speech Signal Deconvolution Using Wavelet Filter Banks
s(t)=e(t)@v(t)
249
(1)
Where s(t) is the speech signal, v(t) is the impulse response of the vocal tract system, and e(t) is the excitation signal which originates at the vocal cords, and @ represents convolution. In the frequency domain, or spectral, domain we can write: S(w)=E(w)*V(w)
(2)
Where S(w), E(w), and V(w) are the Fourier transforms of s(t), e(t), and v(t), respectively, and * represents multiplication. E(w) is the spectrum of the glottal excitation signal, and V(w) is the transfer function of the vocal tract. If we take the logarithm of the equation (2), we transform these multiplicative components into additive ones: log|S(w)|2=2[log|E(w)|+ log|V(w)|] or
P(w)=Pe(w)+Pv(w)
(3)
Where P(w) is the log power spectrum of the speech signal, Pe(w) is the log spectrum of the excitation signal, and Pv(w) is the log magnitude frequency response of the vocal tract. Thus, P(w) consists of the spectrum of the excitation, which being periodic has a harmonic structure, shaped by the low pass filtering characteristic of the vocal tract transfer function. These two components are combined by addition, and since one (harmonic spectrum) varies more rapidly than the other (low pass filter function), they are theoretically capable of being separated. If the two components were in the time domain we should simply use a filter. The method of homomorphic deconvolution is to treat the components as if they were time signals and transform them via the Fourier transform into a so-called queferency domain, where we can separate the components on the basis of their queferency content. If we impose the IFFT (Inverse Fourier transforms) on P(w) again, we can get: Cs(n)=IFFT(P(w))=[IFFT(Pe(w))+IFFT(Pv(w))]
(4)
Cs(n)=Ce(n)+Cv(n)
(5)
Where Cs(n) is called the Cepstrum of the speech signal, and Ce (n) = [IFFT(Pe(w))] and Cv (n) = [IFFT(Pv(w))] Thus, we can use a low-pass filter in the querferency domain, (low-time lifter), to separate Cv (n) from Ce (n). Assuming that they can be separated completely, we can used an FFT again on Cv (n) and Ce (n) respectively to retrieve Ce (n) and Cv (n) [3]. The whole process is illustrated in the Figure 1.
250
Hu Weiping and Robert Linggard
Fig. 1. “Low-time liftering” to separate Ce (n) and Cv (n)
3
Wavelet Method
The Wavelet Transform has been successfully applied to various fields such as image and speech processing. In particular, the discrete dyadic wavelet transform is being used increasingly in speech recognition [4] [5], to implement filter banks [6]. In essence, the dyadic wavelet decomposition is a process that convolves a signal with the low-pass and high-pass decomposition filters associated with the decomposition wavelet and its scaling function at each scale of 2k. The simple introduction of dyadic wavelet decomposition is as follow: Level 0: a0(n) = x(n) Level 1: a1(n) = a0(n) @ h1(n); d1(n) = a0(n) @ g1(n); Level 2: a2(n) = a1(n) @ h2(n); d2(n) = a1(n) @ g2(n); ………… Level k: ak(n) = ak-1(n) @ hk(n); dk(n) = ak-1(n) @ gk(n);
(6)
Where ak(n) and dk(n) are the “ approximations” component and the “ details” component of the original signal in k level wavelet decomposition, respectively. hk(n) is the low-pass filter of k level (scaling function), gk(n) is the high-pass filter of k level(wavelet function), k∈ ∈[1,L]. Where 2L is the maximum scale of decomposition (or L is the maximum level of decomposition) and depends on N, the number of samples of the original signal x (n). The symbol @ represents circular convolution. The whole decomposition processes are shown in the Figure 2a.
Speech Signal Deconvolution Using Wavelet Filter Banks
251
The biorthogonal wavelets can be used if we have a signal or image that needs to be decomposed and then reconstructed. The family of biorthogonal wavelets exhibits the property of linear phase, which is indispensable for if we wish to recover the time waveform of the excitation signal. In practice, the one-dimensional biorthogonal wavelet transform requires four discrete filters, two low-pass filters and two high-pass filters, besides the hk(n) and gk(n) for decomposition, there are another pair of filters called dual Hk(n) (low-pass filters) and dual Gk(n) (high-pass filters) for reconstruction. The corresponding wavelets of these two pair of filters are duals of each other and these two kind of wavelet families are biorthogonal each other. By applying the dual Hk(n) and Gk(n) to the k level decomposition components ak(n) and dk(n), the decomposition component of the (k-1) level can be obtained, and by iterating this process, we can retrieve the original signal. The formula (7) and the block diagram shown in Figure 2b [7] can describe this whole process.
Fig.2a. 4 levels wavelet decomposition
Fig.2b. 4 levels wavelet reconstruction
ak-1(n)= 1/2(ak(n) @ Hk(n)+ dk(n) @ Gk(n))
(7)
So, if impose a k level dyadic wavelet transform to a signal a0(n),in its frequency domain, it is noticed that all the different level of decomposition filters {hk(n), gk(n)} will constitute a filter bank { hk, gk, gk-1, gk-2, …… g3, g2, g1 }, and the decomposition component set { ak, dk, dk-1, dk-2, …… d3, d2, d1 } corresponds to these band-pass filters. This is shown in the Figure 3. Returning now to our main task, in expression (3), we may impose the wavelet transform to the P(w). In its frequency domain, the “ querfrency domain” of original signal s(n), using the wavelet filter bank property, we can separate the Pe(w) and Pv(w) easily by using different decomposition components as follows: { ak, dk, dk-1, dk-2, …… d3, d2, d1 }= DWT(P(w))
(8)
Pv(w)=IDWT{ak, dk, dk-1, dk-2, …… dj,0, …… 0,0}
(9)
Pe(w)=IDWT{ 0, 0, 0, 0, …… 0, dj+1, …… d3, d2, d1}
(10)
Where DWT means wavelet decomposition transform and IDWT means wavelet reconstruction transform.
Hu Weiping and Robert Linggard
252
Fig.3. Wavelet filter banks in the frequency domain k=4; fmax, the highest frequency in a0
Now we are confronted by the problem of how to determine the key point number j, according to the “ frequency domain” property of “ signal” P(w). An automatic separating algorithm can be used as follows: •
Calculate power spectrum of speech signal s(n) P(w)=log10|FFT(s(n)ham(n))|=log10|S(w)|
Where s(n) is the original speech signal, ham(n) is the hamming window function •
Apply k level wavelet decomposition transform to the power spectrum P(w) and k=log2N, where N is the number of samples of the power spectrum signal P(w). (In experiments we just use maximum k=log2N-1) { ak, dk, dk-1, dk-2, …d3, d2, d1 }= DWT(P(w))
•
By using {ak,0},{ak,dk,0},{ ak,dk, dk-1, 0}… to retrieve [Di] respectively in order.
D1 ak D a 2 k D 3 = IDWT a k ... ... D k a k
0 dk dk ... dk
... 0 d k −1 ... d k −1
... ... 0 ... ...
... ... ... ... d2
0 0 0 . ... 0
Where Di=IDWT{ak, dk, dk-1, dk-2, …… dk-I+2, 0, ……,0 } ; (i>1) •
Calculate the different energy between neighbouring band-pass filter b = {b1 , b2 , b3 , b4 ,......bk −1}
Where [ Di +1 ( n ) − Di ( n )]2 N n =1 N
bi = ∑
Speech Signal Deconvolution Using Wavelet Filter Banks
• •
253
Find out the first minimum of b, such as bmin=bj then get the key number j. Use the low-frequency component to retrieve Pv(w) and high-frequency component to retrieve Pe(w) Pv(w)=IDWT{ ak, dk, dk-1, dk-2, …… dj, 0, …… 0,0,0 } Pe(w)=IDWT{ 0, 0, 0, 0, …… 0, dj+1, …… d3, d2, d1}
4
Experiments
We performed experiments to evaluate the performance of wavelet method and contrasted the wavelet method with the ceptrual method; the process and the conclusions are as follows. The sample A is a sustained phonation of vowel /a:/ for three seconds (female), the sampling rate of the speech signal is 8,192Hz, 16bit. The sample B is the same as the sample A except its sampling rate is 12,820Hz. We use the bior6.8, biothogonal dyadic wavelet transform [6], in all the experiments. Figure 4 shows that the wavelet method can perform well for this propose. The envelope component Pv(w) and excitation component Pe(w) have been separated successfully and in this case, the key number j=3, and b3 is close to zero, it means that the components ak, dk, dk-1, dk-2 contain the envelope function almost completely. Figure 5 shows the application of the wavelet method to the same sample A with different length of date frame. We have noticed that with the wavelet method, data frame length N is better when more than, or equal to 1024. The explanation must be that because of the properties of envelope, all the envelope energy in the power spectrum is located in the very low “ frequency” band, and if there is not adequate dateframe length for wavelet transform to deal with, the wavelet filter banks will not have enough fine low-pass filters to perform appropriate separation. Figure 6 shows the comparison between the wavelet and cepstral method. With the same sample A and the same date frame length, and different index n in Cs (n) which we use Cs (1), Cs (2), …… , Cs (n) to retrieve the Pv(w). We noticed that when the n is too small, the envelope curve loses most detail, which indicates where the formant is. When the n is too big, the envelope function will contain more detail than it should, and spurious peaks may mask the true information. Result shows that the optimum index for n in the Cepstral method is about 10. The envelope function from the cepstral method in which n=10 is the best fit to the envelope curve from wavelet method. Finally we use the different sampling rate signal, Sample B, in the experiment to show that the wavelet method also does a good separation, as shown in Figure 7.
5
Conclusion
In this paper we present a wavelet filter banks separation method for speech signal deconvolution. The experiments show that the wavelet method can work very well
254
Hu Weiping and Robert Linggard
with the data frame length more than or equal to 1024. We have very reason to suppose that an improvement would be expected if the wavelet packets technique could take into account even if the data frame length less than 1024. It may also be noted that, in essence, the wavelet method is similar to the cepstral method, but that it overcomes the difficulties of the cepstral separation. It is robust, automatic, and easy to implement.
Acknowledgements We wish to thank the Australian Research Center for Medical Engineering (ARCME) for providing us with the opportunity to carry out this research. In particular, We wish to thank Professor Attikiouzel, director of ARCME. We are also grateful to Mr. Fangwei Zhao and Dr. Christopher J. S. deSilva for their help in this research work.
References 1. 2. 3. 4.
5. 6. 7.
Hanson, H.M.; Maragos, P.; Potamianos, A. A system for finding speech formants and modulations via energy separation, Speech and Audio Processing, IEEE Transactions, Volume: 2 Issue: 3, July 1994, Page(s): 436 -443 Maragos, P.; Kaiser, J.F.; Quatieri, T.F. Energy separation in signal modulations with application to speech analysis, Signal Processing, IEEE Transactions, Volume: 41 Issue: 10, Oct. 1993 Page(s): 3024 -3051 J. R. Deller Jr., J. G. Proakis, and J. H. L. Hansen, Discrete-Time Processing of Speech Signals. New York: IEEE PRESS, 2000. Gowdy, J.N.; Tufekci, Z. Mel-scaled discrete wavelet coefficients for speech recognition Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference, Volume: 3 , 2000, Page(s): 1351 1354 vol.3 Yu Hao; Xiaoyan Zhu, A new feature in speech recognition based on wavelet transform, Signal Processing Proceedings, 2000. WCCC-ICSP 2000. 5th International Conference, Volume: 3,2000 Page(s): 1526 -1529 Kidae Kim; Dae Hee Youn; Chulhee Lee, Evaluation of wavelet filters for speech recognition, Systems, Man, and Cybernetics, 2000 IEEE International Conference, Volume: 4 , 2000 Page(s): 2891 -2894 Gilbert Strang and Truong Nguyen, Wavelet and Filter Banks, WellesleyCambridge Press, 1997.
Speech Signal Deconvolution Using Wavelet Filter Banks
Fig. 4. Wavelet method deconvolution, data frame length N=1024, Sample A
Fig.7. Wavelet method deconvolution, data frame length N=1024, Sample B
Fig.5. Wavelet method deconvolution, data frame length N=512, 1024, 2048. Sample A
255
256
Hu Weiping and Robert Linggard
Fig.6. Comparison of wavelet and cepstral method, data frame length N=1024, Sample A
A Proposal of Jitter Analysis Based on a Wavelet Transform Jan Borgosz and Boguslaw Cyganek Electronic Engineering and Computer Science Department, Academy of Mining and Metallurgy, Mickiewicza 30, 30-059 Kraków, Poland {borgosz,cyganek}@uci.agh.edu.pl
Abstract. The paper puts forth a proposal for a new jitter measurement method based on a wavelet transform usage. There are many problems associated with the generation jitter free reference clock for measurements. The proposed method does not need a reference clock which is its main advantage over known methods that rely on reference clock usage. Additionally, presented wavelet transform applied to the jitter signals allows for more detailed analysis than offered by other methods. Comparison of classic and wavelet approach is presented. Problems like wavelet function type, order and post processing methods are also indicated.
1 Introduction Estimating the jitter of a transmission clock is an important problem in telecommunication measurements. The classic approach to jitter measurement analysis usually consists of processing steps that use a reference clock [2][6][7]. The most troublesome part of the measurement process is to correlate slopes of the reference and received clocks. The purpose of this paper is to present a totally different wavelet based approach to jitter measurement analysis as compared to the aforementioned methods. Possibility of the usage different post processing methods is also shown (e.g. neural networks, fuzzy logic).
2 Jitter Theory A jitter is an unwanted, spurious transmission clock phase modulation that orginates from the physics of semiconductor device [2][6]. Modeling this phenomenon using a modulation scheme allows us to describe it with a multitone technique [5]. A single tone modulation case can be described:
y( t ) = A ⋅ cos( 2 ⋅ π ⋅ f n ⋅ t + ϕ m ( t )) Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 257-268, 2001. c Springer-Verlag Berlin Heidelberg 2001
(1)
258
Jan Borgosz and Boguslaw Cyganek
where y is the jittered clock signal with amplitude A [V] and base frequency fn [Hz]. Phase modulating function ϕ m ( t ) can be described as follows:
ϕ m ( t ) = 2 ⋅ π ⋅ k sin⋅ ( 2 ⋅ π ⋅ f jitt ⋅ t )
(2)
where fjitt is jitter frequency [Hz], k≥0 ∧ k∈R the jitter amplitude in telecommunication UI units (UI means Unit Interval which is equal to one cycle of transmission clock). Results of more extended simulations are shown in Fig. 1. For view clarity some assumptions were made: simulation time 0.001s, base clock frequency fn =1kHz, jitter amplitude Ajitt =1UI, jitter frequency fjitt =100 Hz.
Fig. 1. Example of jitter simulation results
All calculations will be presented for the sinus function, which easily can be changed to other wave shapes. Note that the sinus waves after the comparator will be square wave with 50% duty cycle – ideal clock signal [3][6][7].
3 Classic Measurement Environment Jitter test equipment is used with Equipment Under Test (EUT). Generated test signals are transmitted over telecom line into Equipment Under Test. EUT retransmits re-
A Proposal of Jitter Analysis Based on a Wavelet Transform
259
ceived data to the meter over telecom line again. Test equipment processes all received information and calculates results. In classic approach, the implementation of the jitter meter is part of the structure shown in Fig. 2. Data received from the telecommunication line is being transformed by line interfaces. Simultaneously to data processing and Bit Error Ratio calculations, transmission clock is recovered and passed to the FPGA meter input. Output from this module is connected to the Digital Signal Processor. DSP may bypass data straight to the host processor or improve processing by additional calculations (e.g. FFT or other filtering methods). Host processor (Fig. 2) helps to visualize measurement results to the end user.
Fig. 2. Jitter meter structure 1) Signal from the telecommunication line 2) Signal from the line interfaces. 3) BER information 4) Jittered clock 5) Jitter measure results 6) Jitter measure results after DSP
Here is example of digital jitter meter. There are three main components of FPGA implementation presented in Fig. 3: jitter-free clock generator, phase comparator and FIR filter [5][6].
Fig. 3. FPGA jitter meter implementation
Digital phase detector forms series of pulses in accordance with the phase differences between jittered clock signal y(t) described by equations (1), (2) and reference clock with frequency fn. This way formed signal (extended by sign bit, that describes phase shift direction) is provided into the FIR filter input. It can be seen, that time resolution (phase measure quantization) depends on sampling clock. An appropriate
260
Jan Borgosz and Boguslaw Cyganek
FIR structure is selected due to measurement type and range. Filtered signal with jitter information d(t) is available at the FIR output. It may be shown, that d(t) is equal to:
d ( t ) = k ⋅ sin( 2 ⋅ π ⋅ f jitt ⋅ t )
(3)
4 Jitter Measurement with Wavelet Transform
4.1 Bessel representation of a single tone modulated signal A single tone modulated signal can be written as follows:
y PM ( t ) = A ⋅ cos( Ω ⋅ t + ∆Θ PM ⋅ sinϖ ⋅ t )
(4)
As shown in [5] this representation can be replaced by a more appropriate form that makes use of a Bessel function:
y PM ( t ) = A ⋅
∞
∑ J n ( ∆Θ PM ) ⋅ cos( Ω + n ⋅ϖ ) ⋅ t
(5)
n = −∞
where Jn is an n-th order Bessel function of the first kind. In this case equations (1) and (2) can be rewritten as follows ( Ω n = 2 ⋅ π ⋅ f n and ϖ j = 2 ⋅ π ⋅ f jitt ): ∞
y( t ) = A ⋅
∑J
n ( 2 ⋅π
⋅ k ) ⋅ cos( Ω n + n ⋅ ϖ j ) ⋅ t
(6)
n = −∞
4.2 Jittered signal integration and RMS calculations A jittered sine signal can be integrated by a circuit with a much higher cut-off frequency than the maximum. In this case, the integrator like that of an accumulator. A jittered signal given by (6) after the integration will be equal to:
A y INT ( t ) = ⋅ ∆T
t + ∆T
∞
t
n = −∞
∫ ∑J
n ( 2 ⋅π
⋅ k ) ⋅ cos( Ω n + n ⋅ ϖ j ) ⋅ t
(7)
where ∆T is the integration period. Another operation that can be performed on the jittered signal is the RMS calculation:
A Proposal of Jitter Analysis Based on a Wavelet Transform
t + ∆T
y RMS ( t ) = A ⋅
∞ J n ( 2 ⋅ π ⋅ k ) ⋅ cos( Ω n + n ⋅ ϖ j ) ⋅ t n = −∞ ∆T
∫ ∑ t
261
2
(8)
Because a direct analysis of (7) and (8) were somewhat cumbersome, therefore some numerical computations were performed and are presented in Fig. 4. The relationship between frequency changes in the jittered signal as well as the amplitude changes in the integrated jittered signal or RMS can be observed in Fig.4b and Fig.4c.
(a)
(b) Fig. 4. a) Set of jittered sinusoids with jitter frequency fjitt=10Hz, carrier frequency fn = 100 Hz for different jitter amplitudes, b) the same sinusoids after integration
262
Jan Borgosz and Boguslaw Cyganek
(c) Fig. 4 - continuation. c) the same sinusoids after RMS calculation
Also numerical computations were performed for square waves – sinus waves after the comparator and are presented in Fig. 5. Note that, there is no need for reference clock usage for jitter detection. Results of practical experiments are shown in Fig. 6. They are confirmation of presented here calculations.
(a) Fig. 5. a) Set of jittered clock signals with jitter frequency fjitt=10Hz, carrier frequency fn = 100 Hz for different jitter amplitudes
A Proposal of Jitter Analysis Based on a Wavelet Transform
263
(b)
(c) Fig. 5 - continuation. b) the same clock signals after integration, c) the same clock signals after RMS calculation
4.3 Wavelet transform applied to jitter analysis High quality jitter measurements involve: amplitude, frequency and time - changes in time of both parameters. According to the authors opinion, the best tool for such an analysis is the Continuous Wavelet Transform (CWT) [1] [4], which is represented by the following equation:
264
Jan Borgosz and Boguslaw Cyganek
Fig. 6. Practical tests. Integrated jittered signals of E1 standard – 2,048 MHz base clock. Jitter frequency 5kHz, cursor positions: 1) jitter amplitude 0.5UI, 2) jitter amplitude 1UI, 3) jitter amplitude 1.5UI
∞
C( s , p ) =
∫ f ( t ) ⋅Ψ ( s , p ,t )dt
(9)
−∞
where Ψ(s,p,t) is mother wavelet, s the scale and p the position. Inserting (7) into (9) provides us the following formula: A t + ∆T ∞ dt ⋅Ψ ( s , p ,t ) dt ⋅ J ( 2 ⋅ π ⋅ k ) ⋅ cos( Ω + n ⋅ ϖ ) ⋅ t n n j ∆T −∞ t n = −∞ ∞
C( s , p ) =
∫
∫ ∑
(10)
As can be observed in (10), there is no easy way to find a relationship between jitter parameters and CWT coefficients for a signal after integration. This problem is the subject of research, as is also a selection of a proper mother wavelet function.
A Proposal of Jitter Analysis Based on a Wavelet Transform
265
During tests the authors decided to use the following wavelets (Fig.7): Mexican Hat, Morlet, Coiflets 2-5, Biorthogonal 2.6 2.8 4.4 5.5 6.8, because their shapes appeared to be the most appropriate [1] [4] to analyze the signals shown in Fig.4b and Fig.4c. As can be seen in Fig. 8. changes of wavelet coefficients carry information about jitter. The problem of correlating wavelet coefficients C(s,p) with jitter parameters (i.e. its amplitude and frequency) is subject of further research. Neural networks or fuzzy logic methods seem to be most appropriate.
(a)
(b) Fig. 7. Example of used wavelets a) Mexican Hat, b) Morlet
266
Jan Borgosz and Boguslaw Cyganek
(a)
(b) Fig. 8. Set of Mexican Hat wavelet coefficients for integrated jittered signals shown in Fig.4b: a) jitter amplitude equals 0UI, b) jitter amplitude equals 0.5 UI
A Proposal of Jitter Analysis Based on a Wavelet Transform
267
(c)
(d) Fig. 8 - continuation. Set of Mexican Hat wavelet coefficients for integrated jittered signals shown in Fig.4b: c) jitter amplitude equals 1 UI, d) jitter amplitude equals 1.5 UI
268
Jan Borgosz and Boguslaw Cyganek
5 Conclusions A new jitter measurement method using the wavelet transform was developed. In this paper we found correlation between jitter in signals and CWT coefficients of integrated jittered signals. This method provides a novel way of jitter measurement without reference clock usage. Wavelet transform with all benefits and lack of reference clock make this method very attractive. An analytic description of the method was presented. Furthermore wavelet types selection, operations performed on jittered signal (other than integration and RMS), as well as relations between wavelet coefficients C(s,p) and jitter amplitude, given by analytic equations (10), are under continuous research. This paper is a result of research work registered in KBN The State Committee for Scientific Research of Poland at number 7 T11B 072 20 and its publication is sponsored from KBN founds.
References 1. Bia•asiewicz J.: Falki i aproksymacje (in Polish). WNT (2000) 2. Feher and Engineers of Hewlett-Packard: Telecommunication Measurements Analysis and Instrumentation, Hewlett-Packard(1991) 3. Glover I. A., Grant P.M.: Digital Communications. Prentience Hall (1991) 4. Prasad L., Iyengar S.S.: Wavelet Analysis. CRC Press (1997) 5. Szabatin J.: Podstawy teorii sygnalów (in Polish). WKL (2000) 6. Trischitta P.R., Varma E.L.: Jitter in Digital Transmission System. Artech House Publishers (1989) 7. Takasaki Y., Personick S.D.: Digital Transmission Design and Jitter Analysis. Artech House Publishers (1991)
Skewness of Gabor Wavelets and Source Signal Separation Weichuan Yu1 , Gerald Sommer2 , and Kostas Daniilidis3 1
3
Dept. of Diagnostic Radiology, Yale University [email protected] 2 Institut f¨ ur Informatik, Universit¨ at Kiel [email protected] GRASP Laboratory, University of Pennsylvania [email protected]
Abstract. Responses of Gabor wavelets in the mid-frequency space build a local spectral representation scheme with optimal properties regarding the time-frequency uncertainty principle. However, when using Gabor wavelets we observe a skewness in the mid-frequency space caused by the spreading effect of Gabor wavelets. Though in most current applications the skewness does not obstruct the sampling of the spectral domain, it affects the identification and separation of source signals from the filter response in the mid-frequency space. In this paper, we present a modification of the original Gabor filter, the skew Gabor filter, which corrects skewness so that the filter response can be described with a sumof-Gaussians model in the mid-frequency space. The correction further enables us to use higher-order moment information to separate different source signal components. This provides us with an elegant framework to deblur the filter response which is not characterized by the limited spectral resolution of other local spectral representations.
1
Introduction
According to the well known uncertainty principle, the product of the spatial and the spectral support of a filter has a lower bound. Because Gabor filters [1] can achieve such a lower bound they are very useful in many spectral analysis tasks such as image representation (e.g. [2]) and the spatio-temporal analysis of motions in image sequences (e.g. [3,4]). Besides, Gabor filters were shown to approximate biological models of vision (e.g. [5,6,7]). In the spatio-temporal models for motion estimation [3,8], the energy spectrum of a constant translational motion can be characterized as an oriented plane passing through the origin in the spectral domain. Sampling the spectrum with a set of Gabor filters at different frequencies and orientations [4] may help us to estimate the orientation of the spectral plane. Grzywacz and Yuille [9] further argued that the spectral support of a Gabor filter is a measure of uncertainty and the angle between two tangential lines of the support, which pass through the spectral origin, represents the uncertainty of orientation estimation (see figure 1). This angle is desired to be Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 269–283, 2001. c Springer-Verlag Berlin Heidelberg 2001
270
Weichuan Yu et al.
the same for filters at different frequencies. Thus, the spectral support should be proportional to the distance between the origin and the support center.
ωt
0
ωx
Fig. 1. The motivation of applying 2D Gabor wavelets (redrawn from [9]). We represent the spectral support of a 2D Gabor filter with a circle. Applying a set of filters with constant scale may cause larger angular uncertainty at lower frequencies (as shown by the angle between two dashed lines). Thus, the spectral support of filters should be directly proportional to the mid-frequency
In Gabor filters, impulse responses have the same support in low and high frequencies. However, we would prefer the support to be inversely proportional to the mid-frequency. The coupling of the bandwidth with the mid-frequency yields Gabor wavelets which are extensively used in signal analysis and image representation (e.g. [10,2]). In applying Gabor wavelets we observe a positive skewness in the midfrequency space [9]. This skewness did not draw considerable attention in the computer vision community because most applications of Gabor wavelets are classification tasks. Being aware of the non-symmetrically spreading effect of Gabor wavelets in the mid-frequency space, we argue that an isotropic dissemination of the mid-frequency representation of the filter response (we call this local spectral representation the mid-spectrum) may facilitate the deblurring of filter responses so that we no more suffer from the limited resolution of frequency-based approaches. This is especially useful in source signal separation and multiple spectral orientation analysis. Based on this motivation we design a new filter to correct the skewness effect (section 2). In section 3 we further describe the 1D corrected mid-spectrum with a sum-of-Gaussians model and use higher-order moments to identify different source components. The deblurring of the mid-spectrum is also demonstrated. In section 4 we extend the analysis to 2D spectral orientation analysis. This paper is concluded in section 5.
2
The Skewness of Gabor Wavelets
We first explain the positive skewness of Gabor wavelets. For simplicity we begin with a 1D Gabor filter whose impulse response reads 2 1 −( x ) e 2σx2 ejω0 x . (1) g1 (x; ω0 , σx ) := √ 2πσx Here ω0 denotes the mid-frequency and σx is the scale parameter. The spectrum of g1 (x; ω0 , σx ) is a Gaussian centered at ω0
G1 (ω; ω0 , σx ) = e−
2 (ω−ω )2 σx 0 2
(2)
Skewness of Gabor Wavelets and Source Signal Separation
271
with bandwidth inversely proportional to σx . In applications, we usually calculate the spatial convolution between g1 (x; ω0 , σx ) and the input signal i(x) ∞ h1 (x; ω0 , σx ) := i(x) ∗ g1 (x) = i(ξ)g1 (x − ξ)dξ. (3) ξ=−∞
At a fixed position x0 , the filter response is simplified as an inner product ∞ h1 (x0 ; ω0 , σx ) = i(ξ)g1 (x0 − ξ)dξ. (4) ξ=−∞
Using the facts that g1 (x0 − x) = g1 (x − x0 ) and G1 (ω) = G1 (ω) (here denotes conjugation) the above inner product can also be represented in the spectral domain according to the Parseval theorem ([11], pp.113-115) as ∞ I(ω)G1 (ω)ejωx0 dω. (5) h1 (x0 ; ω0 , σx ) = ω=−∞
Here I(ω) is the spectrum of i(x). Thus, for x = x0 (for simplicity we set x0 = 0) we obtain a local spectral representation which is a function of the mid-frequency ω0 and the scale σx . We call this representation the mid-spectrum of the signal. The mid-spectrum h1 (ω0 , σx ) spreads every spectral Dirac component of the source signal into a function of ω0 . Assume that the spectrum of a source signal is a Dirac function: I(ω) = δ(ω − ωi ) originating from a complex harmonic. Equation (5) then turns out to be h1 (ω0 , σx ) = G1 (ωi ; ω0 , σx ) = e−
2 (ω −ω )2 σx 0 i 2
.
(6)
When the parameter σx is a constant, h1 (ω0 , σx ) is a Gaussian spreading of δ(ω − ωi ) and there is no skewness. But if the wavelet property is preferred, i.e. if σx is inversely proportional to ω0 σx =
C ω0
(7)
with C as a constant. Then, we observe the positive skewness of ω0 [9] (see also figure 2) h1 (ω0 , C) = e
−
C 2 (ω0 −ωi )2 2 2ω0
.
(8)
We may straightforwardly extend the above analysis for n-dimensional Gabor wavelets with isotropic envelope. For 2D Gabor wavelets in the spatio-temporal domain we have the following relation C σx = σt = 2 . 2 ωx0 + ωt0
(9)
The mid-spectrum of a 2D spectral impulse δ(ωx0 − ωxi , ωt0 − ωti ) reads h2 (ωx0 , ωt0 , C) = exp{−
C 2 [(ωx0 − ωxi )2 + (ωt0 − ωti )2 ] }. 2 + ω2 ) 2(ωx0 t0
(10)
272
Weichuan Yu et al.
h (ω 0, C)
0
ω t0 π
ωi
π
ω0
0
0
π
ω x0
Fig. 2. The skewness of Gabor wavelets. Left: The solid curve denotes h1 (ω0 , C) and the dotted curve is a Gaussian function centered at ωi with the scale parameter ωCi . C = 3.5, ωi = π2 . Right: 2D skewness h2 (ωx0 , ωt0 , C). C = 3.5, ωxi = ωti = π2
In many Gabor wavelet approaches, this skewness seems harmless because it does not obstruct the description of different signals with a set of samples [12,13]. The main attention was attracted to the efficient covering/sampling of the spectrum as well as the coefficient estimation of the Gabor basis [10,2]. But we should keep in mind that the spreading effect of Gabor wavelet filtering (see equation (8)) really blurs the input signal non-symmetrically in the mid-frequency space. For the sake of source signal identification and separation, we prefer to have a symmetric spreading. In the following we present a new filter to correct this positive skewness. 2.1
Correcting the Skewness
In order to achieve symmetry in the mid-spectrum, we introduce a new skew Gabor filter whose spectral definition reads SG1 (ω; ω0 , C) := exp{−(
C 2 ω − ω0 2 )( ) }. 2 ω
(11)
There exists no analytical expression of the skew Gabor filter in the spatial domain because there is no closed-form representation of the inverse Fourier transform of SG1 (ω). But we may obtain an FIR version of both the real and the imaginary part of the skew Gabor filter sg1 (x) using filter-design in the Fourier domain and discrete Fourier transform. In figure 3 we display one example of the skew Gabor filter. It is similar to a Gabor filter with subtle shape differences inside the Gaussian envelope. Replacing G1 (ω; ω0 , σx ) in equation (5) with SG1 (ω; ω0 , σx ) yields a midspectrum with an ideal Gaussian shape sh1 (ω0 , C) = exp{−
C 2 (ω0 − ωi )2 }. 2ωi2
(12)
Similarly, we may correct the skewness of 2D Gabor wavelets by using a 2D skew Gabor filter SG2 (ωx , ωt ; ωx0 , ωt0 , C) = exp{−(
C 2 (ωx − ωx0 )2 + (ωt − ωt0 )2 )[ ]}. 2 ωx2 + ωt2
(13)
Skewness of Gabor Wavelets and Source Signal Separation
0.2
0.2
0.2
0.1
0.1
0.1
0
0
0
−0.1
−0.1
−0.1
−0.2 −50
0
50
−0.2 −50
0
50
−0.2 −50
0.2
0.2
0.2
0.1
0.1
0.1
0
0
0
−0.1
−0.1
−0.1
−0.2 −50
0
50
−0.2 −50
0
50
−0.2 −50
273
0
50
0
50
Fig. 3. Top: The real parts of a 1D skew Gabor filter (left) and of a Gabor filter (middle) as well as their even-symmetric difference (right). Bottom: The imaginary parts of both filters (left: skew Gabor; middle: Gabor) and their oddsymmetric difference (right). The parameters are C = 3.5 and ω0 = π2 The mid-spectrum corresponding to δ(ωx0 − ωxi , ωt0 − ωti ) is then an ideal 2D Gaussian (cf. equation (10)) sh2 (ωx0 , ωt0 , C) = exp{−(
C 2 (ωx0 − ωxi )2 + (ωt0 − ωti )2 )[ ]}. 2 + ω2 2 ωxi ti
(14)
In figure 4 we display a 1D cosine sequence and the correction of the skewness in the mid-spectrum. Here we use only one constant C to keep the Gaussian envelope isotropic. It is also possible to apply two different constants (i.e. Cx = Ct ) in order to form a mid-spectrum with an elongated Gaussian shape. But this is beyond the scope of this paper.
3
1D Source Signal Separation
In the following we demonstrate the merit of correcting the positive skewness. We start with 1D source signal separation. We assume that the spectrum of an input signal is composed of two Dirac components S(ω) = a1 δ(ω − µ1 ) + a2 δ(ω − µ2 ),
(15)
where their amplitudes (a1 and a2 ) and offsets (µ1 and µ2 ) are unknown. Our goal is to estimate these amplitudes and offsets from the mid-spectrum so that
274
Weichuan Yu et al.
ωt 0
t
ωt 0
ωx0
ωx0
x
Fig. 4. Left: Cosine sequence f (x, t) = 5 cos( π4 (x + 0.5t)). Middle: Midspectrum using Gabor wavelets with C = 3.5. Right: Mid-spectrum using 2D skew Gabor filters with the same C. The skewness in the middle image is corrected the source components can be identified and separated. Here we do not discuss the traditional Fourier analysis, but focus on the comparison with Gabor wavelets. If we apply plain Gabor wavelets for filtering, the mid-spectrum is an overlap of two skewness curves (cf. equation (8)). Though iterative algorithms (e.g. [14]) or learning methods (e.g. [15]) may be used to extract the desired parameters, such non-analytic approaches are computationally inefficient and are sensitive to initial values and related parameters in the cost function. Besides, they are susceptible to local minima in the regression procedure. Thus, we prefer to use an analytic framework for parameter regression. The correction of skewness makes this idea possible. Under the same assumption as that in equation (15), the mid-spectrum of skew Gabor filters is then a sum of two differently weighted and shifted Gaussian functions (for simplicity 1 of Gaussian) we omit the coefficient term √2πσ g(ω0 ) = g1 (ω0 ) + g2 (ω0 ) with
g1 (ω0 ) = a1 e
−
(ω0 −µ1 )2 µ 2( 1 )2 C
(ω0 −µ2 ) g (ω ) = a e− 2( µC2 )2 2 0 2
2
.
(16)
(17)
The scale parameters in above Gaussians are proportional to the mean values. In figure 5 we demonstrate the mid-spectrum of plain Gabor wavelet filtering as well as the mid-spectrum of skew Gabor filtering. The sum-of-Gaussians model is well studied from statistic aspect and is widely used in neural network approaches (e.g. [15,14]). One benefit of this model is that we are able to use higher-order moment information to extract parameters. According to Appendix A we obtain the following system of equations
Skewness of Gabor Wavelets and Source Signal Separation
g(ω0)
h(ω0) a1 a2
0
g(ω0)
h(ω0) h1(ω0) h2(ω0) µ1 µ 2
a1
g1(ω0)
g2(ω0)
a2
ω0
0
ω0
µ1 µ 2
275
Fig. 5. Left: The midspectrum of plain Gabor wavelet filtering. Right: The superposition of two Gaussians after 1D skew Gabor filtering. The scale parameters of these two Gaussians are determined by µC1 and µC2 , respectively
in a1 , a2 , µ1 , and µ2 a1 µ1 + a2 µ2 a1 µ2 + a2 µ2 1 2 3 3 a µ + a µ 1 2 1 2 a1 µ4 + a2 µ4 1
2
= m0 √C2π := b1 = m1 b 1 := b2 = 1 1+1 m2 b1 := b3 . =
C2
1 m b +1 3 1
3 C2
(18)
:= b4
Here m0 denotes the integration of g(ω0 ) and m1 , m2 , and m3 denote the first three order moments of g(ω0 )/m0 . Without loss of generality we assume 0 < µ1 ≤ µ2 . Solving these equations (Appendix B) yields √ 2 2 +bb1 +b1 b −4ac) √ a1 = a(2ab 2 −4ac−b b2 −4ac b √ 2 2 +bb1 −b1 b −4ac) √ a2 = a(2ab b2√ −4ac+b b2 −4ac , (19) −b+ b2 −4ac µ = 1 2a √ b2 −4ac µ2 = −b− 2a where b1 , b2 , b3 , and b4 are defined in (18) and the variables a, b, and c are defined in (B.6) (Appendix B). The term b2 − 4ac is guaranteed to be no less than zero (see Appendix B). If b2 − 4ac = 0, there is only one single Gaussian (i.e. µ1 = µ2 ) and we can estimate its mean value and amplitude directly using equations (A.1) and (A.2). In figure 6 we display an example of source signal separation. The input signal is composed of two cosine functions 3π π s(x) = 2 cos( x) + cos( x) 4 8
(20)
with the spectrum S(ω) = δ(ω ± π4 ) + 12 δ(ω ± 3π 8 ). Now we sample the positive spectral space with Gabor wavelets and skew Gabor filters. We start the midπ π and increase it with a step of 128 to get a dense sampling. frequency at ω0 = 128 7π Here we set the highest mid-frequency as ω0 = 8 so that we do not need to consider the boundaries in the mid-spectrum. Using higher-order moments we
276
Weichuan Yu et al.
|S(ω)| 2
1 0 −2 0
100
0 π 3π 4 8
200
1.5
1.5
1
1
0.5
0.5
0 0
π
ω
0
0 0
ω
π
ω
0
Fig. 6. Top: The source signal and its energy spectrum. Bottom: The positive mid-spectra (solid lines) using plain Gabor wavelets (left) and using skew Gabor filters (right). These curves are actually overlapping of the spreading responses of two Dirac functions (shown as crosses)
estimate the amplitudes and the locations of two positive Dirac components a1 = 0.9976 ≈ 1 a2 = 0.4825 ≈ 12 . (21) µ1 = 0.8130 ≈ π4 µ2 = 1.2079 ≈ 3π 8 In the negative frequencies, we may perform a similar procedure to extract the desired parameters. Then, we are able to identify the source signal components in spite of the blurring in the mid-spectrum. In other words, this method can “deblur” the mid-spectrum. Taking into account that a lot of efforts had to be made in filter design so that the blurring after filtering does not significantly affect the identification of signals or orientations (e.g. [16,17]), this framework provides an elegant solution to increases the spectral resolution.
4
Orientation Analysis in 2D Spectral Space
In this section, we analyze the appearance of multiple orientations in 2D spectral space using skew Gabor filters. An important application of this analysis is multiple motion analysis in xt-space. According to [18,19,8], both 1D occlusion and transparency may be modeled as multiple lines in the spectral domain, with some distortion in case of occlusion and without distortion in case of transparency. Thus, the problem of motion estimation turns out to be an issue of orientation analysis in the spatio-temporal space. As the angle between two spectral lines
Skewness of Gabor Wavelets and Source Signal Separation
277
can be arbitrary, eigen-analysis (e.g. [20,21]) cannot properly determine the orientation of multiple lines. Sampling the spectrum with Gabor filters [4] provided a good motivation, but suffered under the limited resolution. Here we prove that this limitation may be overcome using skew Gabor filters. As the energy spectrum of either occlusion or transparency is mainly a superposition of two spectral lines, the corresponding mid-spectrum after skew Gabor filtering is then the sum of differently weighted 2D Gaussians centered on two spectral lines. Along each spectral line, these Gaussians have the same angular uncertainty due to wavelet property (cf. figure 1). Though the angular distribution sha (θ) of a 2D Gaussian is no more an exact 1D Gaussian, we still can approximate sha (θ) using a Gaussian with appropriate parameters, especially if C is adequately large (e.g. C ≥ 3). Due to the space limitation we won’t delve into the mathematic derivation, but use an example in figure 7 as an intuitive proof. The reader is referred to [22] for more details. After this approximation, all 2D Gaussians centered on the same spectral line have the same angular mean value and the same angular scale parameter σa . Consequently, after polar integration 1 we obtain one 1D Gaussian from all 2D Gaussians centered on the same spectral line and the angular distribution of the mid-spectrum is the superposition of two 1D Gaussians. Thus, we are able to extract the exact orientation of the spectral lines from the blurring mid-spectrum using the framework introduced in section 3. The only difference here is that the parameter σa is no more proportional to θi , but a constant determined by C.
ω t0 (ωxi,ωti)
1.5
11 00
1 1
0
θi O
σa
l0
0.5
−1
ω x0
0 0
1
2
θ
0
1
2
θ
Fig. 7. Left: Polar integration of an isotropic 2D Gaussian centered at (ωxi , ωti ) can be approximated by an ideal Gaussian function with mean value θi and scale parameter σa . The solid circle represents the support of the Gaussian. The pencil of lines passing through the origin denotes the integration paths. The middle point of the intersection between a integration path and the solid circle lies on the dotted circle with a diameter l0 . Middle: The solid curve is the plot of sha (θ). For comparison we plot an ideal Gaussian with crosses as well. The scale of this Gaussian is σa = sin−1 (1/C) with C = 3.5. Right: The maximal difference between the normal Gaussian and sha (θ) is less than 2% of max(sha (θi )) 1
This integration is well known as Radon Transform [23].
278
4.1
Weichuan Yu et al.
Examples
To evaluate the performance of our framework properly, we use synthesized examples. The first example demonstrates the deblurring ability of our framework. We use a 2D signal whose spectrum is composed of two spectral lines passing through the origin (figure 8). The angles between these two lines and the ωx axis are 15 degrees and 30 degrees, respectively. The mid-spectrum after skew Gabor filtering is strongly blurred due to the spreading effect of filtering and the overlapping of two neighboring Gaussians. The source signal components are hardly to observe in this mid-spectrum. Using higher-order moments, we are still able to determine the orientation of the original spectral lines: µ1 = 13.37◦ and µ2 = 30.07◦. The relative large error in µ1 is caused by the discrete approximation of the polar integration (e.g. at 0 degree we have more grid points than at 15 degrees). We may reduce this error by increasing the grid density or by interpolation. But we will not enter this topic here.
ωy0
ωy
0.025 0.02
ωx
ωx0
0.015 0.01 0.005 0
−100
0
100
θ
Fig. 8. Left: The spectrum of a 2D signal is composed of two spectral lines passing through the origin with an angle of 15 degrees between them. Middle: Mid-spectrum using 2D skew Gabor filters with C = 6. The mid-frequency 2 + ω 2 ≤ 3π/4. Right: The angular distribution of the satisfies π/16 ≤ ωx0 t0 mid-spectrum
The second example is to estimate multiple motions in a transparency sequence (figure 9). In this sequence we have one random dot sequence moving at 1.00 [pixel/frame] and one sum-of-cosines sequence moving at 0.40 [pixel/frame], respectively. For clarity of displaying we arrange the maximal amplitude of the cosine sequence to be twice the maximal amplitude of the random dot sequence so that the corresponding spectral lines have the same amplitudes. The spectrum displays the superposition of two motions clearly. The mid-spectrum spreads this distribution. This is clearly to see in the plot after the polar integration. As the spectral lines are symmetric with respect to the origin, we only need one half for estimation. Here we use the higher-order moments in the angular space between 90 degrees and 180 degrees to determine the orientation of spectral lines: µ1 = 134.43◦ and µ2 = 158.62◦. The normal vector of these two lines indicate the velocities: u1 = cot(µ1 − 90◦ ) = 1.02 and u2 = cot(µ2 − 90◦ ) = 0.39.
Skewness of Gabor Wavelets and Source Signal Separation
t
t
279
t
+
=
x
x
x
ωt 0
ωt
3
ωx
ωx0
2 1 0
−100
0
100
θ
Fig. 9. Top: The transparency sequence (right) f (x, t) = 89 k=1 Ak cos(ωk (x − 0.4t) + φk ) + ran(x − t). In the sum-of-cosine sequence (left) ωk varies from π/16 to 3π/4 with a step of π/128. The amplitudes Ak and phase components φk are randomly chosen. The random dot sequence (middle) ran(x − t) moves with 1 [pixel/frame]. Bottom Left: The spectrum of the transparency sequence. Bottom Middle : Mid-spectrum using 2D skew Gabor filters with C = 6. The 2 + ω 2 ≤ 3π/4. Bottom Right: The middle frequency satisfies π/16 ≤ ωx0 t0 angular distribution of the mid-spectrum
5
Discussions
The skewness correction of Gabor wavelets results in a Gaussian spreading of the input signal in the mid-frequency space. After the correction we are able to model the distribution in the parameter space with a sum-of-Gaussians model. Comparing with the non-symmetric skewness curve, the benefit of using Gaussian functions for distribution description is obvious: Gaussians have good localization ability and are capable of providing simple yet rich descriptions of signals. From the point of view of probabilistic signal processing and pattern recognition, this correction simplifies the tasks of signal analysis significantly. For example, the analytical framework for source signal separation benefits from the statistical simplicity of Gaussians in calculating higher-order moments. Higher-order moment information is also used in independent component analysis (ICA) approaches [24,25]. In ICA approaches we need a numerical solution (e.g. singular value decomposition (SVD)) because the distribution is unknown. In our framework, however, the sum-of-Gaussians model makes an analytic solution possible. It is also worth mentioning that we need only one superposition of the source signals to separate them (In [25], for example, two linearly independent superposition are needed to separate source signals).
280
Weichuan Yu et al.
Another point of our source signal separation framework is that most frequency-based methods suffer from low resolution due to spreading and overlapping. By achieving the spreading to have a Gaussian shape, we can separate two overlapping Gaussians in the mid-frequency space. This enables us to reach very fine resolution in the spectral domain and therefore solve the aliasing problem. In the future work the following points are worth studying: – Extend the framework to 2D multiple motion analysis, where the source signal itself is a sum of 2D Gaussians. – Reduce the computational load by using elongated filter masks and by studying how sparsely we can sample the spectrum without affecting the parameter regression. – Develop efficient estimation algorithms for the spectrum with multiple harmonics (more than two Dirac impulses). – Study the sensitivity to noise.
Appendix A For convenience we change the variable in equations (16) and (17) to x and normalize g1 (x), g2 (x), and g(x) to obtain the corresponding distribution density functions f1 (x), f2 (x), and f (x): √ 2π (a1 µ1 + a2 µ2 ), (A.1) m0 = g(x)dx = C (x−µ )2 − µ11 2 1 ) 2( R g1 (x) √ C f (x) = = e 1 2π g1 (x)dx C µ1 (x−µ )2 − µ22 2 . g2 (x) 1 ) 2( R √ C f (x) = = e 2 2π g2 (x)dx C µ2 1 f (x) = m10 g(x) = a1 µ1 +a [a1 µ1 f1 (x) + a2 µ2 f2 (x)] 2 µ2 The first three order moments of f (x) read
1 (a1 µ21 + a2 µ22 ), a1 µ1 + a2 µ2 1 C2 + 1 (a1 µ31 + a2 µ32 ), m2 = x2 f (x)dx = a1 µ1 + a2 µ2 3 C2 + 1 m3 = x3 f (x)dx = (a1 µ41 + a2 µ42 ). a1 µ1 + a2 µ2
m1 =
xf (x)dx =
(A.2) (A.3) (A.4)
Reformulate equations (A.1), (A.2), (A.3), and (A.4) yields the equation system (18).
Skewness of Gabor Wavelets and Source Signal Separation
281
Appendix B After defining x1 = a1 µ1 , x2 = a2 µ2 , we get an equation system of variables x1 , x2 , µ1 , and µ2 from (18) x1 + x2 = b1 ,
(B.1)
x1 µ1 + x2 µ2 = b2 ,
(B.2)
x1 µ21 + x2 µ22 = b3 ,
(B.3)
x1 µ31 + x2 µ32 = b4 .
(B.4)
From (B.1) and (B.2) we obtain x1 (µ1 − µ2 ) = b2 − b1 µ2 ,
(B.1-1)
x2 (µ1 − µ2 ) = b1 µ1 − b2 .
(B.2-1)
We multiply both sides of (B.3) and (B.4) with (µ1 − µ2 ) and simplify them as (b2 − b1 µ1 )µ2 = b3 − b2 µ1 ,
(B.3-1)
(b2 − b1 µ1 )µ22 + (b2 − b1 µ1 )µ1 µ2 + b2 µ21 − b4 = 0.
(B.4-1)
Submitting (B.3-1) into (B.4-1) yields
with
aµ21 + bµ1 + c = 0
(B.5)
a := b22 − b1 b3 b := b1 b4 − b2 b3 . c := b23 − b2 b4
(B.6)
This is a standard one variable, two order equation whose discriminator reads b2 − 4ac = (b1 b4 − b2 b3 )2 − 4(b22 − b1 b3 )(b23 − b2 b4 ) = [a1 a2 µ1 µ2 (µ1 − µ2 )3 ]2 ≥ 0
(B.7)
The equality is attainable only when µ1 = µ2 , i.e. when we have only one single Gaussian. Then we only need to use (A.1) and (A.2) directly to extract parameters. In case of b2 − 4ac > 0, we have two real roots (without loss of generality we assume µ1 < µ2 ) √ b2 −4ac µ1 = −b+√2a . (B.8) b2 −4ac µ2 = −b− 2a
282
Weichuan Yu et al.
Here a < 0 (cf. (B.6)). Submitting µ1 and µ2 into (B.1-1) and (B.2-1) and further taking into account that x1 = a1 µ1 , x2 = a2 µ2 we solve a1 and a2 √ a = a(2ab2 +bb1 +b√1 b2 −4ac) 1 2 −4ac b2 −4ac−b b√ . (B.9) a2 = a(2ab2 2 +bb1 −b√1 2 b2 −4ac) b −4ac+b b −4ac
References 1. Gabor, D.: Theory of communication. Journal of the IEE 93 (1946) 429–457 269 2. Lee, T. S.: Image representation using 2D Gabor wavelets. IEEE Trans. Pattern Analysis and Machine Intelligence 18(10) (1996) 959–971 269, 270, 272 3. Adelson, E. H., Bergen, J. R.: Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America 1(2) (1985) 284–299 269 4. Heeger, D. J.: Optical flow using spatiotemporal filters. International Journal of Computer Vision 1(4) (1987) 279–302 269, 277 5. Daugman, J. G.: Uncertainty relation for resolution in space, spatial frequency and orientation optimized by two-dimensional visual cortical filters. Journal of the Optical Society of America 2(7) (1985) 1160–1169 269 6. Koenderink, J., Doorn, A. V.: Representation of local geometry in the vision system. Biological Cybernetics 55 (1987) 367–375 269 7. Heitger, F., Rosenthaler, L., der Heydt, R. V., Peterhans, E., Kuebler, O.: Simulation of neural contour mechanisms: from simple to end-stopped cells. Vision Research 32(5) (1992) 963–981 269 8. Beauchemin, S., Barron, J.: The frequency structure of 1d occluding image signals. IEEE Trans. Pattern Analysis and Machine Intelligence 22 (2000) 200–206 269, 276 9. Grzywacz, N., Yuille, A.: A model for the estimate of local image velocity by cells in the visual cortex. Proc. Royal Society of London. B 239 (1990) 129–161 269, 270, 271 10. Bovik, A. C., Clark, M., Geisler, W. S.: Multichannel texture analysis using localized spatial filters. IEEE Trans. Pattern Analysis and Machine Intelligence 12(1) (1990) 55–73 270, 272 11. Bracewell, R. N.: The Fourier Transform and Its Applications. McGraw-Hill Book Company (1986) 271 12. Jain, A., Farrokhnia, F.: Unsupervised texture segmentation using Gabor filters. Pattern Recognition 24(12) (1991) 1167–1186 272 13. Manjunath, B., Ma, W.: Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Analysis and Machine Intelligence 18(8) (1996) 837–842 272 14. Poggio, T., Girosi, F.: Networks for approximatation and learning. Proceedings of the IEEE 78 (1990) 1481–1497 274 15. Daugman, J.: Complete discrete 2-d Gabor transforms by neural networks for image analysis and compression. IEEE Trans. Acoustics, Speech, and Signal Processing 36(7) (1988) 274 16. Simoncelli, E. P., Farid, H.: Steerable wedge filters for local orientation analysis. IEEE Trans. Image Processing 5(9) (1996) 1377–1382 276 17. Yu, W., Daniilidis, K., Sommer, G.: Approximate orientation steerability based on angular Gaussians. IEEE Trans. Image Processing 10(2) (2001) 193–205 276
Skewness of Gabor Wavelets and Source Signal Separation
283
18. Fleet, D. J.: Measurement of Image Velocity. Kluwer Academic Publishers (1992) 276 19. Fleet, D., Langley, K.: Computational analysis of non-Fourier motion. Vision Research 34 (1994) 3057–3079 276 20. Shizawa, M., Mase, K.: A unified computational theory for motion transparency and motion boundaries based on eigenenergy analysis. In: IEEE Conf. Computer Vision and Pattern Recognition, Maui, Hawaii, June 3-6 (1991) 289–295 277 21. J¨ ahne, B.: Spatio-Temporal Image Processing. Springer-Verlag (1993) 277 22. Yu, W., Sommer, G., Daniilidis, K.: Skewness of Gabor wavelets and source signal separation. submitted to IEEE Trans. Signal Processing (2001) 277 23. Radon, J., translated by P. C. Parks: On the determination of functions from their integral values along certain manifolds. IEEE Trans. Medical Imaging 5(4) (1986) 170–176 277 24. Cardoso, J.: Source separation using higher order moments. In: IEEE International Conf. on Acoustics, Speech and Signal Processing. (1989) 2109–2112 279 25. Farid, H., Adelson, E. H.: Separating reflections from images using independent components analysis. Journal of the Optical Society of America 16(9) (1999) 2136–2145 279
The Application of the Wavelet Transform to Polysomnographic Signals M. MacCallum and A. E. A. Almaini School of Engineering, Napier University Edinburgh [email protected] [email protected]
Abstract. Polysomnographic (sleep) signals are recorded from patients exhibiting symptoms of a suspected sleep disorder such as Obstructive Sleep Apnoea (OSA). These non-stationary signals are characterised by having both quantitative information in the frequency domain and rich, dynamic data in the time domain. The collected data is subsequently analysed by skilled visual evaluation to determine whether arousals are present, an approach which is both time-consuming and subjective. This paper presents a wavelet-based methodology which seeks to alleviate some of the problems of the above method by providing: (1) an automated mechanism by which the appropriate stage of sleep for disorder observation may be extracted from the composite electroencephalograph (EEG) data set and (2) an ensuing technique to assist in the diagnosis of full arousal by correlation of wavelet-extracted information from a number of specific patient data sources (e.g. pulse oximetry, electromyogram [EMG] etc)
1
Introduction
Although sleep encompasses a third of the average person’s life, sleep and the disorders of sleep are poorly understood. Research suggests that sleep plays a restorative role in physiologic mechanisms and that the long-term disruption of sleep leads to disease and other degenerative disorders. [1]. The most common of these is Obstructive Sleep Apnoea (OSA), which is a progressive, life-threatening condition that affects a large percentage of the population. It is normally only diagnosed following a lengthy evaluation process which involves extensive recording and visual scrutiny of polysomnographic data. This process is both labour-intensive and error-prone [2]. The key objective of this work is to automate the above process by firstly producing wavelet-based tool, within MATLAB, which will establish the sleep stage. This is accomplished by establishing the presence of the K-Complex and Sleep Spindle and so confirming stage 2 of sleep, which is the principal region of evaluation. It is intended to then develop this tool to enable the detection of full arousal, the key indicator of OSA. Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 284-295, 2001. Springer-Verlag Berlin Heidelberg 2001
The Application of the Wavelet Transform to Polysomnographic Signals
2
285
The Nature of Polysomnographic Signals
Polysomnographic, or sleep, signals are part of a composite physiological data which are characterised by being non-stationary and having information in both time and frequency domains. The physiological functions occurring during sleep extensive. They are, in essence, neuro-cardio-respiratory in nature Such signals gathered from patients whilst asleep within a controlled environment.
set the are are
The key signals in question are: 1. 2. 3.
EEG (electroencephalogram): neurological data collected from the scalp [3] EMG (electromyogram): muscular data collected from on and beneath the chin EOG (electrooculogram): eye movement data collected from the periphery of the left and right eyes
For the purpose of conventional analysis, the signal of primary importance is the EEG, although it may be correlated with others in the above set for more precise assessment of an event. The signal before amplification has an amplitude of between 10 to 100 µV and a frequency range of 0.5 to 40 Hz.
3
Sleep Architecture
Sleep is separated into two main regions of categorisation: Rapid-Eye Movement Sleep (REM) and Non-Rapid-Eye Movement Sleep (NREM). The sleep process is preceded with Stage W (wakefulness), which is followed by Stages 1, 2, 3 and 4 consecutively and collectively termed NREM sleep. Stage 5 is classified as REM sleep. The cycle stage 1 -5 repeats approximately every 90 minutes in a normal individual. For the purpose of assessment, Stage 2 sleep is of special interest. It is characterised by Sleep Spindles and K-Complexes along with mixed frequency, low voltage EEG signal activity. The EEG signal is also normally classified into frequency bands: • • • •
4
below 4Hz (δ band) between 4 and 8Hz (θ band) between 8 and 12Hz (α band) above 12Hz (β band)
The Nature of Obstructive Sleep Apnoea (OSA)
Sleep Apnoea Syndrome (SAS), first described in 1965, is a phenomenon characterised by excessive daytime sleepiness. It is classified as Obstructive , Central or Mixed [4]. Obstructive Sleep Apnoea (OSA) is by far the most common form of the aliment and is caused primarily by the collapse of the upper airway, resulting in
286
M. MacCallum and A. E. A. Almaini
diaphragmatic and chest wall movement without airflow. Individuals who have a narrower than average throat due to either genetic factors or obesity are thus more likely to occlude their upper airway during sleep giving rise to the repetitive arousal from sleep and repetitive hypoxaemia. It is this consequent sleep disruption and hypoxaemia which causes the daytime sleepiness and impaired cognitive function.
Falling asleep
Upper airway obstruction
Arousal from sleep
Breathing pause
Fig. 1. The Cycle of events in OSA
As illustrated in Figure 1, when the patient drifts off to sleep the muscles of the upper airway relax and loosen. Under these conditions, the upper airway can be obstructed completely producing a breathing pause or 'Apnoea' of at least 10 seconds in duration (meaning ‘want of breath’ from Greek) [5]. During such episodes, the patient will make continued efforts to breathe through the upper airway obstruction, but oxygen levels in the blood will fall. This results in the sufferer becoming hypoxic and the individual may exhibit heavy snoring. After between 10 and 90 seconds, the increased respiratory effort to clear the obstruction or the falling blood oxygen levels alert the brain, and produce a brief awakening, or 'arousal', from sleep. Arousal restores the muscle tone of the upper airway and breathing can subsequently recommence until the next drift into sleep, when the cycle may be repeated. The above cycle of disturbance may occur hundreds of times in the course of a night’s sleep. The normal pattern of sleep, progressing through stages of light sleep into deeper and restorative slow wave sleep and then into dreaming sleep, may be grossly disrupted by the frequent arousals to recommence breathing, resulting in the shallow and broken sleep pattern which is a major cause of the daytime problems experienced in OSA. Obstructive sleep apnoea is claimed to be an important cause of premature death and disability . There is increasing pressure to provide sleep services for the treatment of patients with sleep apnoea . Epidemiological evidence suggests that sleep apnoea causes vehicle and other workplace accidents . One of the most dramatic physiological consequences of OSA is the large rise in systemic blood pressure that occurs at the end of each apnoeic episode . Systolic blood pressure can increase by up
The Application of the Wavelet Transform to Polysomnographic Signals
287
to 100 mmHg, 300-400 times each night in severe cases. With regard to treatment, there is a paucity of robust evidence for the clinical and cost effectiveness of continuous positive airways pressure (CPAP) for most patients with sleep apnoea. [6]. In summary, OSA is a serious, progressive and potentially life-threatening breathing disorder. It is among the most common chronic disorders in humans with a prevalence of around 4% of the general population. It is more common in men than women and it is estimated that mild to moderate sleep apnoea goes undetected in some 90% of sufferers. Some studies indicate that it is associated with an increased risk of heart attack and stroke and is also linked to depression, irritability and learning difficulties.
5
Conventional Polysomnographic Signal Analysis
The gold standard for diagnosis and evaluation of sleep apnoea is overnight polysomnography [7]. This is an expensive and labour intensive procedure which requires the patient to remain overnight in a sleep laboratory. The scoring of the polygraph is based on the unit of the epoch, which is one page of a sleep record. In this case, this unit represents 30 seconds of recording time. For the purpose of scoring, each page is scrutinised in turn and assessed as a whole for its sleep stage. In some situations, the stages in the preceding and/or following pages influence the scoring of that page. The main evaluatory signal is the EEG, but EMG and EOG are also used in certain cases where sufficient uncertainty exists to warrant correlation. The human analyst, must thus scan the entire sleep record, manually scoring periods of absent or decreased airflow (apnoeas and hypopneas), whilst correlating discontinuities against the other sampled data, as shown in Figure 2. The repetitiveness and subjectiveness of the task also leads to inaccuracies and low interscorer agreement. The growing number of patients being examined for OSA has caused a strain on healthcare personnel and consequently a need has arisen for technological improvements to increase efficiency of diagnosis.
SCORED AROUSAL Fig. 2. Visually Scored Data with Arousal Shown
288
6
M. MacCallum and A. E. A. Almaini
Data Source
The Scottish National Sleep Centre within Edinburgh Royal Infirmary provided the data for this work. The EEG data was gathered from scalp-mounted transducers at C3 and C4 on the scalp as indicated in [3]. The data was recorded on a computerised polysomnographic system (Compumedics Inc., Melbourne, Australia), sampled at 125 Hertz and stored on optical disk. The data is encoded in EDF (European data Format). The data from 11 patients was supplied, ranging from normal through atypical to pathological OSA sufferers. Personnel at the Sleep Centre had manually scored this data and the tabulated results for both arousal events and sleep stages were supplied. For initial evaluation purposes, it was decided to focus on the data of a severe OSA patient, as the various events within the EEG data set for this individual were much easier to correlate by visual inspection
7
Automatic Detection Method for K-Complex and Sleep Spindle
7.1 Rationale As a first step towards an automated mechanism for detecting arousal within OSA sufferers, it is imperative that the appropriate stage of sleep be first established. This is of particular importance as the vast majority of visual scoring is conducted in Stage 2 NREM sleep. A number of automated methods have emerged for the purpose of sleep staging over the past twenty or so years [8]. Any method so created must also take into account the considerable variability of data both intra and inter-individual and the inherent stochastic nature of the EEG [9]. Thus the key characteristics of this sleep stage are: 1. 2.
Presence of Sleep Spindles Presence of K-Complexes
The difficulty in detecting the above parameters is compounded by EEG background activity. Thus some type of filtering technique must be utilised in the first instance. The signal processing method chosen for this investigation was the Wavelet Transform for the following reasons: 1. 2.
It is capable of extracting both time and frequency information Unlike the Short-Term Fourier Transform (STFT), it’s resolution varies with frequency
The above is especially useful when the signal under scrutiny has short duration, high frequency components and long duration ,low frequency components. 7.2 K-Complex This signal, which is central vertex in origin, is described as a biphasic wave swinging negative then positive going (or the reverse depending on the location of the probes
The Application of the Wavelet Transform to Polysomnographic Signals
289
on the skull). A typical K-Complex waveform is shown in Figure 3. The voltage measured from peak-to-peak should exceed 75 micro volts with a duration of between 0.5 and 5 seconds. It should also be at least twice the amplitude of the preceding one second of sleep activity.
Fig. 3. Typical K-Complex
Particular care must be taken not to confuse K complexes with slow (delta) waves, as shown in Figure 4. As a general rule, K complexes tend to occur in groups and runs. K complexes are often accompanied by a transientincrease in EMG activity, and this also helps to discriminate them from slow waves.
Fig. 4. Typical Delta Waves
7.3 Sleep Spindle A sleep spindle must be more than half a second in duration in order to be scored. Sleep spindles are small ‘bursts’ of brain activity and are more abundant during, and thus indicative of, Stage 2 sleep. A typical Sleep Spindle waveform is illustrated in Figure 5. The frequency range of sleep spindles is 12 – 16 Hz [10]. The range of duration <2 seconds and but > 0.5 seconds. Sleep spindles are generated in the thalamus and are generally diffuse, but of highest voltage over the central regions of the head. The amplitude is normally less than 50 µV in the adult. One of the identifying EEG features of non-REM stage 2 sleep; may persist into non-REM stages 3 and 4; not seen in REM sleep.
290
M. MacCallum and A. E. A. Almaini
Fig. 5. Typical Sleep Spindle
7.4 Automatic Detection Algorithm Development The composite algorithm for arousal detection is shown in Figure 6. The wavelet packet transform, within the MATLAB software suite, was employed to extract the relevant frequency band, although high frequency decomposition of the composite EEG signal was not an intention. The main purpose of using the packet approach was to afford easier graphical visualisation of different frequency bands within the decomposed signal The frequency bands available for analysis are dependent on the sampling frequency of the signal, which is 125 Hz. Using the wavelet decomposition tree this means the analysis level is level 5 (for K-Complex) in order to extract the appropriate ( < 4 Hz δ band) frequency range. It was subsequently discovered that level 4 (10-16 Hz) was better suited for Sleep Spindle detection (i.e. 12-24 Hz β band)). It was decided to use, for initial analysis, the HAAR wavelet as it is the simplest orthogonal system and thus fastest, especially for signals with rapid transitions. A test epoch (Slst21.mat) was used as a benchmark for initial evaluation as it contains a clear and confirmed K-complex, as well as an additional borderline K-complex. K-complex Detection Algorithm
30 second Epoch
Arousal Detection Algorithm
Arousal Information
Sleep Spindle Detection Algorithm
Fig. 6. Data flow between algorithm components
The Application of the Wavelet Transform to Polysomnographic Signals
291
Detection System for K-Complex This was based o the following criteria • • • •
Minimum period: 0.5 seconds Maximum period: 5 seconds Minimum amplitude: 75µV Frequency range: below 4Hz (δ band)
Figure 3 shows that the key points of interest are the turning points (TPs), the location of which allows the duration and amplitude to be calculated. This results in the process sequence shown in Figure 7.
1
Wavelet Coefficient Extraction
TP Detection 2
K-Complex Detection 3
4
1.
Input signal
2.
δ band
3.
Turning points
4.
K-complexes
Fig. 7. Full detection sequence for K-Complex
From the previous flow diagram the K-complex detection block may be created: The main stages are: • • • • •
Check the position of the pointer is valid, i.e. 3 positions from the end of the array. Check the type is N, i.e. negative turning point. Set the duration and amplitude Check the duration and amplitude are within the required limits. Log K-complex
The algorithms for both turning-point and K-Complex were encoded within MATLAB using M-Files Detection System for Sleep Spindle As illustrated in Figure 6, for full establishment of the sleep stage (i.e. stage 2), it is necessary to ascertain the presence of the Sleep Spindle. This is accomplished in a manner similar to that for the K-Complex Detection criteria for the Sleep Spindle was taken as : • • • • •
•
Minimum period: 0.5 seconds Maximum period: 2 seconds Minimum amplitude: 10µV Maximum amplitude: 50µV Frequency range 12-16 Hz Six or seven distinct waves
292
M. MacCallum and A. E. A. Almaini
Read TP Data
Yes
TP pointer < length(tp)-3? No
Increment TP pointer
Increment KC pointer
Read TP record
Log K-complex Yes
Is type=N?
No
No
amplitude <min?
Yes Set duration
Set amplitude No min< duration <max?
Display K-complexes
Yes
Fig. 8. K-Complex Detection Flow
1
Wavelet Coefficient Extraction
TP Detection 2
3
Sleep Spindle Detection
4
Fig. 9. Full detection sequence for Sleep Spindle
The Sleep Spindle Detection sequence, as illustrated in Figure 9, is currently being developed. It is intended that this will merge with the K-Complex algorithm as illustrated in Figure 6, to provide a fully automated detection system for the establishment of stage 2 sleep.
The Application of the Wavelet Transform to Polysomnographic Signals
8
293
Results
Initial results obtained following application of the K-Complex algorithm on the sleep data have been very encouraging, with detection being achieved on both the full and marginal K-Complex within the epochs scrutinised, as shown in Figure 10.
Fig. 10. Epoch Stage 21 with two K-Complexes detected
Figure 11 shows an epoch which features Slow Wave Sleep (SWS or Delta wave). Note that the algorithm has not identified these as K-Complexes.
Fig. 11. Epoch showing slow-wave sleep without K-complexes
It will be necessary to undertake extensive further tests on the sleep data, in order to refine the detection algorithm and thus obtain acceptable correlation across a range of visually-scored patient data.
9
Discussion
Results obtained from initial tests with the K-complex detection algorithm have been encouraging. It is now considered worthwhile to refine this tool and develop the companion model for detection of the Sleep Spindle. This may also require, due to the decomposition band cut-off point, to examine for the presence of Spindles across two stage outputs (i.e. 3 and 4)
294
M. MacCallum and A. E. A. Almaini
Additionally, it is important not to consider such events in isolation. The packet sequence of Spindle, K-Complex and Delta wave and their inter-dependency in time must also be considered. This is especially important when extending the analysis to encompass full arousal detection, which is the best indicator of the presence of OSA. An EEG arousal is defined as an abrupt shift in EEG frequency, which may include, θ, α and/or frequencies greater than 16 Hz but not spindles. The parameter that exhibits the least uncertainty is the usage of wavelets with this type of data.. Experimentation with different Wavelet families is, however, another important area of further investigation: only the Haar wavelet has been tested so far with the K-Complex detection tool. Of particular future interest are those of the Morlet family. Involvement of other polysomnographic signals (e.g. EMG, EOG, Pulse Oximetry) is viewed as an important development to this work and is hoped to yield more accurate results. In a fashion similar to some automated sleep stage detection methods, a ‘jury’ system of such wavelet-processed signals could provide an accurate automatic diagnosis of sleep data. Such a jury should involve not only these signals or elements of signals which are expected to exhibit change during a suspected arousal, in either the frequency and/or time domains, but also those that are expected not to show any variation. The above, in time, might necessitate the employment of other more sophisticated methods of classification e.g. neural networks or genetic algorithms.
References 1. 2. 3. 4. 5. 6. 7.
Foresman, B.H., “Sleep and breathing disorders: the genesis of obstructive sleep apnoea” Journal of the American Osteopath Association, Volume 100, part 8, pp. 1-8, 2000. Drinnan M.J., Murray A., Griffiths C.J., Gibson G.J. “Interobserver Variability in Recognising Arousal I Respiratory Sleep Disorders”. American Journal OF Critical Care Medicine Volume 158 pp358-362, 1998. Rechtschaffen, A., Kales, A.,”A Manual of Standardized Terminology, Techniques and Scoring System for Sleep Stages of Human Subjects”. US Department of Health, Education and Welfare 1968. Loadsman J.K., Lectures on Anaesthesia and Sleep Apnoea Department of Anaesthetics, Royal Prince Alfred Hospital, Camperdown NSW, Australia, 2001 Coleman, J., “Sleep Studies: Current Techniques and Future Trends” Journal of the Otolaryngologic Clinics Of North America, Volume 32, Number 2, pp 195210, 1999 Engleman H.M., Martin S.E, Deary I.J., Douglas N.J. “Effect of CPAP therapy on daytime function in patients with mild sleep apnoea/hypopnoea syndrome”. Thorax; Volume52: pp 114-119, 1997. Liam C.K., “A portal recording system for the assessment of of patients with sleep apnoea syndrome” Medical Journal of Malaysia Volume 51, part 1, pp8288, 1996
The Application of the Wavelet Transform to Polysomnographic Signals
8.
295
Pachero O.R.,Vaz F., “Integrated Systems for Analysis and Automatic Classification of Sleep EEG” Proceedings of the 24th Annual IEEE Conference in Bioengineering pp15-17 1998 9. Marsalek K., Rozman J., “Automatic Time and Frequency Domain Detection in Biomedical Signals” Proceedings of the 9th International Czech-Slovak Conference RadioElektronica pp152-5 1999. 10. De Gennaro L., Ferrara M., Bertini M., “Topographical Distribution of Spindles: Variations Between and Within NREM Sleep Cycles” Sleep Research Online Volume 3 Part 4: pp155-160, 2000
Wavelet Transform Method of Waveform Estimation for Hilbert Transform of Fractional Stochastic Signals with Noise* Wei Su1, Hong Ma1, Yuan Yan Tang2, and Michio Umeda3 1
Dept of Mathematics, Sichuan University Chengdu 610064,China; 2 Dept of Computer Science, Hong Kong Baptist University Kowloon Tong, HongKong; 3 Dept of Information Engineering, Osaka Electro-Communication University Osaka, Japan
Abstract In this paper, those splendid characters of the Hilbert transform let the processes that taking wavelet transform after taking Hilbert transform for the statistic self-similarity processes FBM [ BH (t ) ] become another processes, that firstly taking Hilbert transform for the wavelet function function
ψ (t ) ,
φ (t )
and forming a new wavelet
secondly taking the wavelet transform for
BH (t ) .
Then, we use the optimum threshold to estimate the Bˆ H (t ) embedded in additive white noise. Typical computer simulation results to demonstrate the viability and the effectiveness of the Hilbert transform in the signal’s estimation of the statistic self-similarity process.
1
Introduction
Hilbert transform has some splendid characters, for example, the statistic selfsimilarity fixed character. By using it, a real signal can be described into a complex signal. This not only make the theory easier but also make we can be studying the real signal’s instantaneous phase and instantaneous frequency. Now, the Hilbert transform will become one of the important tools in communication researching theory. Here, we take the estimation of original signal’s Hilbert transform it is the complex signal’s imaginary part, temporarily not of the complex signal. This paper is an earlier stage achievement in the series of discussing. In this paper, those splendid characters of the Hilbert transform let the processes that taking wavelet transform after taking Hilbert transform for the statistic self-similarity processes FBM [ B H (t ) ] become another processes, that firstly taking Hilbert *
Supported by the NNSF of china (No. 19971063)
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 296-304, 2001. Springer-Verlag Berlin Heidelberg 2001
Wavelet Transform Method of Waveform Estimation
transform for the wavelet function
φ (t )
and forming a new wavelet function
297
ψ (t ) ,
BH (t ) . These are principal results in this paper. Then, we use the optimum threshold to estimate the Bˆ H (t ) embedded in
secondly taking the wavelet transform for
additive white noise. Typical computer simulation results to demonstrate the viability and the effectiveness of the Hilbert transform in the signal’s estimation of the statistic self-similarity process.
2
Second-Order Statistics of Hilbert Transform of FBM Wavelet Coefficients
Fractional Brownian motion (FBM) is a Gaussian zero-mean non-stationary stochastic process B H (t ) , indexed by a single scalar parameter 0 < H < 1 . The nonstationary character of FBM is evidenced by its covariance structure:
E[ BH (t ) BH ( s )] =
σ2 2
[t
2H
+s
2H
− t−s
2H
]
(1)
It follows from the above equation that the variance of FBM is of the type
var[ B H (t )] = σ 2 t
2H
(2)
As a non-stationary process, FBM does not admit a spectrum in the usual sense. However, it is possible to attach to it an average spectrum
S BH (ω ) = Let
σ2 ω
2 H +1
(3)
f (t ) is a real signal, so its Hilbert transform fˆ (t ) is defined as following: ∞ 1 f (τ ) fˆ (t ) = f (t ) ∗ = ∫ dτ − ∞ πt π (t − τ )
(4)
Hilbert transform has some splendid characters, and can give beautiful results in the signal estimation.
X (t ) is a statistic self-similarity process, Xˆ (t ) is the Hilbert transform of X (t ) , then Xˆ (t ) is also a statistic self-similarity process.
Theorem 1
If
Proof: Because
X (t ) is a statistic self-similarity process, so
E ( X (t ) ) = a − H E ( X (at ) ) E ( X (t ) X ( s ) ) = a −2 H E ( X (at ) X (as ) ) a > 0
298
Wei Su et al.
for Xˆ (t ) , we have
(
)
E Xˆ (t ) = E ∫ X ( s ) R
[
1 ds = a − H E Xˆ (at ) π (t − s )
]
1 1 E Xˆ (t1 ) Xˆ (t 2 ) = E ∫ X ( s1 ) ds1 ∫ X ( s 2 ) ds 2 π (t1 − s1 ) R π (t 2 − s 2 ) R
[
]
[
= a −2 H E Xˆ (at1 ) Xˆ (at 2 )
]
From above, we know that Xˆ (t ) satisfied the definition of statistic self-similarity process, so the theory is proofed. # FBM is a typical example in statistic self-similarity process, its Hilbert transform
Bˆ H (t ) also has statistic self-similarity. W (t ) is zero-mean Gaussian white noise with variance σ w2 , Wˆ (t ) is the Hilbert transform of W (t ) , then Wˆ (t ) is also the zero-mean Gaussian By the way, if
σ w2
white noise, and has the same spectrum Let for
with
W (t ) .
Bˆ H (t ) is the Hilbert transform of FBM BH (t ) , the self-correlation function
Bˆ H (t ) is
[
E Bˆ H (t ) Bˆ H ( s )
] = σ ∫∫ [τ 2 2
2H 1
+ τ2
2H
− τ1 − τ 2
2H
1
1
R2
It is clear from the previous section that process.
−J
2
∞
∑ a [n]ϕ (2 J
−J
n = −∞
where
φ (t )
Bˆ H (t ) is J
d j [ n] = 2
∞ 2
−j
2
j = −∞
(
)
−j ∫ Bˆ H (t )φ 2 t − n dt a j [n] = 2
−∞
An important character of the wavelet coefficients the serial
∞
) ∑ 2 ∑ d [n]φ (2
t−n +
j
−j
t − n)
n = −∞
is the basic wavelet, ϕ (t ) is scaling function associated with −j
2
2
Bˆ H (t ) is also a non-stationary stochastic
The wavelet mean-square representation of Bˆ H (t ) = 2
]π (t 1− τ ) π (s 1− τ ) dτ dτ
−j
∞ 2
∫ Bˆ
−∞
H
φ (t ) ,
(t )ϕ (2 − j t − n)dt j, n ∈ Z
d j [n] is that, for each scale j,
{d j [n], n ∈ Z } is similar to a stationary serial. Later, we will use it.
Wavelet Transform Method of Waveform Estimation
Theorem 2 Let φˆ(t ) is the Hilbert transform of basic wavelet also the basic wavelet.
299
φ (t ) , then φˆ(t )
is
Proof: From the definition of Hilbert transform, there is
φˆ(t ) = φ (t ) ∗ so the Fourier transform of
φˆ(t )
1 πt
is
F (φˆ(t )) = Φ (ω ) ⋅ H (ω ) and
∫
F (φˆ(t ))
R
So
2
dω = ∫
ω
R
Φ (ω )
ω
2
dω < ∞
φˆ(t )
is the basic wavelet. # For sign’s convenience, let
ψ j ,m (τ ) = 2 ψ (2 − j τ − m) = ∫ 2 φ (2 − j t − m) ⋅ −j
−j
2
2
R
−j
= (−2 2 φ (2 − j τ − m)) ∗
1 dt π (t − τ )
1
(5)
πτ
−j
2 2 ψ (2 − j τ − m) is the basic wavelet. When
From the theorem 2, we know
j = 0 , we get
ψ (τ − m) = −φ (τ − m) ∗ So
1
(6)
πτ
d j [n] can be expressed by another way as following: d j [ n] = 2
−j
∞ 2
∫B
H
(
)
(t )ψ 2 − j t − n dt
−∞
j, n ∈ Z
Now we consider R j [n] , which is the correlation function of
[
]
R j [n] = E d j [m + n]d j [m] = where
σ2 2
d j [n] .
(− ∫ Aψ (1,τ − n) τ R
2H
dτ )(2 j ) 2 H +1
300
Wei Su et al.
Aψ (1,τ ) = ∫ψ (t )ψ (t − τ )dt R
n = 0 , the variance of wavelet coefficients d j [n] is
Especially, when
R j [0] = var(d j [n]) =
σ2 2
Vψ (H )(2 j )
2 H +1
2
= σ c 2γj
where
Vψ ( H ) = − ∫ Aψ (1,τ ) τ dτ 2
R
From above equations, we can see that through comparing the wavelet transform
equations of Bˆ H (t ) with the wavelet transform equations of B H (t ) , all but the basic wavelet are the same. The discuss in this segment tell us, through the Hilbert transform, a new wavelet function ψ (t ) is formed by the original wavelet function φ (t ) , the wavelet
Bˆ H (t ) become another wavelet transform of BH (t ) with the new wavelet function ψ (t ) . At the same time, all the equations in the wavelet transform of Bˆ (t ) and B (t ) have very beautiful similarity. So we can select existing transform of
H
H
methods for the waveform estimation. This is the major result. Now, we give the wavelet mean-square representation of
Bˆ H (t ) which have N 0
sample points: −J Bˆ H (k ) = 2 2
(
N0
2J
∑a n =0
3
) −1 J
J
[ n]ϕ ( 2 k − n) + ∑ 2 −J
j =1
−j
( 2
N0
2j
) −1
∑d
j
[ n]φ (2 − j k − n)
k = 0,......, N 0 − 1
n=0
Optimum Threshold Method of Wavelet Estimation for Hilbert Transform of FBM
Consider the received signal
y (k ) = BH (k ) + W (k ) where
(7)
BH (k ) is FBM, W (k ) is additive Gaussian white noise with zero-mean and
variance
σ w2 . The Hilbert transform of equation (7) is yˆ (k ) = Bˆ H (k ) + Wˆ (k )
(8)
Wavelet Transform Method of Waveform Estimation
We know that
301
Bˆ H (k ) is the statistic self-similarity process and Wˆ (t ) is also the
zero-mean Gaussian white noise with variance
σ w2 .
Bˆ H (k ) from the equation (3.2) is that, setting a threshold to the wavelet coefficients of the signal yˆ ( k ) , using the inverse wavelet transform to combine the signal Bˆ ( k ) only with that marked wavelet coefficients
A simple existing method to estimate
H
beyond the threshold. However, setting a appropriate threshold has a restricted condition: that increasing the threshold to decrease the influence of noise will make the distortion of signal’s estimates increasing, because large threshold restricts the smaller wavelet coefficients which do not be used to the inverse wavelet transform: on the contrary, decreasing the threshold to decrease the noise’s distortion, but make the influence from the noise increasing, because the large noise wavelet coefficients that beyond the threshold will be used to the inverse wavelet transform. So, we need an optimum threshold for the estimation of the signal. Let a 0 k = yˆ (k ) then
[]
yˆ [n] = d j [n] + Wˆ j [n] j = 1,....., J where d j
(9)
[n] and Wˆ j [n] is the wavelet coefficients serial of
Bˆ H (k ) and Wˆ (t ) , and
{W j [n], n ∈ N }, j = 1,......, J is white noise serial with variance σ w2 . When the scale J is abundant large, we have N0
J ~ −j 2 ˆ ( ) BH k = ∑ 2 j =1
−1 2j
∑ d [n]φ (2 j
n =0
−j
k −n
)
k = 1,.....N 0 − 1
In the following, we use the optimum threshold to estimate named the estimation of
~ d j [n] . Let d j [n]
d j [n] , and L the optimum threshold, then
0 ~ d j [ n] = yˆ j [n]
j≤L j>L
(10)
To make the mean-variance estimation of the wavelet coefficients serial of the scale j smallest, namely
~
σ 2j =ˆ E{(d j [n] − d j [n]) 2 } = min( R j (0), σ w2 } σ w2 = R j (0)
σ w2 < R j (0) σ w2 ≥ R j (0)
(11)
302
Wei Su et al.
R j (0) = σ c2 2γj
Since
σ w2 j ≤ log 2 2 γ σc 1
so when
and
σ ≥ R (0) , 2 w
2 j
let
σ w2 L = [ log 2 2 ] , γ σc 1
where [ ]
means reserving the integer part, and we have
j≤L j>L
0 ~ d j [ n] = yˆ j [n]
4
(12)
Simulation
In the signal estimation of statistic self-similarity process, to demonstrate the viability and the effectiveness of the Hilbert transform, we use computer simulation to rehabilitate the 1/f-type fractional signal. Figure 1 and figure 5 indicate Gaussian zero-mean 1/f-type fractional signal, which comes from random Weiestrass function, where the number of sample points is 1000, γ = 1.7 , and bases on Harr wavelet. Figure 2 indicates the received signal in Gaussian white noise, in which the fractional signal has a figure that x=0dB. Figure 3 indicates the Hilbert transform of the received signal. Figure 4 indicates the estimation of by the optimum threshold. Figure 6 indicates the signal by inverse Hilbert transform of the signal in figure 4, where M=2, and the error of signal’s estimation deta=0.5591. So the viability and the effectiveness of the Hilbert transform have been demonstrated. 4 2 0 -2 -4
0
100
200
300
400 500 600 Figure1.1/f Signal
700
800
900
1000
0
100
200
300 400 500 600 700 Figure2.1/f Signal in Noise
800
900
1000
5
0
-5
Wavelet Transform Method of Waveform Estimation
303
5
0
-5
0
100
200
300 400 500 600 700 800 Figure3. Signal through Hilbert Trans form
900
1000
0
100 200 300 400 500 600 700 800 900 Figure4. E stim ated S ignal by the Optim um Thres hold Technique
1000
0
100
900
1000
0
100 200 300 400 500 600 700 800 900 Figure6.E s tim ated Signal through Invers e Hilbert Trans form
1000
6 4 2 0 -2 -4
4 2 0 -2 -4
200
300
400 500 600 Figure5.1/f S ignal
700
800
4 2 0 -2 -4
5
Conclusion
In this paper, those splendid characters of the Hilbert transform let the processes that taking wavelet transform after taking Hilbert transform for the statistic self-similarity processes FBM [ B H (t ) ] become another processes, that firstly taking Hilbert transform for the wavelet function
φ (t )
and forming a new wavelet function
secondly taking the wavelet transform for
ψ (t ) ,
BH (t ) . Then, we use the optimum
threshold to estimate the Bˆ H (t ) embedded in additive white noise. Typical computer simulation results to demonstrate the viability and the effectiveness of the Hilbert transform in the signal’s estimation of the statistic self-similarity process. So this paper is the fundamental work, later we will take part in the estimation of complex signals.
304
Wei Su et al.
References 1. 2. 3. 4. 5. 6. 7. 8.
B. S. Chen and G. W. Lin, “Multiscale Wiener filter for the restoration of fractal signals: Wavelet filter bank approach”, IEEE Trans. Signal Processing, Vol. 42, No. 11, PP. 2972-2982,1994. B. S. Chen and W. S. Hou, “Deconvolution filter design for fractal signal transmission systems: A multiscale Kalman filter bank approach”, IEEE Trans. Signal Processing, Vol. 45, PP. 1395-1364, 1997. P. Flandrin, “On the spectrum of fractional Brownian motion”, IEEE Trans. Information Theotry, Vol. 35, No. 1, PP. 197-199, 1989. P. Flandrin, “Wavelet analysis and synthesis of fractional Brownian motion”, IEEE Trans. Information Theory, Vol. 38, No. 2, PP. 910-917, 1992. B. B. Mandelbrot and J. W. Van Ness, “Fractional Browrian motions, fractonal motions, fractional noises and applications” SIAM Rev., Vol. 10. No. 4, pp. 422437, 1968. G. W. Wornell, “A Karhunen-Loeve-Like Expansion for 1/f Processes Via Wavelets”, IEEE Trans, Information Theoty, V0l. 36, No. 4, PP. 859-861, 1990. Hong Ma, Michio Umeda, Wei Su, “Hilbert Transform of Non-stationary Stochastic Signal and Parameter Estimation”, to appear. Kesu Zhang, Hong Ma, Zhisheng You, Michio Umeda, “Wavelet Estimation of Non-stationary Fractal Stochastic Signals Using Optimum Threshold Technique”, to appear.
Multiscale Kalman Filtering of Fractal Signals Using Wavelet Transform* Juan Zhao1, Hong Ma1, Zhi-sheng You2, and Michio Umeda3 1
Dept of Mathematics, Sichuan University Chengdu 610064,China 2 Dept of Computer Science, Sichuan University Chengdu 610064,China 3 Dept of Information Engineering, Osaka Electro-Communication University Osaka, Japan
Abstract. A filter bank design based on orthonormal wavelets and equipped with a multiscale Kalman filter was recently proposed for signal restoration of fractal signals corrupted by external noise. In this paper, we give the corresponding parameters of the dynamic system and more accurate estimation. Comparisons between Wiener and Kalman filters are given. Typical computer simulation results demonstrate its feasibility and effectiveness.
1
Introduction
The family of 1/f stochastic processes constitutes an important class of models for different signal processing applications. Examples are geophysical and economic time series, biological and speech signals, noise in electronic devices, burst errors in communications, and recently, traffic in computer networks[4]. A typical model for these processes is the fractional Brownian motion (fBm), which is a Gaussian zeromean nonstationary stochastic process B H (t ) indexed by a parameter 0
non-stationary variations; self-similarity.
Recently, wavelet theory is a powerful method based on time-scale considerations, as an adequate tool for analyzing this type of processes, which could really provide a research method for every aspect of fractal signals processing. Many methods for estimating 1/f-type fractal signals embedded in noise have been proposed in order to conquer the traditional lacks, such as wavelet maximum likelihood ratio estimating method proposed by G.W.Worell and A.V.Oppenleim [7], multi-scale Wiener filters and Kalman filters considering the system influence proposed by B.S.Chen etrc.[1],[2] and multi-scale Wiener filters [4]and Kalman *
Supported by the NNSF of China(No.60074017 and No.69732010).
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 305-313, 2001. Springer-Verlag Berlin Heidelberg 2001
306
Juan Zhao et al.
filters [5] considering the effect of the correlation of the sequences of wavelet coefficients in each scale and the approximation coefficients proposed by G.A.Hirchoren and C.E.D’Attellis. In this paper, we consider the estimation of fBm in noise using multiscale Kalman filters. We give the corresponding parameters of the dynamic system and more accurate estimation based on [5] in Section 2. Furthermore, in Section 3, we give some typical computer simulation to demonstrate its feasibility and effectiveness. Finally, we draw some conclusions in Section 4.
2
Multiscale Kalman Filters
Consider a received signal
y (k ) = BH (k ) + w(k ) where B H (k ) is the fBm to be estimated and process with zero mean and variance
σ
2 w
w(k ) is a Gaussian white noise
,which is independent with
BH (k ) .
The sequences of wavelet coefficients of the received signal are
y j [ n ] = d j [ n] + w j [ n] where
j = 1,2, ! , J
d j [n] and w j [n] are the sequences of wavelet coefficients of the fBm and
white noise, respectively. The sequences variance
σ
2 w
and
{d j [n]
{w j [n]
n ∈ N } are white noises with
n ∈ N } are stationary colored processes for any
j [3]. Hence, they can be approached on the basis of the AR model in the time-scale domain as p
d j [n] = ∑ φ i j d j [n − i ] + e j [n] = φ j x j [n − 1] +e j [n]
(1)
i =1
n ∈ N } is a zero-mean white noise, p denotes the order of the AR model and the p -dimensional vectors are defined as
where
{e j [n]
x j [n − 1] = (d j [n − 1], d j [n − 2], ! , d j [n − p]) t φ j = (φ1j , φ 2j ,!, φ pj ) The coefficients of the AR model are given by
φ j = h j R x−1[ n −1] , j
σ e2 = R j (0) − h j R x−1[ n −1] h tj j
where
(2)
Multiscale Kalman Filtering of Fractal Signals Using Wavelet Transform
307
h j = ( R j (1), R j (2),! , R j ( p ))
R x j [ n −1]
and
! R j ( p − 1) R j (1) R j ( 0) ! R j ( p − 2) R j ( 0) R j (1) = " " " " R ( p − 1) R ( p − 2) ! R ( 0 ) j j j
R j (n) is the autocorrelation function of the sequence of wavelet coefficients of
the fBm
BH (k ) .
Based on the definition of
x j [n] ,the state space model x j [n] = F j x j [n − 1] + Ge j [n] y j [n] = Hx j [n] + w j [n]
(3)
can be derived from (1) ,where
φ1j 1 Fj = " 0
φ 2j ! φ pj 0 ! " " ! 1
0 " 0
G = (1,0,!,0) t
H = (1,0, ! ,0)
So we can estimate the sequence of wavelet coefficients {d j [ n]
n ∈ N } using
Kalman filters. xˆ j [ n] denotes the estimate of x j [ n] .
xˆ j [n] = (dˆ j [n, n], dˆ j [n, n − 1],!, dˆ j [n, n − p + 1]) t and
Pnj denotes the mean-square estimation error of xˆ j [n] .
Pnj = E ( xˆ j [n] − x j [n])( xˆ j [n] − x j [n]) t In [5] the author use
dˆ 1j [n] as the estimate of d j [n] where
dˆ 1j [n] = (1,0 ! ,0) xˆ j [ n] = φ j xˆ j [ n − 1] +
p nj / n ( y j [ n] − φ j xˆ j [ n − 1]) p nj / n + σ w2 t
(4)
p nj / n = E (φ j xˆ j [n − 1] − d j [n]) 2 = φ j Pnj−1φ j + σ e2 The minimum of the meansquare esmation error is
308
Juan Zhao et al.
σ 1j [n] = E (dˆ 1j [n] − d j [n]) 2 = (1,0,!,0)Pnj (1,0,!,0)t We call this estimation as Kalman Filters 1. We have found that the last p − 1 elements of
(5)
xˆ j [n] are not identical to the previous
p − 1 elements of xˆ j [n − 1] when p > 1 .In fact we obtain dˆ j [ n, n − i ] = dˆ j [n − 1, n − i ] +
p ij/ n ( y j [ n] − φ j xˆ j [ n − 1]) p + σ w2 j n/n
(6)
i = 1, ! , p − 1 where
pij/ n = E (φ j xˆ j [ n − 1] − d j [n])(dˆ j [n − i ] − d j [ n − i]) = (0, ! ,1, ! ,0) Pnj−1φ
jt
(7)
(0, ! ,1, ! ,0) is an unit vector whose i th element is one. which can be derived from
xˆ j [n] = Fj xˆ j [n −1] + Knj ( y j [n] −φ j xˆ j [n −1]) So we use
dˆ j [n + p − 1, n] as the estimate of d j [n] ,which is denoted by dˆ j [n] ,
i.e.
dˆ j [n] =ˆ dˆ j [n + p − 1, n] = (0, !,0,1) xˆ j [n + p − 1]
(8)
and call it as Kalman Filters 2.The minimum of the mean-square estimation error is
σ 2j [ n] = E ( dˆ j [ n ] − d j [ n]) 2 = (0, ! ,0,1) Pn j+ p −1 (0, ! ,0,1)t
(9)
Then we have the following result. Theorem 1. 2
pij/ n+i >0 σ [ n ] − σ [ n] = ∑ 2 j i =1,! p −1 p n + i / n + i + σ w 1 That is to say, dˆ [ n] is more accurate than dˆ [ n] in the mean-square sense . 2 j
1
2 j
j
Proof: Subtracting
j
d j [n − i ] by the both sides of the equation (6)and then doing
variance operation, yields
Multiscale Kalman Filtering of Fractal Signals Using Wavelet Transform
309
2
p
j n ,i +1
=p
j n −1,i
−2
where
pj + j i / n 2 2 E ( y j [ n] − φ j xˆ j [ n − 1]) 2 ( pn / n + σ w )
pij/ n E (dˆ j [n − 1, n − i ] − d j [ n − i])(φ j xˆ j [ n − 1] − y j [n]) p nj / n + σ w2
p nj,i +1 and p nj−1,i are the i + 1 th element of the diagonal of Pnj and the i th j
element of the diagonal of Pn −1 ,respectively. Substituting y j [ n] = d j [ n] + w j [n] into the above equation and considering the independence between xˆ j [ n − 1] d j [n] and w j [n] , it follows from (4) and(7) that
E ( y j [n] − φ j xˆ j [n − 1]) 2 = p nj / n + σ w2 E (dˆ j [n − 1, n − i] − d j [n − i])(φ j xˆ j [n − 1] − y j [n]) = pij/ n 2
So we obtain
p
j n ,i +1
=p
j n −1,i
pj − j i / n 2 i = 1, ! p − 1 , i.e. the mean-square pn / n + σ w
estimation error of d j [ n − i ] is deceasing. It follows from (5) and (9)that the theorem holds. Now we consider the estimation of the approximation coefficients using the sequence of approximation coefficients of the signal y (k )
y Ja [n] = a J [n] + wJa [n] a
where a J [n] and w J [n] are the sequences of approximation coefficients of the fBm and of the observation white noise process, respectively. We use a simple memoryless estimator a aˆ J [ n] = E{a J [ n] | y J [ n ]} =
Var ( aJ [ n]) a y [ n] 2 J Var (aJ [ n]) + σ w
(10)
which is optimal in the mean-square sense and the estimation error variance is
σ a2 [n] = E (a J [n] − aˆ J [n]) 2 =
Var (a J [n ])σ w2 Var (a J [n]) + σ w2
Then the mean-square error of the estimation of the process
E[( BH ( k ) − Bˆ H ( k ) ] = 1 ( 2
N0
m ( J ) −1
∑σ n=0
2 a
J m ( j ) −1
[n] + ∑ j =1
where m( j ) is the number of samples in the scale
∑σ
2 j
[n ])
(11)
BH (k ) is given by (12)
n =0
j , k = 0 , ...... , N 0 − 1
310
3
Juan Zhao et al.
Simulations
We will estimate a fractal signal embedded in noise with different methods. Considering that Wiener filters use correlation function R j (0), ! , R j ( M − 1) and Kalman filters use R j (0), R j (1), !, R j ( p ) , we let p = M − 1 to guarantee that they use the same information. We consider 1000 samples of a fractal signal
BH (k ) embedded in additive Gaussian white noise with variance σ w2 =1. For the simulations, we have chosen Haar wavelet and used Weiestrass function to generate of the fBm with H = 0.35 and a filter bank corresponding to 3 scales. Kalman filters 2 ( p = 3) and Wiener filters ( M = 4) are used to estimate the fractal signal and the estimate errors are 0.3020 and 0.3044 ,respectively. 3 2 1 0 -1 -2 0
100
200
300 400 500 600 Figure 1. fBm with H=0.35
700
800
900
1000
100
200 300 400 500 600 700 800 900 Figure 2. fBm embedded in additive white noise
1000
6 4 2 0 -2 -4 0
Because the decay of the correlation heavily depends on R, i.e. the number of vanishing moments of wavelet[3], the estimated fractal signal will be more accurate when we use wavelet with large R under the same conditions. Now we make a comparison between Haar wavelet ( R = 1) and Daubechies5 wavelet wavelet ( R ≥ 2)
Multiscale Kalman Filtering of Fractal Signals Using Wavelet Transform
311
4 2 0 -2 -4
0
100
200 300 400 500 600 700 800 Figure 3. estimated fBm by Kalman filters 2
900
1000
0
100
200
900
1000
4 2 0 -2 -4
300 400 500 600 700 800 Figure 4. estimasted fBm by Wiener filters
( R ≥ 2) .We consider 2048 samples of a fractal signal embedded in additive white noise with variance σ w =1 and calculate the theoretical mean square error 2
values of Kalman Filters 1 ( p = 4 ), Kalman Filers 2 ( p = 4 )and Wiener filters( M = 5 ) corresponding to 3 scales. Table 1 Mean-Square Error for Different Values of the Parameter H and Different Methods
Haar wavelet
H 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95
Kalman filter 1
0.5499 0.5551 0.5627 0.5726 0.5849 0.5994 0.6161 0.6345 0.6544
Kalman filter 2
0.5498 0.5543 0.5606 0.5683 0.5770 0.5862 0.5954 0.6040 0.6115
312
Juan Zhao et al.
Table 2 Mean-Square Error for Different Values of the Parameter H and Different Methods Daubechies5 wavelet
H 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95
Kalman filter 1
Kalman filter 2
0.4610 0.4520 0.4442 0.4374 0.4315 0.4262 0.4216 0.4176 0.4141
0.4550 0.4454 0.4370 0.4297 0.4231 0.4173 0.4122 0.4076 0.4035
We observe that we can obtain more accurate estimate by Kalman filters with 2% improvement under the same conditions and the estimation errors in the table II are dramatically lower than the values shown in the table I using wavelet with large R. Furthermore, we consider the choice of the number of scales J and p in the AR model. We shall show some numerical results obtained from simulations. For the simulations, we have considered Kalman filters 2 with Haar wavelet and H=0.75.The results obtained are shown in the Table III and we find that large p can decrease the estimation error effectively. Table 3 Mean-Square Error for Different Values of
p
in the AR Model and the Number of
Scales J
p=1 J=11 p=2 J=11 p=3 J=11
4
0.5879 0.5795 0.5772
p=2 J=3 p=3 J=3 p=4 J=3
0.5805 0.5782 0.5770
Conclusions
A scheme for the estimation of fBm was developed on the basis of a bank of multiscale Kalman filters. It takes into account the correlation of the wavelet coefficients and the approximation coefficients in the wavelet expansion. In this paper, we propose the more accurate estimation based on [5]. Comparisons between Wiener and Kalman filters are given. Numerical results were shown on the wavelet and the minimum mean-square error for different values of the parameter H in the fBm. The theoretical results matched the ones obtained in the numerical simulations. It can be used to estimate signal in noise such as communication in radar, processing of biological signal etc.
Multiscale Kalman Filtering of Fractal Signals Using Wavelet Transform
313
References 1. 2. 3. 4. 5. 6. 7.
B.S. Chen and G.W. Lin, “Multiscale Wiener filter for the restoration of fractal signals: Wavelet filter bank approach,” IEEE Trans. Signal Processing.Vol. 42,N0.11,pp. 2972~2982,1994. B.S. Chen and W.S. Hou, “Deconvolution filter design for fractal signal transmission systems: A multiscale Kalman filter bank approch,”IEEE Trans. Signal Processing, Vol. 45,pp. 1395~1364, 1997. P. Flandrin “Wavelet analysis and synthesis of fractional Brownian motion ”IEEE Trans.Inform . Theory Vol . 38,No.2,pp . 910~917,1992. G. A. Hirchoren and C. E . D‘Attellis “Estimation of fractal signals using wavelets and filter banks,” IEEE Trans. Signal processing, Vol.46,No.6, pp.1624~1630, June 1998. G. A. Hirchoren and C. E . D‘Attellis “Estimation of fractal Brownian Motion with Multiresolution Kalman Filter Banks,“ IEEE Trans Signal Processing, Vol. 47, No.5,pp. 1431~1434, May 1999. B.B.Mandelbrot and J.W.Van Ness, “Fractional Browrian motions, fractional motions, fractional noises and applications,” SIAM Rev., Vol.10. No.4, pp.422~437, 1968. G.W. Wornell and A.V. Oppenheim, “Estimation of fractal systems from noisy measurements using wavelets,” IEEE Trans. Signal Processing, Vol, 40, No.3, pp.611~622, 1992.
General Analytic Construction for Wavelet Low-Passed Filters* Jian Ping Li1 and Yuan Yan Tang2 1
International Centre for Wavelet Analysis and Its Applications, Logistical Engineering University, Chongqing 400016, P. R. China [email protected], [email protected] 2 Department of Computer Science, Hong Kong Baptist University, Hong Kong [email protected]
Abstract. The orthogonal wavelet lowpassed filters coefficients with arbitrary length are constructed in this paper. When N=2k and
N = 2 k − 1 , the general analytic constructions of orthogonal wavelet
filters are put forward, respectively. The famous Daubechies filter and many other wavelet filters are tested by the proposed novel method, which is very useful for wavelet theory research and many applications areas such as pattern recognition.
1
Introduction
The scaling equation is
ϕ(x)=∑ h(n)ϕ(2x-n) , n∈ Z
(1)
where h(n), n∈ Z, is very complex, but also the building function of multiresolution analysis is not easy for finding. Essentially, finding wavelet function or scaling function is equally finding filters coefficients of wavelet[1~12]. The conditions of orthonormal bases of compactly supported wavelets are Eq.(2) to Eq.(5) as following: 2 N −1
∑ h (i ) =
i=0
2,
N −1 2 = h ( 2 i ) ∑ ∑ h ( 2 i + 1) = 2 , i=0 i=0
(2)
N −1
*
(3)
This work was supported by the National Natural Science Foundation of China under the grant number 69903012.
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 314-320, 2001. Springer-Verlag Berlin Heidelberg 2001
General Analytic Construction for Wavelet Low-Passed Filters
h(0)
h(1) h(2) .... h(2N-1) h(0) h(1) h(2) .... h(2N-1) ............................ h(0) h(1) h(2) .... h(2N-1) N × 2 N
315
(4)
is row orthogonal matrix, and 2 N −1
∑ h (i )h (i ) = 1 .
(5)
i=0 2
Analytic Construction of Wavelet Filters when N=1,2
When N=1 the filter has only two coefficients, if we denote the parameter angle as α, from Eq.(5), we can suppose h(0)= cosα; h(1)= sinα. It is clear that, if Eq.(2) and Eq.(3) are satisfied, then α=
π
4
, here the wavelet
function is Haar-Wavelet. On the contrary, for arbitrary parameter angle α, Eq.(4) and Eq.(5) are naturally satisfied, but Eq.(2) and Eq.(3) are not satisfied. We will discuss the case of N=2 below. Theorem 1 When N=2, wavelet filter coefficients {h(0),h(1),h(2),h(3)} are subject to orthogonal wavelet conditions Eq.(2), Eq.(3), Eq.(4) and Eq.(5) if and only if there are parameters angles α,β such that
α+β=
π 4
,
h(0)= cosα cosβ; h(1)= sinαcosβ; h(2)=- sinαsinβ; h(3)= cosαsinβ. Especially, when α=
π
3
, β= −
π
12
,the wavelet coefficients are below
h(0)=0.482963 h(1)=0.836516 h(2)=0.224144 h(3)=-0.129410, which are Daubechies wavelet filter coefficients when N=2. Of course, from above formulae, infinite kinds of filters coefficients will be gotten. The proof of this theorem is shown in [12].
316
3 3.1
Jian Ping Li and Yuan Yan Tang
Analytic Construction of Wavelet Filters when N = 2 k − 1 Commonly Conclusion When
N = 2k − 1
First of all, we define decomposed rules. Rule 1 Strict Sequence Decomposition (SSD) Definition 1 The decompositions of cosine function cos(α+β) and sine function sin(α+β) are called strict sequence decomposition (SSD) if cos(α+β)=cosαcosβ-sinαsinβ, sin(α+β)=sinαcosβ+cosαsinβ . It is clear that, decomposed items of cosine function cos(α+β) form even coefficients such as h0,h2,..., and decomposed items of sine function sin(α+β) form odd coefficients such as h1,h3,...... Definition 2 The decompositions of cosine function cos(α+β) and sine function sin(α+β) are called inverse sequence decomposition (ISD) if cos(α+β)=-sinαsinβ+cosαcosβ, sin(α+β)=cosαsinβ+sinαcosβ . Rule 2 Parameter Angles Sequence Combination (PASC) Definition 3 Decomposition of parameter angles is called back combination way (BCW) if α+β+γ=α+(β+γ), α+β+γ+θ= α+(β+γ+θ) . ... Definition 4 Decomposition of parameter angles is called fore combination way (FCW) if α+β+γ=(α+β)+γ, α+β+γ+θ=(α+β+γ)+θ . ... In following subsection, we decompose cos(α1+α2+.....αk) and sin(α1+α2+ .....αk) in BCW and SSD for k>1, N = 2 k − 1 时,and denote their decomposed items as vectors as following C(k)={h(0),h(2),h(4),...h(2N-2)},
(6)
S(k)={h(1),h(3),h(5)...h(2N-1)},
(7)
respectively. Then the decomposed items for cos(α+α1+α2+.....αk) and sin(α+α1+ α2+ .....αk) , which contain k+1 parameter angles as following vectors C (k+1)={cosα C (k), -sinαS(k)},
(8)
S (k+1)={sinα C (k), cosα S(k)},
(9)
respectively. This result is very simple for us to prove.
General Analytic Construction for Wavelet Low-Passed Filters
317
Lemma 1 For arbitrary set of parameter angles α1, α2, ..... αk (k>1, N = 2 k − 1 ), if cos(α1+α2+.....αk) and sin(α1+α2+.....αk) are expanded in BCW and SSD, then (k)={h(0),h(2),h(4),...h(2N-2)}, (k)={h(1),h(3),h(5)...h(2N-1)}, are subject to the row orthogonal condition Eq.(4). Proof. A mathematics induction method will be used in this proof, the details of the proof is shown in [12]. Theorem 2 For any set parameter angles of α1,α2,....αk (k>1, N = 2 k − 1 ), cos(α1+α2+.....αk), and sin(α1+α2+.....αk) are decomposed in BCW and SSD, i.e., the decomposed items from Eq.(6) to Eq.(9). If the sum of all the parameter angles is
π
4
, then the decomposed items are wavelet filters coefficients which subject to the
orthogonal wavelet basis conditions such as Eq.(2) to Eq.(5). 3.2
Some Special Cases for N = 2 k − 1
Some details are discussed for some k ( such as k=3)parameters in this subsection. Please pay attention to the sequence of decomposition, combination, incorporation. First of all, we construct the wavelet filter coefficients for N = 2 k − 1 , where k=3. Decomposing cos(α+β+γ), sin(α+β+γ) in BCW and SSD, we have cos(α+β+γ)=cos(α+(β+γ)) = cosα cos(β+γ)-sinαsin(β+γ) = cosα cosβ cos γ- cosα sinβ sinγ-sinα sinβ cos γ-sinα cosβ sinγ , sin(α+β+γ) =sin(α+(β+γ)) =sinαcos(β+γ)+cosαsin(β+γ) =sinαcosβcosγ-sinαsinβsinγ+cosαsinβcosγ+cosαcosβsinγ . We denote the decomposed items of cos(α+β+γ) as even items according to their sequence, i.e., h(0)=cosα cosβ cosγ; h(2)=-cosα sinβ sinγ; h(4)=-sinαsinβcosγ; h(6) =-sinα cosβ sinγ. In the same reason, denote the decomposed items of sin(α+β+γ) as odd items according to their sequence, i.e., h(1)=sinαcosβcosγ; h(3)=-sinαsinβsinγ; h(5) = cosαsinβcosγ; h(7)=cosαcosβsinγ. α+β+γ=π/4. Then, h(0),h(1),h(2),h(3),h(4),h(5),h(6),h(7) construct wavelet filter coefficients for N=4.
318
Jian Ping Li and Yuan Yan Tang
Incorporating h(4) in h(2) and h(5) in h(3), respectively, i.e., h(2)=h(2)+h(4), h(3)=h(3)+h(5), we have h(0)=cosαcosβcosγ; h(1)=sinαcosβcosγ;
h(2)=-sin(α+γ)sinβ, h(3)=cos(α+γ)sinβ,
h(4)=-sinαcosβsinγ; h(5)=cosαcosβsinγ.
Then, h(0),h(1),h(2),h(3),h(4),h(5) construct six coefficients for N=3. This is nice a easy to prove. And when α=π/2.66295; β=-π/6.28518; γ=π/2 7792 , we can get Daubechies wavelet filter coefficients forN=3 as following h(0)= .332671, h(1)=.806892, h(2)=.459878, h(3)=-.135011, h(4)=-.085441, h(5)=.0352263. If we set γ=0, and throw off zero elements, we can get wavelet filter coefficients for N=2, i.e. , Theorem 1.
4 4.1
Analytic Construction of General Filters Definition of Decomposed Method
Recursion Decomposed Method: When N=1, cosα and sinα can not be decomposed, i.e., h(0)= cosα, h(1)=sinα.. When N>1, decomposing cos(α1+α2+.....αN) and sin(α1+α2+.....αN), marking their decomposed items as following vectors, respectively, C(N)={h(0),h(2),h(4),...h(2N-2)},
(10)
S(N)={h(1),h(3),h(5)...h(2N-1)}.
(11)
Then, the decomposed items of cos(α+α1+α2+.....αN) and sin(α+α1+ α2 + .....αN) are vectors below, respectively (12) C(N+1)={cosα(C(N),0)-sinα(0,S(N))} S(N+1)={sinα(C(N),0)+cosα(0,S(N))}
(13)
Lemma 2 Denote ∑C(N) and ∑S(N) as sum of C(N) and S(N), respectively, then
4.2
∑C(N)=cos(α1+α2+.....αN),
(14)
∑S(N)=sin(α1+α2+.....αN).
(15)
General Conclusion
Lemma 3 For any set of parameter angles such as α1, α2, ..... αN, the decomposed items of cos(α1+α2+.....αN) and sin(α1+α2+.....αN) in induction method based on Eq.(10) to Eq.(13) as following , respectively,
General Analytic Construction for Wavelet Low-Passed Filters
319
(N)={h(0),h(2),h(4),...h(2N-2)}, (N)={h(1),h(3),h(5),...h(2N-1)}. Then, the decomposed items are subject to row orthogonal condition Eq.(4). Theorem 3 For any set of parameter angles such as α1, α2 ....., αk, if sum of all parameter angles is
π
4
, then decomposed items of cos(α1+α2+.....αk) and
sin(α1+α2+.....αk) based on induction decomposed method from Eq.(10) to Eq.(13) are wavelet filter coefficients which subject to orthogonal conditions such as Eq.(2) to Eq.(5).
5
Conclusions
This paper introduced the novel method of selection and construction of wavelet basis, that is, unified analytic construction of wavelet filters based on trigonometric functions. Many traditional famous wavelet coefficients are special cases of the novel method, and it is more fast, more simple, more efficient, more advantageous than the traditional methods. We have designed a very efficient and advantageous software , which can calculate fast and simply wavelet filters coefficients with arbitrary parameter angles produced by random process. The algorithms and formulae in this paper show that how to adaptively choose wavelet basis or the best wavelet basis for any problems is very easy and fast. Our novel method will influence on studies of wavelet theory and its applications, and it is very useful for application of wavelet to pattern recognition.
References 1. 2. 3. 4. 5. 6.
Daubechies: Orthonormal bases of compactly supported wavelets. Comm. Pure & Appl. Math 41(1988) 909~996 Z. X. Chen: Algorithms and applications on wavelet analysis. Xi’an Jiaotong University Publishing House, Xi’an, P.R.China( 1998) 78~119 Q. Q.Qin, Z. K. Yang: Applied wavelet analysis, Xi’an Electronic Science and Technology University Publishing House, Xi’an:,.R.China( 1994)41~53 Wickerhauser M V: Adapted wavelet analysis from theory to software. New York : SIAM , (1994) 442~462 Vaidyanathan P P, Huong P Q: Lattice structures for optimal design and robust implementation of two channel perfect-reconstruction QMF banks. IEEE Trans. On ASSP,1(1998)81~94 Jian Ping Li: Wavelet analysis & signal processing-----theory, applications & software implementations, Chongqing Publishing House, Chongqing (1997)96~101,282~298
320
7.
Jian Ping Li and Yuan Yan Tang
Jian Ping Li, Yuan Yan Tang. Applications of wavelet analysis method. Chongqing University Publishing House, Chongqing, P.R.China(1999)72~91 8. Jian Ping Li: Studies of the theories and applications of vector product wavelet transforms and wavelet analysis.[Ph.D. Thesis].In: Chongqing University, P.R.China(1998) 9. Yuan Yan Tang, et al: Wavelet theory and its application to pattern recognition. Singapore: World Scientific( 1999) 10. Jian Ping Li, Yuan Yan Tang: A novel method on fast wavelet analysis algorithm(I). Computer Science,5 ( 2001) 11. Jian Ping Li, Yuan Yan Tang: A novel method on fast wavelet analysis algorithm(II). Computer Science, 6(2001) 12. Jian Ping Li, Yuan Yan Tang: Analytic construction of wavelet filters based on trigonometric functions. IEEE Trans. on Information Theory (to appear)
A Design of Automatic Speech Playing System Based on Wavelet Transform Yishu Liu, Jinyu Cen, Qian Sun, and Lihua Yang Department of Scientific Computing and Computer Applications Zhongshan University, Guangzhou 510275, P. R. China
Abstract. This paper introduces a novel approach to store speech words after cutting the signals and decomposing them through Mallat’s decompostion algorithm, and generate a speech phrase by connecting such word data and reconstructing it through Mallat’s reconstruction algorithm. This way, speech signals of good quality can be produced easily from a small library of compressed speech words. Keywords: speech signal processing, wavelet basis, Mallat’s algorithm.
1
Introduction
Automatic Speech Playing plays an important role in speech signal processing. It is widely applied in many modern business automatic processing, such as electronic business, ATA (automatic time announcing) system, banks’ ATM system, IP telephone card service, etc.. An automatic speech playing system consists of a speech library and a speech connecting algorithm. Generally speaking, a speech library must be as small as possible and it must be guaranteed that the library has great ability in generating every kind of phrases and sentences. In the light of different practical application, accordingly we can choose the most basic speech units (in Chinese they are Chinese words’ speech data) to build an speech library. In order to reduce the volume of the library, its data can be compressed provided the result is acceptable. In this paper, we extract single Chinese word speech’s main part, which is then decomposed twice by wavelet decomposition algorithm. An approximation of the original speech obtained this way is made an element of the basic speech library. The library so built is very small. In phrase/sentence generating, the basic speech units are put together in proper order, then form a playable speech signal through wavelet reconstruction algorithm. Wavelet reconstruction makes the signals smoother, so that the audio effect is good. And this algorithm is potent in phrase/sentence generating. As a significant breakthrough of Fourier Analysis, wavelet has attached much attention in many fields from applied mathematics to signal processing. The idea of multi-resolution analysis underlying wavelet theory makes it possible to get the signals at different scales whose lengths are reduced by half successively. By
This work was supported by the Foundation for University Key Teacher by the Ministry of Education of China, NSFC(19871095) and GPNSFC(990227).
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 321–325, 2001. c Springer-Verlag Berlin Heidelberg 2001
322
Yishu Liu et al.
neglecting the details, we can get an approximation of the signal, whose length is greatly shortened but whose main characteristics remain. Some basic facts on wavelets will be stated in the next section. A textbook-like reference on speech signal processing can be found in [4].
2
Some Basic Facts on Wavelets
Wavelet analysis has become an effective mathematical tool to process signals locally on time and frequency, which was developed more than ten years ago. With a wavelet basis, the space L2 (R) can be decomposed into the orthogonal sum of a sequence of closed spaces. That means, any signal with finite energy can be expressed as the sum of a series of local frequency structures. Such a decomposition is convenient for signal compression and smoothing. In this section, some basic facts on wavelet theory used in this paper are stated without proofs. A textbook-like introduction on wavelet theory can be found in [1,2,3]. Let L2 (R) be the space of all the finite energy signals, i.e., ∞ 2 2 |f (t)| dt < ∞ . L (R) = f (t) : −∞
The well-known Mutliresolution Analysis (MRA) is defined as follows. Definition 1 Let φ(x) ∈ L2 (R). The sequence of closed subspaces of L2 (R) which are defined by Vj = {φj,k (x) = 2j/2 φ(2j x − k), k ∈ Z},
(j ∈ Z),
is called an orthonormal Mutliresolution Analysis (MRA) of L2 (R) if the following there conditions are satisfied: 1) Vj ⊆ Vj+1 , (∀j ∈ Z); 2) j∈Z Vj = L2 (R), j∈Z Vj = {0}; 3) {φ(t − k)}k∈Z is an orthonormal basis of V0 . Then φ(t) is said to be the corresponding scaling funtion of the MRA. It can be easily derived that {φj,k (t) = 2j/2 φ(2j t − k)}k∈Z is an orthonormal basis of Vj . For any j ∈ Z, we let Vj = Vj−1 ⊕ Wj−1 , where Wj−1 is the orthogonal complement of Vj−1 in Vj . We can find a wavelet function ψ(t) ∈ W0 such that {ψj,k (t) = 2j/2 ψ(2j t − k)}k∈Z is an orthonormal basis of Wj . Wj+1 ⊕ Wj+2 ⊕ ....... Therefore, for any j ∈ Z, we have L2 (R) = Vj ⊕ Wj ⊕ It can thus be inferred that ∀f (t) ∈ L2 (R), f (t) = Aj + Dn , where n>j
Aj = an (j) · φj,n , an (j) = f, φj,n n Dj = dn (j) · ψj,n , dn (j) = f, ψj,n n
A Design of Automatic Speech Playing System Based on Wavelet Transform
323
Aj is said to be the approximation of f(t) at 2j scale, it reflects the main information of f(x). And Dj is said to be the details of f(t) between scales 2j+1 and 2j . From Vj = Vj−1 ⊕ Wj−1 , we know that φj−1,0 and ψj−1,0 can be expanded in {ϕj,n }n∈Z to obtain scale equations as follows:
h(n)φj,n (t) , ψj−1,0 (t) = g(n)φj,n (t) , φj−1,0 (t) = n
n
where 1 t h(n) = 2− 2 φ( ), φ(t − n) , 2
1 t g(n) = 2− 2 ψ( ), φ(t − n) = (−1)1−n h(1 − n). 2
And therefore, S. Mallat’s decomposition and reconstruction algorithms are obtained: Algorithm 2 Decomposition: ak (j − 1) = h(n − 2k)an (j) n , dk (j − 1) = g(n − 2k)an (j)
(k ∈ Z).
n
Reconstruction:
ajk = h(k − 2n)an (j − 1) + g(k − 2n)dn (j − 1), n
3
(k ∈ Z).
n
The Construction of Basic Speech Library
The authors of the paper recorded some speech words. After our surveying and analyzing their wave forms in MatLab, a conclusion was arrived at: the waves are all composed of smooth parts and steep ones. Such an example is shown in the left of Fig.1. It is the speech data of Chinese pronunciation “jiu” of “nine”. In order to extract the signal’s main information (the steep part), an algorithm is presented as follows: (l)
1. For each speech signal f[i] , where l=0,1,2,......and i=0, 1, 2,..., Nl − 1 (Nl is an integer), do (a) Divide the interval [0, Nl − 1] equally into M parts; (b) Calculate num[j], j=0, 1, 2,..., M-1, where num[j] denotes the number (l) (l) of f[i] which satisfies: i belongs to the j’th part and f[i] ≥ ε; (c) Find the maximum among num[j], j=0, 1,..., M-1. Suppose it is num[maxj]. To select j’s around maxj such that num[j] ≥ δ ×num[maxj] and all the selected j’s are adjoining. Suppose they are j0 , j1 , ..., jk . (l) (l) N (d) Let=g[i] = f[i+ N j ] , i=0, 1, 2,..., M (jk − j0 + 1) − 1. M
0
2. Put all g (l) together in proper order to form a new signal.
324
Yishu Liu et al.
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
−0.2
−0.2
−0.4
−0.4
−0.6
−0.6
−0.8
−0.8
−1
0
0.5
1
1.5
2
2.5
−1
3
0
0.5
1
1.5
2
2.5
4
3 4
x 10
x 10
Fig. 1. Left: speech signal of “jiu”, Right: the extracted and decomposed signal 1
0.8
0.6
0.4
0.2
0
−0.2
−0.4
−0.6
−0.8
−1
0
0.5
1
1.5
2
2.5
3 4
x 10
Fig. 2. The connected speech signal
The signal extracted still has some redundance. To save speech data efficiently, we further decompose it twice with Mallat’s decomposition algorithm with respect to Daubechies 10. Then the result is stored into the library as a basic element. The right of Fig.1 is an example for Chinese word “jiu”.
4
Connection and Reconstruction of Speech Signals
When the system is assigned to play a phrase such as the Chinese phrase of nine dollars and two cents:“jiu yuan er jiao”, four words will be chosen to connect the phrase which is then reconstructed by Mallat’s algorithm with respect to Daubechies 10. It can be shown in mathematics that such a reconstruction can smooth the connection points of a signal to obtain a better audio effect.
A Design of Automatic Speech Playing System Based on Wavelet Transform
325
The experiment is shown in Fig. 2. It has been played and shown to be audio acceptable.
References 1. C. K. Chui. An Introduction to Wavelets. Academic Press, Boston, 1992. 322 2. I. Daubechies. Ten Lectures on Wavelets. Society for Industrial and Applied Mathemathics, Philadelphia, 1992. 322 3. S. Mallat. Wavelet Tour of Signal Processing. Academic Press, San Diego, USA, 2nd edition, 1998. 322 4. L. Rabiner and B. H. Juang. Fundamentals of Speech Recognition. Prentice-Hall International, Inc., 1993. 322
General Design of Wavelet High-Pass Filters from Reconstructional Symbol Lihua Yang1 , Qiuhui Chen1 , and Yuan Y. Tang2 1 2
Department of Scientific Computing and Computer Applications Zhongshan University, Guangzhou 510275, P. R. China Department of Computer Science, Hong Kong Baptist University
Abstract. For given reconstructional low-pass filters, the general solu˜ (ξ)M ∗ (ξ) = I for the construction of orthogtions of matrix equation M onal or biorthogonal wavelet filter banks are presented. Keywords: matrix equation, MRA, wavelets, filter.
1
Introduction
In the theory of wavelet Analysis, it is well-known that the key to construct an orthogonal wavelet base or a pair of biorthogonal wavelet bases from MRA (Multiresolution Analysis) is to design the filter banks {m ˜ µ (ξ), mµ (ξ) |µ ∈ Ed },with Ed being the set of all the vertices of [0, 1]d, such that ˜ (ξ) := (m M ˜ µ (ξ + πν))µ,ν∈Ed , satisfy:
M (ξ) := (mµ (ξ + πν))µ,ν∈Ed ,
˜ (ξ) = M ˜ (ξ)M ∗ (ξ) = I2d M ∗ (ξ)M
a.e. ξ ∈ T d .
(1) (2)
Usually, the question on the solutions of (2) can be described as follows: Question 1 Assume m0 (ξ), the filter function of a MRA, is given. We are ˜ µ (ξ) ∈ L∞ (T ) (µ ∈ Ed \{0}) such needed to construct m ˜ 0 (ξ), mµ (ξ), m that (2) holds. ˜ 0 (ξ), the filter functions of a pair of biorthogQuestion 2 Assume m0 (ξ) and m onal MRAs, are given. We need to construct mµ (ξ), m ˜ µ (ξ) ∈ L∞ (T ) (µ ∈ Ed \{0}) such that (2) holds. It is essentially the problem of matrix extension which can be solved by constructing a or a pair of particular solution(s), or constructing all the possible solutions. Up till now, many results have been developed on the construction of wavelet bases from MRAs mainly by constructing a or a pair of particular solution(s) of (2)(see [1,2,6,3]). In this paper, we present all analytic solutions of (2) based a special solution. We also design an algorithm to get a special solution for the matrix equation (2) and illustrate some examples to verify our results.
This work was supported by the Foundation for University Key Teacher by the Ministry of Education of China, NSFC(19871095) and GPNSFC(990227).
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 326–330, 2001. c Springer-Verlag Berlin Heidelberg 2001
General Design of Wavelet High-Pass Filters
2 2.1
327
Main Results The general solutions of matrix equation (2)
We first restate the Question 1 and 2 by polyphase factorization of filters. We ˜ µ,τ (ξ) the polyphase components of mµ (ξ), m ˜ µ (ξ) if call respectively mµ,τ (ξ), m mµ,τ (2ξ)e−iτ ξ and m ˜ µ (ξ) = m ˜ µ,τ (2ξ)e−iτ ξ . mµ (ξ) = τ ∈Ed
τ ∈Ed
Let (ν0 , · · · , νs ) be a permutation of Ed with s = 2d − 1 and ν0 = 0 ∈ Ed . ˜ For simplicity, we denote m(ξ) := (m ˜ µ,ν (ξ))µ,ν∈Ed , m(ξ) := (mµ,ν (ξ))µ,ν∈Ed ,. Then the filters mµ , m ˜ µ and their polyphase components have the following relations ˜ (ξ) = m(2ξ)E(ξ) ˜ M (ξ) = m(2ξ)E(ξ), M (3) −iν ·(ξ+πν ) s k . It is easy with Vandemonde matrix E(ξ) defined by E(ξ) = e j j,k=0 to conclude that (2) leads to ∗ ˜ (ξ) = I, 2d m(ξ)m
a.e. ξ ∈ T d .
(4)
˜ (ξ), M (ξ) satisfying (2) is equivalent It is an well-known that constructing M d ˜ to constructing 2πZ -periodic matrices m(ξ), m(ξ) satisfying (4). An equivalent discription of Question 1,2 is stated as follows: Question 1 Assume the first row of a matrix m(ξ) is given and satisfies 2 d ν∈Ed |m0ν (ξ)| = 0, a.e. ξ ∈ T . We are needed to construct the other d ˜ rows and the 2πZ − periodic matix m(ξ) such that (4) holds. ˜ Question 2 Assume the first rows of matrixes m(ξ) and m(ξ) are given and satisfy: 2d m ˜ 0,ν (ξ)m ¯ 0,ν (ξ) = 1, a.e. ξ ∈ T d . ν∈Ed
We are needed to construct the other rows such that (4) holds. Let e0 be the column vector whose first element equals to 1 and others being 0s. The following notations will be used in the following theorem. ∆d (ξ) := |m0 (ξ/2 + πν)|2 = 2d |m0,ν (ξ)|2 ; (5) ν∈E d
˜d (ξ) := ∆
ν∈E d 2
|m ˜ 0 (ξ/2 + πν)| = 2
ν∈E d
2d/2 U (ξ)e0 = mt (ξ)e0 ∆d (ξ) 2d/2 ˜ ˜ t (ξ)e0 U(ξ)e m 0 = ˜d (ξ) ∆
d
|m ˜ 0,ν (ξ)|2 .
(6)
ν∈E d
a.e. ξ ∈ T d .
(7)
a.e.ξ ∈ T d
(8)
328
Lihua Yang et al.
Theorem 1. Suppose that 2πZ d -periodic functions {m0,ν (ξ)}ν∈E d satisfy 2 d and U (ξ) is a 2πZ d -periodic unitary maν∈Ed |m0,ν (ξ)| = 0 a.e. ξ ∈ T d ˜ trix satisfying (7). Then the 2πZ -periodic solutions m(ξ) and m(ξ) of (4) can be expressed as: m(ξ) = m0,0 (ξ) · t· · m0,νt s (ξ) , a.e. ξ ∈ T d; (c(ξ), A (ξ))U (ξ) t (ξ)A−1 (ξ) −d/2 1 −c ˜ = √2 U t (ξ), a.e. ξ ∈ T d. m(ξ) ∆d (ξ) 0(2d −1)×1 2−d/2 ∆d (ξ) A−1 (ξ) (9) d -periodic functions { m ˜ (ξ)} d satisfying If there exist 2πZ 0,ν ν∈E ˜ 0,ν (ξ)m ¯ 0,ν (ξ) = 1 a.e. ξ ∈ T d , then the solution of (4) with the 2d ν∈Ed m ˜ first rows of m(ξ) and m(ξ) being (m0,ν (ξ))ν∈Ed and (m ˜ 0,ν (ξ))ν∈Ed respectively can be expressed as follows: m0,0 (ξ) · · · m0,νs (ξ) , a.e. ξ ∈ T d ; m(ξ) = (c(ξ), At (ξ)) U t (ξ) (10) m ˜ 0,0 (ξ) · · · m ˜ 0,νs (ξ) d m(ξ) ˜ = , a.e. ξ ∈ T (0(2d −1)×1 , 2−d A−1 (ξ))U t (ξ) where A(ξ) is a 2πZ d −periodic nonsingular matrix of order 2d − 1 and ¯˜ 0,0 (ξ) m .. c(ξ) = 2d/2 ∆d (ξ) [0, At (ξ)]U t (ξ) . . ¯˜ 0,νs (ξ) m ˜ (ξ) is a 2πZ d -periodic unitary matrix satisfying (8), then Furthermore, if U m(ξ) can be rewritten as follows: m0,0 (ξ) · · · m0,νs (ξ) (11) m(ξ) = ˜ t (ξ) . (0, At (ξ)L(ξ)) U where
¯˜ (ξ) ¯˜ (ξ)e et U t (ξ) U L(ξ) = (0, I2d −1 ) U t (ξ) I2d − ∆d (ξ)∆˜d (ξ)U 0 0
0
I2d −1 (12) ˜ 0,ν (ξ) = m0,ν (ξ) a.e. ξ ∈ is a nonsingular matrix of order 2d −1. Particularly, if m Td (∀ν ∈ Ed ), we have L(ξ) = I2d −1 . 2.2
The Construction of Unitary Matrix U (ξ)
This section focuses on the construction of the 2πZ d - periodic unitary matrix U (ξ) which satisfies (7) for a given nonzero vector m0 (ξ) := (m0,0 (ξ), · · · , m0,νs (ξ))t . For simplicity, we denote |m0ν (ξ)|2 = 2−d/2 ∆d (ξ). (13) m0 (ξ) := ν∈Ed
General Design of Wavelet High-Pass Filters
329
Theorem 2. Let m0 (ξ) = 0 and m00 (ξ) = |m00 (ξ)|e−iθ(ξ) a.e. ξ ∈ T . Then, U (ξ), which is defined by m (ξ) 00 m0ν1 (ξ) 1 U (ξ) = .. m0 (ξ) .
m0ν1 (ξ)
m0νs (ξ) with
··· M1 (ξ)
M1 (ξ) =
e m0 (ξ) + |m00 (ξ)| −m0 (ξ)e
−iθ(ξ)
(14)
m0ν1 (ξ)m0ν1 (ξ) .. .
···
m0ν1 (ξ)m0νs (ξ) .. .
m0νs (ξ)m0ν1 (ξ)
···
m0νs (ξ)m0νs (ξ)
−iθ(ξ)
m0νs (ξ) ,
I2d −1 ,
is a unitary matrix satisfying (7). Furthermore, If there exists constant c > 0 such that m0 (ξ) > c (∀ξ ∈ T d ), then U (ξ) is smooth as m0 (ξ) and θ(ξ). 2.3
Examples
For simplicity, we denote x := e−iξ1 , y := e−iξ2 . It can be verified that the following polynomials 1 1 1 −1 1 y 1 1 2 3 1 −1 1 m0 (ξ1 , ξ2 ) = (1, x, x , x ) 1 1 −1 1 y 2 8 −1 1 1 1 y3 satisfies |m0 (ξ1 , ξ2 )|2 + |m0 (ξ1 + π, ξ2 )|2 + |m0 (ξ1 , ξ2 + π)|2 + |m0 (ξ1 + π, ξ2 + π)|2 = 1. The unitary matrix in Theorem 2.2 is U (ξ1 , ξ2 ) := 1 + x + y − xy 1 1 − x + y + xy 4 1 + x − y + xy −1 + x + y + xy
1 − x + y + xy 1 + x + y − xy 1 − x − y − xy −1 − x + y − xy
1 + x − y + xy 1 − x − y − xy 1 + x + y − xy −1 + x − y − xy
−1 + x + y + xy −1 − x + y − xy −1 + x − y − xy 1 + x + y − xy
330
Lihua Yang et al.
which leads to the following three high-pass filters 1 1 1 −1 1 1 −1 1 1 y m1 (ξ1 , ξ2 ) = 18 (1, x, x2 , x3 ) −1 −1 1 −1 y 2 3 1 −1 −1 −1 y 1 1 −1 1 1 1 −1 −1 −1 y m2 (ξ1 , ξ2 ) = 18 (1, x, x2 , x3 ) 1 1 1 −1 y 2 3 −1 1 −1 −1 y −1 −1 1 −1 1 −1 1 1 1 y 1 2 3 m3 (ξ1 , ξ2 ) = 8 (1, x, x , x ) 1 1 1 −1 y 2 y3 −1 1 −1 −1 Since there are different choices for the unitary matrix A(x, y), we can also construct other high-pass filter for the given low-pass filter m0 (ξ1 , ξ2 ). For example, for the unitary matrix 0 √y2 0, A(x, y) = 0 xy x √ √y 0 − 2 2 √x
2
we can get another three high-pass filters corresponding to the low-pass filter m0 (ξ1 , ξ2 ). We omit the details here.
References 1. A. Cohen,I.Daubechies,and J. C.Feauveau, Biorthogonal bases of compactly supported wavelets, Comm. Pure Appl. Math., 45:485-560,1992. 326 2. K.Grochning, Analyse multi-echelle et bases d’ondelettes, Acad. Sci. Paris., Serie 1,305:13-17,1987. 326 3. R. Q. Jia and C. A. Micchelli, Using the refinement equation for the construction of pre-wavelets V:extensibility of trigonometric polynomials, Computing, Vol. 48, 61-72,1992. 326 4. D. X. Zhou, Construction of real-valued wavelets by symmetry, preprint. 5. S. Mallat, Review of Multifrequency Channel Decomposition of Images and Wavelet Models, Technical report 412, Robotics Report 178, New York Univ., (1988). 6. C. A.Micchelli and Yuesheng Xu, Reconstruction and decomposition algorithms for biorthogonal multiwavelets, Multidimensional Systems and Signal Processing 8, 31-69,1997. 326
Realization of Perfect Reconstruction Non-uniform Filter Banks via a Tree Structure Wing-kuen Ling and Peter Kwong-Shun Tam
Department of Electronic and Information Engineering The Hong Kong Polytechnic University Hung Hom, Kowloon, Hong Kong Hong Kong Special Administrative Region, China Tel: (852) 2766-6238, Fax: (852) 2362-8439 Email: [email protected]
Abstract. It is well known that a tree structure filter bank can be realized via a non-uniform filter bank, and perfect reconstruction is achieved if and only if each branch of the tree structure can provide perfect reconstruction. In this paper, the converse of this problem is studied. We show that a perfect reconstruction non-uniform filter bank with decimation ratio {2,4,4} can be realized via a tree structure and each branch of the tree structure achieves perfect reconstruction.
1
Introduction It is well known that the tree structure filter bank shown in figure 1b can be realized via a non-uniform filter bank shown in
figure 1a, and perfect reconstruction can be achieved if and only if each branch of the tree structure can provide perfect reconstruction [1-4]. However, is the converse true? That is, given any perfect reconstruction non-uniform filter shown in figure 1a, can it be realized via a tree structure shown in figure 1b? In general, a perfect reconstruction non-uniform filter bank cannot be realized by a tree structure [11]. This paper works on this problem. There are some advantages of realizing a non-uniform filter bank via a tree structure, such as reducing the filter length in the filters [5], and improving the computation complexity and implementation speed [5]. In section II, we show how a perfect reconstruction non-uniform filter bank can be converted to a tree structure filter bank. Some illustrative examples are demonstrated in section III. Finally, a conclusion is given in section IV.
2
Realization of Non-uniform Filter Bank Via a Tree Structure
Theorem 1 A non-uniform filter bank with decimation ratio { 2,4,4} achieves perfect reconstruction if and only if it can be realized via a tree structure and each branch of the tree structure achieves perfect reconstruction. Proof: Since the if part was well known [1-4], we only prove the only if part. A non-uniform filter bank shown in figure 1a achieves perfect reconstruction if and only if ∃c∈C and ∃m∈Z such that:
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 331-335, 2001. c Springer-Verlag Berlin Heidelberg 2001
332
Wing-kuen Ling and Peter Kwong-Shun Tam 1 ⋅ H 0 (z ) 2 0 1 2 ⋅ H 0 (z ⋅W ) 2 0
1 ⋅ H (z ) 4 1 1 ⋅ H (z ⋅W ) 4 1 1 ⋅ H1 (z ⋅ W 2 ) 4 1 ⋅ H (z ⋅ W 3 ) 4 1
1 ⋅ H (z ) 4 2 c⋅ zm 1 ⋅ H 2 (z ⋅ W ) G0 (z ) 0 . 4 ⋅ G1 ( z) = 1 ⋅ H (z ⋅ W 2 ) G (z ) 0 2 4 2 0 1 ⋅ H 2 (z ⋅ W 3 ) 4
(1)
This directly implies that: H (z ⋅ W ) H 2 (z ⋅ W ) =0. det 1 3 3 H 1 (z ⋅ W ) H 2 (z ⋅ W )
(2)
H1 (z ) H (z ⋅ W ) H 2 (z ⋅ W ) H 2 ( z) =0, = 0 ⇒ det det 1 2 2 3 3 H 1 (z ⋅ W ) H 2 (z ⋅ W ) H 1 (z ⋅ W ) H 2 (z ⋅ W )
(3)
Since
hence 1 ⋅ H 0 (z ) 2 0 1 2 ⋅ H 0 (z ⋅W ) 2 0
1 ⋅ H (z ) 4 1 1 ⋅ H (z ⋅W ) 4 1 1 ⋅ H1 (z ⋅ W 2 ) 4 1 ⋅ H (z ⋅ W 3 ) 4 1
1 ⋅ H 2 (z ) 4 c⋅ zm 1 ⋅ H 2 (z ⋅ W ) G0 (z ) 0 4 ⋅ G1 ( z) = 1 ⋅ H 2 (z ⋅ W 2 ) G (z ) 0 2 4 0 1 ⋅ H 2 (z ⋅ W 3 ) 4
H1 (z ) H 2 ( z) H 2 ( z) and det H 0 (z ) ≠ 0. ⇒ det ≠ 0 H 0 (z ⋅ W 2 ) H 2 (z ⋅ W 2 ) H1 (z ⋅ W ) H 2 (z ⋅ W )
(4)
The converse is also true. That is if: H1 (z ) H 2 (z ) H 0 (z ) H 2 ( z) , and det H1 (z ⋅ W ) H 2 (z ⋅ W ) = 0 , det ≠ 0 det H (z ⋅ W 2 ) H (z ⋅ W 2 ) ≠ 0 H (z ⋅ W 3 ) H (z ⋅ W 3 ) 2 2 H 1 ( z ⋅W ) H 2 (z ⋅ W ) 0 1
then there exist G0 (z), G1(z) and G2 (z) such that: 1 ⋅ H 0 (z ) 2 0 1 2 ⋅ H 0 (z ⋅W ) 2 0
1 ⋅ H (z ) 4 1 1 ⋅ H 1( z ⋅ W ) 4 1 ⋅ H (z ⋅ W 2 ) 4 1 1 ⋅ H (z ⋅ W 3 ) 4 1
1 ⋅ H (z ) 4 2 c⋅ zm 1 ⋅ H 2 (z ⋅ W ) G0 (z ) 0 . 4 ⋅ G1 ( z) = 1 ⋅ H (z ⋅ W 2 ) G (z ) 0 2 4 2 0 1 ⋅ H (z ⋅ W 3 ) 4 2
(5)
Let H (z ) = z −l ⋅ E (z 4 ) , for i=0,1,2, then from equation (2), there exist R(z), R’ (z) and R”(z) such that: ∑ i i ,l 3
l =0
E1,1 (z4)=R(z4)◊E1,0(z4) and E2,1 (z4)=R(z4)◊E2,0(z4) and
(6)
E1,3 (z4)=R’ (z4)◊E1,2 (z4) and E2,3(z4 )=R’ (z4)◊E2,2(z4 ) and
(7)
{R(z4 )=R’ (z4) or {E1,0(z4 )=R”(z4)◊E1,2(z4 ) and E2,0(z4)=R”(z4 )◊E2,2 (z4)}}.
(8)
But E1,0(z4 )=R”(z4)◊E1,2(z4) and E2,0 (z4)=R”(z4 )◊E2,2 (z4) contradict equation (4). Hence, we have R(z4)=R’ (z4). R(z4 )=R’ (z4), which implies:
Realization of Perfect Reconstruction Non-uniform Filter Banks via a Tree Structure H 2 (z ) E 2 ,0 (z 4 ) + z −2 ⋅ E 2, 2 (z 4 ) . = H1 (z ) E1, 0 (z 4 ) + z − 2 ⋅ E1, 2 (z 4 )
333
(9)
Hence, there exist F’ 1(z), F0(z), and F1 (z) such that H1 (z)=F’ 1(z)◊F0(z2 ) and H2 (z)=F’ 1(z)◊F1(z2 ), respectively. And the nonuniform filter bank shown in figure 1a can be realized via a tree structure shown in figure 1b. From equation (4), we have: E1,2 (z4)◊E2,0(z4 )-E1,0 (z4)◊E2,2(z4)π0 and
(10)
{E0,1(z4 )◊E2,0 (z4)-E0,0(z4 )◊E2,1 (z4)π0 or
(11)
E0,3 (z4)◊E2,2(z4 )-E0,2 (z4)◊E2,3(z4)π0 or E0,1 (z4)◊E2,2(z4 )+E0,3(z4)◊E2,0(z4 )-E0,0 (z4)◊E2,3(z4 )-E0,2 (z4 )◊E2,1(z4)π0}.
(12) (13)
Let F1 (z2) be the numerator of H2 (z)/H1(z), and F0 (z2) be the denominator of H2 (z)/H1(z), respectively. We have: F1 (z ) F (z ) , and −1 2 2 2 2 det 0 = 2 ⋅ z ⋅ (E1,2 (z )⋅ E 2, 0 (z ) − E1,0 (z )⋅ E 2, 2 (z )) ≠ 0 F0 (− z) F1 (− z )
(14)
F1′(z ) 2 ⋅ z −1 ⋅ (E 0 ,1 (z 4 ) ⋅ E 2 ,0 (z 4 ) − E 0 ,0 (z 4 )⋅ E 2 ,1 (z 4 )) + 2 ⋅ z −5 ⋅ (E 0 ,3 (z 4 )⋅ E 2 ,2 (z 4 ) − E 0 ,2 (z 4 )⋅ E 2,3 (z 4 )) H (z ) = det 0 4 −2 4 H 0 (− z ) F1′(− z ) E 2 ,0 (z ) + z ⋅ E 2 ,2 (z ) 2 ⋅ z −3 ⋅ (E 0 ,1 (z 4 )⋅ E 2 ,2 (z 4 ) + E 0 ,3 (z 4 )⋅ E 2 ,0 (z 4 ) − E 0 ,0 (z 4 )⋅ E 2,3 (z 4 ) − E 0 ,2 (z 4 )⋅ E 2,1 (z 4 )) + 4 −2 4 E 2 ,0 (z ) + z ⋅ E 2 ,2 (z ) ≠ 0.
Hence, each branch of the tree structure achieves perfect reconstruction.
3
Illustrative Examples
3.1
Non-tree Structure Filter Bank
(15)
Consider an example of H0(z)=1+z -1, H1(z)=1-z-1, and H2(z)=z -2, respectively. Since H1 (z)/H2(z)=z 2◊(1-z-1), there does not exist F’ 1(z), F0(z), and F1(z) such that H1(z)=F’ 1(z)◊F0(z2) and H2(z)=F’ 1(z)◊F1(z2 ), respectively. Hence, this non-uniform filter bank cannot be realized via a tree structure. By theorem 1, this non-uniform filter bank does not achieve perfect reconstruction. It is worth to note that by converting the non-uniform filter bank to a uniform filter bank shown in figure 2 [6-10], perfect reconstruction can be achieved. However, G’ -1(z)πz-2◊G’ 0(z), this implies that the corresponding synthesis filter G0(z) shown in figure 1a is time varying.
3.2
Tree Structure Filter Bank
Consider another example with H0(z)=2◊(1+z -1 +z-2 +z-3), H1(z)=4◊(2+6◊z-1 +4◊z-2+12◊z-3), and H2 (z)=4◊(5+15◊z-1 +7◊z2
+21◊z-3), respectively. Since H1(z)/H2(z)=(2+4◊z-2)/(5+7◊z-2 ), there exists F’ 1(z), F0 (z), and F1(z) such that H1 (z)=F’ 1(z)◊F0 (z2) and
H2(z)=F’ 1(z)◊F1(z2), respectively. Hence, this non-uniform filter bank can be realized via a tree structure. It can be checked easily that each branch in the tree structure achieves perfect reconstruction. Hence, this non-uniform filter bank achieves perfect reconstruction.
334 4
Wing-kuen Ling and Peter Kwong-Shun Tam Conclusion In this paper, we show that a non-uniform filter bank with decimation ratio {2,4,4} achieves perfect reconstruction if and
only if it can be realized via a tree structure and each branch of the tree structure achieves perfect reconstruction. The advantage of realizing a non-uniform filter bank via a tree structure is to reduce the computation complexity and provide a fast implementation for a non-uniform filter bank [5].
Acknowledgement The work described in this paper was substantially supported by a grant from the Hong Kong Polytechnic University with account number G-V968.
References 1.
Vaidyanathan P. P.: Lossless Systems in Wavelet Transforms. IEEE International Symposium on Circuits and Systems, ISCAS, Vol. 1. (1991) 116-119.
2.
Soman K. and Vaidyanathan P. P.: Paraunitary Filter Banks and Wavelet Packets. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, Vol. 4. (1992) 397-400.
3.
Sodagar I., Nayebi K. and Barnwell T. P.: A Class of Time-Varying Wavelet Transforms. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, Vol. 3. (1993) 201-204.
4.
Soman A. K. and Vaidyanathan P. P.: On Orthonormal Wavelets and Paraunitary Filter Banks. IEEE Transactions on Signal Processing, Vol. 41, No. 3. (1993) 1170-1183.
5.
Vaidyanathan P. P.: Multirate Systems and Filter Banks. Englewood Cliffs, NJ: Prentice Hall, 1993.
6.
Hoang P. Q. and Vaidyanathan P. P.: Non-Uniform Multirate Filter Banks: Theory and Design. IEEE International Symposium on Circuits and Systems, ISCAS, Vol. 1. (1989) 371-374.
7.
Li J., Nguyen T. Q. and Tantaratana S.: A Simple Design Method for Nonuniform Multirate Filter Banks. Conference Record of the Twenty-Eight Asilomar Conference on Signals, Systems and Computers, Vol. 2. (1995) 1015-1019.
8.
Makur A.: BOT’ s Based on Nonuniform Filter Banks. IEEE Transactions on Signal Processing, Vol. 44, No. 8. (1996) 1971-1981.
9.
Li J., Nguyen T. Q. and Tantaratana S.: A Simple Design Method for Near-Perfect-Reconstruction Nonuniform Filter Banks. IEEE Transactions on Signal Processing, Vol. 45, No. 8. (1997) 2105-2109.
10. Omiya N., Nagai T., Ikehara M. and Takahashi S. I.: Organization of Optimal Nonuniform Lapped Biorthogonal Transforms Based on Coding Efficiency. IEEE International Conference on Image Processing, ICIP, Vol. 1. (1999) 624-627. 11. Akkarakaran S. and Vaidyanathan P. P.: New Results and Open Problems on Nonuniform Filter-Banks. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, Vol. 3. (1999) 1501-1504.
Realization of Perfect Reconstruction Non-uniform Filter Banks via a Tree Structure
x[n]
H 0 ( z)
↓2
↑2
G0 (z)
H 1 ( z)
↓4
↑4
G1 ( z)
H 2 (z)
↓4
↑4
G2 (z )
y[n]
(a) H 0 ( z)
F0 ( z)
x[n]
F1′(z )
↑2
G0 (z)
↓2
↑4
G1 ( z)
↓2
↑4
G2 (z )
↓2
y[n]
↓2
F1 (z ) (b)
Fig. 1. (a) Non-uniform filter bank (b) Tree structure filter bank
z 2 ⋅ H0 (z )
↓4
↑4
G−′1 (z )
H 0 ( z)
↓4
↑4
G0′ (z) y[n]
x[n]
H 1 ( z)
↓4
↑4
G1 ( z)
H 2 (z)
↓4
↑4
G2 (z )
Fig. 2. Realization of non-uniform filter bank via a uniform filter bank
335
Set of Decimators for Tree Structure Filter Banks Wing-kuen Ling and Peter Kwong-Shun Tam
Department of Electronic and Information Engineering The Hong Kong Polytechnic University Hung Hom, Kowloon, Hong Kong Hong Kong Special Administrative Region, China Tel: (852) 2766-6238, Fax: (852) 2362-8439 Email: [email protected]
Abstract. In this paper, we propose a novel method to test if a set of decimators can be generated by a tree structure filter bank. The decimation ratio is first sorted in an ascending order. Then we group the largest decimators with the same decimation ratio together and form a new set of decimators. A set of decimators can be generated by a tree structure filter bank if and only if by repeating the above procedure, all the decimators can be grouped together. Some examples are illustrated to show that the proposed method is simple and easy to implement.
1
Introduction Non-uniform filter banks have taken an important role in this decade and they are widely applied in the area of digital
image compression [3, 6, 7, 9, 13]. By realizing a non-uniform filter bank in a tree structure [1, 2, 4, 5, 10-12], the filter lengths in the filters can be reduced, improving the computation complexity and the implementation speed [15]. However, not all the non-uniform filter banks can be realized via a tree structure [5, 8, 10-12]. This paper is to propose a method to test if a set of decimators can be generated by a tree structure filter bank. In order to tackle this problem, a method to compute the number of combinations of sub-trees is proposed [8]. However, if the number of decimators is large, it is very complicated to compute the number of combinations of sub-trees. Also, this method is order dependent, which will give a wrong result by changing the order of the decimators in the set [8].
2
Proposed Algorithm
Theorem 1
Let the ordered set of decimators {n 0,º,n 0,n 1,º,n 1,º,n N-1,º,n N-1} be D , where n i>n j for i>j, and the multiplicity of n i in D be p i. By grouping the largest decimators with the same decimation ratio together and forming a new set of decimators, a set of decimators can be generated by a tree structure filter bank if and only if by repeating the above procedures, all the decimators can be grouped together. Proof:
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 336-340, 2001. c Springer-Verlag Berlin Heidelberg 2001
Set of Decimators for Tree Structure Filter Banks
337
Consider the only if part first. If D can be generated by a tree structure filter bank, then there should be no branch coming out from the decimators n N-1. Otherwise, n N-1 is not the greatest number in D . Hence, by grouping some or all of the decimators with decimation ratio n N-1 together, the branch corresponding to the grouped decimators is removed. Suppose k N-1 decimators are grouped together, where 2£k N-1£p N-1, then the effective decimation ratio corresponding to the grouped decimators is n N-1/k N-1. And the new set of the decimators become {n 0,º,n 0,n 1,º,n 1,º,n N-2,º,n N-2,n N-1/k N-1,n N-1,º,n N-1}. If D can be generated by a tree structure filter bank, then ∃n i∈D such that n i=nN-1/k N-1. Hence, by repeating the above procedure, all the branches are removed and eventually there is only one decimator left in D , which is 1. And this proves the only if part. For the if part, since ∃n i ∈D such that n i=nN-1/k N-1, we can construct a sub-tree corresponding to those k N-1 channels. By repeating the above procedure, the non-uniform filter bank can be realized in the form of a tree structure. Hence, this proves the
if part and the theorem.
Some problems are: When should we group all of the decimators with decimation ratio n N-1 together, that is k N-1=pN-1? When should we group part of them together, that is k N-1
3
Illustrative Examples It is well known that {2,4,4}, {2,6,6,6} can be generated by a tree structure [14]. By applying our proposed algorithm, it
gives the same conclusion as below: Original Decimators 4 4 2 Original Decimators 6 6 6 2
First Grouping 2
Second Grouping 1
2 First Grouping 2
Second Grouping 1
2
Since all the decimators are grouped together, the above two sets of decimators can be generated by a tree structure. Consider another example as follows:
338
Wing-kuen Ling and Peter Kwong-Shun Tam Original First Second Third Decimators Grouping Grouping Grouping 144 48 144 144 48 48 48 48 32 32 32 16 32 32 32 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 8 8 8 8 4 4 4 4 4 4 4 4
After the third grouping, we have six decimators with the decimation ratio 16 in the set. Since 16/6 is not an integer, we split those six decimators into two sessions. The first session contains four decimators and the other session contains two decimator. The proposed algorithm then gives: Third Fourth Fifth Sixth Seventh Grouping Grouping Grouping Grouping Grouping 16 16 16 16 16 16 8 4 1 16 16 8 8 8 4 4 4 4 4 4 4 4 4 4 4
Since all the decimators are grouped together, this set of decimators can be generated by a tree structured filter bank. There is another way to split those six decimators with decimation ratio 16 into different sessions, which is to split into three sessions, in which each session contains two decimators. And it will give the same conclusion as above. There are some sets of decimators which cannot be generated by a tree structure, such as {2,3,6}, {2,6,10,12,12,30,30} [14]. By applying our proposed algorithm, it gives the same conclusion as below: Original Decimators First Grouping 30 15 30 12 12 12 12 10 10 6 6 2 2
After the first grouping, there is only one decimator with decimation ratio 15 in the set, and we cannot proceed with the proposed algorithm further. Hence, this set of decimators cannot be generated by a tree structure filter bank. Consider the last example with the set of decimators {5,5,5,7,7,35,35,35,35}. Although p N-1π1, there does not exist k N-1, such that 2£k N-1£p N-1. Hence, this set of decimators cannot be generated by a tree structure filter bank.
Set of Decimators for Tree Structure Filter Banks 4
339
Conclusion In this paper, we have proposed a novel method to test if a set of decimators can be generated by a tree structure filter bank.
The proposed method is order independent, simple and easy to implement because there is no need to consider the number of combinations of sub-trees [8]. If all the decimators can be grouped together, then the set of decimators can be generated by a tree structure filter bank.
Acknowledgement The work described in this paper was substantially supported by a grant from the Hong Kong Polytechnic University with account number G-V968.
References 1.
Vaidyanathan P. P.: Lossless Systems in Wavelet Transforms. IEEE International Symposium on Circuits and Systems,
2.
Soman A. K. and Vaidyanathan P. P.: Paraunitary Filter Banks and Wavelet Packets. IEEE International Conference on
3.
Bamberger R. H., Eddins S. L. and Nuri V.: Generalizing Symmetric Extension: Multiple Nonuniform Channels and
ISCAS, Vol. 1. (1991) 116-119.
Acoustics, Speech, and Signal Processing, ICASSP, Vol. 4. (1992) 397-400.
Multidimensional Nonseparable IIR Filter Banks. IEEE International Symposium on Circuits and Systems, ISCAS, Vol. 2. (1992) 991-994. 4.
Sodagar I., Nayebi and Barnwell T. P.: A Class of Time-Varying Wavelet Transforms. IEEE International Conference on
5.
Soman A. K. and Vaidyanathan P. P.: On Orthonormal Wavelets and Paraunitary Filter Banks. IEEE Transactions on Signal
Acoustics, Speech, and Signal Processing, ICASSP, Vol. 3. (1993) 201-204.
Processing, Vol. 41, No. 3. (1993) 1170-1183. 6.
Vaidyanathan P. P.: Orthonormal and Biorthonormal Filter Banks as Convolvers, and Convolutional Coding Gain. IEEE
7.
Soman A. K. and Vaidyanathan P. P.: Coding Gain in Paraunitary Analysis/Synthesis Systems. IEEE Transactions on Signal
8.
Kovaèeviæ J. and Vetterli M.: Perfect Reconstruction Filter Banks with Rational Sampling Factors. IEEE Transactions on
9.
Bamberger R. H., Eddins S. L. and Nuri V.: Generalized Symmetric Extension for Size-Limited Multirate Filter Banks.
Transactions on Signal Processing, Vol. 41, No. 6. (1993) 2110-2130.
Processing, Vol. 41, No. 5. (1993) 1824-1835.
Signal Processing, Vol. 41, No. 6. (1993) 2047-2066.
IEEE Transactions on Image Processing, Vol. 3, No. 1. (1994) 82-87. 10. Makur A.: BOT’ s Based on Nonuniform Filter Banks. IEEE Transactions on Signal Processing, Vol. 44, No. 8. (1996) 1971-1981. 11. Li J., Nguyen T. Q. and Tantaratana S.: A Simple Design Method for Near-Perfect-Reconstruction Nonuniform Filter Banks. IEEE Transactions on Signal Processing, Vol. 45, No. 8. (1997) 2105-2109. 12. Akkarakaran S. and Vaidyanathan P. P.: New Results and Open Problems on Nonuniform Filter-Banks. IEEE International
340
Wing-kuen Ling and Peter Kwong-Shun Tam Conference on Acoustics, Speech, and Signal Processing, ICASSP, Vol. 3. (1999) 1501-1504.
13. Omiya N., Nagai T., Ikehara M. and Takahashi S. I.: Organization of Optimal Nonuniform Lapped Biorthogonal Transforms Based on Coding Efficiency. IEEE International Conference on Image Processing, ICIP, Vol. 1. (1999) 624-627. 14. Hoang P. Q. and Vaidyanathan P. P.: Non-uniform Multirate Filter Banks: Theory and Design. IEEE International Symposium on Circuits and Systems, ISCAS, Vol. 1. (1989) 371-372. 15. Vaidyanathan P. P.: Multirate Systems and Filter Banks. Englewood Cliffs, NJ: Prentice Hall, 1993.
Set of Perfect Reconstruction Non-uniform Filter Banks via a Tree Structure Wing-kuen Ling and Peter Kwong-Shun Tam
Department of Electronic and Information Engineering The Hong Kong Polytechnic University Hung Hom, Kowloon, Hong Kong Hong Kong Special Administrative Region, China Tel: (852) 2766-6238, Fax: (852) 2362-8439 Email: [email protected]
Abstract. In this paper, we propose a novel method to test if a non-uniform filter bank can achieve perfect reconstruction via a tree structure. The set of decimators is first sorted in an ascending order. A non-uniform filter bank can achieve perfect reconstruction via a tree structure if and only if some or all of the channels corresponding to the maximum decimation ratio can be grouped into one channel, and the procedure can be repeated until all the channels are grouped together.
1
Introduction Non-uniform filter banks play an important role in this decade and they are widely applied to digital image compression [3,
6, 7, 9, 13]. By realizing a non-uniform filter bank via a tree structure [1, 2, 4, 5, 10-12], the filter length in the filters is reduced, improving the computation complexity and the implementation speed [14]. However, not all the non-uniform filter banks can be realized via a tree structure [5, 8, 10-12]. A method to compute the number of combinations of sub-trees is proposed [8] to test if the decimators in the non-uniform filter bank can be generated by a tree structure. However, even though the decimators can be generated by a tree structure, this does not imply that the non-uniform filter bank can be generated by a tree structure. This is because the analysis filters are ignored in the consideration. In this paper, the necessary and sufficient conditions for a nonuniform filter bank to be realized by a tree structure are addressed. The necessary and sufficient conditions are discussed in section II and illustrative examples are presented in section III. Finally, a conclusion is given in section IV.
2
Necessary and Sufficient Conditions for Realizing a Non-uniform Filter Bank via a Tree Structure Let the ordered set of decimators {n 0,º,n 0,º,n N-1,º,n N-1} be D , where n i>n j for i>j, and the multiplicity of n i in D be p i.
Let the corresponding analysis filters and synthesis filters be
{G (z), L, G 0 ,0
0 , p0 −1
(z ),L, GN −1,0 (z ),L , GN −1 ,p
{
N −1
, respectively. −1 ( z )}
If there exists a set of filters H N′ −1 ( z), H N′ −1 ,k (z ),L , H N′ −1,k 0
K N −1 − 1
{H (z ),L , H 0 ,0
0 , p0 −1
( z), L, H N −1,0 ( z), L, H N −1, p
N −1 − 1
(z )} and
(z )}, where k iŒ[0 p N-1-1] for i=0,1,º,K N-1-1 and KN-1Œ[2 pN-1],
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 341-346, 2001. c Springer-Verlag Berlin Heidelberg 2001
342
Wing-kuen Ling and Peter Kwong-Shun Tam
such that:
n N-1/KN-1ŒZ,
nKN −1 and, H N′ −1 (z ) ⋅ H N′ −1,ki z N −1 = H N −1,k i (z )
(1)
H N′ −1,k 0 (z ) H N′ −1,k 0 (z ⋅ WN −1 ) det M H N′ −1,k 0 z ⋅ W N−1 K N −1 −1
(
where WN −1 = e
−
j⋅ 2 ⋅p K N −1
(2)
H N′ −1,k1 (z )
H N′ −1,k1 (z ⋅ WN −1 ) M K −1 H N′ −1,k1 z ⋅ WN −1 N −1
)
(
H N′ −1,k K N −1 −1 (z )
L
)
H N′ −1,k K N −1 −1 (z ⋅ WN −1 ) M K −1 L H N′ −1,k K −1 z ⋅ W N−1 N −1 L O
(
N −1
, ≠ 0
(3)
)
, then by a proper design of the synthesis filters, those KN-1 channels can be grouped together into one channel
with the analysis filter H N′ −1 (z ) and the decimator ↓n N-1/KN-1. Now, we have a new set of decimators and analysis/synthesis filters. Let the new set of decimators {n’ 0,º,n’ 0,º,n’ N’1
,º,n’ N’ -1} be D ’ and the multiplicity of n’ i in D ’ be p’ i. Let the corresponding analysis/synthesis filters be
{H
new
0 ,0
(z ),L , H new0 , p′ −1 (z ),L , H new N ′−1,0 (z ),L, H new N′−1, p ′
N ′− 1 − 1
0
( z )}
{G
and
new 0 ,0
( z), L, Gnew0 , p′ −1 ( z), L, Gnew N′−1 ,0 (z ),L , G newN′−1, p′
N ′ −1 − 1
0
(z )},
respectively. By repeating the above grouping procedure, if all the channels can be grouped together, and eventually only one channel is left, then the non-uniform filter bank can achieve perfect reconstruction via a tree structure. Theorem 1
A non-uniform filter bank can achieve perfect reconstruction via a tree structure if and only if all the channels can be grouped together by the above grouping procedure. Proof: The if part is proved in the above. Now, let's consider the only if part. Since the non-uniform filter bank can be realized by a tree structure, ∃n i∈D
{H ′
such that n i=nN-1/KN-1, and a set of filters
N −1
(z ), H N′ −1,k (z ),L , H N′ −1,k 0
K N −1 −1
(z )} such that
n N −1 . But do those filters satisfy equation (3)? Or in other words, if some of the analysis filters in a H N′ −1 (z ) ⋅ H N′ −1,ki z KN −1 = H N −1,k i (z )
sub-tree are linearly dependent, does there exist a set of synthesis filters such that the whole system still achieves perfect reconstruction?
Assume
H N′ −1,k 0 (z ) H N′ −1,k 0 (z ⋅ WN −1 ) det M H N′ −1,k 0 z ⋅ W N−1 K N −1 −1
(
H N′ −1,k1 (z) H N′ −1,k1 (z ⋅ WN −1 )
)
M K −1 H N′ −1 ,k1 z ⋅ WN −1 N −1
(
H N′ −1,k K N −1 −1 (z ) , ∃G N −1,k (z ),L , G N −1,k H N′ −1,k K N −1 −1 (z ⋅ WN −1 ) ( z) and a K N −1 − 1 0 =0 O M K −1 L H N′ −1,k K −1 z ⋅ WN −1 N −1 N −1 L L
)
(
)
non-zero transfer function T(z) such that: n H N′ −1 (z ) ⋅ H N′ −1, k z K n Kn K ′ −1 ( z ⋅ W ) ⋅ H N′ −1, k z ⋅ H W N M n Kn K −1 K H N′ −1 z ⋅ W ⋅ H N′ −1, k z ⋅W N −1
N −1
N −1
N −1
N −1
N −1
)
0
N −1
N −1
N −1
N− 1
⋅ (K N −1 −1)
N −1
N −1
N −1
N −1
1
H N′ − 1
(
N −1
)
1
H N′ −1 ( z ) ⋅ H N′ −1, k
L
N −1
1
0
(
n H N′ −1 (z ) ⋅ H N′ −1,k z K n Kn K ⋅W H N′ −1 ( z ⋅ W ) ⋅ H N′ −1, k z M n Kn K −1 K ⋅ H N′ −1, k z ⋅W z ⋅W N −1
N− 1
0
N −1
N −1
N −1
N −1
⋅ (K N −1 −1)
L
H N′ −1 ( z ⋅ W ) ⋅ H N′ −1,k
O L H ′ z ⋅W N −1
(
K
N −1
)⋅ H ′
z
nK z
n N− 1 KN − 1
N −1
K N − 1 −1
M K N − 1 −1
−1
N −1, k K N −1 − 1
nK z
N− 1
⋅W
N −1 N −1
⋅W
n N −1 K N− 1
n N− 1 ⋅( K N − 1 KN − 1
−1)
Set of Perfect Reconstruction Non-uniform Filter Banks via a Tree Structure
G N −1, k ( z ) G N −1, k ( z ) = ⋅ M ( ) G z N −1, k 0
1
K N −1 − 1
where W = e
Since
−
T ( z ) 0 , M 0
j⋅ 2⋅ p nN −1
343
(4)
.
H N′ −1,k 0 (z ) H N′ −1,k 0 (z ⋅ WN −1 ) det M H N′ −1,k 0 z ⋅ W N−1 K N −1 −1
(
Kn H N′ −1( z )⋅ H N′ −1, k z n Kn H N′ −1 (z ⋅ W ) ⋅ H N′ −1, k z ⋅W K det M n n H ′ z ⋅ W K −1 ⋅ H ′ ) N −1,k z K ⋅W K N −1( N −1
)
(
H N′ −1,k K N −1 −1 (z )
L
H N′ −1,k K N −1 −1 (z ⋅ WN −1 ) M K −1 L H N′ −1,k K N −1 −1 z ⋅ WN −1 N −1 L O
)
(
Kn H N′ −1( z) ⋅ H N′ −1, k z n nK H N′ −1 (z ⋅ W )⋅ H N′ −1, k z ⋅W K M n −1) n H ′ (z ⋅ W K −1) ⋅ H ′ z K ⋅W K N −1 N −1,k N −1
N− 1
L
N −1
0
1
N −1
N− 1
N −1
N −1
0
N− 1
H N′ −1,k1 (z)
H N′ −1,k1 (z ⋅ WN −1 ) M K −1 H N′ −1 ,k1 z ⋅ WN −1 N −1
N− 1
N −1
N −1
N− 1
N −1
N −1
N− 1
1
⋅( KN −1
0
N −1
N− 1
N− 1
N −1
N −1
N −1
L
⋅ ( K N − 1 −1)
1
n N −1 , by letting z = z KN −1 , we have: = 0
)
H ′N −1( z) ⋅ H ′N −1, k H ′N −1( z ⋅ W ) ⋅ H ′N −1, k
O
L H ′ (z ⋅ W K N −1
N −1
⋅W K
Kn z
⋅W K
nN −1
N −1
K N − 1 −1
)⋅ H ′
N −1
N −1, k K N −1 −1
nK z
N− 1
M N −1 −1
nK z
N− 1
K N −1 − 1
N− 1
N −1
nN −1 N −1
⋅ ( K N −1
. = 0 −1)
(5)
Let the matrix in equation (5) be H. By examining equation (5) and applying Cramer’ s rule to equation (4), we find that the determinants of the matrices by deleting the first row and any columns are zero. By the modulation principle, we find that the determinants of the matrices by deleting the last row and any columns are zero. Let the rank of the matrix by deleting the first row of H be r, and that of the matrix by keeping the first r+1 rows of H be H ′ = [h′ L h′ 0 K
column of H’ and h0,0 are the first r elements of the first row of H’ . Since
] = hh
0 ,0
N − 1 −1
S ,0
h0 ,1 , where h’ is the i th i hS ,1
T (z ) g 0 , where ga is a vector containing the H ′⋅ a = gb M 0
first r synthesis filters, we have h0,0⋅ ga+h0,1⋅ gb=T(z) and hS,0⋅ ga+hS,1⋅ gb=0. This implies that (h0,1-h0,0⋅ hS,0-1⋅ hS,1)⋅ gb=T(z), and
[det([h ′ 0
L hr′−1
hr′ ]) det([h0′ L hr′−1
[
hr′+1 ]) L det( h0′ L hr′−1
]]
hK′ N −1 −1 ) ⋅ g b = T ( z) ⋅ det(hS ,0 ) = 0 , which contradicts the
assumption. Hence, if some of the analysis filters in a sub-tree are linearly dependent, there does not exist a set of synthesis filters such that the whole system achieves perfect reconstruction. This proves the only if part and the theorem.
3
Illustrative Examples
3.1
Uniform Filter Bank
Consider an M-channel uniform filter bank with analysis filters {H0(z),H1(z),º,HM-1(z)}. In this case, N=1 and n 0=p0=K0=M. By selecting H’ 0(z)=1, H’ 0,i(z)=Hi(z) , for i=0,1,º,M-1, this M-channel uniform filter bank can achieve perfect reconstruction via a tree structure if and only if: H ′0 ,0 (z ) H ′0 ,1 (z ) H 0′,1 (z ⋅ W ) H 0′,0 ( z ⋅W ) det M M H ′ (z ⋅ W M −1 ) H ′ (z ⋅ W M −1 ) 0 ,1 0 ,0
L H 0′,M −1 (z ) , L H 0′,M −1 (z ⋅ W ) ≠0 O M −1 L H ′0, M −1 (z ⋅ W M )
(6)
344
Wing-kuen Ling and Peter Kwong-Shun Tam
where W = e
3.2
−
j⋅ 2⋅ p M
[14].
Perfect Reconstruction Dyadic Tree Structure Filter Bank
Consider the non-uniform filter bank shown in figure 1 [1, 2, 5]: F0 ( z) = H 1 ( z)
↓2
↑2
G0 (z)
F1 ( z) = H0 (z ) ⋅ H1 (z 2 )
↓4
↑4
G1 ( z) y[n]
x[n]
F2 ( z) = H 0 (z ) ⋅ H 0 (z 2 )⋅ H1 (z 4 )
↓8
↑8
G2 (z )
F3 (z ) = H 0 (z )⋅ H 0 (z 2 )⋅ H 0 (z 4 )
↓8
↑8
G3 ( z)
Fig. 1. Perfect reconstruction dyadic tree structure filter bank In this case, n 0=2, n 1=4, n 2=8, p 0=1, p 1=1, p 2=2 and N=3. By selecting Kj=2, H ′0 (z ) = 1 , H 1′(z ) = H 0 (z ) , H ′2 (z ) = H 0 (z )⋅ H 0 (z 2 ) , H ′j ,0 (z ) = H1 (z ) , and H ′j ,1 (z ) = H 0 (z ) , for j=0,1,2, this non-uniform filter bank can achieve perfect reconstruction via a tree structure
if and only if det H 0 (z )
H1 (z ) [1, 2, 5]. ≠ 0 H 0 (− z ) H 1 (− z )
3.3
Perfect Reconstruction Tree Structure Filter Bank
Consider the non-uniform filter bank shown in figure 2: H 0 ( z ) = 1 + z −1
↓2
↑2
G0 (z)
H 1 ( z) = 5 − 5 ⋅ z −1 + 2 ⋅ z −2 − 2 ⋅ z −3 + z −6 − z −7 + 2 ⋅ z −8 − 2 ⋅ z −9 + z −10 − z −11
↓6
↑6
G1 ( z) y[n]
x[n]
H 2 (z ) = 2 − 2 ⋅ z −1 + z −2 − z −3 + 2 ⋅ z −6 − 2 ⋅ z −7 + 4 ⋅ z −8 − 4 ⋅ z −9 + 2 ⋅ z −10 − 2 ⋅ z −11
↓6
↑6
G 2 (z )
H 3 (z ) = z −6 − z −7 + 2 ⋅ z −8 − 2 ⋅ z −9 + z −10 − z −11
↓6
↑6
G3 ( z)
Fig. 2. Perfect reconstruction tree structure filter bank In this case, n 0=2, n 1=6, p 0=1, p 1=3 and N=2. By selecting K1=3, H 1′(z ) = 1 − z −1 , H 1′,0 (z ) = 5 + 2 ⋅ z −1 + z −3 + 2 ⋅ z −4 + z −5 , H 1′,1 (z ) = 2 + z −1 + 2 ⋅ z −3 + 4 ⋅ z −4 + 2 ⋅ z −5 , and H 1′,2 (z ) = z −3 + 2 ⋅ z −4 + z −5 [14], we can group the last three channels together into one
channel with the new analysis filter H 1′(z ) = 1 − z −1 and the decimator ↓2. Similarly, by selecting K0=2, H’ 0(z)=1, H’ 0,0(z)=1+z-1, and H’ 0,1(z)=1-z-1, this non-uniform filter bank can achieve perfect reconstruction via a tree structure.
Set of Perfect Reconstruction Non-uniform Filter Banks via a Tree Structure
3.4
345
Not Perfect Reconstruction Tree Structure Filter Bank Due to the Dependent Kernel
Consider the same non-uniform filter bank shown in figure 2 with H0(z) is changed to F(z) ◊(1-z-1), where F(z)=F(-z). The last three channels are grouped together with the same procedure as above, and we have two channels left with decimator ↓2, and the analysis filters are F(z)◊(1-z-1) and (1-z-1), respectively. Since det H 0 (z )
1 − z −1 , we conclude that this non-uniform =0 − 1 + z −1 H ( z ) 0
filter bank cannot achieve perfect reconstruction even through it can be realized via a tree structure.
3.5
Cannot Be Realized Via a Tree Structure Filter Bank Due to Structural Problem
Consider the same non-uniform filter bank shown in figure 2 with H1(z) changed to (1-z-1)◊F1(z) , H2(z) changed to (1-z1
)◊F2(z), H3(z) changed to (1-z-1)◊F3(z), where F1(z)/F2(z) and F2(z)/F3(z) are not rational functions of z2. In this case, the last three
channels cannot be grouped together. Hence, this non-uniform filter bank cannot be realized via a tree structure.
3.6
Incompatible Non-uniform Filter Bank
Consider an incompatible non-uniform filter bank [15] with the set of decimators {2,3,6}. Since p i=1, ∀i, there does not exist KjŒ[2 p j]. Hence, an incompatible non-uniform filter bank cannot be realized via a tree structure [15].
3.7
Compatible Non-uniform Filter Bank, But Cannot Be Realized Via a Tree Structure
Consider a non-uniform filter bank with the set of decimators {5,5,5,7,7,35,35,35,35}. In this case, n 0=5, n 1=7, n 2=35, p 0=3, p 1=2, p 2=4, and N=3. Since there does not exist K2Œ[2 p2] such that n 2/K2∈Z , this non-uniform filter bank cannot be realized via a tree structure.
4
Conclusion In this paper, we propose a novel method to test if a non-uniform filter bank can achieve perfect reconstruction via a tree
structure. The advantage of realizing a non-uniform filter bank via a tree structure is to reduce the computation complexity and provide fast implementation for non-uniform filter bank [14].
Acknowledgement The work described in this paper was substantially supported by a grant from the Hong Kong Polytechnic University with account number G-V968.
346
Wing-kuen Ling and Peter Kwong-Shun Tam
References 1.
Vaidyanathan P. P.: Lossless Systems in Wavelet Transforms. IEEE International Symposium on Circuits and Systems, ISCAS, Vol. 1. (1991) 116-119.
2.
Soman A. K. and Vaidyanathan P. P.: Paraunitary Filter Banks and Wavelet Packets. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, Vol. 4. (1992) 397-400.
3.
Bamberger R. H., Eddins S. L. and Nuri V.: Generalizing Symmetric Extension: Multiple Nonuniform Channels and Multidimensional Nonseparable IIR Filter Banks. IEEE International Symposium on Circuits and Systems, ISCAS, Vol. 2. (1992) 991-994.
4.
Sodagar I., Nayebi K. and Barnwell T. P.: A Class of Time-Varying Wavelet Transforms. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, Vol. 3. (1993) 201-204.
5.
Soman A. K. and Vaidyanathan P. P.: On Orthonormal Wavelets and Paraunitary Filter Banks. IEEE Transactions on Signal Processing, Vol. 41, No. 3. (1993) 1170-1183.
6.
Vaidyanathan P. P.: Orthonormal and Biorthonormal Filter Banks as Convolvers, and Convolutional Coding Gain. IEEE Transactions on Signal Processing, Vol. 41, No. 6. (1993) 2110-2130.
7.
Soman A. K. and Vaidyanathan P. P.: Coding Gain in Paraunitary Analysis/Synthesis Systems. IEEE Transactions on Signal Processing, Vol. 41, No. 5. (1993) 1824-1835.
8.
Kovaèeviæ J. and Vetterli M.: Perfect Reconstruction Filter Banks with Rational Sampling Factors. IEEE Transactions on Signal Processing, Vol. 41, No. 6. (1993) 2047-2066.
9.
Bamberger R. H., Eddins S. L. and Nuri V.: Generalized Symmetric Extension for Size-Limited Multirate Filter Banks. IEEE Transactions on Image Processing, Vol. 3, No. 1. (1994) 82-87.
10. Makur A.: BOT’ s Based on Nonuniform Filter Banks. IEEE Transactions on Signal Processing, Vol. 44, No. 8. (1996) 1971-1981. 11. Li J., Nguyen T. Q. and Tantaratana S.: A Simple Design Method for Near-Perfect-Reconstruction Nonuniform Filter Banks. IEEE Transactions on Signal Processing, Vol. 45, No. 8. (1997) 2105-2109. 12. Akkarakaran S. and Vaidyanathan P. P.: New Results and Open Problems on Nonuniform Filter-Banks. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, Vol. 3,. (1999) 1501-1504. 13. Omiya N., Nagai T., Ikehara M. and Takahashi S. I.: Organization of Optimal Nonuniform Lapped Biorthogonal Transforms Based on Coding Efficiency. IEEE International Conference on Image Processing, ICIP, Vol. 1. (1999) 624-627. 14. Vaidyanathan P. P.: Multirate Systems and Filter Banks. Englewood Cliffs, NJ: Prentice Hall, 1993. 15. Hoang P. Q. and Vaidyanathan P. P.: Non-Uniform Multirate Filter Banks: Theory and Design. IEEE International Symposium on Circuits and Systems, ISCAS, Vol. 1. (1989) 371-374.
Joint Time-Frequency Distributions for Business Cycle Analysis∗ Sharif Md. Raihan 1 , Yi Wen 2 , and Bing Zeng 1 1
Department of Electrical and Electronic Engineering The Hong Kong University of Science and Technology Clear Water Bay, Hong Kong, China 2
Department of Economics Cornell University Ithaca, NY 14853, USA
Abstract: The joint time-frequency analysis (JTFA) is a signal processing technique in which signals are represented in both the time domain and the frequency domain simultaneously. Recently, this analysis technique has become an extremely powerful tool for analyzing nonstationary time series. One basic problem in business-cycle studies is how to deal with nonstationary time series. The market economy is an evolutionary system. Economic time series therefore contain stochastic components that are necessarily time dependent. Traditional methods of business cycle analysis, such as the correlation analysis and the spectral analysis, cannot capture such historical information because they do not take the time-varying characteristics of the business cycles into consideration. In this paper, we introduce and apply a new technique to the studies of the business cycle: the wavelet-based time-frequency analysis that has recently been developed in the field of signal processing. This new method allows us to characterize and understand not only the timing of shocks that trigger the business cycle, but also situations where the frequency of the business cycle shifts in time. Applying this new method to post war US data, we are able to show that 1973 marks a new era for the evolution of the business cycle since World War II. Keywords: Wavelets, time-frequency analysis, business cycle, nonstationary time series, scalogram, and spectrum.
∗
This work has been supported by a grant, HKUST6176/98H, from the Research Grants Council of the Hong Kong Special Administrative Region, China. Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 347-358, 2001. c Springer-Verlag Berlin Heidelberg 2001
348
Sharif Md. Raihan et al.
I.
Introduction
The analysis of nonstationary signal cannot be accomplished by classical time domain representations such as correlation methods, or by frequency domain representations based on the Fourier transform [2]. To analyze business cycles that evolve over time, we need to develop a concept of time-frequency distribution that takes into account jointly and simultaneously the information of time and frequency. The business cycle, one of the most puzzling phenomena in capitalistic, free-market economies, has long been the central focus of macroeconomic researches. The biggest challenge to researchers in this field is to capture business cycle patterns that vary in nature across time. Economic time series contain stochastic components that are necessarily time dependent. Although time-frequency analysis has its origin almost 50 years ago [Gabor, 1946; Ville, 1948], significant advances occurred only in the last 15 years or so. Recently, time-frequency representations have become an extremely powerful tool for analyzing nonstationary signals in many fields: such as engineering, medical sciences, and astronomy, to name just a few. A number of articles have also been published to deal with applications in economics and finance [10]. So far, many alternative transforms have been developed to overcome the problems associated with classical spectral analysis, we introduce in this paper a new technique of time series analysis to business cycle studies: a joint time-frequency distribution based on the wavelet transform. This new technique enables us to capture the evolutionary aspects of the spectral distribution of the business cycle across time. In this paper, we compare the wavelet-based time-frequency analysis to a traditional approach based on the windowed Fourier transform. We show that the wavelet transform has many advantages over the traditional approach in that the wavelet transform has a beautiful property: its window size adjusts itself optimally to longer basis functions at low frequencies and to shorter basis functions at high frequencies. Consequently, it has sharp frequency resolution for low frequency movements and sharp time resolution for high frequency movements. Thus, the new method is capable of capturing simultaneously the time-varying nature of low frequency cycles and the frequency distribution of sudden and abrupt shocks in the original time series. The rest of the sections are organized as follows. Section II describes the windowed Fourier transform and spectrogram. Section III describes the wavelet transform and scalogram. Section IV explains the implementation of the wavelet transform when applying to actual data. Section V uses artificial signals to demonstrate the advantages of wavelet transform over the windowed Fourier transform. Section VI applies the wavelet-based time-frequency analysis to economic data. Finally, we conclude the paper in section VII. II.
The Windowed Fourier Transform and Spectrogram
Fourier transform (FT), most widely used classical representation, is a mathematical technique for transforming a signal from the time domain to the frequency domain.
Joint Time-Frequency Distributions for Business Cycle Analysis
349
However, in the transformation process, the time information of the signal is completely lost. When we look at the FT of a signal, we observe no information about when a particular event took place. For signals in which the time information is not important but the frequency contents are of primary interest, this limitation is of little consequence. Thus, Fourier analysis is useful for analyzing periodic and stationary signals whose moments do not change much over time. However, many interesting and important signals are not stationary and need to be analyzed in both time and frequency domain simultaneously. For many years, the representation of a signal in a joint time-frequency space has been of interest in the signal processing area, especially when one is dealing with timevarying nonstationary signals. Performing a mapping of a one-dimensional signal of time into a two-dimensional function of time and frequency is thus needed in order to extract relevant time-frequency information. We refer to several excellent review papers on distributions for the time-frequency (TF) analysis [3, 6]. A classical linear time-frequency representation, called the windowed Fourier transform (WFT), has been extensively used for nonstationary signal analysis since its introduction by Gabor [5]. The basic idea of WFT is to find the spectrum of a signal x(t ) at a particular time τ by analyzing a small portion of the signal around this time point. Specifically, the signal is multiplied by a window function w(t ) centered at time point τ , and the spectrum of the windowed signal, x(t ) w* (t − τ ), is calculated by ∞
WFTx (τ , ω ) = ∫ x(t ) w* (t − τ )e − jωt dt , −∞
(1)
where ω is the angular frequency and * denotes the complex conjugation. Because multiplication by a relatively short window w(t − τ ) effectively suppresses the signal outside a neighborhood around the analysis time point t = τ , the WFT is a ‘local’ spectrum of the signal x(t ) around τ . Spectrogram is the most familiar representation to obtain the energy distribution of the signal. The spectrogram of a signal x(t ) is defined as the squared magnitudes of the WFT: SPx (τ , ω ) =
∫
∞
−∞
2
x(t ) w* (t − τ )e − jωt dt .
(2)
The WFT has many useful properties [9], including a well-developed theory [1]. It is one of the most efficient methods in computation. But a crucial feature inherent in the WFT method is that the length of the window can be selected arbitrarily, but is fixed exogenously once the selection is made. To enhance the time information, therefore, one must choose a short window; and to enhance the frequency resolution, one must choose a long window, which means that the time information (nonstationarities) occurring within the window interval is smeared. The length of the window is therefore the main issue involved in practice.
350
Sharif Md. Raihan et al.
III.
The Wavelet Transform and Scalogram
In recent years, an alternative representation, called the wavelet transform, has been widely adopted in the literature [4, 7, 14, 17]. One major advantage afforded by wavelet transform is that the windows vary endogenously in an optimal way. With this transform one can process data at different resolutions. In order to isolate signal discontinues, for example, one would like to have some very short basis functions. At the same time, in order to obtain detailed frequency analysis, one would like to have some very long basis functions. A way to achieve this is to have short basis functions for high-frequency movements and long ones for low-frequency movements. This is exactly what can be achieved with the wavelet transform. WT have an infinite set of possible basis functions. Thus, wavelet analysis provides immediate access to information that can be obscured by other time-frequency methods such as Fourier analysis. The wavelet transform is defined as the convolution of a signal x(t ) with a wavelet function Ψ (t ) , called mother wavelet, shifted in time by a translation parameter τ , and dilated by a scale parameter a , as shown by the following equation
WTx (τ , a) =
1
a
∫
∞
−∞
t −τ x(t )Ψ * dt , a
(3)
where Ψ * (.) is the complex conjugate of the basic wavelet function Ψ (t ) , the parameter a is the scaling factor that controls the length of the analyzing wavelet; and τ is the translation parameter. The squared modulus of the wavelet transform, called scalogram, is defined as SCALx (τ , a ) =
t −τ x(t )Ψ dt ∫ −∞ a a
1
∞
*
2
(4)
The wavelet transform of a signal depends on two parameters: scale (or frequency) and time. This leads to a so-called time-scale representation that provides a tool for the analysis of nonstationary signals [7, 14]. There is a dozen of wavelet function available, such as Morlet, Mexican hat, Haar, Shannon, Daubechies wavelet function, etc. The choice of the wavelet function depends on the specific application. With respect to time and frequency localization, the Haar and Shannon wavelets take opposite extremes. Having compact support in time, the Haar wavelet has poor decay in frequency, whereas the Shannon wavelet has compact support in frequency with poor decay in time. Other wavelets typically fall in the middle of these two extremes. In fact, having exponential decay in both the time and frequency domain, the Morlet wavelet has optimal joint time-frequency concentration [16]. The wavelet that is used for analysis of economics fluctuations in this paper is Morlet wavelet, which is a modulated Gaussian function with exponential decay property. It is defined as Ψ (t ) = e
−
t2 2a2
e j 2πft ,
(5)
Joint Time-Frequency Distributions for Business Cycle Analysis
351
where f is the modulation (frequency) parameter. The scale parameter a and the frequency parameter f are related to each other by the relationship: a = f0 / f ,
(6)
where f 0 is the central wavelet frequency. IV.
Implementations
In WFT, the signal is divided into small enough segments, where these segments of the signal can be assumed to be stationary. For this purpose, a window function w is chosen. The width of this window must be equal to the segment of the signal where its stationarity is valid. This window function is first placed at the beginning of the signal and the Fourier transform is performed. Then the window is shifted to a new location and another Fourier transform is computed. This procedure continues until the end of the signal is reached. The spectrogram is computed accordingly as the squared modulus of the windowed Fourier transform. The wavelet transform is done in a similar manner to the WFT. The signal is multiplied by a wavelet function and the wavelet transform is computed according to equation (3) for different values of the scale parameter (a ) at different time location (τ ) . Suppose x(t ) is the signal to be analyzed. The mother wavelet is chosen to serve as a prototype for all wavelets in the process. All the wavelets that are used subsequently are the stretched (or compressed) and shifted versions of the mother wavelet. The computation starts with a value of the scaling factor a = a1 , and the wavelet is placed at the beginning of the signal. Since the wavelet function has only finite time duration, it serves just like a window in the WFT. The constant 1 / a1 is for normalization purpose so that the transformed signal will have the same energy at every scale. Next, with the same scale a = a1 , the wavelet function is shifted to the next sample point, and the wavelet transform is computed again. This procedure is repeated until the wavelet reaches the end of the signal. The result is a sequence of numbers corresponding to the scale a = a1 .
Next, the scale factor is changed to a = a 2 , and the whole procedure described above is repeated. When the process is completed for all desired values of a , the result is an energy distribution of the original signal along the two-dimensional time-frequency space. V.
Applications to Test Signals
To show the effects and the advantages of wavelet-based time-frequency analysis over the traditional WFT based time-frequency analysis, we present scalograms and spectrograms of two test signals. The signals are of length 512 points each. The WFT uses a Hanning window, and the scalogram is obtained with the Morlet wavelet. The horizontal axis is time and the vertical axis is frequency in both scalograms and spectrograms respectively.
352
Sharif Md. Raihan et al.
The first test signal used is composed of two parts: the 1st part is a time-varying low frequency sinusoidal cycle, and the 2nd part is a constant high frequency cycle with some sample points gap in the middle of the signal. The signal is shown in the top window in Figure 1.a, and the power spectrum is shown in the left window in Figure 1.a. The central window in Figure 1.a shows that the scalogram is able to capture not only the frequency location of the time-varying low frequency cycle, but also the exact timing of the missing signals presented in the constant high frequency cycle. There is no energy distribution in the middle of the scalogram due to the missing data points in the high frequency cycle (notice the sharp breaking edges in the middle of the scalogram). WFT, on the other hand, is unable to simultaneously capture all the information adequately. With a short window (Figure 1.b), the time information with respect to the exact timing of the missing data points is captured, but the frequency location of the low frequency cycle is not localized at all along the frequency axis. With a large window (Figure 1.c), on the other hand, the frequency locations of the cycles are well localized along the frequency axis, but the exact location and timing of the missing data points are not very well captured or localized along the time axis. The second test signal showing in Figure 2.a (top window) is composed of sine waves whose frequency shifts periodically across time in the low frequency region. Along the sample, however, there are three sharp transitory impulses. The power spectrum of the test signal is shown in the left window of Figure 2.a. It is seen there that the power spectrum is completely silent about the time-varying nature of the cycle and about the white noise impulses. Instead, it shows that there are simultaneously several major cycles contained in the low frequency region. The central window in Figure 2.a, however, shows how remarkably the scalogram captures not only the time-varying nature of the low frequency cycle, but also the exact timing of the white noise impulses. Notice that the frequency of the shifting cycle is highly localized along the frequency dimension on one hand, and the timing of the frequency shift is also highly localized along the time dimension on the other hand. As a comparison, the spectrogram based on WFT is shown in Figure 2.b and Figure 2.c. We see there that the spectrogram either gives an imprecise frequency localization of the time-varying low frequency cycle when the window size is small enough to adequately capture the timing of the high frequency impulses (Figure 2.b), or misses the impulses entirely when the window size is large enough to capture adequately the frequency location of the time-varying low frequency cycle in the original signal (Figure 2.c). This is so because both the time and the frequency resolutions of WFT are fixed once the window length is fixed. In contrast, scalogram allows good frequency resolution at low frequencies and good time resolution at high frequencies. VI.
Application to Economic Data
Since Second World War, the US economy has experienced several important institutional changes. These institutional changes have likely had important impact on
Joint Time-Frequency Distributions for Business Cycle Analysis
353
the structure of the US economy. The US economy has also experienced several unprecedented shocks that may also have brought deep structural adjustment to the economy. The oil price shock during the early 70s, for example, could have resulted in a fundamental reorganization of the input-output structure in the economy, especially with regard to the energy-intensive industries. It is then of great interest to investigate whether these changes have also brought fundamental changes to the nature of the US business cycle. In particular, it is of great interest to know whether the old business cycles observed by economists almost half century ago are still alive, and whether new business cycles have emerged during those years of social changes and economic development. Applying the wavelet-based time-frequency transform to the growth rate of real GDP (1960:1 - 1996:3), we find that the US business cycle has the following defining features: 1)
Business cycles through out the sample period are concentrated mostly in the frequency region below 10 quarters per cycle. They are triggered mostly by external shocks.
2)
Business cycles become far more active during the 70s and 80s after the oil price shocks in the early 70s. The two most active business cycles occurred around 1974 and 1983, both are triggered apparently by external impulses. The periodicity of the two cycles is about 6 years per cycle.
3)
There exist business cycles that are not triggered by any external shocks to GDP, such as the 1991 business cycle. On the other hand, strong external shocks to GDP do not necessarily trigger business cycles, such as the shocks during 19771978.
Figure 3 shows the contour of energy distribution of the US GDP growth across time and frequency. The time series (top window) reveals very little about the frequency location of the cycles, while the spectrum (left window) reveals nothing about the timing of the different cycles. The scalogram (center), however, shows that there have been three major business cycles since 1960. The first occurred in 1961, triggered by a sharp external impulse during that year. The 1961 cycle has a frequency of 0.1 cycles per quarter (or 10 quarters per cycle) and is short lived (it lasted about one year). The second major cycle took place in 1973, apparently triggered by two impulses during 1972 and 1973, and was greatly intensified by another impulse near 1975. This business cycle lasted about 3-4 years and peaked at the frequency of about 0.04 cycles per quarter (or 25 quarters per cycle). The third major cycle occurred during 1982-1984, apparently triggered by a shock in 1982. This cycle lasted about 3 years and peaked also at a frequency similar to the 1973 cycle. The 1973 cycle and the 1982 cycle dominated all other business cycles since 1960. Notice that the cycle in 1991 is very mild compared to the three major cycles mentioned above. It is apparently not triggered by any external shocks to GDP. The scalogram also reveals that a major shock around 1977-1978 did not trigger any business cycle around that time. In addition, there is a short-lived business cycle in 1966 triggered by an external impulse that is not obvious or noticeable, however, in the original time series (see top window in Figure 3).
354
Sharif Md. Raihan et al.
We think that these findings are of great importance to the business cycle theory. They not only help us identify the important historical shocks that triggered the business cycle, but also provide important information regarding the evolution of the business cycle across time. If the business cycle is unstable over time, for example, then there is the need for finding a common propagation mechanism to explain that instability. Without exception, existing real business cycle models all predict a stable business cycle with the same characteristic frequencies. But the scalogram shows otherwise: business cycles come and go; they emerge at different frequencies and at different times; they are not at all alike. VII.
Conclusions
A new technique of nonstationary time series analysis based on joint time-frequency representation was proposed. Two popular time-frequency distributions, the wavelet transform and the windowed Fourier transform were compared for this purpose. Our analyses showed that the wavelet-based time-frequency analysis is superior to the Fourier transform based time-frequency analysis. Applying the wavelet-based analysis to economic data, we found that business cycles in the US have not been stable over time. In particular, business cycles became far more active since the oil price crisis in the early 70s. References
[1]
Allen J. B., Rabiner L. R., “A unified approach to short-time Fourier analysis and synthesis,” Proceedings of the IEEE, vol. 65, no. 11, 1977, pp. 1558-64. [2] Boashash B., “Theory, implementation and application of time-frequency signal analysis using the Wigner-Ville distribution,” Journal of Electrical and Electronics Engineering, vol. 7, no. 3, 1987, pp. 166-177. [3] Cohen L., “Time-frequency distributions – a review,” Proceedings of the IEEE, vol. 77, no. 7, 1989, pp. 941-981. [4] Daubechies I., “The wavelet transform, time-frequency localization and signal analysis,” IEEE Transactions on Information Theory, vol. 36, no. 5, 1990, pp. 961-1005. [5] Gabor D., “Theory of communication,” J. Inst. Elec. Eng., vol. 93, 1946, pp. 429457. [6] Hlawatsch F., Boudreaux-Bartels G. F., “Linear and quadratic time-frequency signal representation,” IEEE Signal Processing Magazine, 1992, pp. 21-67. [7] Kaiser G., “A friendly guide to wavelets,” Birkhauser, Boston, 1994. [8] Lin Z., “An introduction to time-frequency signal analysis,” Sensor Review, vol. 17, no. 1, 1997, pp. 46-53. [9] Nawab S. N., Quatieri T. F., “Short-time Fourier transform,” In Lim, J. S. and Oppenheim, A. V. (Eds), Advanced Topics in Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1988. [10] Ramsey J., “The contribution of wavelets to the analysis of economic and financial data,” Phil. Trans. R. Soc. Lond. A (forthcoming), 1996. [11] Ramsey J., Zhang Z., “The analysis of foreign exchange rates using waveform dictionaries,” Journal of Empirical Finance, 4, 1997, pp. 341-372. [12] Ramsey J., Usikov D., Zaslavsky G., “An analysis of US stock price behavior using wavelets,” Fractals, vol. 3, no. 2, 1995, pp. 377-389.
Joint Time-Frequency Distributions for Business Cycle Analysis
355
[13] Rioul O., Flandrin P., “Time-scale energy distributions: a general class extending wavelet transforms,” IEEE Transactions on Signal Processing, vol. 40, no. 7, 1992, pp. 1746-57. [14] Rioul O., Vetterli M., “Wavelets and signal processing,” IEEE Signal Processing Magazine, 1991, pp. 14-38. [15] Stankovic L., “An analysis of some time-frequency and time-scale distributions,” Annals of Telecommunications, vol. 49, no. 9-10, 1994, pp. 505-517. [16] Teolis A., “Computational signal processing with wavelets,” 1964. [17] Vetterli M., Harley C., “Wavelets and filter banks: theory and design,” IEEE Transactions on Signal Processing, vol. 40, no. 9, 1992, pp. 2207-2232. [18] Wen Y., Zeng B., “A simple nonlinear filter for economic time series analysis,” Economics Letters, 64, 1999, pp. 151-160.
Figure 1.a: Scalogram contour with signal (top) and spectrum (left).
356
Sharif Md. Raihan et al.
Figure 1.b: Spectrogram contour with signal (top) and spectrum (left) (window = 13).
Figure 1.c: Spectrogram contour with signal (top) and spectrum (left) (window = 27).
Joint Time-Frequency Distributions for Business Cycle Analysis
357
Figure 2.a: Scalogram contour with signal (top) and spectrum (left).
Figure 2.b: Spectrogram contour with signal (top) and spectrum (left) (window = 7).
358
Sharif Md. Raihan et al.
Figure 2.c: Spectrogram contour with signal (top) and spectrum (left) (window = 21).
Figure 3: Scalogram contour with time series (top) and spectrum (left). U.S. GDP growth rate (1960:1 - 1996:3).
The Design of Discrete Wavelet Transformation Chip Zaidi Razak and Mashkuri Yaacob Faculty of Computer Science and Information Technology, University Malaya Kuala Lumpur
Abstract. In this paper, an explanation on the need for a special discrete wavelet transformation hardware is presented. The development processes that have been carried out which includes simulation (both in MATLABTM and SYNOPSYSTM) and synthesis which also used SYNOPSYSTM tools is described together with a discussion on specific design issues in the coordination of the entities.
1 Why Special Hardware? There are a number of reasons that can be quoted that prompted the work on the hardware design of a discrete wavelet transform: (i) (ii)
(iii)
There has been too little research in wavelet transformation hardware because most research activities are centered on software development; Of late, there has been an increased requirement on real-time processing as well as increasing data size. This situation occurs because of the current technology that tries to represent the reality of what we have in our life today. For example, in the 1980’s colours were represented by 16 bits but now colours are represented by 256 bits; The cost effectiveness factor, i.e. cost can be reduced if it involves many more processes. This is due to the fact that if software is used, there are actually two cost factors that must be considered. Firstly, it is the cost of software and secondly, it is the cost of the equipment (e.g. main processor). Eventually, the cost will rise steeply if the processing procedure involves a large data size, where repetitive processing is needed, hence the increase in the software cost. On the same count, if off-the-shelf processors are used, the hardware cost will also increase. So, with this special hardware, the overall cost (including software cost) can be reduced substantially.
At the same time, successful research conducted by Michael L. Hilton, Björn D. Jawert and Ayan Sengupta [1] have managed to achieve data compression using the wavelet tranformation method. Researchers like Ian K.Levy dan Roland Wilson [2] have concentrated on research in 3D wavelet compression for videos. Currently, there are numerous research activities on wavelet application and usage in various fields.
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 359-368, 2001. Springer-Verlag Berlin Heidelberg 2001
360
Zaidi Razak and Mashkuri Yaacob
2 What is Discrete Wavelet Transformation? Discrete Wavelet Transformation (DWT) is a method being used to analyze wavelet, an important characteristic in effective processing. It is produced from the recursive wavelet formula as shown below [3]:
Φ ( sl ) ( x ) = 2
−
s 2
Φ (2− s x − l )
(1)
s and l being the respective scalar and translation multipliers for the particular wavelet. However, for DWT, the wavelet analyzer that is being used is in a discrete form as presented below: M −1
φ ( x) = ∑ ckφ (2 x − k )
(2)
k =0
where the adding ratio is determined by the positive M value & the multiplier C is the wavelet’s constant. This processing technique involves the slicing of the signal such that it will be processed into equal sizes. All of these signals will be processed in a non-dependent way. One of the few specialities of PDW processing is that it can process signals in various resolutions. To enable this feature, a scalar function can be used. The scalar can be shown as below:
W ( x) =
N −2
∑ (−1)
k
k = −1
ck +1Φ (2 x + k )
where W(x) is the scalar function [3,4] for wavelet analyzer
(3)
Φ,
and
ck is the
wavelet coefficient. To enable this function to work well, the coefficient must confirm these linear and quadratic prerequisites: N −1
∑c
k
=2
(4)
k =0
N −1
∑c c k =0
k k + 2l
= 2δ l , 0
(5)
where δ is the delta function and l is the coordination index at the wavelet. The wavelet analyzer that was shown above is processed in analog form. To process it in discrete, the formula below is used: M −1
φ ( x) = ∑ ckφ (2x − k )
(6)
k =0
In the DWT implementation, an algorithm, named the Pyramid Algorithm [5] is used. The implementation of this algorithm must comply with one rule, i.e. the size of the signal must be of a factor of 2.
The Design of Discrete Wavelet Transformation Chip
361
X[n]
t[n]
r[n]
2
2
t[n]
r[n]
2
2
t[n]
r[n]
2
2
.
.
.
Fig. 1. Pyramid Algorithm
where t[n] and r[n] are the respective highpass and lowpass functions which can be expressed below [6]: ai =
bi =
1 2
N
∑c j =1
2i − j +1
f j i=1,….,
1 N ∑ (−1) j+1 c j+2−2i f j 2 j =1
N 2
i=1,….,
(7)
N 2
(8)
where a is the highpass function and b is the lowpass function. All of these theories and equations that have been presented will be translated into basic design forms and later on into an integrated design.
3 Design Stage In the process of transfering the above formulae into hardware design, the method used is to divide the formula into fundamental forms and these are later integrated together to produce the whole process itself. There are 3 phases of development, i.e. the planning of the algorithm, VHDL programming including the simulation process, followed by the synthesis process that
362
Zaidi Razak and Mashkuri Yaacob
involves the optimization of the results. The optimization process is to determine whether or not the design that is being produced can be synthesised into a physical chip.
4 Algorithm Stage One of the algorithms that has been developed is the algorithm for the highpass function which is illustrated below: 1. counter1 = 2 * (i – 1) 2. counter2 = mod (counter1, data size) 3. counter3 = mod (counter1 + min( data size, length(l)) – 1), data size) 4. for n from counter2 until counter3 do 4.1. calculate index to get highpass value, mod(n – counter1, data size) 4.2. calculate multiplication between highpass value with subscript index + 1 and data with subscript n + 1 4.3. store result in variable, b 5. end for 6. Result tally in 4.3
The algorithm below is the algorithm for processing the decomposition. 1. 2. 3.
4. 5.
flag = 0 call function get_h to get highpass value from lowpass value while data size >= 2 3.1 value of highpass and lowpass will be set 3.2 if flag = 0 then 3.2.1 for I := 0 to ( data size /2) do 3.2.1.1 call lowpass, result copy to array d. 3.2.1.2 call highpass, result copy to array h. 3.2.1.3 copy value in d to array temp 3.2.1.4 Set flag = 1 3.2.2 end for. 3.2.3 copy value in array h to final array result w in descending way. 3.3 else 3.3.1 repeat process 3.2.1.1 and 3.2.1.2 but inputs for lowpass and highpass are the elements in array temp 3.3.2 intializing array temp. 3.3.3 repeat process 3.2.1.3. 3.3.4 repeat process 3.2.3 to get the next result of highpass 3.4 end if. 3.5 Data size = ( Data size /2) end while. end.
The Design of Discrete Wavelet Transformation Chip
363
5 The Determination of Design Characteristics The characteristics that are being heavily considered in the determination of the design are as follows: (i) (ii) (iii)
types and the format of the data used; techniques of processing whether parallel or serial; number of inputs and outputs.
For the type of data, the IEEE 754 data format is used. For this standard, the floating data is represented as below: S 0
EEEEEEE 1 7
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 8 31
Fig. 2. IEEE standard for 32-Bit
where
S represents the sign bit E represents the exponential value F represents the mantissa value
In determining the technique of processing, serial processing type is chosen for the main processing component. In situations involving input of data and output of result, the parallel technique is chosen. The serial processing is needed because each existing sub-process needs an output from the previous sub-process. The situation can be clarified below: A
C B
Fig. 3. Data Dependency
In the above figure, process C needs an output from A and B to execute its operation. So, C has to to be placed on a ‘wait’ state, until A and B produce its respective outputs. Because of this ‘wait’ state, this process must be carried on in serial order.
6 Involved Entities The entities involved in producing this special hardware are listed below: (i) 2 memory modules, 16 X 32 bits in size to store input and output data. (ii) one ROM, 8 X 32 bits in size to store wavelet coefficients i.e. the daubechies-4 coefficients.
364
Zaidi Razak and Mashkuri Yaacob
(iii) (iv) (v) (vi) (vii)
one memory, 16 X 32 bits in size used as a temporary storage. one memory, 16 X 32 bits in size that is divided into 2 banks for storage of current value of coefficients. a latch that can allow 32 bits of data. a buffer that can allow the temporary storage of 32 bits of data. one controller that controls the overall work that has been carried out.
A special entity is used to coordinate the data. This entity uses active address to select its output pins. a1 activation pin
a2 a3 a4
Output address (a1 - a8)
a5 a6
Tim ing
a7 a8
Fig. 4. Data Selection Entity
7 The Simulation of Entity After the algorithm and the entity determination process are implemented, a simulation process, or more accurately a test for each of the entity is carried out. This is to ensure that each entity is able to process the input accurately before the integration process takes place. One of the results that can be generated by this simulation is the simulation contrived by the latch. It is shown below:
Fig. 5. Simulation Result for latch
To ensure that each entity will produce accurate results, a detailed knowledge of the entity’s behavior is needed. In the above figure, the latch entity will store data that
The Design of Discrete Wavelet Transformation Chip
365
has been stored when the activating pin is 0 and it will produce data when the activating pin is 1. Both pins produce an effect when the timer is 1. For the integration result of all entities, to produce the DWT, the required data is as below: [0.1708,0.5724,0.0314,0.8033,0.1000,0.1011, 0.1111, 0.9801,0.9999,0.3311,0.8900,0.1231, 0.7651,0.0001,0.5555,0.9999] The data selection is important to determine the process that has been carried out that can cover all the data’s characteristics. But before that, the processing range must be chosen. Below are the rules and conditions of the selection: (i)
Lower boundary input value This value is needed to determine that there will be no error produced if the input process is in the lower boundary, i.e. near to 0. In the research, this value is 0.0001.
(ii)
Upper boundary input value This value must be included in this work so that it can produce a result, particularly to check whether the process experiences any flow. The value is 0.9999.
(iii)
Intermediate value between the upper and lower boundary and the median For this criterion, the value chosen will validate that the process that has been carried out can be executed for all values in the existing value range.
Before these data can be used for simulation, these data must be changed into a floating point format. To ensure that the results produced from the simulation using SYNOPSYSTM are accurate, a simulation using MATLABTM is used. The simulation using MATLABTM is a process of validating the decomposition and the reconstruction of all the data that has been processed. The table below shows an output list derived for the MATLABTM simulation (all values have been changed into floating point values) and the outputs of the simulation by using SYNOPSYSTM. Table 1.
Num 1. 2. 3. 4. 5. 6. 7. 8. 9.
MATLABTM Simulation Output Compared to SYNOPSYSTM
MATLAB Simulation Result (Hexadecimal) 3F711D14 BEB837B4 3E905530 3F0624DC 3E177318 3E013C360 3F11205A 3E3DA510 3DA8C150
SYNOPSYS Simulation Result (Hexadecimal) 3FF1174A BEB821D8 3E906C86 3F062C50 3E178F8A 3EB3D888 3F1122C4 3E3DBAC5 3DA8D808
Output Data (Decimal) 1.883523 -0.359633 0.282078 0.524114 0.148008 0.351261 0.566937 0.185283 0.082443
366
Zaidi Razak and Mashkuri Yaacob
10. 11. 12. 13. 14. 15. 16.
BC498580 BF0DA512 BF0DF27C BE327BB0 3F256042 3EAA3070 3EC594AC
BC481330 BF0DA0E0 BF00EA62 BE3253E4 3F2565C8 3EAA477E 3EC59F2A
-0.012212 -0.553236 -0.503576 -0.174148 0.646084 0.332577 0.385980
The figure below shows how the entities are integrated to produce the above results. ACTUAL COEFF MEMORY
WORK COEFF MEMORY
TEMP WORK MEMORY
ACTUAL INPUT MEMORY
WORK INPUT MEMORY
MULTIPLIER
LATCH_2
ACCUMULATOR
ADDER
LATCH_1
MAIN CONTROLLER
RESULTMEMORY TEMP RESULT MEMORY
RESULR BUFFER
Fig. 6. Block Design for DWT
8 Synthesis of the Entity Each entity that has been produced will be optimized to ensure the production of the DWT chip. However, the validation process for the design must be carried out to ensure that the derived result do not change after the optimization process. Two figures below show the similarity of the outcome before and after the optimization process that took place at the buffer.
The Design of Discrete Wavelet Transformation Chip
367
Fig. 7. Input and Output Data Before Optimization
Fig. 8. Input and Output Data After Optimization
The synthesis process was done after all the results have gone through the validation process. Figure 9 below shows a sample of the logic gate diagram i.e output from the synthesis process.
9 Conclusion The output that has been obtained shows that the process of producing a special DWT hardware can be achieved. Moreover, the design can be programmed into FPGA chips and tested further for full functionality. Currently the existing designs are being improved with the aim of minimising silicon real estate as well as improving the processing speed of the transform functions.
References 1. 2. 3. 4.
Hiton, M.L, Jawerth, B.D, Sengupta, A.N, “Compressing Still And Moving Images With Wavelets,” Multimedia Systems, Vol 2, No. 3, pp. 218 – 227, April 1994. Levy. I.K., Wilson. R.: Three Dimensional Wavelet Transform Video Compression. IEEE International Conference on Multimedia Computing And System, Vol. 2. (1999) 924 – 928. Grasp. A.: An Introduction to Wavelets. IEEE Computational Science and Engineering, Vol.2, No.2. (1995) 2 Coffey. M.A., Etter. D.M.: Image Coding With The Wavelet Transform. IEEE Symposium Circuit And System, Vol. 2. (1995) 1110-1113
368
Zaidi Razak and Mashkuri Yaacob
Fig. 9. Gate Logic Diagram of PDW
5. 6.
Mallat. S.: A Theory for Multiresolution Signal Decompositions, the Wavelet Representation. IEEE Trans. Pattern Analysis And Machine Intelligence, Vol. 2, (1989) 674-693. Edwards. T.: Discrete Wavelet Transforms: Theory And Implementation. Research Report, Stanford University. (September 1991) 4.
On the Performance of Informative Wavelets for Classification and Diagnosis of Machine Faults H. Ahmadi1, R. Tafreshi2, F. Sassani2, and G. Dumont1 1
The Department of Electrical and Computer Engineering The University of British Columbia {ahmadi,guyd}@ppc.ubc.ca 2 The Department of Mechanical Engineering The University of British Columbia {tafreshi,sassani}@mech.ubc.ca
Abstract. This paper deals with an application of wavelets for feature extraction and classification of machine faults in a real-world machine data analysis environment. We have utilized informative wavelet algorithm to generate wavelets and subsequent coefficients that are used as feature variables for classification and diagnosis of machine faults. Informative wavelets are classes of functions generated from a given analyzing wavelet in a wavelet packet decomposition structure in which for the selection of best wavelets, concepts from information theory i.e. mutual information and entropy are utilized. Training data are used to construct probability distributions required for the computation of the entropy and mutual information. In our data analysis, we have used machine data acquired from a single cylinder engine under a series of induced faults in a test environment. The objective of the experiment was to evaluate the performance of the informative wavelet algorithm for the accuracy of classification results using a real-world machine data and to examine to what the extent the results were influenced by different analyzing wavelets chosen for data analysis. Accuracy of classification results as related to the correlation structure of the coefficients is also discussed in the paper.
1
Introduction
The objective of machine data analysis for diagnosis is to extract features that can be used for classification and diagnosis of different machine faults. In general, faults in machine data analysis are attributed to component faults or machine events; they are characterized by nonstationary signals associated with burst of high-energy events such as combustion or closing/opening of a valve. Several approaches have been proposed by several research groups for the analysis and diagnosis of machine faults and some have been applied successfully [4,5,6]. They include methodologies using time-frequency approaches, statistical analysis and application of wavelet-based Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 369-381, 2001. Springer-Verlag Berlin Heidelberg 2001
370
H. Ahmadi et al.
signal processing methods[3,5]. Wavelets are considered to be highly suitable for the analysis of nonstationary transient signals as often observed in machine data. This paper deals with an application of statistical approach for feature extraction namely informative wavelet algorithm for machine diagnosis in which wavelet coefficients are used as feature variables for the classification and diagnosis of different machine faults. The study is aimed at the following. • • • •
2
To utilize informative wavelet algorithm for the analysis of real-world machine data and to evaluate the performance of the algorithm for the accuracy of classification results To analyze the accuracy of results as influenced by different analyzing wavelets used in the algorithm To trace the accuracy of classification results to parameters of training data To examine correlation structure of the informative wavelets and coefficient matrix as determined by different parameters of the algorithm and to identify the manner they influence the classification results
Introductory Remarks about Informative Wavelets and Classification Algorithm
Informative wavelets are classes of functions generated from a given analyzing wavelet in a wavelet packet decomposition structure in which for the selection of ‘best’ wavelets, concepts from information theory i.e. mutual information and entropy are utilized. Entropy is a measure of uncertainty in predicting a given state of a system where a system state refers to different operating conditions such as normal or faulty operation. Computation of entropy requires evaluation of probabilities generated from training data and is supplied as inputs to the algorithm. An iterative process to generate informative wavelets is applied where at each stage, algorithm selects a wavelet from a library of orthogonal wavelets in a given wavelet packet signal decomposition structure which results in a maximal reduction of entropy, i.e. maximal reduction in uncertainty of predicting a given system state. Reduction in uncertainty is expressed in terms of mutual information derived from the joint probability distributions of the training data and coefficients. Following derivation describes the concept. M
H ( S ) = ∑ P(Si) log(P(Si)) i =1
H(S) indicates entropy of system where S1, S2, …, SM are the states of the given system with probability of occurrences given by P(S1), P(S2), …, P(SM). States of the system are observed by a measurement system with N possible outputs {T1, T2, …, TN} of a random variable T with a probability distribution P(T1), P(T2), …, P(TN). Mutual information between the states and measurements is defined as the difference between the uncertainty of predicting S before and after the observation of T:
On the Performance of Informative Wavelets for Classification and Diagnosis M
N
JS (ωγ ) = H ( S ) − H ( S / T ) = ∑∑ P(SiTj) log i =1 j =1
371
P(SiTj) P(Si)P(Sj)
Here H(S/T) and P(SiTj) indicate conditional entropy of state S given measurement T and joint probability distribution of S=Si and T=Tj, respectively. When a given state of a system is independent of a measurements, i.e. Js =0, a change in the state of the machine will not result in any change in the probability P(SiTj). The algorithm selects wavelets that results in a maximal reduction of uncertainty i.e. maximal Js(ω). In informative algorithm, measurements are wavelet coefficients obtained by projecting data onto a selected wavelet. This is done iteratively where at each stage, the residual signal is considered for signal expansion. These wavelets are referred to as informative wavelets. The iterative selection of the informative wavelets is much similar to matching pursue algorithm [2]. Wavelet coefficients are then used as feature variables and as inputs to a neural network classifier for classification and diagnostics (For further details please refer to ref. No 3). Following block diagram illustrates the main stages of the algorithm.
Fig. 1. Block Diagram of Informative Wavelet Algorithm
3
Design of the Experiment
For the experiment, machine data from a single cylinder reciprocating engine were utilized in which two types of faults were considered. We considered engine knock and intake loose valve condition each with varying intensity levels. The engine was a dual mode engine operating both on diesel fuel and natural gas mode. Data presented here mainly belong to diesel mode operation. Engine knock condition was generated by carefully adjusted load changes. Load changes were made in incremental steps of approximately 15% increase at each step. For loose valve condition, set of progressively increasing valve clearances were induced on the valve at each run. Three categories of data were collected simultaneously: 1-cylinder pressure measured through a connecting tube to the cylinder, 2-acceleration measured at a carefully chosen location at the cylinder head, 3- engine RPM. Other data were also collected for engine diagnosis including engine power, peak cylinder pressure, peak pressure angle, etc. At each run, data from sixteen consecutive cycle runs were acquired.
4
Data Used in the Study
Machine acceleration data at the intake valve closing and combustion events were utilized in our data analysis. Data were collected from consecutive cycles and were
372
H. Ahmadi et al.
used as training data. We considered three conditions of valve clearance namely normal, 0.006 in. and 0.012 in. clearances as well as three load conditions i.e. 18, 22, 25 HP. An initial review of data where mean values vs. standard deviation in each training data are examined, indicates that a certain degree of data clustering and class separation can be found in some of the data, though this could not be observed in all of our data runs. Separation of classes was more vivid in training data belonging to valve clearance conditions. Certain number of outliers was also observed in each class.
5
Parameters Chosen for Data Analysis
We considered the following as input parameters in our data analysis. • •
•
• • •
6
We used equal number of training data in all three classes where we had initially 30 training data in each class. At a later stage the number was modified when we examined the effect of changes in the training data. Initially, two classes consisting of normal and one faulty condition were considered. At a later stage, we considered three classes namely healthy and two faulty conditions with two different intensity levels. Most of the results illustrated in this paper belong to the latter. In informative algorithm, “number of informative wavelets” corresponds to the number of feature variables used for classification. In the absence of any á priori knowledge about a suitable number of feature variables, several values ranging from 1 up to 50 were initially considered. At a later stage, the number was confined to a smaller set ranging from 4 to 10. We considered wavelets from orthogonal as well as biorthogonal wavelet families. We considered Daubechies wavelets, mainly Db5, Db20, Db40 and Db45 as well as Coif5, Symlet5, Bior3.1, and Bior6.8. Standard multi layer perceptron was used for a neural network classifier. For a three-class data, five nodes of hidden layer were used for classification. We used 30 levels (bins) in quantification of coefficient and training data for construction of probability distributions.
Observations and Data Analysis Results
Informative wavelet algorithm is mainly a statistical approach for fault classification in which probability distributions of training data are utilized to generate wavelets for signal expansion and subsequently for classification. In this algorithm, wavelet coefficients carry statistical properties best matched to those of the training data. As such, classification results are determined jointly by the statistical properties of the given training data and the analyzing wavelet used for data expansion. Performance of the algorithm for the accuracy of the classification results is highly influenced by the extent of uniformity of data in each class as well as properties that are necessary for class separation. Separation of classes in coefficient domain follows a similar pattern
On the Performance of Informative Wavelets for Classification and Diagnosis
373
as those of training data where statistics of the coefficients are also influenced by the particular wavelet used for the analysis. Following observations were made in our data analysis. They are listed in three sections as described below. 6.1 Classification Results We examined wavelet coefficients generated by different analyzing wavelets as the main output of the algorithm for the classification of three valve clearance conditions and load changes. Mean vs. standard deviations of the coefficients of training data for three classes as well as histogram of the coefficients were also examined. A sample classification error results for several analyzing wavelets and for three load conditions are given (Fig 2). The errors of classification are below 5% of maximum error output of a NN classifier for most of the wavelets and were considered to be small enough and acceptable for an accurate classification of both valve clearances and load conditions. As shown in Fig 2, different errors were obtained using different wavelets. Bior3.1 performed superior to others and Symlet5 exhibited the largest error. Classification errors for load changes and knock detection varied and were influenced to a large extent by the extent of uniformity of the training data in all classes. We traced the differences in classification errors to several parameters of the algorithm as well as input data including correlation structure of coefficient matrix as described in sections below. 25
Classification error for: Db2, Db5, Db20, Db45, Co1, Co5, Bi31
Percentage Error
20 15 10 5 0
1
2
3
4
5
6
7
Fig.2. Classification errors, three classes, three load conditions (healthy +, mild *, sever fault o, total error x). Error values from left to right: Db2, Db5, Db40, Db45, Bior3.1, Bior6.8, Sym5
6.2 Informative Wavelets A varied classification results were obtained in our data analysis using different wavelets. We tried to trace these different results to different input parameters of the algorithm such as training data, number of iterations and more significantly to the intermediate outputs of the algorithm such as informative wavelets as described below.
374
H. Ahmadi et al.
Nonorthogonal Signal Expansion. Informative wavelet algorithm is a nonorthogonal signal decomposition in which informative wavelets are in general correlated and a certain degree of redundancy is always observed in signal decomposition. Nonorthogonality of signal decomposition can be attributed to the iterative process in selecting informative wavelets where at each stage the residual signal is considered for signal expansion much similar to matching pursue algorithm [2]. In our data analysis, we examined deviation from orthogonality for several analyzing wavelets. It was observed that for most of the orthogonal analyzing wavelets, such as Daubechies wavelets or Coiflets, majority of the informative wavelets of the first few iterations generated by the algorithm, were near orthogonal as indicated by the inner product of the wavelets (Fig 3). Deviations from orthogonality increased as we moved to latter stages of the iterations, i.e. informative wavelets generated at the later iterations exhibited a higher correlation than those of the first few iterations. However for biorthogonal wavelets namely Bior3.1, deviations from orthogonality was even high at the initial iterations. A singular value decomposition of matrix composed of informative wavelets was examined to identify the extent of correlation among the informative wavelets along different principal axes (Fig. 3). As it can be seen, for Bior3.1, the first singular value is nearly twice that of Db5. The same observations were also made when covariance of the informative wavelet matrix was examined. We also examined an increase in the number of informative wavelets and the manner the correlation and singular values were influenced by such changes. It was observed that correlation was increased as additional informative wavelets were added. However, the extent of the increase differed for different wavelets. Biorthogonal wavelets showed the largest increase. Non-orthogonal Signal Expansion and Accuracy of Classification Results. The accuracy of the classification results as influenced by the nonorthogonality of the informative wavelets, were studied using correlation and singular values of the coefficient matrix. Following observations were made. • •
•
It was observed that correlation structure and singular values of coefficient matrix followed the same pattern as those of informative matrix. For Db family of orthogonal wavelets, it was observed that singular values of informative matrix of the first few iterations did not differ significantly from each other. It was also observed that for these wavelets, classification results were also not noticeably different from each other. However, singular values of wavelet matrix of higher iterations differed from each other for different analyzing wavelets of Db family. Biorthogonal wavelets namely Bior3.1, showed a higher degree of correlations among the informative wavelets than orthogonal. First singular value of the covariance matrix of the wavelets for Bior3.1 was observed to be nearly ten times that of majority of the orthogonal wavelets such as Db5, Db40. Classification results in majority of the cases considered here indicated a higher accuracy of classification results were obtained for Bior3.1 as compared with other wavelets under identical conditions.
On the Performance of Informative Wavelets for Classification and Diagnosis
375
Orthogonality in Db5 Informative Wavelet 1.5
(1)
1 0.5 0 -0.5
1
2 3 4 5 6 7 Orthogonality in Db40 Informative Wavelet
8
1
2 3 4 5 6 7 SVD of Db5 $ Db40 Informative Wavelet
8
1
2
8
1.5
(2)
1 0.5 0 -0.5 2
(3)
1.5 1 0.5 0
3
4
5
6
7
Fig. 3. Inner product of informative wavelets generated by Db40 (1), Bi3.1 (2), and singular values of corresponding coefficient matrices (3). Shown are first four informative wavelets (‘o’, ‘x’, ‘*’, ‘+’) from left to right and singular values (Db40, ‘o’, Bior3.1, ‘*’)
General Pattern of the Centroid of the Wavelets. A generally distinct patterns and clustering of centroid of informative wavelets were observed in our data analysis. For majority of the cases considered here, centroid of the wavelets of small translation index occurred at the high scale levels (Fig. 4). It was also observed that centroid of majority of wavelets, lie at the first half-length of the given signal as shown in Fig. 4 (top diagram). As a result centroid of wavelets with longer support (higher order) such as db45, were seen to exhibit this pattern more vividly than low order wavelets in identical data analysis situations. This could be confirmed when informative wavelets generated by Db5 was compared with those of Db45. It was also observed that the centroid of wavelets with large oscillation index, lie at higher scale levels. This follows from wavelet packet decomposition where wavelets with high frequency of oscillations occur at high scale levels.
376
H. Ahmadi et al.
Clustering in Translation-Modulation. A more vivid clustering pattern of centroid of wavelets was also observed in translation-modulation plane where centroid of majority of the wavelets lied on or near origin (Fig. 4). Sensitivity of Wavelets to Changes in Training Data. In our data analysis, it was observed that a small changes in training data caused a noticeable change in the informative wavelets. For example a small increase in the number of training data (e.g. simple repetition of data) caused a considerable change in several numbers of the informative wavelets. This could be attributed to the generation of informative wavelets by the algorithm where changes in the probability distribution of the coefficients change mutual information followed by changes in informative wavelets. In our data analysis using eight informative wavelets, often a change in half of the wavelets was observed. Mutual Information and Pattern of Oscillatory Behavior. Mutual information is used as a measure of reduction of uncertainty of the prediction; it contains information that is reflective of those of the input data. While mutual information generally follows a declining pattern with the number of iterations; however, considerable oscillatory patterns were observed in several cases in our data analysis. Oscillatory pattern as well as increasing mutual information (instead of declining), was mostly observed in biorthogonal wavelets such as Bior3.1 and to a lesser degree in Bior6.8. Several cases with mutual information at a constant level of unity during several consecutive iterations were also observed. Increase in Number of Iterations. In the current algorithm, number of informative wavelets (iterations) is chosen á priori as an input to the algorithm. It was observed that increasing number of iterations in a given data analysis does not alter informative wavelets derived from prior iterations. As a result no change was observed in the coefficient values already derived. The additional informative wavelets increased feature variables and thus expanding the dimension of feature space. Increase in Number of Iterations and Accuracy of Classification Results. In our experiments, we have used 5-10 iterations (informative wavelets) though higher iterations were also examined. It was observed that an increase in the number of iterations was not always accompanied by an increase in the accuracy of classification results. This could be traced to the correlation structure of wavelet coefficients and changes made by the added wavelets as discussed in the following section. 6.3 Observation on the Coefficients Wavelet coefficients are the feature variables in informative wavelet algorithm; they contain the necessary information about a given data for classification and fault diagnosis. They are determined by the given data as well as informative wavelets generated by the algorithm. Following observations were made on the coefficients.
On the Performance of Informative Wavelets for Classification and Diagnosis
377
Trans
100 50 0
Modul
100
0 Mutual Info
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
50
1 .5 1 0 .5 100 Trans
0
50 0
0
10
20
30 40 M o d ula ti o n
50
60
70
Fig.7. Centroids of informative wavelets. In top three diagrams, horizontal axes are scale levels
Correlation Structure of the Coefficient Matrix. Informative wavelet algorithm is a nonorthogonal signal decomposition; as a result a certain degree of redundancy of signal expansion and correlation will always remain among the coefficient values. We examined correlation structure of coefficient matrix under several analyzing wavelets and different number of iterations. We examined correlation of coefficients for different informative wavelets in a given data. Significant differences were observed in correlation of the coefficients for different wavelets. We used singular value decomposition of the coefficient matrix to identify the manner classification results were influenced by the correlation structure of the coefficients. Singular values of the coefficient matrix for several wavelets are shown in Fig. 5. It was observed that for orthogonal wavelets such as Db wavelets, no significant differences could be observed between first singular values of the coefficient matrix of the first few iterations. However differences were observed in singular values of latter stages of the iterations. We observed the same for Coiflet5, but for Symlet5 singular values were different. For coefficients generated by biorthogonal wavelets such as Bior3.1, often a large first singular value was observed as compared with other wavelets (Fig. 5). Accuracy of Classification. In majority of cases considered here, accuracy of the classification results was higher when singular values of the coefficient matrix were also high. We have shown result of classification errors for several wavelets derived under identical conditions of data analysis(Fig. 2). Classification errors were seen to be approximately the same for orthogonal Db wavelets and were generally lowest for Bior3.1. It was also observed that for wavelets within the same group (such as db family) and for a given data analysis, a higher accuracy of classification results was obtained when correlation between informative wavelets was also large.
378
H. Ahmadi et al.
All Classes
SVD of Coef Matrix, all Classes, Db20 & Bior3.1, Last 4 Info. Wvlts. 5
0
Class 1
2
4 Class 2
1.5
2
2.5
3
3.5
4
1
1.5
2
2.5
3
3.5
4
1
1.5
2
2.5
3
3.5
4
1
1.5
2
2.5
3
3.5
4
1 0
2 0 4
Class 3
1
2 0
Fig. 5. Singular values of coefficient matrix for three classes of load conditions (individual and all classes), Db20 (‘o’), Bior3.1 (‘*’) C oeffs: Bior 3.1 with 28 training signals in each class W1
0.5 0 -0.5
0
10
20
30
40
50
60
70
80
90
-0.5 10
10
20
30
40
50
60
70
80
90
0
10
20
30
40
50
60
70
80
90
0
10
20
30
40
50
60
70
80
90
0
10
80
90
W2
0.5
W3
0
0 -1
W4
0.5 0 -0.5 2 1 0
20 30 40 50 60 70 Sum Squared C oeffs for 8 informative wavelets
Fig. 6. Wavelet coefficients, four informative wavelets W1, …, W4 for Bior3.1. Bottom diagram is coefficient L2 norm of training data for three classes (1:28, 29:56, 57:84)
On the Performance of Informative Wavelets for Classification and Diagnosis
379
Number of Iterations. An increase in the number of iterations was followed by an increase in most of the singular values. For Bior3.1 with highly correlated coefficients, the number of iterations was found to be in general lower than those of other wavelets tried in this experiment for an equal size of classification error. Reduced Size of Coefficient Matrix. Cases were observed in which wavelet coefficients of the signal expansion by different informative wavelets, differed significantly from each other where for some wavelets they were negligibly small across all classes. For such cases, rows of coefficient matrix having large coefficient values were selected and used as feature variables and as input to the classifier resulting in an increased computational efficiency. We also considered singular value decomposition of informative wavelet matrix to reduce the size of coefficient matrix where highly correlated wavelets (and coefficients) could be removed (Fig. 6). Coefficient Values and Separation of Classes. For each of the training data L2 norm of the coefficient values were evaluated (Fig. 6). Mean squared values and standard deviations of different classes were also used as an index for class separation. It was found that classification results using mean squared values of different classes were nearly identical with those obtained by the algorithm and when classification error was small. More accurate classification results could be obtained when standard deviations were small. Sum squared coefficient values were also used to identify outliers in training data as discussed below. Uniform Coefficient Values and Identification of Outliers. Under a given wavelet, a relatively a uniform coefficient values were obtained for signal expansion of different training data in a given class. This was more pronounced when sum squared coefficient values as mentioned above, were examined. A uniform pattern in coefficient values were used for identifying outliers in a given training data where coefficients differed significantly the general pattern (Fig. 6). While coefficients are wavelet dependent, often such outliers could be seen consistently across several analyzing wavelets. This enabled us to remove outliers in the training data and improve the results of the classification.
7
Conclusions
In this paper results of an experimental study for an application of informative wavelet algorithm for classification and diagnosis of machine faults were presented. We have used several wavelets and different set of machine data. Effectiveness of the algorithm for the classification of two categories of faults namely excess valve clearance and knock conditions each with variable intensity levels were examined. Accuracy of results was studied under several parameters of the algorithm. We have used different wavelets from both orthogonal and biorthogonal family of wavelets. Several illustrative examples were presented in this paper. Some of the results of the study were discussed and are summarized as follows. •
In majority of the experimental runs in which we used different analyzing wavelets, satisfactory classification results were obtained when training data were
380
•
•
•
•
•
H. Ahmadi et al.
considered to be sufficiently uniform and sufficient number of training data was used for classification. However different classification errors were obtained for different wavelets. For biorthogonal wavelets often classification errors were found to be lower as compared with orthogonal wavelets. For load changes and knock condition, accuracy of results varied for different training data and different intensity levels of faulty conditions. It was observed that informative wavelets generated by the algorithm are highly sensitive to changes in number as well as minor changes in actual values of training data. While classification results remained almost unchanged by small changes in training data, informative wavelets and coefficient values changed significantly. This was attributed to the structure of the algorithm in which probability distribution of the coefficients can be influenced by minor changes in training data and more significantly during the selection of the informative wavelets. It was often observed that wavelet coefficients of a given signal differed for different informative wavelets. As a result, small coefficient values across all classes under a given informative wavelets, could be removed and a reduced size of feature variables could be obtained without a noticeable effect on the classification results. Different informative wavelets were derived using different analyzing wavelets. This lead to different correlation structure of the coefficient matrix measured by singular values. It was observed that while for orthogonal wavelets of Db family, first few large singular values were not significantly different from each other, often Bior3.1 showed the highest correlation( large singular values) for a given signal analysis. Accuracy of results was also superior under Bior3.1 in majority of cases as compared with other wavelets. In a given data analysis where informative wavelets were highly correlated, it was possible to reduce the number of informative wavelets and as such number of features by removing rows of correlated coefficients without significant changes in the classification results. Different coefficient values were obtained when different analyzing wavelets were used for data analysis. As a result correlation structure of the coefficient matrix was influenced by the use of different wavelets leading to different size of the reduced matrix and different number of feature variables in a classification problem.
Acknowledgement This experiment was made possible through grants supplied by NRC_IRAP and REM Technology of Port Coquitlam, BC. Authors wish to thank Dr. Howard Malm of REM Technology for a continued support during both data collection and data analysis.
On the Performance of Informative Wavelets for Classification and Diagnosis
381
References 1. 2. 3. 4. 5. 6. 7.
Coifman, R.R, Wickerhauser M.V.: Entropy-based Algorithm for Best Basis Selection. IEEE Transactions on Information Theory 38,713-718 (1992) Mallat, S., Zhang, Z: Matching Pursuit with Time Frequency Dictionaries. IEEE Trans. on Signal Processing 41,3397-3415 (1993) Bao Liu, Shih Fu Ling: On the Selection of Informative Wavelets for Machinery Diagnosis. Mechanical Systems and Signal Processing, Vol. 13, No 1 (1999) Samimy B. Rizzoni, G: Mechanical Signature Analysis using Time Frequency Signal Processing: Application to Internal Combustion Engine Knock detection. Proc. of IEEE, Vo. 84 No.9 (Sep. 1996) Zheng G.T, McFadden P.D.: A time-frequency Distribution for Analysis of Signal with Transient Components and its Application to Vibration Analysis. Trans. ASME, Vol 121 (Jul, 1999) Samimy B. et all: Design of Training data–based Quadratic Detectors with Application to Mechanical Systems. Proc. ICASSP-96, May 7-10, Atlanta, GA (1996) Daubechies, I: Ten lectures on Wavelets. Siam, Philadelphia, PA (1992)
A Wavelet-Based Ammunition Doppler Radar System S. H. Ong and A. Z. Kouzani School of Engineering and Technology, Deakin University Geelong, Victoria 3217, Australia
Abstract. Today’s state-of-the-art ammunition Doppler radars use the Fourier spectrogram for the joint time-frequency analysis of ammunition Doppler signals. In this paper, we implement the joint time-frequency analysis of ammunition Doppler signals based on the theory of wavelet packets. The wavelet-based approach is demonstrated on Doppler signals for projectile velocity measurement, projectile inbore velocity measurement and on modulated Doppler signal for projectile spin rate measurement. The wavelet-based representation with its good resolution in time and frequency and reasonable computational complexity as compared to the Fourier spectrogram is a good alternative for the joint time-frequency analysis of ammunition Doppler signals.
1
Introduction
Continuous wave (CW) Doppler radars [1,2] are used for measuring the velocity and spin rate of projectile(s). Joint time-frequency analysis (JTFA) of the Doppler signal is done to extract the velocity-time and/or spin rate-time information of the projectile(s) from the Doppler signal. The Fourier spectrogram (FS) is the current method used in today’s ammunition Doppler radar systems. The requirements of a digital signal processing (DSP) system for the JTFA of ammunition Doppler signals are: 1.
2.
The computation of the velocity or spin rate results has to be done immediately after each round is fired or at a later time. Thus, the DSP algorithm has to be of reasonable computational complexity if the results are needed immediately after a round is fired. The system must be able to give accurate velocity and spin rate results. To satisfy this requirement, the time-frequency representation must have good resolution in time and frequency.
The FS is able to meet the first requirement using algorithms based on the fast Fourier transform. However, the main limitation of the FS is the poor resolution in time or frequency of the representation. The objective of this paper is to implement the JTFA of ammunition Doppler signals using the wavelet packet transform. In Sect. 2, we present the FS and explain its main limitation. In Sect. 3 we review the ammunition Doppler radar system and describe the main applications of ammunition Doppler Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 382-392, 2001. Springer-Verlag Berlin Heidelberg 2001
A Wavelet-Based Ammunition Doppler Radar System
383
radar systems. In Sect. 4, we introduce the best-basis wavelet packet transform (BBWPT). In Sect. 5, we compare the wavelet-based approach with the FS-based approach for the JTFA of ammunition Doppler signals. In Sect. 6, we conclude the paper.
2
The Fourier Spectrogram
To determine the properties of a signal at a particular time, we emphasize the signal at that time and suppress the signal at other times. This is done by multiplying the signal by a window function h(t) centered at time t, to produce the signal,
st (τ ) = s (τ ) h (τ − t ) .
(1)
The Fourier transform of st(τ) will reflect the distribution of frequency around time t,
St (ω ) =
1 2π
∫e
=
1 2π
∫e
− jωτ
st (τ ) dτ
(2)
− jωτ
s (τ ) h (τ − t ) dτ .
(3)
The energy density spectrum at time t is given by 2
PSP ( t , ω ) = St (ω ) =
1 2π
2
∫e
− jωτ
s (τ ) h (τ − t ) dτ .
(4)
At each time instant we get a different spectrum. The totality of these spectra gives the time-frequency representation PSP called the FS. To obtain good time localization a narrow window in the time domain h(t) is used. To obtain good frequency localization a narrow window in the frequency domain H(ω) is used. But both h(t) and H(ω) cannot be made arbitrarily narrow. There is an inherent trade-off between the time and frequency localization in the FS for a particular window. The amount of trade-off depends on the signal, window, time, and, frequency. The uncertainty principle quantifies these trade-off dependencies. The poor resolution in time or frequency is the main limitation of the FS method.
3
Ammunition Doppler Radar System
The ammunition Doppler radar is a versatile instrument used for the testing of ammunition. In this section, we review the ammunition Doppler radar system and state its main applications. A simplified block diagram of an ammunition Doppler radar system is shown in Fig. 1.
384
S. H. Ong and A. Z. Kouzani
The output of the CW transmitter at frequency f0 Hz is routed through a circulator to the antenna. The wave transmitted by the antenna propagates to and is scattered from the moving projectile. The wave is then received back at the antenna. The received wave has a frequency of (f0 - fd) Hz. The wave passes through the circulator to the receiver. At the front end of the receiver, a mixer heterodynes the two signals together to produce a Doppler signal of fd Hz. The A/D converter digitizes the Doppler signal and sends it to a computer whereby the JTFA is done. The subsections below describe three major applications of CW ammunition Doppler radar systems. Circulator Antenna
f0
f0
f0 - fd Projectile
Gun
f0 - fd
CW transmitter
f0 (leakage)
Mixer
fd
ADC
Computer
Fig. 1. Simplified block diagram of an ammunition Doppler radar system
Projectile Velocity Measurement. JTFA of the Doppler signal is done to obtain the instantaneous frequency (single projectile case) or frequencies (multiple projectiles case) from the Doppler signal. The instantaneous velocity v of the projectile is computed from the instantaneous Doppler frequency shift fd using the following relation [3]:
v = 0.5 × f d × λ
(5)
where λ is the wavelength. λ is given by
λ=
c f0
(6)
where c is the velocity of radio waves and f0 is the frequency of the transmitted wave. A 16k byte multicomponent Doppler signal for the velocity measurement of a saboted round (signal no. 1) is shown in Fig. 2. The broadband FS of this signal as shown in Fig. 3 uses a length 256 Hanning window. The narrowband FS of the signal as shown in Fig. 4 uses a length 1k byte Hanning window. The narrowband FS has a better frequency resolution compared to the broadband FS. However, the time resolution is degraded resulting in smearing of the frequency over the length of the window due to the nonstationary nature of the Doppler signal. The poor resolution in time or frequency of the FS affects the accuracy of the velocity results. This is the
A Wavelet-Based Ammunition Doppler Radar System
385
main drawback of the FS when it is used for the JTFA of ammunition Doppler signals. 2000 1500
Voltage (x 0.01 volt)
1000 500 Sabots 0 -500 -1000 -1500 -2000
0
2000
4000
6000
8000 10000 Time (x 8 us)
12000
14000
16000
Fig. 2. Signal no. 1 - Doppler signal for projectile velocity measurement
Projectile In-Bore Velocity Measurement. Projectile in-bore velocity measurement involves the study of the motion of projectile inside a gun barrel. The radar antenna is placed at the side of the gun muzzle. A reflector is placed at a certain distance down range of the gun and in-line with the barrel bore axis. This reflector reflects the transmitted waves from the antenna into the barrel bore. Upon firing, as the projectile moves inside the barrel, the reflected waves scattered from the projectile are reflected by the reflector and received back at the radar antenna. The in-bore velocity of the projectile vib is related to the Doppler frequency shift fd by the following relation:
vib = 0.5 × f d × λ fs × M c
(7)
where λfs is the free space wavelength and Mc is the mode correction factor. The mode correction factor is described by the following equation.
Mc =
λib = λ fs
1 1.841× c 1− π × d × f0
where d is the diameter of the barrel bore.
2
(8)
386
S. H. Ong and A. Z. Kouzani 4
x 10 6
5
Frequency (Hz)
Penetrator 4 Sabots 3
2
1
0
0
0.02
0.04
0.06 0.08 Time (sec)
0.1
0.12
Fig. 3. Broadband FS of signal no. 1 4
x 10 6
5
Frequency (Hz)
Penetrator 4 Sabots 3
2
1
0
0
0.02
0.04
0.06 Time (sec)
0.08
0.1
0.12
Fig. 4. Narrowband FS of signal no. 1
Fig. 5 shows a 4k byte monocomponent Doppler signal (signal no. 2) for projectile in-bore velocity measurement. The FS of this signal as shown in Fig. 6 uses a length 512 Hanning window. Projectile Spin Rate Measurement. The amplitude and phase of the Doppler signal are not affected by the spin of an axially symmetric projectile. If a slot is milled on the base of the projectile to make it axially asymmetric, this will result in modulation of the amplitude and phase of the Doppler signal. To obtain sufficient modulation, the width and depth of the slot have to be at least one quarter the wavelength of the transmitted signal [4]. A 16k byte modulated Doppler signal for projectile spin rate measurement (signal no. 3) is shown in Fig. 7. The broadband FS of this signal as shown in Fig. 8 uses a length 512 Hanning window. The projectile spin rate cannot
A Wavelet-Based Ammunition Doppler Radar System
387
be determined from Fig. 8 as the frequency resolution of the representation is insufficient. The narrowband FS of the signal as shown in Fig. 9 uses a length 1k byte Hanning window. In Fig. 9, we are able to identify three peaks in the FS: a major peak at frequency fd and two minor peaks at frequencies fd ± 2 × fs . fd is the Doppler frequency shift corresponding to the velocity of the projectile. fs is the spin rate of the projectile. 2000 1500
Voltage (x 0.01 usec)
1000 500 0 -500 -1000 -1500 -2000
0
500
1000
1500 2000 2500 Time (x 5 usec)
3000
3500
4000
Fig. 5. Signal no. 2 - Doppler signal for projectile in-bore velocity measurement 4
5
x 10
4.5 4
Frequency (Hz)
3.5 3 2.5 2
Bullet exit from muzzle
1.5 1 0.5 0
0
2
4
6
8 10 Time (sec.)
12
14
16
18 -3
x 10
Fig. 6. FS of the Doppler signal for projectile in-bore velocity measurement
388
S. H. Ong and A. Z. Kouzani
2000 1500
Voltage (x 0.01 volt)
1000 500 0 -500 -1000 -1500 -2000
0
2000
4000
6000 8000 10000 Time (x 5 usec)
12000
14000
16000
Fig. 7. Signal no. 3 - Modulated Doppler signal for projectile spin rate measurement 4
10
x 10
9 8
Frequency (Hz)
7 6 5 4 3 2 1 0
0
0.01
0.02
0.03
0.04 0.05 Time (sec)
0.06
0.07
0.08
Fig. 8. Broadband FS of the modulated Doppler signal for projectile spin rate measurement
4
Best-Basis Wavelet Packet Transform
The discrete wavelet transform (DWT) is implemented by Mallat’s pyramid algorithm [5]. Fig. 10 shows a 1-stage decomposition of a signal in the DWT. h[n] and g[n] are the impulse responses of the analysis low pass and high pass filters, respectively. Sn,k+1 is one of the vector spaces at level k + 1 onto which signals are projected. The outputs of both filters are downsampled by a factor of two. The filtering and downsampling operations split the vector space into two subspaces S2n,k and S2n+1,k. The
A Wavelet-Based Ammunition Doppler Radar System
389
signal is successively decomposed into lower resolution components, while the highfrequency components are not analyzed any further. The wavelet packet transform (WPT) however uses both the low and high frequency components. Fig. 11 shows a wavelet packet tree. There is a large but finite library of bases in the wavelet packet tree. The best basis can be extracted from this library based on some criterion. This is done using the best-basis algorithm [6]. The best-basis algorithm compares the entropy of the children to their parent entropy, starting from the bottom of the tree. If the parent entropy is smaller than the sum of the children entropy, then the parent entropy is retained. Else, the children entropy sum replaces the parent entropy. 4
10
x 10
9 8
Frequency (Hz)
7 Minor peaks
6 5
Major peak
4 3 2 1 0
0
0.01
0.02
0.03
0.04 Time (sec)
0.05
0.06
0.07
Fig. 9. Narrowband FS of the modulated Doppler signal for projectile spin rate measurement
Sn,k+1
g(n) h(n)
↓2 ↓2
S2n+1,k S2n,k
Fig. 10. 1-stage decomposition of a signal
5
Wavelet-Based versus FS-Based Approach
Fig. 12 shows the BB-WPT representation of signal no. 1 for projectile velocity
Fig. 11. A wavelet packet tree. The wavelet basis is shown in continuous line
390
S. H. Ong and A. Z. Kouzani
measurement. Vaidyanathan-Huong 24 (VH24) [7] wavelet is used and the WPT is computed up to resolution level 14. Comparing Fig. 12 with the corresponding FSs shown in Figs. 3 and 4, we can see the good resolution in time and frequency of the BB-WPT as compared to the FS representation. Fig. 13 shows the BB-WPT representation of signal no. 2 for projectile velocity inbore measurement. VH24 wavelet is used and the WPT is computed up to resolution level 12. The BB-WPT representation shown in Fig. 13 is in conformance with the corresponding FS shown in Fig. 6. Fig. 14 shows the BB-WPT representation of signal no. 3 for projectile spin rate measurement. VH24 wavelet is used and the WPT is computed up to resolution level 14. Due to the good time and frequency resolution of the BB-WPT representation, we are able to identify the main peak and the two minor peaks in Fig. 14. From the two minor peaks, the spin rate of the projectile can be determined. In the corresponding FSs shown in Figs. 8 and 9, the main peak and the two minor peaks can only be identified from the narrowband FS. 1 0.9
Frequency [x 62.5 kHz]
0.8 0.7
Penetrator
0.6 Sabots 0.5 0.4 0.3 0.2 0.1 0
0
0.1
0.2
0.3
0.4 0.5 0.6 Time [x 0.131072 s]
0.7
0.8
0.9
1
Fig. 12. BB-WPT representation of signal no. 1 for projectile velocity measurement 1 0.9
Frequency [x 100 kHz]
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
0.1
0.2
0.3
0.4 0.5 0.6 Time [x 0.02048 s]
0.7
0.8
0.9
1
Fig. 13. BB-WPT representation of signal no. 2 for projectile in-bore velocity measurement
A Wavelet-Based Ammunition Doppler Radar System
391
1 0.9
Frequency [x 100 kHz]
0.8 0.7 Minor peaks 0.6 0.5 0.4
Major peak
0.3 0.2 0.1 0
0
0.1
0.2
0.3
0.4 0.5 0.6 Time [x 0.08192 s]
0.7
0.8
0.9
1
Fig. 14. BB-WPT of signal no. 3 for projectile spin rate measurement
6
Conclusion
We have demonstrated the JTFA using the wavelet-based approach via the BB-WPT on three types of ammunition Doppler signals as follows: 1. 2. 3.
Multicomponent Doppler signal for projectile velocity measurement. Monocomponent Doppler signal for projectile in-bore velocity measurement. Modulated Doppler signal for projectile spin rate measurement. The main limitation of the FS is the poor resolution in time or frequency of the representation. This affects the accuracy of the velocity and/or spin rate results. On the other hand, due to the unique properties of wavelet packets, the BB-WPT representation exhibits better time and frequency resolution compared to the FS representation. The BB-WPT is suitable for practical implementation as the algorithm is of reasonable computational complexity when implemented based on Mallat’s pyramid algorithm and the best basis algorithm. In conclusion, the waveletbased approach is a good alternative to the FS-based approach for the JTFA of ammunition Doppler signals.
References 1. 2. 3. 4.
Skolnik, M. I., Ed.: Radar Applications. IEEE Press. (1987) 443 – 452 Whetton, C. P.: Industrial and Scientific Applications of Doppler Radar. Microwave J., Vol. 18 (Nov 1975) 39 – 42 Levanon, N.: Radar Principles. John Wiley and Sons, Inc. (1988) 1 – 18 Jens-Erik Lolck: Spin Measurements. Tenth International Symposium on Ballistics. Vol. I. San Diego California. (1987)
392
5. 6. 7.
S. H. Ong and A. Z. Kouzani
Mallat, S. G.: A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Trans. Patt. Anal. and Mach. Intell., 11(7) (1989) 674 – 693 Coifman R. R., Wickerhauser, M. V.: Entropy-Based Algorithms for Best-Basis Selection. IEEE Trans. Inform. Theory. 38 (2) (Mar 1992) 713 – 718 Vaidyanathan P. P., Phuong-Quan Huong: Lattice Structures for Optimal Design and Robust Implementation of Two-Channel Perfect-Reconstruction QMF Banks. IEEE Trans. Acoust., Speech, and Signal Proc. 36(1), (Jan 1988) 81 - 94
The Application of Wavelet Analysis Method to Civil Infrastructure Health Monitoring* Jian Ping Li1, Shang An Yan1, and Yuan Yan Tang2 1
International Centre for Wavelet Analysis and Its Applications, Logistical Engineering University, Chongqing 400016, P. R. China [email protected] 2 Department of Computer Science, Hong Kong Baptist University, Hong Kong [email protected]
Abstract. Wavelet analysis and its applications have become one of the fastest growing research areas in the recent years. This is in part attributed to the pioneering work by the researchers as well as practitioners in the field of signal processing. Morlet first coined down the term of wavelet analysis in early 1980s. Meyer developed a wavelet basis in 1986, which is best known today as Meyer basis. Later, Mallat and Meyer formulated a theory of multiresolution analysis theory, and subsequently, proposed the Mallat algorithm, making wavelet transform readily implementable with digital computers. In 1990s, advanced research and development in wavelet analysis have found numerous applications in such areas as signal processing, image processing, and pattern recognition with many encouraging results. Despite this fast growth in theories and applications, the theoretical development of wavelet transform itself is somewhat lagging behind as compared to its applications. Recently, a new method based on wavelet analysis and wavelet transform has been developed to process nonlinear and nonstationary time series data by Huang[1,2,3,4] and J. P. Li, Y. Y. Tang [5,6]. This novel method, consisted of wavelet transform, Hilbert Spectral Analysis and Empirical Mode Decomposition. It has been applied to a variety of geophysical and bio-engineering problems. The specific application to civil infrastructure health monitoring has been reported. The basic method and infrastructure health monitoring application will be discussed here.
1
Introduction
To safeguard the safety performance of a bridge, regular inspections are essential. At the present time, the inspection method is primarily visual: A technician has to go *
This work was supported by the National Nature Science Foundation of China under the grand number 69903012.
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 393-397, 2001. Springer-Verlag Berlin Heidelberg 2001
394
Jian Ping Li et al.
through a bridge to examine each member and certify its safety. This method is subjective and flawed for lacking of rigorous and objective standards. For example, for a deteriorating bridge from fatigue or aging, the damage is not clear-cut at any time. Therefore, any call is judgmental. Furthermore, it is not feasible to use this method for complicated bridge structures: there might be members located at positions too awkward to access; there might be too many members that would require too much time to inspect; and there might be damage too subtle to be detected visually. Because of these limitations, the visual inspection results are known to be not totally reliable; yet, we are forced to rely on it today. We think that an ideal inspection method will have to satisfy the following conditions: 1. 2. 3. 4.
2
Robust, objective, and reliable To be able to identify the existence of damages, To be able to locate the damages, To be able to determine the degree of the damages.
Wavelet Transform
The wavelet approach is essentially an adjustable window Fourier spectral analysis with the following general definition:
Wa ,b ( f (t ),ψ ) = a
−1 / 2
∫ f (t )ψ (
t −b )dt a
(1)
in which ψ (t ) is the basic wavelet function that satisfies certain very general conditions, a is the dilation factor and b is the translation of the origin. Although time and frequency do not appear explicitly in the transformed result, the variable 1/a gives the frequency scale and b, the temporal location of an event. An intuitive physical explanation of equation above is very simple: Wa ,b ( f (t ),ψ ) is the ‘energy’ of f(t) of scale a at t=b. Because of this basic form of at+b involved in the transformation, it is also known as affine wavelet analysis. For specific applications, the basic wavelet function, ψ (t ) , can be modified according to special needs, but the form has to be given before the analysis. In most common applications, however, the Morlet wavelet is defined as Gaussian enveloped sine and cosine wave groups with 5.5 waves. Generally, ψ (t ) is not orthogonal for different a for continuous wavelets. Although one can make the wavelet orthogonal by selecting a discrete set of a, this discrete wavelet analysis will miss physical signals having scale different from the selected discrete set of a. Continuous or discrete, the wavelet analysis is basically a linear analysis. A very appealing feature of the wavelet analysis is that it provides a uniform resolution for all the scales. Limited by the size of the basic wavelet function, the downside of the uniform resolution is uniformly poor resolution. Although wavelet analysis has been available only in the last ten years or so, it has become extremely popular. Indeed, it is very useful in analyzing data with gradual frequency changes. Since it has an analytic form for the result, it has attracted
The Application of Wavelet Analysis Method
395
extensive attention of the applied mathematicians. Most of its applications have been in edge detection and image compression. Limited applications have also been made to the time-frequency distribution in time series and two-dimensional images. Despite versatile as the wavelet analysis is, we find that there are some problems with its applications, if these problems can be solved completely, we believe that wavelet transform will enrich its theory, have new substantial content in signal representation & reconstruction direction choosing & optimization and open up some perspective applied areas of wavelet analysis. 1.
2.
3.
The first problem with the most commonly used Morlet wavelet is its leakage generated by the limited length of the basic wavelet function, which makes the quantitative definition of the energy frequency time distribution difficult. The second problem is its counterintuitive. Sometimes, the interpretation of the wavelet can also be counterintuitive. For example, to define a change occurring locally, one must look for the result in the high-frequency range, for the higher the frequency the more localized the basic wavelet will be. If a local event occurs only in the low-frequency range, one will still be forced to look for its effects in the high-frequency range. Such interpretation will be difficult if it is possible at all. The third problem with the difficulty of the wavelet analysis is its non-adaptive nature. Once the basic wavelet is selected, one will have to use it to analyze all the data. Since the most commonly used Morlet wavelet is Fourier based, it also suffers the many shortcomings of Fourier spectral analysis: it can only give a physically meaningful interpretation to linear phenomena; it can resolve the interwave frequency modulation provided the frequency variation is gradual, but it cannot resolve the intrawave frequency modulation because the basic wavelet has a length of 5.5 waves.
In spite of all these problems, wavelet analysis is still the best available nonstationary data analysis method so far, therefore, we will use it in this paper as a reference to establish the validity and the calibration of the Hilbert spectrum.
3
The State-of-the-Art Review
The approach of using dynamic response and vibration characteristics for structure damage identification is the theoretical foundation of instrumental safety inspection methods. It has also been mainstream of research for more than thirty fears. Doebling et al. have reviewed the available literature of this approach. The practical problems associated with this approach have also been reviewed by Farrar and Doebling and Felber. In principle, each structure should have its proper frequency of vibration under dynamic loading. The value of this proper frequency can be computed based on wavelet transform and the elasticity properties of the structure. Sound as this argument is, the instrument inspection has never worked successfully. The reasons are many:
396
1. 2. 3. 4.
Jian Ping Li et al.
First, there is the lack of the precision sensors to measure the detailed dynamic response of the structure under loading. Secondly, there is a lack of existing data of bridges to be used as a reference state. Thirdly, there is a lack of proper data processing methods to process the structural response. Finally, there is a lack of sensitivity of the structure in response to the local damage, because of the large built in safety factor.
A damage up to 50% of the cross-section can only result in a few percentages vibration frequency changes. Such a small frequency shift, when processed with the conventional methods, would be totally lost in the inevitable noise in all the real situation. In the final analysis, many of the difficulties can be alleviated, if the data processing method can be made more versatile to handle highly transient and nonlinear vibration data. The wavelet analysis and wavelet transform theory will provide the good health monitoring for civil infrastructure.
4
Conclusion
Based on our analysis, we can conclude the following: 1.
2.
3.
We can indeed extract the bridge vibration signal from the noisy load test condition. The test condition was more complicated than our numerical model, but it is also realistic. Our analysis has extracted the structurally and dynamically significant components without any difficulties. We have also established that a realistic bridge is sensitive to a live transient load as a test for the structural integrity. The bridge vibrations in different directions at different instant and under a transient load indeed are sensitive enough to serve as indicators for the structural integrity test. We have also determined the structurally weak spot based on the Wavelet Transform (WT), the Empirical Mode Decomposition (EMD), and the Hilbert Spectral Analysis (HSA). Thus established the feasibility of the Nondestructive Instrument Bridge Safety Inspection System.
The data analysis method based on WT, EMD, and HSA is the most unique and at the forefront of the research in the data analysis[5,6]. It utilized not only the nonlinear characteristic of the response to determine the damage, but also the transient properties of the load to determine the damage location. Then, the free vibration frequency can be used to determine the extent of the damage. Considering the low number of the sensors required, and the efficient way of utilizing the data, HHT present a new data analysis alternative for bridge damage identification.
The Application of Wavelet Analysis Method
397
References 1. 2. 3. 4. 5. 6.
Huang, N. E: US Provisional Application Serial Number 60/023,411,August 14 (1966) and Serial No. 60/023,822 filed on August 12 (1996), Patent allowed March 1999 Huang, N. E.Z, Shen and S. R. Long: Adv. Appl. Mech., 32 (1996),59~117 Huang, N. E. et al: Proc. Roy. Soc. Lond., A454, (1998) 903~995 Huang, K: 1998a: US Patent Application Serial No. 09-210693, filed December 14 (1998) Jian Ping Li: Wavelet analysis & signal processing: theory, applications and software implementations, Chongqing Publishing House, Chonqqing (2001) Jian Ping Li, Yuan Yan Tang: The applications of wavelet analysis method, Chonqqing University Publishing House, Chongqing (2001)
! " # $%&'( ')**+*',% -.# $%&'( ',*'+%/,0 # 1 233
334 5 5 $6( 3 1 7 3 .
4 .3 + 14 1+3 7 + 14 1+ 7 7 73 7
7
1 3
!" #$ % & '$ ( ) * +$ ," - $ . . - . $ !/" #$ $ 0 #. $ % . .$ )
* ℘℘ $ $ ℘ !℘ = Φ $ ≠ $
"℘
∀
= $
)!*
Φ . $ . & $ 1 () = [ () ⋅ ()] ∗
+∞
∑ ( −
⋅ℑ
→ −∞
)$
∈℘
() =
ℑ
= () () $ ∈ ℘
∈ ℘
)+*
δ $ $ $ 1 [ ] = ( [ ] ⋅ [ ]) ∗
+∞
∑ [ −
→ −∞
⋅ℑ
]$
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 398-403, 2001. c Springer-Verlag Berlin Heidelberg 2001
Piecewise Periodized Wavelet Transform and Its Realization
399
∈℘ [ ] =
ℑ
= [ ] [ ] ∈ ℘
∈ ℘
),*
- . 1 )2
* 1 ∀ ∈ℜ
$ … $ ( ) =
+∞
∫ () ⋅
⋅
∗
→ −∞
−
)3*
)4
* 1 ∀ ∈ Z $ … $
[ ] = − ⋅
∫ 8 () ⋅ ( ⋅
+∞
→ −∞
−
−
)
⋅ $
)5*
$ )4
* 1 ∀ ∈ Z $ … $ [ ] =
+∞
∑
→ −∞
[
]
[ ] ⋅ ⋅ − ⋅ $
)6*
$ 0 . $ 2
$ 4
4
-- 7 --- - -8$ &$ 8
)4* . . . 3$ 5"$ 4
. . . .$ ! !. & 4
$ $ . . . . . 794 ! ! 2
!
& $ $ % $ 6" #$
$ $ % & '$ 1 ℘
℘
℘
℘
' ⋅π ≥ ≥ 9 $ = ⋅ω ' ⋅ π $ = 9 > ≥ − ⋅ω
' ⋅ π $ = > ⋅ω ' ⋅ π $ = < − ⋅ω
400
Wing-kuen Ling and Peter Kwung-Shun Tam ⋅π ≥ ≥
⋅ω ⋅π > ≥ −
≠
⋅ω
⋅⋅ω ⋅ () = ⋅⋅ω ⋅
∗ () = ⋅ ∀ ∈ℜ ℜ :$$$$
);*
1 : $
(9 ) = ' ⋅ π⋅ ⋅ ω ⋅ ⋅⋅⋅ω ⋅ δ + ⋅ ω : $
(: ) = ' ⋅ π⋅ ⋅ ω ⋅ ⋅⋅⋅ω ⋅ δ + ⋅ ω
( ) = ( ) =
)<*
& '$ . % - $ ℘ ℘ ⋅π ω
⋅π ω $
$ .$ . . . . )-
* .1 - )-2
*1 +∞ ⋅ ⋅ ∫ → −∞
−
∀ ∈ℜ ℜ $ () =
∑
=
+∞
∫
→ −∞
− $ ( ) ⋅
+∞
∫
=
Ψ
→ −∞
+∞ ( ) Ψ ( ) = ∫ ψ ( ) ⋅ − ⋅⋅π⋅ ⋅
)=*
→ −∞
- )-4
*1 ∀ ∈ℜ ℜ $ () =
−
∑
=
(
)
+∞ +∞ − − ⋅ ∑ ∑
⋅ [ ] ⋅ ⋅ − ⋅ $ → −∞ → −∞
() = ( − )
)!/*
- )-4
*1 ∀ ∈ Z $ [] =
−
∑
=
+∞ +∞ [] ⋅ ∑ ∑
[ ] ⋅ ⋅ − ⋅ $ → −∞ → −∞
[
]
)!!*
. . ;"
!" #
- . ' $ $ . <" #$ . 4 1 ∀ ∈ Z $ [] =
+∞
∑ ∑
+∞
→ −∞ → −∞
[ ] ⋅ [ ⋅ − ⋅ ]
)!+*
& 4
$ . $ .
Piecewise Periodized Wavelet Transform and Its Realization
401
#$ $ $ ℑ ℑ ℑ $ $ $ . . . $ 1 ∀ ∈ Z $ … $ [ ] =
+∞
+∞
→ −∞
→ −∞
∑ ∑
[
]
8 [ ] ⋅ ⋅ − ⋅ $
)!,*
. . . . . ℑ $
% & ' #
% . ' ' % ⋅ ⋅ω $ =" #$ 4
. % ℑ $ . #$ % . - $ ℑ % ⋅πω $ $ ℑ ≠ ⋅π⋅ω .$ ' ' % ⋅π⋅⋅ ℑ ⋅ > $ ℑ ⋅π⋅ω % 4
4 $ ' ' % ⋅ ⋅ω . % -8
$
(' & ) ) ) "
7 %$ %$ $ . 4
& ' $ . % % $ . . 4
% - . $ ℘℘ % $ ℘℘ $ $ . 7 % . . % . #$ 4
$
('
!")!
$ $ $ . . - $ . .
) -* $ 4
- % - . . . - ℘℘ . - $
% *) +
7 % ℑ $ & % . . $ % % ℑ % 0 $ % . >
% . 7$ . . . . . - .$ .
!
- $ $ 7 .
$ % $ .$
402
Wing-kuen Ling and Peter Kwung-Shun Tam
% $
$ %$ $ % $
" . . . # ? . @8=6<
! A @1 7 A 41 B -CCC
7 A -$ 8 !!$ D ; )!=<=* 6;36=, + E1 > F A - 2 -CCC -
$ 8 <$ D 3 )!===* 35,36! , 4 - A @1 > 2 F D -CCC - $ 8 <$ D !! )!===* !5/<!5!5 3 8 A # 21 & 01 4 -CCC $ 8 3/$ D = )!==+* ++/;++,+ 5 8 1 A & 0 C 21 #$ !==, 6 4. -1 $ &% 7 -CCC - $ 8 ,6$ D 5 )!==/* =6!!//5 ; D. $ 0 A E 1 D & 01 7 B 4 -CCC $ 8 3!$ D , )!==,* !!!3!!+; < B @1 A 8 72 A -CCC 2 --$ 8 35$ D , )!==<* ,!3,+/ = B @$ 0 4 1 A & 4 -CCC $ 8 3+$ D = )!==3* ++3+++56 !/ - A$ @ 1 & A 0 & & - B - E 2 D D$ -E2DD$ 8 3 )!===* +6=/+6=,
Piecewise Periodized Wavelet Transform and Its Realization []
+∞
∑ δ [ −
→ −∞
+∞
∑ δ [ −
]
⋅ℑ
⋅ℑ
→ −∞
+∞
∑ δ [ −
⋅ℑ
→ −∞
+∞
→ −∞
[]
#
]
! "#$" %$$&! &'&#$"$$ $&(#
δ ( − ⋅ ℑ
)
()
()
#
$( ∑
! "#$" %$$&! &'&#$"$$ $&(#
[]
#
#
[]
]
#
#
! "#$" %$$&! &'&#$"$$ $&(#
+∞
∑
→ −∞
δ ( − ⋅ ℑ
)
↓
↑
↓
#
#
#
↑
↓
#
#
#
$1(
⋅ ⋅⋅
#
⋅⋅ #
)$"$" %&$*
$+$"$ $&(#
#
+∞
∑ δ ( −
⋅ℑ
→ −∞
#
⋅ ⋅
#
#
)
()
$(
)$"$" %&$*
$+$"$
#
⋅ℑ
→ −∞
)
#
+∞
∑ δ ( −
⋅ℑ
→ −∞
#
)
∑ δ ( − #
$(
⋅⋅
#
#
# +∞
→ −∞
#
⋅⋅ +∞
∑ δ ( −
⋅⋅
#
#
$&(#
#
$+$"$
()
↑
)$"$" %&$*
$&(#
403
⋅ℑ
)
$(
#
%) $( ! 5 6 $1( ! 5 + 14 1+ $( ! 5 6 $( ! 5 14 ; $( ! 5 "6
Wavelet Transform and Its Application to Decomposition of Gravity Anomalies Hou Zunze Chinese People's Armed Police Force Academy, Langfang 065000 China
Abstract. The gravity anomalies obtained by survey reflect the inhomogeneity of the lithosphere density. The gravity values are suitable to study the basements and structure of the earth. However, the gravity anomalies include the whole lithosphere and upper mantle, the decomposition of the gravity field is important to study. Based upon the theory of wavelet and multi-scale analysis, we studied the method of decomposition of gravity anomalies and then decomposed the gravity anomalies of China, East China Sea, etc. by using the two dimensional wavelet decomposition technique. Results show that the wavelet multiscale analysis is a powerful tool for decomposition of gravity anomalies. Keywords: Wavelet transform, Multi-scale analysis, Multiple decomposition of gravity field, Bouguer gravity anomalies, Free-air gravity anomalies.
1
Introduction
Gravity anomalies are important fundamental data for studying geological and lithospherical structures. According to the specific geological problem, it is expected to get the corresponding gravity anomalies of the geological bodies[1], then inverse and interpret them and get their densities, forms and depth, etc. It is clear that decomposition of the gravity anomalies is an important stage. To get these anomalies, the survey anomalies need to be decomposed and tendency analysis, analytical continuation, circle average and matching filtering used to be applied, but the effect is bad. The decomposition of the gravity field is a difficult job. The recent development of the wavelet transform increases its application to signal processing, seismic exploration, image analysis, and other geophysical data processing. The wavelet transform introduces the idea of multi-scale analysis, which enjoys desirable properties of localization in both the space and the frequency domains. The wavelet transform can decompose signal f (x ) into several components with different frequencies or scales and focus it into many specified details via dilation and shifting of the wavelet functions[2-3]. Because the wavelet transform is a powerful tool and looked upon as a mathematical microscope, it should be a new effective technique for Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 404-410, 2001. Springer-Verlag Berlin Heidelberg 2001
Wavelet Transform and Its Application to Decomposition of Gravity Anomalies
405
geophysical data analysis. By using the wavelet transform to decompose the gravity anomalies, we can obtain divided anomalies with different scales. Based upon the theory of wavelet and the principle of multi-scale analysis, we develop some new procedures for wavelet multi-scale analysis of gravity anomalies[4] and some algorithms for two-dimensional multi-scale analysis which are then applied to decompose the Bouguer gravity anomalies of China and to decompose the free-air gravity anomalies of the East China Sea. The decomposition produces a series of gravity component maps corresponding to different geological objects with different scales and buried depths.
2
The Method of Wavelet Transform and Multi-scale Analysis[5-9] f ( x) ∈ L2 ( R) , define its wavelet transform as
Let a function be
W f (a, b) =< f ,ψ a ,b >= where ψ ( x ) ∈ L
2
a
∞
∫ f ( x)Ψ (
−∞
x−b )dx , a
(1)
( R) is the wavelet function,
ψ a ,b ( x ) = ψ (x)
1
1 a
ψ(
x−b ) , a, b ∈ R, a ≠ 0 , a
(2)
satisfies ∞
∫ψ ( x)dx = 0 .
(3)
−∞
We call a set of subspaces
{V j } j∈Z and a function ϕ (x) as an orthogonal multi-
scale analysis if the following conditions are satisfied (1) V j ⊂ V j −1 , ∀j ∈ Z ; (2)
∩V j = {0} , j∈Z
−
∪V
j
= L2 ( R) ;
j∈Z
(3)
ϕ ( x) ∈ V0 , and {ϕ ( x − n)}n∈Z
(4)
f ( x) ∈ V j ⇔ f (2 x) ∈ V j −1 .
Based on the conditions (3) and (4),
f ( x) ∈ V0 ⇔ f (2 − j x) ∈ V j ,
is the normalized orthogonal bases of V0;
406
Hou Zunze
−
j 2
{2 ϕ (2 − j x − n)}n∈Z constructs a set of normalized orthogonal bases in the space Vj. The function ϕ (x ) is called the scale function in the multi-scale analysis, and φ (x ) constructed from ϕ (x) is the wavelet function. Let {V j } j∈Z be a given multi-scale analysis, ϕ (x ) and φ (x ) are corresponding there is an equivalent relationship and the function family
scale and wavelet functions, respectively, for a given
J 1 ∈ Z and function
f ( x) ∈ V J1 , there is the decomposition
f ( x) = AJ1 f ( x) = ∑ C J1 ,k ϕ J1 ,k ( x) ,
(4)
Denote
< ϕ J1 ,k , ϕ J1 +1,m >= hk − 2 m ,
(5)
< φ J1 ,k , φ J1 +1,m >= g k − 2 m ,
(6)
for an integer J2>J1, there is
f ( x) = AJ1 f ( x) = AJ1 +1 f ( x) + D J1 +1 f ( x) = AJ1 + 2 f ( x) + D J1 + 2 f ( x) + D J1 +1 f ( x) =......
= AJ 2 f ( x) +
J2
∑D
j
f ( x) ,
(7)
j = J1 +1
where ∞
A j f ( x) =
∑C
j ,m
ϕ j ,m ,
(8)
φ j ,m
(9)
m = −∞
D j f ( x) =
∞
∑d
j ,m
,
m = −∞
On the other hand,
C j ,m =
∞
∑h
k = −∞
k −2m
C j −1,k ,
(10)
Wavelet Transform and Its Application to Decomposition of Gravity Anomalies
d j ,m =
∞
∑g
k = −∞
k −2 m
C j −1,k ,
407
(11)
j=J1+1, J1+2, ……, J2
3
Let
Method of Wavelet Multi-scale Decomposition of Gravity Anomalies
{V } j
j∈Z
be an one-dimensional multi-scale analysis and its scale function is
denoted by φ and wavelet function by ψ. Denote
{ }
2
V j = V j ⊗ V j , then V j
2
j∈Z
form a two-dimensional multi-scale analysis. The 2-D scale function is defined by Φ(x,y)= φ(x)φ(y) ,
(12)
and the 2-D wavelet functions are defined by
Let
Ψ1(x,y)= φ(x)ψ(y) ,
(13)
Ψ2(x,y)= ψ(x)φ(y) ,
(14)
Ψ3(x,y)= ψ(x)ψ(y) ,
(15)
2
f ( x, y ) ∈ V J1 , following the principle of multi-scale analysis, we have 3
f ( x, y ) = AJ1 f ( x, y ) = AJ1 +1 f ( x, y ) + ∑ D J1 +1 f ( x, y ) , ε
(16)
ε =1
where
AJ1 +1 f ( x, y ) =
∑c
m1 , m2 ∈Z
J1 +1, m1 , m 2
∑dε
D ε J1 +1 f ( x, y ) =
Φ J1 +1, m1 ,m 2 ,
(17)
Ψ ε J1 +1,m1 ,m 2 ,
(18)
hk2 − 2m2c J 1,k1,k2 ,
(19)
J1 +1, m1 , m 2
m1 , m2 ∈Z
where
c J 1 + 1,m1,m2 =
d 1 J1 +1,m1 , m2 =
∑h
k1,k2 ∈Z
∑h
k1 , k 2 ∈Z
k1 − 2m1
k1 − 2 m1
g k 2 − 2 m2 c J1 ,k1 ,k 2 ,
(20)
408
Hou Zunze
d 2 J1 +1,m1 , m2 =
∑g
k1 − 2 m1
hk 2 − 2 m2 c J1 ,k1 ,k 2 ,
(21)
∑g
k1 − 2 m1
g k 2 −2 m2 c J1 ,k1 ,k 2 ,
(22)
− k)dx
(23)
k1 , k 2 ∈Z
d 3 J1 +1,m1 ,m2 =
k1 , k 2 ∈Z
where
hk =
1 2
+∞
x
∫ φ(2)φ(x
−∞
g k = (−1)k − 1 h 1 − k
(24)
Equation (5) can be further decomposed to the step of J2-J1 as
f ( x, y ) = AJ 2 f ( x, y ) +
J2
3
Dε ∑∑ ε
j
f ( x, y ) ,
(25)
j = J1 +1 =1
where
A j f ( x, y ) =
∑c
m1 ,m2 ∈Z
D ε j f ( x, y ) =
j , m1 , m 2
∑dε
Φ j ,m1 ,m 2 ,
j , m1 , m 2
Ψ ε j ,m1 ,m 2 ,
(26)
(27)
m1 , m2 ∈Z
j=J1+1,…,J2 . By letting ∆g(x,y)=f(x,y), we have the shorten decomposition expression ∆ g = AJG + D1G + D2G +... + DJG
(28)
Where D1G is denoted by the first order wavelet detail of the gravity anomalies, D2G is denoted by the second order wavelet detail of the gravity anomalies, and DJG the J-th order wavelet detail, AJG is denoted by the J-th order approximation of the gravity anomalies.
4
Application of the Wavelet Multi-scale Decomposition[10-12]
The method mentioned above is applied to decomposition of the Bouguer gravity field of China and the free-air gravity anomalies of the East China Sea. The data of Bouguer gravity field of China are picked from the Bouguer gravity map of China with scale 1:4,000,000, compiled by the Institute of Geophysical and Geochemical Exploration of Ministry of Geology and Mineral Resources. This map shows the latest regional gravity measurements using a grid of 40×40km. The first order wavelet detail of the gravity mainly reflects the density inhomogeneity of the
Wavelet Transform and Its Application to Decomposition of Gravity Anomalies
409
upper crust. From this map, one can see the difference of the upper crust between the eastern and the western parts, with the boundary from Helan mountain to Qionglai mountain. The western part shows string inhomogeneity of density striking west to east, while in the eastern part the inhomogeneity is weak and disperse. The second wavelet detail reflects density inhomogeneity of both the upper and the middle crust and so looks similar to the first order. One can see the differences of density between Yangtze and Huanan terrains and between Northeast and Huabei terrains. The third wavelet detail of the gravity anomalies mainly reflects density variation in the lower crust. The fourth order wavelet detail of the gravity anomalies mainly reflects density of the uppermost mantle in the eastern part of China. The fourth order wavelet approximation of the Chinese gravity field shows the trend of Moho fluctuation and density variation of largest scales. The gravity anomalies of the East China Sea covers 694 thousand square kilometers, spanning the East China shelf basin, Tiaoyu I. Folded doming-up belt, Okinawa trough basin, Ryukyu folded doming-up area and Ryukyu trench etc from west to east. According to previous studies, Moho surface rises step by from 28km to 16km from west to east. The East China Sea shelf basin consists of a lot of sags, Xihu sag famed in the world is one of them. For such a large and complex area, we apply the wavelet multi-scale analysis to decompose the free-air gravity anomalies. The first order transform detail shows that small circles with diameter of 10km or so are distributed a lot in the shelf or its west, the field value changes between (-5~10) ×10-5m/s2. Second order transform detail shows that the number of the contour circle is less than first order transform detail and the size is large than it, the field value changes between (-5~5) ×10-5m/s2. The scope of gravity anomalies of first and second order is small, it is small scale-gravity anomalies. According to the theory of analysis and result, they shall be related to the inhomogeneity of rock density in shallow stratum and some survey errors. The third order transform detail shows that the range of contour circle is a few hundred square kilometers, high contour value is 6×10-5m/s2 and low is -4×10-5m/s2 in the shelf and its west, it is supposed to be related to sediment thickness compared with the sediment of seismic interpretation. The sediment thickness in high value area is getting thin, that of low value is getting thick, the lowest value presents the center of sedimentary. In the middle part of the fourth order detail map, a high value belt tends towards north-northeast, the contour of (6~10) ×10-5m/s2 are distributed like a string of beads in the belt. The form and range of the whole high value area is in accord with Tiaoyu I. folded doming-up belt. In the west of high value belt is East China Sea shelf basin, its field is (-6~6) ×10-5m/s2 and the area of high, low value circles is up to 1800km2, the form and range of -2×105 m/s2contour is consistent with the seismic sedimentary form and range. To the east of high value belt and parallel with it, there is a low value belt, the contours of (-6~-8) ×10-5m/s2 are distributed like a string of beads. The form and range of the whole low value area consistent with Okinawa trough basin. Viewing the whole map, high value reflects lifts and low value reflects sags, the relations are marked in the map. As described above, the fourth order transform detail mainly reflects the lift of the sedimentary basement in East China Sea and adjacent regions. The fourth order wavelet approximation reflects the Moho surface in the area.
410
5
Hou Zunze
Conclusion
As a new mathematical tool, the wavelet transform enjoys many properties that other conventional mathematical methods cannot have. Wavelet transform is a powerful tool for multiple decomposition of gravity field. Wavelet multi-scale analysis technique has been successful in gravity anomalies decomposition of China and the East China Sea, etc.
References 1.
Hou Zunze: Calculation of gravity anomalies for multi-layer density interface. Computing Techniques for Geophysical and Geochenical Exploration (in Chinese). 10 (1988) 129-132 2. Li Shixiong and Liu Jiaqi: Wavelet Transform and Foundation of Math (in Chinese). Beijing, Geology Press (1994) 3. Liu Guizhong and Di Shuangliang: Wavelet Analysis and Its Application (in Chinese). Xi'an, Xi'an Electronics University Press (1992) 4. Hou Zunze and Yang Wencai: An operational research on the wavelet analysis. Computing Techniques for Geophysical and Geochenical Exploration (in Chinese), 17 (1995) 1-9 5. Daubechies, I.: The wavelet transform, time-frequency localization and signal analysis. IEEE TRANS. On Information Theory, 36 (1990)961-1006 6. Daubechies, I.: Orthonormal bases of compactly supported wavelets. Communications on Pure and Applied Math. XII (1988) 909-996 7. Daubechies, I.: Ten lectures on wavelets. Society for Industrial and Applied Math., Philadelphia, Pennsylvania (1992) 8. Mallat, S. And W. L. Hwang: Singularity detection and processing with wavelets. IEEE TRANS. On Information Theory, 38 (1992) 617-643 9. Mallat, S.: Multifrequency channel decompositions of image sand wavelet models. IEEE TRANS. On Acoustics, Speech and Signal Processing, 37 (1989) 2091-2110 10. Hou Zunze and Yang Wencai: Two-dimensional wavelet transform and multiscale analysis of the gravity field of China. Chinese J. Geophysics (in Chinese), 40 (1997) 85-95 11. Hou Zunze and Yang Wencai: Decomposition of crustal gravity anomalies in China by wavelet transform. 30th International Geological Congress. Beijing, China (1996) 12. Hou Zunze, Yang Wencai and Liu Jiaqi: Multi-scale inversion of density distribution of the Chinese crust. Chinese J. Geophysics (in Chinese), 41 (1998) 642-651
Computations of Inverse Problem by Using Wavelet in Multi-layer Soil Wu Boying1, Liu Shaohui1, and Deng Zhongxing2 1
Mathematics Department of Harbin Institute of Technology 150001, Harbin, the People’s Republic of China [email protected] 2 College of Applied Science Harbin University of Science and Technology 150080, Harbin, the People’s Republic of China
Abstract. In this paper we study the usage of wavelet in inverse problem multiplayer soil.We put forward a function and prove it is a wavelet function. Then we do theory analysis in detail about the application in computing soil parameters. At the same time, we do numerical experiments with two and three levels soil structure. The results indicate the valid of method
1
Introduction
Along with the development of electric power system capacity, the value of failure current flowing into ground has increased greatly. So grounding system is very important to ensure device and workmen safe. In design of substation grounding system, estimation of many main parameters is relevant closely to soil structure. In earlier years, designation of grounding system is based on considering soil as mean medium and simplified formulas, but it is impractical. Subsequently along with the development of computer technology F.P.Dawalibi[1]-[2]and Takehiko Takahashi[3] studied deeply the multi-layer soil. F.P.Dawalibi’s model paid more attention to the application of mathematics methods and accurate of calculation than physical sense. TakehikoTakahashi utilized the concept of templet in geognosy. Templets are finite and grounding parameters changes according to location, hence it can obtain the approximately parameters. In general, research on multiplayer soil is based constant current field theory. When current flows soil, each point satisfies the Laplace equation. Solving the equation, we can get the representation of electronic potential, and then getting the representation of apparent resistance ρ (r ) . If we expand it by Taylor expand method, then getting the series representation. Furthermore, we can obtain the parameters of multi-layer soil structure by observational data and least square method. But that method has drawbacks, such as the complexity of representation of ρ (r ) , the convergence of series and so on. Author improved on that method in [4]. He made use of Simpson formula in calculating representation of ρ (r ) , and transforming the representation of ρ (r ) in computation of parameters in Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 411-417, 2001. Springer-Verlag Berlin Heidelberg 2001
412
Wu Boying et al.
multi-layer soil. The apparent resistivity ρ (r ) is relative to kernel function in analysis of soil structure as following fashion:
B (λ ) [3]
1 2
+∞
ρ (r ) = ρ 1 + 4 ρ1 r ∫ ( B(λ ) − B( 12 λ )) J 0 (rλ )dλ 0
(1)
The rest of this paper is organized as follows. First, we will clarify the terminology used for wavelet analysis. Secondly, we introduce a function and prove it be a wavelet. Next, we apply the wavelet in multi-layer soil structure. Finally, we will do experiments to verify our method feasibility.
2
The Proof of Wavelet
Definition 2.1 Let
ψ ∈ L2 ∩ L1
and
ψ ( 0) = 0 ,
then defining the set of function
{ψ a ,b } as −
1
ψ a ,b ( x ) = a 2 ψ (
x−b ), b ∈ R, a ∈ R − {0} a
We call them continuous wavelets generated by wavelet wavelet satisfying ψˆ (0)
ψ . Sometimes ones call
= 0 as base wavelet.
Definition 2.2 Let ψ be a base wavelet, 2.1. For
(2)
ψ a,b
is the continuous wavelet in definition
2
f ∈ L , wavelet transformation the signal or function f is defined as Wf (a, b) =< f ,ψ a ,b >= a
−
1 2
∫
R
For the sake of existence inverse transformation, admissibility condition, namely
f ( x)ψ (
x−b )dx a
ψ ∈ L2 ∩ L1
(3)
must be agree with
2
ψˆ (ω ) Cψ = 2π ∫ dω < ∞ R ω then
ψ is
(4)
admissible. According reference [6], one hand if admissibility condition
ψˆ (ω ) = 0 also holds, on the other hand, ψ (ω ) ≤ C (1 + ω ) −1−α , then ψ must be admissible.
holds, then
Definition 2.3 Suppose
if
ψˆ (ω ) = 0
f ( x) ∈ L2 [0,+∞) , wavelet ψ is defined as
holds, and
Computations of Inverse Problem by Using Wavelet in Multi-layer Soil
413
xe − x cos( x), when x ≥ 0 0, when x < 0
(5)
ψ ( x) = Definition 2.4 Defining
{ψ a ,b } as −
1 2
ψ a ,b ( x ) = a ψ (
x−b ), b ∈ R, a ∈ [0,+∞) a
(6)
where ψ is defined in Definition 2.3. Our next work is to prove ψ be a wavelet. It is obvious that ψˆ (0) = 0 .That we prove ψ be a wavelet is equivalent to prove ψ agree with admissibility condition by term of discussion above. Propersition 2.1 Existing constan C and α > 0 make ψ ( x ) satisfy
ψ ( x) ≤ C (1 + x ) −1−α Definition 2.5 If function f { f } ⊆ [0,+∞), {ψ a ,b } is defined by (6)
Wf (a, b) =< f ,ψ a ,b >= a
−
1 2
∫
R
f ( x)ψ (
Theorem 2.1 According definition above, for all +∞
+∞
0
−∞
∫ ∫
Wf (a, b)Wg (a, b)
x−b )dx ,where α > 0 a
(7)
f , g ∈ L2 [0,+∞) We have
da db = Cψ < f , g > a2
(8)
Meantime, have inverse formula
f ( x) =
where
Cψ = 2π ∫
+∞
0
1 Cψ
+∞
+∞
0
−∞
∫ ∫
Wf (a, b)ψ a ,b ( x)
dadb a2
2
ψˆ (ω ) dω ω
Note1 we evaluate the value of fourier transform of wavelet and constant well known know
∫
+∞
0
(9)
xe − ax cos(bx)dx =
2
2
Cψ .It is
a −b (a > 0) . So ones are easy to (a 2 + b 2 ) 2
414
Wu Boying et al.
1
ψˆ (ω ) =
(ω 10 + 4ω 8 + 8ω 6 + 32ω 4 + 16ω 2 + 64) 2 ×ω (ω 4 + 4) 2 2π 1
ω ω2 +4 = 4 2π (ω + 4) 1
Note that
1 (π + 1) . 16
Cψ =
2 −
α
γ
Theorem 2.2 If ψ satisfies C1: ψˆ (ω ) ≤ C ω (1 + ω ) 2 , α > 0, γ > α + 1 , [7]
where
C
∑ ψˆ (2
−k
is
a
constant,
and
if,
2
for
ω ) ≥ α > 0 , then there must be b0
all
ω ≠ 0 ,ψˆ satisfies
C2:
making ψ k ,n ( x ) constitute frame
k =Z
2
of L [0,+∞ ) . So In order to prove wavelet defined in this paper constituting frame , we only need to verify the condition C1.First of all, the condition C2 is obvious. Next ,we verify condition C1. It is easy to show
there
exist
ω ω2 + 4 know ψˆ (ω ) = ,so only to 4 2π ω + 4 constant C > 0, α > 0 and γ > α + 1 and 1
γ
1 ω ω2 +4 α 2 − ≤ C ω (1 + ω ) 2 holds. And this is obvious Those 4 2π ω + 4 constants exist indeed, for example, C = 1, α = 1, γ = 2.1 > α + 1 . The last issue is to compute the dual wavelet in practical application. We adopt the method as Daubechies’s[6]. For convenience of discussion, we only select the first term in approximation. We have calculated the frames, and the specific values are 0.359 and 0.375.
3
Application in Multi-layer Soil
In introduction , we can transform the formulas into evaluating integral of
∫
∞
0
ρ (r ) J 0 (λr )dr .
Now, we evaluate this integration by the wavelet in
Computations of Inverse Problem by Using Wavelet in Multi-layer Soil
section
∫
∞
0
ρ (r ) J 0 (λr )dr . ρ (r ) =
section 2.Suppose
Now, we evaluate this integration by the wavelet in
∑ρ
m , n∈Z
Theorem 3.1 The definitions of
∫
∞
0
=
ψ mn (r ) ,then we have:
mn
ρ ,ψ
as above, then :
ρ (r ) J 0 (λr )dr
∑ ρ mn 2
m −1 2
m , n∈Z
+ e n (1−i ) −
415
∑
[(1 − i)
ρ mn 2
n (1+i ) 1+ i e (1 + i ) 2 + (2 m λz ) 2
[
1− i 2
m −1 2
+ ( 2 m λz ) 2
∫
m
2 n
0
m , n∈Z n ≥0
]
3
2
− ne n (1−i )
]
3
2
[(1 − i)
− ne n (1+i )
[(1 + i)
1 2
+ (2 m λ ) 2
]
1
1 2
+ (2 m λ ) 2
2
ψ mn (r ) J 0 (λr )dr
holds. Proof: Because the space is limited, We only explain the main thought.
∫
∞
0
∞
ρ (r )J 0 (λr )dr = ∑ ρ mn ∫ ψ mn (r ) J 0 (λr )dr 0
mn
We devide the right term of this formulation into two parts, then computing them respectively. In computational process, the key is to compute
∫
∞
0
ψ mn (r ) J 0 (λr )dr .
Subsequently, we compute it. According to the lipschitz quadrature formula of Bessel function, namely
∫
∞
0
∫
∞
0
e −ar J 0 (λr )dr =
re −ar J 0 (λr )dr =
a ( a 2 + λ2 )
3 2
1
a + λ2 2
, we can obtain
. Then we use it and get the theorem 3.1.
Note 2. In the theorem although including complex numbers, it is obvious that the complex is conjecture ,therefore, the resulting is a real number. Note 3. Because we know only values at those discrete points, and its interval is [0,200]. This is a bounded function by expertise knowledge, but its value is large. And wavelet diminishes at infinite together with applied goal, we think instead of thinking, namely,transforming the interval [0,200] into interval [0,2] makes the approximation more exact.
416
4
Wu Boying et al.
Numerical Experiment and Conclusion
In order to verify our method , we do experiments with three-layer soil structure. The soil’s structure is three layers, we select h1 = 5m, h2 = 15m,
ρ1 = 1000Ωm, ρ 2 = 2000Ωm, ρ 3 = 1000Ωm,
the Figure 2,3and 4 show the
numerical results, unit is 100 Ωm . In this paper we construct a function and prove it be wavelet. Then we apply it in inverse problem of soil structure and make theory analysis in details. In the last, we do experiments to verify our method, the results indicate our method is feasible and valid. The design of substation grounding system is based on the analysis of soil structure, and it’s the degree of analysis is the key issue. But wavelet’s applications in this research area are very rare. So there are many open issues to study and research, we hope our work can advance this aspect research work.
Fig. 1. The apparent resistivity in three levels
Fig. 2. The numerical solution
Fig. 3. The relative error of three levels
References 1. 2. 3.
F. Dawalibi, C. J. Blattner. Earth Resistivity Measurement interpretation techniques. IEEE T-PAS.103(1984) 374-384 F. Dawalibi, N. Barbeito. Measurement and Computations of the performance of grounding system buried in multiplayer soil. IEEE Transactions on power Delivery. 6(1991) 1483-1490 T. Takahashi, T. Kawase. Analysis of apparent resistivity in a multi-layer earth structure. IEEE. T-PWRD. 5(1990) 604-612
Computations of Inverse Problem by Using Wavelet in Multi-layer Soil
4. 5. 6. 7.
417
Jiang Gao, the inverse problem of multi-layer soil structure, dissertation of master’s degree in Peking University 2000 Li Zhongxin, simulative computation of substation grounding based on complex image method. Dissertation of doctor’s degree in Tsinghua University, 1999 1-24 I. Daubechies. Ten lectures on wavelets. SIAM, 1992 53-107 C. K. Chui. An introduction to wavelets, Academic press, 1992:86-98
Wavelets Approach in Choosing Adaptive Regularization Parameter Feng Lu, Zhaoxia Yang, and Yuesheng Li Department of Scientific Computing and Computer Applications Zhongshan University, Guangzhou 510275, P. R. China
Abstract. In noise removal by the approach of regularization, the regularization parameter is global. Constructing the variational model min f − g2L2 (R) + αR(g),g is in some wavelets space. Through the g
wavelets pyramidal decompose and the different time-frequency properties between noise and signal, the regularization parameter is adaptively chosen, the different parameter is chosen in different level for adaptively noise removal. Keywords: Sobolev space, wavelet, noise, adaptive.
1
Wavelets and Discrete Equivalent Norm of Sobolev Space
The model of noisy image is: f = f0 + η
(1)
where f0 is original clean image,η is Guassian noise. Our task is to restore the original image f0 as possible. The regularization approach is always adopted to solve these problems, we consider the variational problems of the form: min f − g2L2 (R2 ) + αR(g) g
(2)
where g ∈ X; X ⊂ L2 (R2 ) X can be chosen as Sobolev space, Besov space ,Lipschitz space and so on, the sobolev space is chosen as X in this paper. α is regularization parameter that determines the trade-off between goodness the fit to the measured data, and the amount of regularization done to the measured image. In (2), the parameter is global, that the regularization parameter is the same number everywhere. In reference [4], the regularization parameter is chosen as a changeable number with the different gradient in some image. In [2] and [3], to choose the proper parameter, the Besov spaces of minimal smoothness can be embedded in L2 (R), and can get the discrete wavelets equivalent norm.
This work is supported by Natural Science Foundation of Guangdong (9902275), Foundation of Zhongshan University Advanced Research Centre.
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 418–423, 2001. c Springer-Verlag Berlin Heidelberg 2001
Wavelets Approach in Choosing Adaptive Regularization Parameter
419
we can easily construct the two dimensional wavelets from one dimensional wavelets ψ and scale function φ by setting for x := (x1 , x2 ) ∈ R, ψ (1) (x1 , x2 ) := ψ(x1 )φ(x2 ); ψ (2) (x1 , x2 ) := φ(x1 )ψ(x2 ); ψ (3) (x1 , x2 ) := ψ(x1 )ψ(x2 ); If we let Ψ := {ψ (1) , ψ (2) , ψ (3) }, then the set of functions ψj,k (x) := 2k ψ(2k x − j)ψ∈Ψ,k∈Z,j∈Z 2 forms an orthonormal basis for L2 (R2 ), that is, for every f ∈ L2 (R2 ), there are coefficients cj,k,ψ := R2 f (x)ψj,k (x)dx such that f= cj,k,ψ ψj,k j∈Z 2 ,k∈Z,ψ∈Ψ
f 2L2(R2 ) =
c2i,k,ψ
(3)
j∈Z 2 ,k∈Z,ψ∈Ψ
In reference [2], the discrete equivalent norm of Sobolev Space is: 22βk |cj,k,ψ |2 f 2W β (L2 (R2 )) ≈
(4)
k≥0 j∈Z 2 ψ∈Ψ
where β is the smoothness order of the Sobolev Space. It is an excellent property that a Space Norm can be expressed by the discrete sequence, especially the wavelets coefficient sequence, it makes many problems easier largely.
2
Variational Model and Its Wavelets Solution
From previous work of regularization approach, we can choose the model as follow: (5) min{f − g2L2 (R) + αg2W 2 (L2 (D)) } g
2
where α > 0,W SobolevSpace with two-order smoothness. (L2 (D) represents Let: f = j,k,ψ cj,k,ψ Ψj,k , g = j,k,ψ dj,k,ψ Ψj,k , From (4),(5) can be expanded as: (|cj,k,ψ − dj,k,ψ |2 + α · 24k |dj,k,ψ |2 )) (6) j,k,ψ
In reference [6],Donoho points out that for the spectrum analysis of a noisy real image, the spectrum corresponding with the noise is quite small, while the spectrum corresponding with the original image is quite large. (See Figure 1) It means that the ”energy” of the noisy image is always ”concentrate” on the original image. Because of the wavelets’ better property of Locality in both time and frequency domain, the wavelets can concentrate the energy, that is, in wavelets transform domain, the energy of original image concentrate on some highlight
420
Feng Lu et al.
lines, while almost zeros else where. But for the noise, it is quite different. The wavelets coefficients corresponding with noise is always small, even almost zeros, in every level in wavelets transform domain, and its distribution is quite uniform in all levels. So it is a new way to choose the regularization parameter not as a constant, but changeable with the wavelets coefficients.
Fig. 1. Left:Original Image,
Right:Wavelets Coefficients
We can construct the new variational model with changeable parameter: (|cj,k,ψ − dj,k,ψ |2 + α(cj,k,ψ ) · 24k |dj,k,ψ |2 )) (7) j,k,ψ
where α(t) > 0,t ∈ {cj,k,ψ } is the wavelets coefficient, W 2 (L2 (D) represents SobolevSpace with two-order smoothness. Here, the regularization is not a constant, but a changeable variable with wavelets coefficients. From this model, we can handle different level with wavelets decomposition with different regularization, when the wavelets coefficient is large, choosing the regularization parameter small for containing more original image , when the wavelets coefficient is small, choosing the parameter large for removing the noise much. So, we can get the regularization image adaptively which containing the information of original image more and removing the noise as well. Hence, two conditions must be satisfied for choosing regularization parameter: (1) lim α(t) = 0 t→∞
(2) lim α(t) = 1 t→0
Wavelets Approach in Choosing Adaptive Regularization Parameter
421
In practice, because the wavelets coefficients corresponding with the noise is quite small, we choose function α(t) with decaying rapidly. For example: α(t) := e−t , α(t) := 2
Fig. 2. Left:α(t) := e−t
2
1 (1 + t2 )
Right:α(t) :=
1 (1+t2 )
In reference [7], the formula of window size of decaying function is: ∞ 1 { x2 |α(x)|2 dx}
α := w2 −∞ Let α(t; m, s) := mα( st ), to meet the practices, we can change the Support Set and Amplitude through choosing the proper m, s. For every j, k, ψ, each term of (7) |cj,k,ψ − dj,k,ψ |2 + α(cj,k,ψ ) · 24k |dj,k,ψ |2 ≥ 0
(8)
Hence, one minimizes (7) just by minimizing separately over dj,k,ψ : |cj,k,ψ − dj,k,ψ |2 + α(cj,k,ψ ) · 24k |dj,k,ψ |2 for each j, k and ψ. Let:s := cj,k,ψ , v := dj,k,ψ ,and supposing v ≤ s, (8) can be reduced to: F (v) := |s − v|2 + α(s) · 24k v 2
(9)
Calculating the derivation of F (v) for v, we can get the minimizer of (9): v=
s 1 + α(s) · 24k
(10)
422
Feng Lu et al.
After calculating (10) for all the wavelets coefficients of all levels, we can get the new wavelets coefficients from regularization processing. Hence, we can get the restored image by wavelets reconstruction.
3
Experiments
An image Bird.bmp is adopted in the experiment, we choose the Haar wavelets 2 and α(t; m, s) := mα( st ) = me−(t/s) .
Fig. 3. Left:Original Image of Bird.bmp, white noise, variance δ 2 = 18
Right:Nosiy image with Gaussian
Fig. 4. Left:Restored image with removing two first level of wavelets coefficients, Right:Restored image with adaptive approach, where m = 0.8; s = 10
Wavelets Approach in Choosing Adaptive Regularization Parameter
4
423
Conclusion
Using the approach of adaptive changeable regularization parameter in image restoration, it is more flexible to choose the model. We can choose the spaces with more smoothness order which have powerful ability in noise removal, at the same time, choosing changeable regularization function to containing more details and removing more noise.
References 1. A. N. Tikhonov and Vasiliy Y. Arsenin Solution of ill-posed problems, V. H. Winston & Sons Press, 1997; 2. R. A. Devore Fast wavelet techniques for near-optimal image processing, IEEE Military Communications Conference Record, 1992, P1129-1135; 418, 419 3. R. A. Devore, Image compression through wavelet transform coding, IEEE Transactions on Information Theory, vol. 38, 1992, P719-746; 418 4. Adaptive regularized constrained least squares image restoration, IEEE trans. on Image Processing, 1999, P1191-1203; 418 5. I. Daubechies, Ten lecture on wavelets CBMSNSF Series in Applied Math #61, SIAM, Pub1., Philudelphia, 1992; 6. Donoho D. L. De-noising by soft-thresholding, IEEE Trans. on Information Theory, 1993, 41(3); 419 7. Chui C. K. An Introuduction to wavelets, Xi’an Jiaotong Univ. Press, 1994. (in chinese) 421
DNA Sequences Classification Based on Wavelet Packet Analysis* Jing Zhao1, Xiu Wen Yang1, Jian Ping Li1, and Yuan Yan Tang2 1
International Centre for Wavelet Analysis and Its Applications, Logistical Engineering University, Chongqing 400016, P. R. China [email protected] 2 Department of Computer Science, Hong Kong Baptist University, Hong Kong [email protected]
Abstract. The classification of two types of DNA sequences is studied in this paper. 20 sample artificial DNA sequences whose types have been known are given to recognize the types of other DNA sequences. Wavelet packet analysis is used to extract the features of the sample DNA sequences.
1
Introduction
Each DNA sequence is a permutation of 4 codes: a, t, c and g. Studying the structure characters of DNA sequences is one of the most important problems in Bioinformatics. In this paper, the classification of two types of DNA sequence which are Exon and Intron, is studied by means of wavelet packet analysis. We have 20 artificial DNA sequence samples whose types have been known, in which No.1-10 are Exons(type A) and No. 11-20 are Introns(type B). All of the lengths of these 20 samples are about 110. Wavelet packet analysis is used to extract the features of the sample DNA sequences and to recognize the types of other DNA sequences.
2
Changing DNA Sequence to Number Sequence
In order to study DNA sequence with wavelet packet decomposition, we make every code of one DNA sequence correspond to one number as following:
*
This work was supported by the National Natural Science Foundation of China under the grand number 69903012 and 69682011.
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 424-429, 2001. Springer-Verlag Berlin Heidelberg 2001
DNA Sequences Classification Based on Wavelet Packet Analysis
0 . 25 0 .5 yi = 0 . 75 1
425
xi ='a' xi =' g ' xi = 'c'
(1)
xi = 't'
where xi is the i-th code of the DNA sequence. In this way, one DNA sequence x is changed to a number sequence y. And the number sequence y can been seen as a onedimensional signal.
3
Performing Wavelet Packet Decomposition
The wavelet packet analysis is a generalization of wavelet decomposition that offers a richer range of possibilities for signal analysis. In wavelet packet analysis, the details as well as the approximations can be split. It is easy to generate wavelet packets by using an orthogonal wavelet. We start with the two filters of length 2N,denoted h(n) and g(n), corresponding to the wavelet. They are respectively the reversed versions of the low-pass decomposition filter and the high-pass decomposition filter divided by 2 . Now we define the sequence of wavelet packets Wn(x) (n=0,1,2,…) by: 2 N −1
W2 n ( x) = 2 ∑ h(k )Wn (2 x − k ) k =0
2 N −1
W2 n +1 ( x) = 2 ∑ g (k )Wn (2 x − k )
(2)
k =0
where
W0 ( x) = φ ( x) is the scaling function and W1 ( x) = ψ ( x) is the wavelet
function. Here, for the corresponding number sequence y of each sample DNA sequence x, we compute its wavelet packet decomposition for the original Daubechies3 wavelet at level 3. Because the sampling number 110 of sequence y is small, we increase its sampling number to 10 times as the original sampling number by computing linear interpolation by every 0.1 before performing wavelet packet decomposition.
4
Reconstructing Wavelet Packet Coefficients
Now we compute the reconstruct signals of the wavelet packet coefficients we got by performing wavelet packet decomposition.
426
Jing Zhao et al.
y37
y36
y35 y34
y33 y32 y31
y30
y
1 0.5 0 20 1 0 0.50 0 -0.5 0.10 0 -0.1 0.10 0 -0.1 0.050 0 -0.05 0.050 0 -0.05 0.10 0 -0.1 0.050 0 -0.05 0
200
400
600
800
1000
1200
200
400
600
800
1000
1200
200
400
600
800
1000
1200
200
400
600
800
1000
1200
200
400
600
800
1000
1200
200
400
600
800
1000
1200
200
400
600
800
1000
1200
200
400
600
800
1000
1200
200
400
600
800
1000
1200
Fig. 1. The number sequence y and its reconstruct signals of the wavelet packet coefficients
For example, the corresponding number sequence y of DNA sequence x='aggcacggaaaaacgggaataacggaggaggacttggcacggcattacacggaggacgaggtaaaggaggcttg tctacggccggaagtgaagggggatatgaccgcttgg' and its reconstruct signals of the wavelet packet coefficients are shown in Figure 1, where y30, y31, y32, y33, y34, y35, y36, y37 respectively represents the reconstruct signal of AAA3, DAA3, ADA3, DDA3, AAD3, DAD3, ADD3, DDD3.
5
Computing the Total Energy of Every Reconstruct Signal
The corresponding total energy of signal y3j(j=0,1,2,…, 7) is as following: 2
n
E3 j = ∫ y3 j (t ) dt = ∑ y jk
2
j=0,1,2,… , 7
(3)
k =1
where
y jk ( j=0,1,2,…, 7, k=1,2, …, n) represent the numerical value of the
reconstruct signal y3j at discrete points. The total energies of the corresponding number sequences of the 20 sample DNA sequences are shown in Table 1.
DNA Sequences Classification Based on Wavelet Packet Analysis
427
Table 1. The total energies of the corresponding number sequences of the 20 sample DNA sequences
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
E30 19.0235 0.1638 19.4844 0.2220 18.0249 0.1842 20.6853 0.1878 19.4273 0.2244 18.2110 0.1364 19.1787 0.1594 20.1857 0.1504 20.3459 0.1328 20.4854 0.2717 24.1353 0.2135 24.3336 0.1871 25.4134 0.1970 24.7997 0.2023 26.7044 0.1987 23.7204 0.2103 21.5479 0.2070 24.9260 0.2133 26.5463 0.2083 26.8730 0.1912
E31 1.3277
E32 0.2574
E33 E34 E35 E36 0.0913 0.1065 0.3539
E37 0.1683
1.2462
0.3148
0.4194
0.1206
0.0982
0.1786
1.3403
0.2709
0.3714
0.1009
0.0982
0.1640
1.6786
0.2926
0.4110
0.1032
0.1220
0.1878
1.1339
0.3232
0.4199
0.1219
0.1056
0.1890
1.1456
0.2268
0.3093
0.0733
0.0923
0.1364
1.3396
0.2692
0.3605
0.0903
0.1102
0.1717
1.2496
0.2498
0.3361
0.0861
0.1046
0.1641
0.9324
0.2327
0.3030
0.0715
0.0968
0.1461
1.0851
0.3876
0.5042
0.1482
0.1314
0.2016
1.7968
0.3697
0.5483
0.1481
0.2076
0.2174
1.4035
0.2676
0.4055
0.1032
0.1164
0.1726
1.4337
0.3379
0.4696
0.1231
0.1771
0.1975
1.5487
0.3036
0.4183
0.1082
0.1234
0.1850
1.4602
0.2979
0.4144
0.1058
0.1242
0.1789
1.7926
0.3871
0.5460
0.1506
0.2172
0.2256
1.6151
0.3338
0.4828
0.1291
0.1733
0.1990
1.8311
0.3747
0.5506
0.1501
0.2079
0.2200
1.6933
0.3057
0.4361
0.1165
0.1267
0.1939
1.5491
0.3545
0.5023
0.1419
0.2078
0.2034
428
6
Jing Zhao et al.
Extracting Features of the Sample DNA Sequences
The sample DNA sequences 1-10 are belong to type A and 11-20 are belong to type B. From Table 1 we can see the features of the sample DNA sequences as following: 1. 2.
Energy E30 contains the main energy of the corresponding number signal of the DNA sequence. Energy E30 of type A and type B has an outstanding difference.
For type A, the mean of E30 is 19.5052 and the maximum number is 20.6853. For type B, the mean of E30 is 24.9000 and the minimum number is 21.5479. So E30 of type A is obviously smaller than that of type B. Let AEmax represent the maximum number of E30 of the sample DNA sequence of type A, BEmin represent the minimum number of E30 of the sample DNA sequence of type B, YE30 represent E30 of the corresponding number signal of a DNA sequence X whose type is unknown. From above discussion, we get the classification regulation: X belongs to type A, if YE30 ≤ AEmax; X belongs to type B, if YE30 ≥ BEmin; X belongs to type A, if AEmax ≤ YE30 ≤ BEmin, and YE30-AEmax ≤ BEmin-YE30; X belongs to type B, if AEmax ≤ YE30 ≤ BEmin, and YE30-AEmax ≥ BEmin-YE30.
7
Experiments
Here we have another 20 artificial DNA sequences and 182 natural DNA sequences whose types have been known. Now we try to recognize the types of these DNA sequences using the given 20 sample DNA sequences and the classification regulation. 7.1 Classification of 20 Artificial DNA Sequences The lengths of the 20 given artificial DNA sequences whose serial number are from 21 to 40 are about 110, almost the same with those of the 20 sample DNA sequences. So, as the same with the 20 sample DNA sequences, for the corresponding number signal of each DNA sequences, we increase its sampling number to 10 times as the original sampling number by computing linear interpolation by every 0.1 before performing wavelet packet decomposition. Here, because AEmax =20.6853 and BEmin =21.5479, we could recognize the types of the 20 artificial DNA sequences. In the 20 DNA sequences, only one has been recognized as a wrong type. So the successful rate of the classification regulation for artificial DNA sequences, whose lengths are about the same with those of the sample DNA sequences, is 95%.
DNA Sequences Classification Based on Wavelet Packet Analysis
429
7.2 Classification of 182 Natural DNA Sequences The lengths of the 182 given natural DNA sequences are from 1061 to 21246. So these natural DNA sequences are much longer than the sample DNA sequences. In order to compare their energies, we must let them have about the same lengths. So, in order to recognize the type of each natural DNA sequences, at first we must do linear interpolation for the corresponding number signal of every sample DNA sequence so that its length become the same with the natural DNA sequence. Secondly, we perform wavelet packet decomposition for this natural DNA sequences and all of the sample DNA sequences to get AEmax, BEmin and E30. At last, we recognize the type of this natural DNA sequence by the classification regulation. For all of the given 182 natural DNA sequences, 47 DNA sequences are recognized as wrong types, 135 DNA sequences are recognized as right types. The successful rate is 74%. In order to find the reason that the successful rate for natural DNA sequences is lower than that of artificial DNA sequences, we analyzed the results of 182 natural DNA sequences. We see that when the length of DNA sequences becomes longer than 8000, the successful rate decreases obviously. This may be explained as that when the DNA sequence is much longer than the sample DNA sequences, the information of sample DNA sequences become not enough to recognize the given DNA sequence.
References 1. 2.
Yuan Yan Tang, Lihua Yang, Jiming Liu,Hong Ma: Wavelet Theory and Its Application to Pattern Recognition, World Scientific Publishing Co.Pte.Ltd, Singapore (2000) Dazhi Meng: Construction and Simplified Model of DNA Sequences, Mathematics in Practices and Theory, 1(2001) 54-58
The Application of the Wavelet Transform to the Prediction of Gas Zones* Xiu Wen Yang1, Jing Zhao1, Jian Ping Li1, Jing Liu2, and Shun Peng Zeng2 1
International Centre for Wavelet Analysis and Its Applications, Logistical Engineering University, Chongqing 400016, P.R.China [email protected] 2 Chongqing Petroleum College, Chongqing 400042, P.R.China Abstract. An accurately evaluate about the zone number and position of the gas zone is put forward in this paper. It provides the reliable basis for developing natural gas through synthetically analyzing the result of carrying on wavelet de-noising and wavelet package denoising disposal simultaneously to the density porosity curve and neutron porosity curve. If there is natural gas in the void of underground reservoir, it can increase the density well logging porosity φ D and decrease the compensation neutron porosity φCNL . As long as overlapping these two kinds of porosity curves we can directly determine the zone meeting the condition of φ CNL < φ D is that one containing gas. While because of noise signals are contained in most of the well logging traces, small saw teeth will take place in the curves caused by some factors. Though this phenomenon is independent of the character of the zone, either of the explanation of the single curve or the two overlapping curves may run into obstacle, and makes the evaluation lack of accuracy. So it is obviously important to control the noise of the well logging traces. Despite a few ways existing for a long time in low frequency filter on the well logging traces, the rate of distinguish of the curves are reduces after dispose, so we can not explain gas zones effectively. Wavelet analysis, which has extensive application on the aspect of signal analysis and graph disposal, is a new developing branch of mathematics in recent years and achieves noticeable success. In this paper, in order to remove the signal noise of φ CNL and φ D , the one-dimension method of wavelet denoising and wavelet package denoising is used to achieve the purpose of prediction of parameters about gas zone.
*
This work was supported by the National Natural Science Foundation of China under the grand number 69903012 and 69682011.
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 430-434, 2001. Springer-Verlag Berlin Heidelberg 2001
The Application of the Wavelet Transform to the Prediction of Gas Zones
1
431
The Principle of Wavelet Transform
The one-dimension signal-de-noising disposal is the one of the important applications of the analyses of wavelet denoising and wavelet package denoising. The basic principle is as follow: A basic model of si containing noise signal:
si = f i + σ zi
i = 0,1,!, n − 1
fi is the real signal, the part of noise is zi , which is often called Gauss vacant noise N (0,1) , σis the noise grade. The purpose of removing noise is to reduce the value of noise part and recover the real signal fi. The Steps of Wavelet De-noising: 1. 2.
3.
The decomposition of the one-dimension wavelet signal Choose a wavelet of Sym8 and decide the number of layers of wavelet decomposition N=5, then decompose the one-dimension signal for the N zone. To quantitatively determine the threshold of high frequency coefficient We adopt the principle of maximum and minimum to choose the threshold, as can achieve the minimum of the maximum mean square error. Quantitatively dispose high frequency coefficient of every zone according to the soft threshold from first to fifth layer. To recompose the one-dimension wavelet According to the low frequency coefficient of the fifth zone and the high frequency coefficient after being modified from first to fifth zone we can calculate the recomposition of one-dimension wavelet.
The idea of denoising by using wavelet package is as nearly same as that of wavelet denoising. The only difference between them, allowing wavelet package subdivide and quantitatively determine the threshold of parts of both low and high frequency simultaneously, lies in the more complicate and more flexible analysis way the wavelet package provides. The steps of wavelet package de-noising: The decomposition of one-dimension wavelet package. 1. 2. 3.
4.
To choose a wavelet of Sym8 and decide the zone of wavelet decomposition N=5, then decompose the one-dimension signal for the N zone wavelet package. To compute the optimum tree (namely determine the optimum wavelet package base).The optimum tree is computed based on the minimum entropy criterion. To quantitatively determine the threshold of wavelet package decomposition coefficient We adopt the principle of maximum and minimum to choose the threshold and quantitatively decide the threshold of each wavelet package decomposition coefficient, especially the low frequency decomposition coefficient. To recompose the wavelet package.
432
Xiu Wen Yang et al.
According to the fifth zone wavelet package decomposition coefficient and the quantitatively disposed coefficient, the signal wavelet package can be recomposed.
2
An Example
We take a well logging trace of carbonate section in an oil field as the example. 1.
Collect materials for well logging traces such as Microlog, Compensated neutron log, Densilog, Spontaneous potential log, Gamma-rag reading, well diameter, etc.
2.
Calculate the density porosity
ρb ρf 3.
φD =
ρ ma − ρ b ρ ma − ρ f
by using the density curve material,
is the reading of Bulk density of fermation,
ρ ma is the density of matrix and
is the Density of fluid in the void.
Plot the original curve overlap involving two curves of density porosity
φD
and
φ
neutron porosity CNL (fig. 1). According to the curve overlap and infiltrative zone in corresponding section, because of the noise in the curves we can only find a gas zone, called A zone which is at the position from 2678 to 2690 meters. Whereas, the accurate position of the bottom and top interfaces of the gas zone can not be made sure. Moreover the two sites, one of which is 2635 to 2640 meters and another 2660 to 2675 meters, are not easy to be determined whether they are gas zone or not. Note: In Fig. 1, Fig. 2, Fig. 3, blue curve indicate density porosity curve, red line indicate neutron porosity curve. 4.
5.
On the basis of the fig. 1 (original curve), through wavelet denoising disposal to the density porosity curve and neutron porosity curve (fig. 2 wavelet denoising disposal curve), we can directly find two zones. The position of top interface of one called A is still inexplicit. Although another zone called B is 2659 to 2672 meters, the position from about 2635 to 2640 meters is hard to be determined whether it is gas zone or not. According to fig. 1, through wavelet package de-noising disposal to the density porosity curve and neutron porosity curve (fig.3 wavelet package de-noising disposal curve), we can obviously find there are three gas zones which are A from 2677 to 2691 meters, B from 2659 to 2672 meters and C from 2636 to 2641 meters respectively. The result of B from wavelet denoising and wavelet package denoising is very consistent. If the results of the interface of one or two zones of three have small difference between two ways above, we can use the average of them to make the result more accurate and reasonable.
The Application of the Wavelet Transform to the Prediction of Gas Zones
3
433
Conclusions
The influence of noise can cause inaccuracy in predicting the bottom and top interfaces by directly using the original well logging trace (fig.1) which contains lots of “small saw teeth”, etc noises independent of the character of the zone, even can omit some zones. The way of wavelet denoising keeps more main formation in the original curve but it cannot predict accurately the gas zones. Only by analyzing the result from wavelet package denoising and wavelet de-noising can we distinctly distinguish the positions of the three zones and get the result coinciding with practice. 2700
2690
2680
2670
2660
2650
2640
2630
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Fig. 1. original curve
2700 2700
2690 2690
2680 2680
2670
2670
2660
2660
2650
2650
2640
2630
2640
0
0.05
0.1
0.15
0.2
Fig. 2. wavelet denoising curve
0.25
2630
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Fig. 3. wavelet package denoising
References 1.
Jian Ping Li: Wavelet analysis and signal processing: theory, applications & software implementations, Chongqing Publishing House, Chongqing (2001)
434
2. 3.
Xiu Wen Yang et al.
Hu Canghua: Base on matlab systematic analysis & design—wavelet analysis, Xian University of electronic science and tecanology publishing house, Xian, China( 2001) Ding Ciqian: Geophysical well logging, Oil university publishing house, Beijing,China( 1989)
PARAMETERIZATIONS OF M-BAND BIORTHOGONAL WAVELETS Zeyin Zhang and Daren Huang
Abstract. In this paper, we consider the structure of compactly supported wavelets. And we prove that any wavelet matrix (the polyphase matrix of the scaling filter and wavelet filters) can be factored as the product of fundamental biorthgonal matrices and a constant valued matrix.
1. Introduction Fixed an integer m ≥ 2. A compactly supported function ϕ ∈ L2 (R) is an m-band scaling function if there exists a finite length sequence {h0k } such that X h0k ϕ(m · −k), ϕ(x) = k
the z-transform
X
h0k z −k
k
is a Laurent polynomial which is called scaling filter of ϕ. Let ϕ(x) ˜ ∈ L2 (R) be another compactly scaling function with Laurent polynomial scaling filter X gk0 z −k . k
The pair of ϕ and ϕ˜ is said to be a biorthogonal pair if Z ϕ(x)ϕ(x ˜ − k)dx = δ0,k
R
for k ∈ Z, where δ0,0 = 1, and δ0,k = 0 if k ∈ Z \ {0}. Corresponding to the biorthogonal scaling functions, there exist compactly supported wavelets X ψi (x) = hik ϕ(m · −k) k
1991 Mathematics Subject Classification. Primary 42C15, 46A35, 46E15. Key words and phrases. Wavelet, polyphase matrix, Parameterizations, Filter bank.
Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 435-447, 2001. c Springer-Verlag Berlin Heidelberg 2001
436
Zeyin Zhang and Daren Huang
with finite length coefficient sequences {hik } for 1 ≤ i ≤ m − 1, and wavelets X ψ˜i (x) = gki ϕ(m ˜ · −k) k
with finite length sequences {kgki } for 1 ≤ i ≤ m − 1, such that the family {mj/2 ψi (mj · −k), j, k ∈ Z, 1 ≤ i ≤ m − 1} and the family {mj/2 ψ˜i (mj · −k), j, k ∈ Z, 1 ≤ i ≤ m − 1} are biorthogonal bases in L2 (R). Now we introduce the polyphase Laurent polynomials X j (1.1) Hi,j (z) = hmk+i z −k k
(1.2)
Gi,j (z) =
X
j gmk+i z −k
k
for 0 ≤ i, j ≤ m − 1. Let (1.3)
H(z) = (Hi,j )0≤i,j≤m−1 ,
G(z) = (Gi,j )0≤i,j≤m−1 .
By the biorthogonality we get (1.4)
G∗ (z −1 )H(z) = mIm
and the first column vectors of H(1) and G(1) is (1, 1, . . . , 1)∗ . G∗ (z −1 )= G(¯ z −1 )∗ , Here and hereafter, for a matrix or vector A, A∗ denote the Hermite transpose of A, Im is an m square identity matrix. The theoretical work of orthogonal wavelets was done in the late eighties [1, 2, 4-6, 11, 15] and the framework of biorthogonal wavelets was established in the early nineties [3, 8, 10]. The invention of the polyphase decomposition is one of the reasons why multirate filter banks processing became practically attractive. It is valuable not only in practical design and actual implementation of filter banks, but also in theoretical study[14]. Actually with the polyphase decomposition, P. P. Vaidyanathan and his colleagues [9, 13 ] derive factorizations of paraunitary matrices and apply such factorizations to design quadrature mirror filter (QMF) banks for digital signal processing problems. P. N. Heller, H. L. Resnikoff, and R. O. Wells, Jr. [7, 12] use the polyphase decomposition to develop a parametrization theory of compactly supported orthonormal wavelets. The purposes of this paper is to factorize A pair of matrices H(z) and G(z) satisfying (1.4) into some simple building block. The building block used in this paper are of the form Im − P + P z ±1 , where P is an one order idempotent matrix. This
Parameterizations of M-Band Biorthogonal Wavelets
437
paper is organized as follows. in section 2, we give a some definition and lemmas for the later use. then discuss parameterizations of dual Laurent polynomial pairs (section 3), and derive parametric decomposition of biorthogonal wavelet filter matrix(section 4), At last some final remarks are given (section 5) 2. some lemmas For the convenience in the following we give some definitions. Definition 1. The pair (H(z), G(z)) of matrices consist of polyphase Laurent polynomials (1.1) and (1.2) of scaling filter and wavelet filters defined as in (1.3) is said to be a biorthogonal wavelet matrix pair. H(z), G(z) are said to be biorthogonal wavelet matrices. Now we consider a pair of Laurent polynomials vectors α(z) and β(z) with vector valued coefficients, α(z) = αs z −s + αs+1 z −s−1 + · · · + αk z −k β(z) = βp z −p + βp+1 z −p−1 + · · · + βq z −q with αi , βj ∈ Rm , for s ≤ i ≤ k, p ≤ j ≤ q. Let Vα be a subspace of Rm spanned by {αi , s ≤ i ≤ k}, Vβ be a subspace of Rm spanned by {βj , p ≤ j ≤ q}. Definition 2. We say (α(z), β(z)) of Laurent polynomial vectors is a dual pair if α∗ (z −1 )β(z) = m where α∗ (z) = α(¯ z )∗ . Now if we rewrite (1.3) into (2.1)
H(z) = (α0 (z), α1 (z), . . . , αm−1 (z))
and (2.2)
G(z) = (β0 (z), β1 (z), . . . , βm−1 (z))
then (2.3)
αi∗ (z −1 )βj (z) = mδi,j , 0 ≤ i, j ≤ m − 1
by (1.4), so (αi (z), βi (z)), 0 ≤ i ≤ m − 1 are m dual pairs of Laurent polynomial vectors. Lemma 1. Let U = Vα ∩ Vβ⊥ , W = Vβ ∩ Vα⊥ . If (α(z), β(z)) is a dual pair of Laurent polynomial vectors, then the difference spaces Vα ª U , Vβ ª W are adjoint. Here and hereafter, we say two subspaces V1 , V2 ⊆ Rm are adjoint if there exist a basis {αi }k1 of V1 and a basis {βi }k1 of V2 such that αi∗ βj = δi,j ,
1 ≤ i, j ≤ k.
438
Zeyin Zhang and Daren Huang
Proof. Let k be the rank of matrix (αi∗ βj )s≤i≤k,p≤j≤q . Then there exist a k order invertable block (αi0 ∗ βj0 )1≤i,j≤k of matrix (αi∗ βj ) and such that k X 0∗ am,j αm βj , p ≤ j ≤ q; (2.4) αi∗ βj = m=1
and (2.5)
αi∗ βj
=
k X
0 bi,m αi∗ βm ,
s ≤ i ≤ k.
m=1
P By (2.4) and (2.5),P it is easy to verify that {αi − kl=1 bi,l αl0 , s ≤ i ≤ k} ⊂ U , and {βj − al,j βl0 , p ≤ j ≤ q} ⊂ W , therefore Vα ª U is the subspace spanned by {αi0 }ki=1 , Vβ ª W is the subspace spanned by {βi0 }ki=1 . To prove Vα ª U and Vβ ª W are adjoint, it is need to prove that there are two bases of the two subspaces respectively which are biorthogonal. In fact, let {β¯j0 }k1 be defined by (β¯10 , β¯20 , · · · , β¯k0 ) = (β10 , β20 , · · · , βk0 )A−1 where A = (αi0 ∗ βj0 )1≤i,j≤k . Then {αi0 }ki=1 and {β¯j0 }k1 are bases of Vα ª U and Vβ ª W respectively, and they are biorthogonal, the proof is completed. Definition 3. Under the condition in Lemma 1. The dual order of dual pair (α(z), β(z)) of Laurent polynomial vectors is defined as the dimension of Vα ª U . For a subspace V ⊆ Rm , define V ⊥ = {α ∈ Rm ; αβ ∗ = 0, ∀β ∈ V }. By the argument in Lemma 1, we see that the dual order of (α(z), β(z)) is equal to the rank of matrix (αi∗ βj )s≤i≤k,p≤j≤q . By the result of Lemma 1, we have Lemma 2. Let k be the order of dual pair (α(z), β(z)) of Laurent polynomial vectors is k, then there exist birothogonal bases α10 , . . . , αk0 ∈ Vα and β10 , . . . , βk0 ∈ Vβ satisfying ∗
αi0 βj = δi,j , 1 ≤ i, j ≤ k such that (2.6)
(
P P ˜izi α(z) = ki=1 Hi (z)αi0 + i α Pk P β(z) = i=1 Gi (z)βi0 + j β˜j z j
where α ˜ i ∈ Vα ∩ Vβ⊥ and β˜j ∈ Vβ ∩ Vα⊥ , and Hi , Gi are Laurent polynomials.
Parameterizations of M-Band Biorthogonal Wavelets
439
In Lemma 2, if k = 1, then H1 (z) = cz −n , G1 (z) = mc z −n for a nonzero constant c and an integer n. Especially if (α(z), α(z)) is a dual pair of Laurent polynomial vectors, by the fact Vα ∩ Vα⊥ = {0}, we have Lemma 3. If Let k be the dual order of (α(z), α(z)).There exist α1 , . . . , αk ∈ Vα such that
αi∗ βj = δi,j , 1 ≤ i, j ≤ k and α(z) =
k X
Hi (z)αi
i=1
where Hi , 1 ≤ i ≤ k are Laurent polynomials. A matrix P ∈ Rm×m is said idempotent if P 2 = P . for a given matrix Q ∈ Rm×m and a subspace V ⊆ Rm , define P V = {P α; α ∈ V }. For a subspace V ⊆ Rm , Q ∈ Rm×m is said to be an annihilator on V , if QV = {0}. Denoted by N (V ) the set of all annihilators on V . 3. Parameterizations of dual pair of Laurent polynomial vectors with one rank idempotent matrices Theorem 1. If (α(z), β(z)) is a dual pair of Laurent polynomial vectors, then there exist one rank idempotent matrices P1 , P2 , . . . , Pd with that Pi ∈ N (Vβ⊥ ), Pi∗ ∈ N (Vα⊥ ), 1 ≤ i ≤ d such that α(z) = Vd (z)Vd−1 (z) · · · V1 (z)δ(z) ∗ β(z) = Vd∗ (z)Vd−1 (z) · · · V1∗ (z)γ(z) where (δ(z), γ(z)) is a one order dual pair of Laurent polynomial vectors, Vδ ⊆ Vα , Vγ ⊆ Vβ and Vi (z) = Im − Pi + Pi z −1 , 1 ≤ i ≤ d. Let P be a one order idempotent matrix, that is, there exist u, v ∈ Rm , u∗ v = 1 such that P = uv ∗ . Define (3.1)
V (z) = Im − P + P z −τ , τ ∈ {−1, +1}
then V ∗ (z) = Im − P ∗ + P ∗ z −τ , so (3.2)
V (z)V (z −1 ) = 1,
det(V (z)) = z −τ
we will say that the matrix V (z) of the form (3.1) as primitive biorthogonal matrix.
440
Zeyin Zhang and Daren Huang
Proof of Theorem 1. Let k be the dual order of (α(z), β(z)). By Lemma 2, we can represent α(z) and β(z) in the form (2.6)with coefficient Laurent polynomials Hi , Gi , 1 ≤ i ≤ k. Now write r X T −n (3.3) (H1 (z), H2 (z), . . . , Hk (z)) = z ηi z −i , 0 T
here and hereafter, for a vector α, α denote the transpose of α, n is an integer, ηi ∈ Rk , i = 0, 1, . . . , r. The scheme of the proof is to decrease the length r + 1 to 1 recursively. If r = 0, the length of (3.3) is just one, there is need do nothing. Assume r ≥ 1. case 1. η0 , ηr is independent. Let (b1 , b2 , . . . , bk )T = ηr and u (a1 , a2 , . . . , ak )T = ∗ , ηr u where u = (η0∗ η0 )ηr − (η0∗ ηr )η0 . Define k k X X 0 β= ai βi , α = bi αi0 , 1
1
then β ∈ Vβ and α ∈ Vα . Let P = αβ ∗ , then P is a one order idempotent matrix, and P ∈ N (Vβ⊥ ), P ∗ ∈ N (Vα⊥ ). Define V (z) = Im − P + P z −1 , then V (z) is a primitive biorthogonal wavelet matrix. And define α0 (z) = V (z −1 )α(z),
β 0 (z) = V ∗ (z −1 )β(z),
it follows that (α0 (z), β 0 (z)) is a dual pair of Laurent polynomial vectors, Vα0 ⊆ Vα , Vβ 0 ⊆ Vβ and α(z) = V (z)α0 (z), Note that 0
α (z) =
k X
β(z) = V ∗ (z)β 0 (z).
Hi0 (z)αi0 +
X
α ˜ i z −i
1
where
(H00 (z), H10 (z), . . . , Hk0 (z))T = z −n (η0 +
u∗ η1 u∗ ηr−1 −r+1 ηr )z ). η +· · ·+(η +η − r r r−1 u∗ ηr u∗ η r
Thus the length of (H00 (z), H10 (z), . . . , Hk0 (z))T is decreased by 1.
Parameterizations of M-Band Biorthogonal Wavelets
441
case 2. η0 , ηr are dependent. Write Laurent polynomials Gi , 1 ≤ i ≤ k in (2.6) as s X T −n1 (G1 (z), . . . , Gk (z)) = z γs z −i , 0
η0∗ γs
ηr∗ γs
ηr∗ γ0
η0∗ γ0
= then = 0 or = = 0. we only consider η0∗ γs = ∗ ηr γs = 0, for another is similar. There exist an l such that ηi∗ γs = 0, 0 ≤ i ≤ l − 1 and ηl∗ γs 6= 0. Let ηl = (c1 , c2 , . . . , ck )T and γs = (d1 , d2 , . . . , dk )T Define k k X X 0 α= ci αi , β = di βi0 , 1
1 ∗
then α ∈ Vα and β ∈ Vβ . Now if we set P = γαβ ∗ , then P is a one rank s ηl ⊥ ∗ ⊥ idempotent matrix, P ∈ N (Vβ ), P ∈ N (Vα ). Define
V (z) = Im − P + P z −n = (Im − P + P z −1 )n then it is a power of primitive biorthogonal wavelet matrix, and define α0 (z) = V (z −1 )α(z),
β 0 (z) = V ∗ (z −1 )α(z).
It follows that (α0 (z), β 0 (z)) is a dual pair of Laurent polynomial vectors, Vα0 ⊆ Vα , Vβ 0 ⊆ Vβ and α(z) = V (z)α0 (z), Note that α0 (z) =
k X
β(z) = V ∗ (z)α0 (z).
Hi0 (z)α0 +
X
α ˜ i z −i
1
where (H10 (z), H20 (z), . . . , Hk0 (z))T = z −n ((η0 + ηn ) + η1 z −1 + · · · + ηr z −r ) thus the length of (H10 (z), H20 (z), . . . , Hk0 (z))T is the same as the length of (3.3), but η0 + ηn and ηr are independent,which transform to condition in the case 1. Recursively proceeding in this fashion, we decrease the length of (3.3) to 1, that is α(z) = Vd (z)Vd−1 (z) · · · V1 (z)δ(z), ∗ β(z) = Vd∗ (z)Vd−1 (z) · · · V1∗ (z)γ(z),
442
Zeyin Zhang and Daren Huang
where
k X
δ(z) = z −n
ci αi +
X
α ˜ i z −i
1
and γ(z) =
k X
G0i (z)βi +
X
β˜i z −i ,
1
therefore (δ(z), γ(z)) is a dual pair of Laurent polynomial vectors with one order, and Vδ ⊆ Vα , Vγ ⊆ Vβ . The proof is completed. In the following, we consider the parameterizations of one order dual pair of Laurent polynomial vectors. Theorem 2. Let (α(z), β(z) be a dual pair of Laurent polynomial vectors. If the dual order is one, then there exist idempotent matrices Pi , 1 ≤ i ≤ d with rank one, and Pi ∈ N (Vβ⊥ ), Pi∗ ∈ N (Vα⊥ ), 1 ≤ i ≤ d such that α(z) = z −k Vd (z)Vd−1 (z) · · · V1 (z)α(1) ∗ (z) · · · V1∗ (z)β(1) β(z) = z −k Vd∗ (z)Vd−1 where k is an integer, and
Vi = Im − Pi + Pi z −τi , τi ∈ {1, −1}, i = 1, 2, · · · , d. Proof. Since (α(z), β(z)) is a one order dual pair of Laurent polynomial vectors with order one, then by Lemma 2, we have ½
α(z) = αk−r z −k+r + · · · + αk z −k + · · · + αk+s z −k−s β(z) = βk−p z −k+p + · · · + βk z −k + · · · + βk+q z−k − q
where αi∗ βj = mδi,k δj,k for k − r ≤ i ≤ k + s, k − p ≤ j ≤ k + q. Define Vi (z) = Im − Pi + Pi z −1 ,
Uj (z) = Im − Qj + Qj z −1
as primitive biorthogonal wavelet matrix, where Pi = (
k X
αl )(
l=k−i
and Qj = ( Then Pi , Qi ∈
k+j X
αl )(
k X
βl ) ∗ , 0 ≤ i ≤ r − 1
l=k−i k+j X
βl )∗ ,
l=k−r l=k−r ⊥ ∗ ∗ N (Vβ ), Pi , Qi ∈ N (Vα⊥ ),
1 ≤ j ≤ s − 1. 0 ≤ i ≤ r − 1, 1 ≤ j ≤ s − 1.
Parameterizations of M-Band Biorthogonal Wavelets
Define
s−1 Y
α ˜ (z) =
r
Uj (z)Vr (z)
1
0 Y
443
Vi (z −1 )α(z),
r−1
and define ˜ β(z) =
r−1 Y
Vi∗ (z −1 )(Vr∗ (z)r )
0
1 Y
Uj∗ (z)β(z).
s−1
˜ It follows that (˜ α(z), β(z)) is a dual pair, Vα˜ ⊆ Vα , Vβ˜ ⊆ Vβ , and α(z) =
r−1 Y
Vi (z)(Vr (z −1 )r )
0
β(z) =
Uj (z −1 )α(z),
s−1
s−1 Y
Uj∗ (z −1 )Vr∗ (z −1 )r
1
Note that α ˜ (z) =
1 Y
0 Y
Vi∗ (z)α(z).
r−1
αn0 z −n ,
where n = k + s, and αn0 =
Pk+s k−r
αi and
0 0 ˜ z r−n + · · · + βn0 z −n + · · · + βn+s z −n−s β(z) = βn−r with that αn0 ∗ βi0 = mδn,i . ˜ The next step is to factor β(z). Define
Vi0 (z) = Im − Pi0 + Pi0 z −1 ,
Uj0 (z) = Im − Q0j + Q0j z −1
as primitive biorthogonal wavelet matrices, where n X βj0 )∗ , 0 ≤ i ≤ r Pi0 = αn0 ( n−i
and Q0j
=
n+j X
αn0 (
βl0 )∗ ,
1 ≤ j ≤ s − 1.
n−r N (Vβ⊥ ), Pi0∗ , Q0∗ j
It follows that Pi0 , Q0j ∈ ∈ N (Vα⊥ ), 0 ≤ i ≤ r, 1 ≤ j ≤ 0 0 s − 1. And Pi , Qj are idempotent matrices with rank one. Note that 1 Y s−1
where β˜n+s =
r Ui0∗ (z)(Vr0∗ (z) )
j
r−1
˜ Vi0∗ (z −1 )β(z) = β˜n+s z −n−s ,
0
P 0 Y
r−1 Y
βj0 and
r Vi0 (z −1 )Vr0 (z)
s−1 Y 1
Ui0 (z)˜ α(z) = α ˜ n z −n−s .
444
Zeyin Zhang and Daren Huang
By the fact of biorthogonal matrix (3.2), we obtain α(z) = z −n−s Vd (z)Vd−1 (z) · · · V1 (z)γ, ∗ β(z) = z −n−s Vd∗ (z)Vd−1 (z) · · · V1∗ (z)δ. At last letting z = 1 above to get γ = α(1), δ = β(1). Together with Theorem 1 and Theorem 2, we get
Theorem 3. For any dual pair of Laurent polynomial vectors α(z) and β(z), we have α(z) = z −n Vd (z)Vd−1 (z) · · · V1 (z)α(1), ∗ β(z) = z −n Vd∗ (z)Vd−1 (z) · · · V1∗ (z)β(1). Where n is an integer, and there exist one rank idempotent matrices Pi satisfying Pi ∈ N (Vβ⊥ ), Pi∗ ∈ N (Vα⊥ ) such that Vi (z) = Im −Pi +Pi z −τi , with that τi ∈ {1, −1}, 1 ≤ i ≤ d.
4. Parameterizations of biorthogonal wavelet matrix Theroem 4. If H(z) and G(z) is a pair of biorthogonal wavelet matrices, then there exist one rank idempotent matrices Pi and integers ki , i = 1, 2, · · · , m such that H(z) = Vd (z)Vd−1 (z) · · · V2 (z)V1 (z)diag(z −k1 , z −k2 , · · · , z −km )H(1), and ∗ (z) · · · V2∗ (z)V1∗ (z)diag(z −k1 , z −k2 , · · · , z −km )G(1), G(z) = Vd∗ (z)Vd−1
where Vi (z) = Im − Pi + Pi z τi , τi ∈ {1 , −1 }, 1 ≤ i ≤ d. Proof: Writing H(z), G(z) in the form as (2.1)and (2.2) respectively, then (2.3)holds, (αi , βi ), 0 ≤ i ≤ m − 1 are dual pairs of Laurent polynomial vectors. By theorem 3, for the dual pair (α0 (z), β0 (z)) of Laurent polynomial vectors, there exist primitive biorthogonal matrices V0,1 (z), V0,2 (z), · · · , V0,d1 (z) and an integer k such that α0 (z) = z −k1 V0,d1 (z)V0,d1 −1 (z) · · · V0,1 (z)α0 (1), ∗ ∗ ∗ β0 (z) = z −k1 V0,d (z)V0,d (z) · · · V0,1 (z)β0 (1) 1 1 −1 where k1 and d1 are non-negative integers. Define
H1 (z) = V1 (z −1 )V2 (z −1 ) · · · Vd1 (z −1 )H(z) and
G1 (z) = V1∗ (z −1 )V2∗ (z −1 ) · · · Vd∗1 (z −1 )G(z), then (H1 (z), G1 (z)) is a pair of biorthogonal wavelet matrices and H(z) = Vd1 (z)Vd1 −1 (z) · · · V1 (z)H1 (z),
Parameterizations of M-Band Biorthogonal Wavelets
445
G(z) = Vd∗1 (z)Vd∗1 −1 (z) · · · V1∗ (z)G1 (z). It follows that
¡ ¢ H1 (z) = z −k1 α0 (1), α1,1 (z), · · · , α1,m−1 (z)
and
¡ ¢ G1 (z) = z −k1 β0 (1), β1,1 (z), · · · , β1,m−1 (z) .
By the biorthogonality we get (4.1)
α0 (1) ∈ Vβ⊥1,k , β0 (1) ∈ Vα⊥1,k , 1 ≤ k ≤ m
and ∗ β1,i (z −1 )β1,j (z) =
© m, if i = j 0, if i 6= j.
therefore, (β1,1 (z), β1,1 (z)) is a dual pair of Laurent polynomial vectors. By Theorem 3, there exist primitive biorthogonal matrices V1,1 (z), V1,2 (z), · · · , V1,d2 (z) such that α1,1 (z) = z −k2 V1,d2 (z)V1,d2 −1 (z) · · · V1,1 (z)α1,1 (1), ∗ ∗ ∗ β1,1 (z) = z −k2 V1,d (z)V1,d (z) · · · V1,1 (z)β1,1 (1), 2 2 −1
where k2 and d2 are non-negative integers. Define H2 (z) = V1,1 (z −1 )V1,2 (z −1 ) · · · V1,d2 (z −1 )H1 (z) and ∗ ∗ ∗ G2 (z) = V1,1 (z −1 )V1,2 (z −1 ) · · · V1,d (z −1 )G1 (z) 2
then (H2 (z), G2 (z)) is a pair of biorthogonal wavelet matrices and H1 (z) = V1,d2 (z)V1,d2 −1 (z) · · · V1,1 (z)H2 (z), ∗ ∗ ∗ G1 (z) = V1,d (z)V1,d (z) · · · V1,1 (z)G2 (z). 2 2 −1
Note the fact (4.1) and Theorem 3, we get ¡ ¢ H1 (z) = z −k1 α0 (1), z −k2 α1,1 (1), · · · , α1,m−1 (z) and
¡ ¢ G1 (z) = z −k1 β0 (1), z −k2 β1,1 (1), · · · , β1,m−1 (z) .
Proceeding in the same fashion, we get primitive biorthogonal matrices Vi,j (z), j = 1, 2, · · · , di , 1 ≤ i ≤ r
446
Zeyin Zhang and Daren Huang
with nonnegative integers di , 1 ≤ i ≤ r and integers ki , 1 ≤ i ≤ m such that H(z) = Vr−1,dr (z)Vr−1,dr −1 (z) × · · · Vr−1,1 (z) · · · V1,d2 (z)V1,d2 −1 (z) × · · · V1,1 (z)Vd1 (z)Vd1 −1 (z) · · · V1 (z) × diag(z −k1 , z −k2 , · · · , z −km )U and ∗ ∗ G(z) = Vr−1,d (z)Vr−1,d (z) × r r −1 ∗ ∗ ∗ · · · Vr−1,1 (z) · · · V1,d (z)V1,d (z) × 2 2 −1 ∗ · · · V1,1 (z)Vd∗1 (z)Vd∗1 −1 (z) · · · V1∗ (z) ×
diag(z −k1 , z −k2 , · · · , z −km )W. By taking z = 1 above we get U = A(1) and W = B(1), therefore U, W is a pair of constant-valued biorthogonal wavelet matrices. Especially, for orthogonal wavelet matrices, by using Lemma 2 and the similarly procedure as above, we have Theroem 5. If H(z) is an orthogonal wavelet matrix, then there exist symmetric idempotent matrices Pi with rank one and integers ki , i = 1, 2, · · · , m such that H(z) = Vd (z)Vd−1 (z) · · · V2 (z)V1 (z)diag(z −k1 , z −k2 , · · · , z −km )U, where U is a constant-valued orthogonal wavelet matrix, Vi = Im − Pi + Pi z −1 , 1 ≤ i ≤ d.
5. Final remark 1. In [12], H. L. Resnikoff, J.Tian and R. O. Wells. Jr discussed the parameterizations and parameterizations in biorthogonal wavelet space, they proved that any biorthogonal wavelet matrix pair can be decomposed into four components: an orthogonal component, a pseudo identity matrix pair, an invertible matrix and a constant matrix. The result is modified into theorem 3 in this paper: Any biorthogonal wavelet matrix pair can be decomposed into two parts: an biorthogonal components V (z) and an constant matrix H.
Parameterizations of M-Band Biorthogonal Wavelets
447
2. It was proved in [12] that any constant matrix in Theorem 4 can be decomposed into ¶ µ 1 0 ˜ H H= 0 U ˜ = (γ0 , γ1 , . . . , γm−1 ) with that γ0 = (1, 1, . . . , 1) , where H r m (0, · · · , 0, −m + i, 1, · · · , 1 )T . γi = | {z } (m − i)(m − i + 1) | {z } i−1 terms
m−i terms
for 1 ≤ i ≤ m − 1, and U is an (m − 1) × (m − 1) nonsingular constantvalued matrix. References [1] Bi N., Dai X. and Sun Q., Construction of compactly supported M-band wavelets, Appl. Comp. Harmonic Anal. 6(1999), pp.113-131. [2] Chui C. K. and Lian J., Construction of compactly supported symmetric and antisymmetric orthogonal wavelets with scale=3, Appl. Comput. Harmonic Anal., 2(1995), pp.21-51. [3] Cohen A., Daubechies I. and Feauveau J. C., Biorthogonal basis of compactly supported Wavelets, Commun. Pure Appl. Math., 45(5)(1992), pp.485-560. [4] Daubechies I., Ten lectures on wavelets, SIAM, Philadelphia, PA, 1992. [5] Han B., Symmetric orthogonal scaling functions and wavelets with dilation factor 4, Adv. Compt. Math., 8(1998), pp.221-247. [6] Heller D. N., Rank m wavelets with n vanish moments, SIAM J. Matrix Anal. 16(2)(1994), pp.502-519. [7] Heller P. N., Resnikoff H. L. and Wells R. O. Jr., Wavelet matrices and the representation of discrete functions, in Wavelet- A Tutorial in theory and applications, C. K. Chui (ed.), Academic Press, Inc.(1992), 15-50. [8] Ji H. and Shen Z., Compactly supported (bi)orthogonal wavelets generated by interplatory refinable functions, Adv. Comput. Math., 111999, pp.81-104. [9] Soman A. K., Vaidyanathan P.P. and Nguyen T.Q., Linear phase paraunitary filter banks: theory, factorization and designs, IEEE Trans. Signal Processing 41(1993), pp.3480-3496. [10] Soardi P., Biorthogonal M-channel compactly supported wavelets, Constr. Approx., 16(2000), pp.283-311. [11] Sun Q. and Zhang Z., M-Band scaling function with filter having vanishing moments two and minimal length, J. Math. Anal. 222(1998), pp.225-243. [12] Resnikoff H. L., Tian J. and Wells R. O. Jr, An algebraic structure of orthogonal wavelet space, Appl. Comput. Harmon. Anal., 8(2000), pp. 223–248. [13] Vaidyanathan P. P., Multi-rate systems and filter banks, Prentice-Hall, Englewood Cliffs, NJ, 1993. [14] Vetterli M. and Herley C., Wavelet and filter banks: Theory and design, IEEE Trans. Acounst. Speech SignaL Processing, 40(1992), pp. 2207-2232. [15] Welland G. V. and Lundberg M., Construction of compact p-wavelets, Constr. Approx. 9(1993), pp.347-370.