Proceedings IWISP '96, 4-7 November 1996; Manchester, UK

PROCEEDINGS IWISP '96 4-7 November 1996 Manchester, U.K. This Page Intentionally Left Blank PROCEEDINGS IWISP '96 4...

Author: B.G. Mertzios | P. Liatsis

7 downloads 1235 Views 62MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

PROCEEDINGS IWISP '96 4-7 November 1996 Manchester, U.K.

This Page Intentionally Left Blank

PROCEEDINGS IWISP '96 4-7 November 1996, Manchester, United Kingdom Third International Workshop on Image and Signal Processing on the Theme of Advances in Computational Intelligence

Edited by

B.G. MERTZIOS Automatic Control Lab., Dept. of Electrical & Comp. Engineering, Democritus University of Thrace, GR-67 100 Xanthi, GREECE

P. LIATSIS Control Systems Centre, Dept. of Electrical Engineering & Electronics, UMIST, Sackville Street, P.O. Box 88, Manchester M60 1QD, United Kingdom

ELSEVIER AMSTERDAM - LAUSANNE- NEW YORK - OXFORD - SHANNON- TOKYO

1996

ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box 211, 1000 AE Amsterdam, The Netherlands

ISBN: 0 444 82587 8

9

1996 Elsevier Science B.V. All rights reserved.

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher, Elsevier Science B.V., Copyright & Permissions Department, P.O. Box 521, 1000 AM Amsterdam, The Netherlands. Special regulations for readers in the U.S.A.- This publication has been registered with the Copyright Clearance Center Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the U.S.A. All other copyright questions, including photocopying outside of the U.S.A., should be referred to the copyright owner, Elsevier Science B.V., unless otherwise specified. No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. This book is printed on acid-free paper. Printed in The Netherlands.

Preface The p a p e r s t h a t

are

included

International

Workshop

Computational

Intelligent,

November,

on

in t h i s

volume have been p r e s e n t e d a t

Image/Signal

Processing

(IWISP):

UMIST in a s s o c i a t i o n Electrical

Advances

which was h e l d a t UMIST, M a n c h e s t e r ,

1996. The 3rd IWISP was o r g a n i s e d by t h e C o n t r o l

the 3rd

UK on 4-7

Systems C e n t r e ,

w i t h IEEE Region 8 and c o - s p o n s o r e d by the I n s t i t u t e

Engineers,

the

Institute

of Measurement

in

and C o n t r o l ,

the

of IEEE

S i g n a l P r o c e s s i n g S o c i e t y and the C o n t r o l Technology T r a n s f e r Network, under the

General

Chairmanship

C h a i r m a n s h i p of P r o f .

of

Prof.

Peter

E.

Wellstead

and

the

Programme

B a s i l G. M e r t z i o s .

E v i d e n t l y , a Workshop cannot c o v e r t h e i n t e n s i v e l y d e v e l o p e d a r e a of Image and Signal

Processing.

The t r e n d of t h e 3rd

Computational

IWISP i s

Intelligence',

emphasized by i t s

'Advances

in

referring

efficiency

and c o m p l e x i t y on Image and S i g n a l P r o c e s s i n g .

Workshop f o c u s e s in t h e most modern and c r i t i c a l Processing society.

and t h e i r

related

Specifically,

categorized

the

areas

that

articles

to

theme:

computational

In p a r t i c u l a r ,

the

a s p e c t s of Image and S i g n a l

have a s i g n i f i c a n t

presented

in

the

3rd

impact

in our

IWISP

may be

in t h e f o l l o w i n g f o u r major p a r t s :

I Coding and Compression (image c o d i n g , representation,

image subband,

wavelet

c o d i n g and

v i d e o c o d i n g , motion e s t i m a t i o n and m u l t i m e d i a ) ;

I Image P r o c e s s i n g and P a t t e r n R e c o g n i t i o n (image a n a l y s i s ,

edge d e t e c t i o n ,

segmentation,

systems,

processing,

image enhancement

pattern

and r e s t o r a t i o n ,

adaptive

colour

and o b j e c t r e c o g n i t i o n and c l a s s i f i c a t i o n ) ;

n F a s t P r o c e s s i n g T e c h n i q u e s ( c o m p u t a t i o n a l methods, VLSI DSP a r c h i t e c t u r e s ) ; I Theory and A p p l i c a t i o n s banks,

wavelets

in

(identification

image and s i g n a l

and m o d e l l i n g ,

processing,

multirate

biomedical

and

filter

industrial

applications). The p r o p o s a l s

from each c a t e g o r y were t h e n r e v i e w e d by the members of t h e

International

Programme

Committee

and

numerous

other

reviewers.

We a r e

sincerely grateful

to t h e r e v i e w e r s and to t h e v o l u n t e e r s who a c t e d as i n v i t e d

sessionorganisers

and h e l p e d up to a t t r a c t

r e v i e w p r o c e s s , about t h r e e f i f t h s final

programme c o n s i s t e d

quality

papers.

exceptionally

interesting

of

the

sessions, papers

giving a total

presented

and wide i n t e r n a t i o n a l

c o n t i n e n t s and r e p r e s e n t i n g

In t h e

of t h e s u b m i t t e d p a p e r s were a c c e p t e d . The

of 24 o r a l

The a u t h o r s

high q u a l i t y c o n t r i b u t i o n s .

in

of 152 h i g h

IWISP-96

form an

group coming from the f i v e

t h e f o l l o w i n g 33 c o u n t r i e s :

Argentina,

Armenia,

vi

Australia,

Belgium, B r a z i l , Canada, China, C r o a t i a , Czech R e p u b l i c , F i n l a n d ,

France, Germany, 6 r e e c e , Hong Kong, I n d i a , I s r a e l , Mexico, The N e t h e r l a n d s ,

Poland, R u s s i a ,

Iran, Italy,

Slovakia,

Slovenia,

Japan, Korea, Spain,

Sweden,

Taiwan, Turkey, UK, USA and Y u g o s l a v i a . The f i r s t

and second IWISP have been held in Budapest under the c h a i r m a n s h i p

of

Kalman Fazekas.

Prof.

signifies

a

true

successful

future.

The

internationalisation

and

fertilisation strong

techniques, reduction,

and

Systems,

focus on the

the

3rd

IWISP to

strengthens

and

where

there

interdisciplinary is

of t h e o r y and a p p l i c a t i o n s .

interest

of

Manchester

guarantees

a

The next Workshops w i l l be o r g a n i s e d by an I n t e r n a t i o n a l

S t e e r i n g Committee and w i l l Processing

transition

include

lossless

multiresolution

a

great

systems,

and w a v e l e t s ,

a d a p t i v e systems and f i l t e r s ,

potential

Amongst o t h e r s ,

and o r t h o g o n a l

analysis

a r e a s of Signal for

typical

c a s e s of

linear prediction

model

and

data

c o m p u t a t i o n a l c o m p l e x i t y and n o n - l i n e a r dynamics.

Acknowledgements

and

are

order

2D c o n t r o l systems, l e a r n i n g t h e o r y

and a p p l i c a t i o n s ,

appreciation

cross-

due

to

all

the

contributors

who

s u b m i t t e d t h e i r p r o p o s a l s f o r review to IWISP'96. Needless to say, we could not have such a high q u a l i t y t e c h n i c a l programme w i t h o u t t h e i r c o n t r i b u t i o n s . We a l s o wish to s i n c e r e l y

thank the members of the I n t e r n a t i o n a l

Programme

Committee, the r e v i e w e r s and a l l those t h a t helped in the o r g a n i s a t i o n of the Workshop.

B a s i l G. M e r t z i o s Panos L i a t s i s

vii

IWISP '96 ORGANIZING COMMITTEE

P.E. W e l l s t e a d , UMIST, UK ( General Chair) M. Domanski, TU Poznan, Poland ( T u t o r i a l s Chair) K. Fazekas, TU Budapest, Hungary ( F i n a n c i a l Chair) P. L i a t s i s ,

UMIST, UK

( P r o c e e d i n g s / P u b l i c i t y Chair)

B.G. M e r t z i o s , D e m o c r i t u s Univ. of Thrace, Greece (Program Chair)

. ~ 1 7 6

Vlll

INTERNATIONAL PROGRAMME COMMITTEE

I. A n t . H i . u , J.

Solvay I n s t . ,

Biemond, TU D e l f t ,

Belgium

The N e t h e r l a n d s

Z. B o j k o v i c , Belgrade Univ., Y u g o s l a v i a I. B o u t a l i s ,

Democritus Univ. of Thrace, Greece

M. Brady, Univ. of Oxford, UK V. C a p p e l l i n i ,

Florence Univ.,

G. C a r a g i a n n i s ,

Italy

NTUA, Greece

A.C. C o n s t a n t i n i d e s ,

I m p e r i a l C o l l e g e , UK

T. Cooklev, Univ. of Toronto, Canada J. C o r n e l i s ,

Vrije Universiteit

Brussel,

Belgium

A. Davies, K i n g ' s C o lle g e London, UK I. E r e n y i , KFKI Research I n s t . ,

Hungary

G. F e t t w e i s , R u h r Univ. Bochum, Germany M. Ghanbari, Univ. of Essex, UK S. van H u f f e l , KU Leuven, Belgium G. I s t e f a n o p o u l o , V.V. I v a n . v ,

Bosporous U n i v . , Turkey

JINR, R u s s i a

M. Karny, UTIA, Academy of S c i e n c e s , Czech Republic T. Kida, Tokyo I n s t .

of Technology, Japan

J. K i t t l e r ,

Univ. of S u r r e y , UK

S. K o l l i a s ,

NTUA, Greece

M. Kunt, U n i v e r s i t y of Lausanne, S w i t z e r l a n d C.L. N i k i a s , Univ. of Southern C a l i f o r n i a ,

USA

T. Nossek, TU Munchen, Germany D. van Ormondt, TU D e l f t , K.K. P a r h i , M. P e t r o u ,

The N e t h e r l a n d s

Univ. of M i n n e s o t t a , USA Univ. of S u r r e y , UK

D.T. Pham, Univ. of Wales C a r d i f f , M. S a b l a t a s h ,

UK

Mcmaster Univ., Canada

D.G. Sampson, Democritus Univ. of Thr~ceT-Greece W. Schemmp, Siegen U n i v . , Germany M. S t r i n t z i s ,

Aristotle

J. Turan, TU Kosice,

Univ. of T h e s s a l o n i k i ,

Slovak Republic

G.J. V a c h t s e v a n o s , Georgia I n s t . A. V e n e t s a n o p o u l o s ,

of T e c h . , USA

Toronto Univ., Canada

Greece

ix

Contents

Session A: Image Coding I: Vector Quantisation, Fractal and Segmented Coding Joint optimization of multidimensional SOFM codebooks with QMA modulations for vector quantized image transmission O. Aitsab, R. Pyndiah and B. Solaiman Visual vector quantization for image compression based on laplacian pyramid structure Z. He, G. Qiu and S. Chen Kohonen's self-organizing feature maps with variable learning rate: Application to image compression A. Cziho, B. Solaiman, G. Cazuguel, C. Roux and I. Lovany

11

An efficient training algorithm design for general competitive neural networks J. Jian and D. Butler

15

Architecture design for polynomial approximation coding of image compression C.-Y. Lu and K.-A. Wen

19

Application of shape recognition to fractal based image compression S. Morgan and A. Bouridane

23

Chrominance vector quantization for coding of images and video at very low bitrates M. Bartkowiak, M. Domanski and P. Gerken

27

Region-of-interest based compression of magnetic resonance imaging data N.G. Panagiotidis and S.D. Kollias

31

Scalable parallel vector quantization for image coding applications D.G. Sampson, A. Cuhadar and A.C. Downton

37

Session B: Wavelets in Image/Signal Processing Real time image compression methods incorporating wavelet transforms D.T. Morris and M.D. Edwards

43

Custom wavelet packet image compression design M.V. Wickerhauser

47

Two-dimensional directional wavelets in image processing J.-P. Antoine

53

The importance of the phase of the symmetric daubechies wavelets representation of signals J.-M. Lina

61

Contrast enhancement in images using the 2D continuous Wavelet transform J.-P. Antoine and P. Vandergheynst

65

Wavelets and differential-dilation equations T. Cooklev, G. Berbecel and A.N. Venetsanopoulos

69

Wavelets in high resolution radar imaging and clinical magnetic resonance imaging W. Schempp

73

Wavelet transform based information extraction from 1-D and 2-D signals A. Dabrowski

81

Invited Session C: General techniques and algorithms Computational methods and tools for simulation and analysis of complex processes V.V. Ivanov

89

Rare events selection on a background of dominated processes applying multilayer perceptron V.V. Ivanov and P.V. Zrelov

97

Cellular automation and elastic neural network application for event reconstruction in high energy physics I. Kisel, E. Konotopskaya and V. Kovalenko

101

Recognition of tracks detected by drift tubes in a magnetic field S.A. Baginyan and G.A. Ososkov

105

Session D: Adaptive Systems I: Identification and Modeling A unified connective representation for linear and nonlinear discrete-time system identification J. Fantini

111

Predicting a chaotic time series using a dynamical recurrent neural network R. Teran, J-P. Draye and D. Pavisic

115

A new neural network structure for modelling non-linear dynamical systems A. Hussain, J.J. Soraghan, T.S. Durrani and D.C. Campell

119

xi A neural network for moving light display trajectory prediction H.M. Lakany and G.M. Hayes

123

Recognizing flow pattern of gas/liquid two-component flow using fuzzy logical neural network P. Lihui, Z. Baofen, Y. Danya and X. Zhijie

127

Adaptive algorithm to solve the mixture problem with a neural networks methodology A.M. Perez, P. Martinez, J. Moreno, A. Silva and P.L. Aguilar

133

Process trend analysis and fuzzy reasoning in fermentation control S. Kivikunnas, K. Ibatici and E. Juusso

137

Higher order cumulant maximisation using non-linear hebbian and anit-hebbian learning for adaptive blind separation of source signals M. Girolami and C. Fyfe

141

Session E: Pattern/Object Recognition A robot vision system for object recognition and work piece location W. Min, D. Qizhi and W. Jun

147

Recognition of objects and their direction of moving based on sequence of two-dimensional frames B. Potochik and D. Zazula

151

Innovative techniques for the recognition of faces based on multiresolution analysis and morphological filtering A. Doulamis, N. Tsapatsoulis and S. Kollias

155

Partial curve identification in 2-D space and its application to robot assembly E-H. Yao, G.-E Shao, A. Tamaki and K. Kato

161

A fast active contour algorithm for object tracking in complex background C.L. Lam and S.Y. Yuen

165

The 2-point combinatorial probabilistic Hough transform for circle detection J.Y. Goulermas and P. Liatsis

169

Modified rapid transform features in information symbols recognition system J. Turan, K. Fazekas, L. Ovsenik and M. Kovesi

173

Image data processing in flying object velocity optoelectronic measuring device J. Mikulec and V. Ricny

177

xii

Session F: Texture Analysis Rotation invariant texture classification schemes using GMRFs and Wavelets R. Porter and N. Canagarajah

183

A new method for describing texture D.T. Pham and B. Cetiner

187

Texture discrimination for quality control using wavelet and neural network techniques D.A. Karras, S.A. Karkanis and B.G. Mertzios

191

A region oriented CFAR approach to the detection of extensive targets in textured images C. Alberola-Lopez, J.R. Casar-Corredera and J. Ruiz-Alzola

195

Generating stabile structure of a color texture image using scale-space analysis with nonuniform gaussian kernels S. Morita and M. Tanaka

199

Session G: Image Coding II: Transform, Subband and Wavelet Coding Approximation of bidimensional Karhunen-Loeve expansions by means of monodimensional Karhunen-Loeve expansions, applied to Image Compression N. Balossino and D. Cavagnino

205

Blockness distortion evaluation in block-coded pictures M. Cireddu, EG.B. De Natale, D.D. Giusto and P. Pes

209

A new distortion measure for the assessment of decoded images adapted to human perception F. Bock, H. Walter and M. Wilde

215

Image compression with interpolation and coding of interpolation errors J. Yi and F. Arp

219

Matrix to vector transformation for image processing D. Ait-Boudaoud

223

A speech coding algorithm based on wavelet transform X. Wu, Y. Li and H. Chen

227

Automatic determination of region importance and JPEG codec reflecting human sense R. Hayasaka, J. Zhao, Y. Shimazu, K. Ohta and Y. Matsushita

231

Directional image coding on wavelet transform domain D.W. Kang

235

xiii

Session H: Video Coding I: MPEG An universal MPEG decoder with scalable picture size R. Prabhakar and W. Li

241

The influence of impairments from digital compression of video signal on perceived picture quality S. Bauer, B. Zovko-Cihlar and M. Grgic

245

On scalable coding of image sequences L. Erwan

249

Image transmission problems between IP and ATM networks V.S. Mkrttchian, A.V. Eranosian and H.L. Karamyan

253

A scalable video coding scheme based on adaptive infield/inframe DCT and adaptive frame interpolation M. Asada and K. Sawada

257

Rate conversion of compressed video for matching bandwidth constraints in ATM networks* P. Assunq~o and M. Ghanbari

Session h Image Subband, Wavelet Coding and Representation Unified image compression using reversible and fast biorthogonal wavelet transforms H. Kim and C.C. Li

263

Subband image coding using adaptive fuzzy quantization step controller P. Planinsic, E Jurkovic, Z. Cucej and D. Donlagic

267

EZW algorithm using visual weighting in the decomposition and DPCM L. Lecornu and C. Jedrzejek

271

Efficient 3-D subband coding of color video M. Domanski and R. Swierczynski

277

Adaptive wavelet packet image coding with zerotree structure T. Otake, K. Fukuda and A. Kawanaka

281

Efficiency of the image morphological pyramid decomposition D. Sandic and D. Milovanovic

285

Optimal vector pyramidal decompositions for the coding of multichannel images D. Tzovaras and M.G. Strintzis

289

* Due to unavoidable circumstances this paper has been placed at the end of the book on page 701.

xiv

Session J: Segmentation Multilingual character segmentation using matching rate K.-A. Moon, S.-Y. Chi, J.-W. Park and W.-G. Oh

295

Architecture of an object-based tracking system using colour segmentation R. Garcia-Campos, J. Battle and R. Bischoff

299

Segmentation of retinal images guided by the wavelet transform T. Morris and Z. Newell

303

An adaptive fuzzy clustering algorithms for image segmentation Y.A. Tolias and S.M. Panas

307

Hy2: A hybrid segmentation method E Marino and G. Mastonardi

311

Session K: Image Enhancement/Restoration Efficient computation of the 2-dimensional RGB vector median filter S.J. Sangwine and A.J. Bardos

317

Image restoration for millimeter wave images by Hopfield neural network K. Yuasa, H. SawN, K. Watabe, K. Mizuno and M. Yoneyama

321

Image restoration of medical diffraction tomography using filtered MEM K. Hamamoto, T. Shiina and T. Nishimura

325

Directionally adaptive image restoration X. Neyt, M. Acheroy and I. Lemanhieu

329

Optimal matching of images at low photon level M. Guillaume, T. Amoroux and P. Refregier

333

A method for controlling the enhancement of image features by unsharp masking filters E. Cernadas, L. Gomez, A. Casas, P.G. Rodriguez and R.G. Carrion

337

Image noise reduction based on local classification and iterated conditional models K. Hafts, S.N. Efstratiadis, N. Maglaveras and C. Pappas

341

Session L: Adaptive Systems II: CLASSIFICATION A neural approach to invariant character recognition I.M. Spiliotis, P. Liatsis, B.G. Mertzios and Y.P. Goulermas

347

xv Image segmentation based on boundary constraint neural network F. Kurugollu, S. B irecik, M. Sezgin and B. Sankur

353

A high performance neural multiclassifier system for generic pattern recognition applications D. Mitzias and B.G. Mertzios

357

Application of a neural network for multifont farsi character recognition using fuzzified pseudo-zernike moments M. Namazi and K. Faez

361

Integrating LANDSAT and SPOT images to improve landcover classification accuracy A. Chiuderi

365

Classification of bottle rims using neural networks-an LMS approach C. Teoh and J.B. Levy

369

INVITED SESSION M: Wavelets and Filter Banks in Communications Data compression, data fusion and kalman filtering in wavelet transform Q. Jin, K.M. Wong, Z.M. Luo and E. Bosse

377

Performance of wavelet packet division multiplexing in timing errors and flat fading channels J. Wu, K.M. Wong and Q. Jin

381

Time-varing wavelet-packet division multiplexing T.N. Davidson and K.M. Wong

385

Co-channel interference mitigation in the time-scale domain: the CIMTS algorithm S. Heidari and C.L. Nikias

389

Design and performance of DS/SS signals defined by arbitrary orthonormal functions W.W. Jones and J.C. Dill

393

COFDM, MC-CDMA and wavelet-based MC-CDMA K. Chang and X. Lin

397

Signal denoising through multifractality W. Kinsner and A. Langi

405

Application of multirate filter bank to the co-existence problem of DS-CDMA and TDMA systems S. Hara, T. Matsuda and N. Morinaga

409

xvi

Session N: Edge Detection Multiscale edges detection by wavelet transform for model of face recognition E Yang, M. Paindavoine and H. Abdi

415

Edge detection by rank functional approximation of grey levels J.P. Asselin de Beauville, D. Bi and EZ. Kettaf

419

Fuzzy logic edge detection algorithm S. Murtovaara, E. Juuso and R. Sutinen

423

Topogical edge finding M. Mertens, H. Sahli and J. Cornelis

427

Session O: Video Coding II: Motion Estimation Automatic parallelization of full 2D block matching for real-time motion compensation and mapping into special purpose architectures N. Koziris, G. Papakonstantinou and P. Tsanakas

433

New search region prediction method for motion estimation D.H. Ryu, C.R. Kim, T.W. Choi and J.C. Kim

439

Motion estimation by direct minimisation of the energy function of the Hopfield neural network L. Cieplinski and C. Jedrzejek

443

A modified MAP-MRF motion-based segmentation algorithm for image sequence coding D. Gatica-Perez, E Garcia-Ugalde and V. Garcia-Garduno

447

Unsupervised motion segmentation of image sequences using adaptive filtering O. Pichler, A. Teuner and B.J. Hostika

451

Development of a motion compensated coding system for an enhanced wide screen TV T. Hamada and S. Matsumoto

455

Session P: Biomedical Applications Brain evoked potentials mapping using the diffuse interpolation D. Bouattoura, P. Gaillard, P. Villon and E Langevin

461

Computer-aided diagnosis: detection of masses on digital mammograms A.J. Mendez, P.G. Tahoces, M.J. Lado, M. Souto and J.J. Vidal

465

Model order determination of ECG beats using rational function approximations J.S. Paul, V. Jagadeesh Kumar and M.R.S. Reddy

469

xvii Computation of the ejection rate of the ventricle from echocardiographic image sequences A. Teuner, O. Pichler and B.J. Hosticka

475

Contour detection of the left ventricle in echocardiographic images S.G. dos Santos, E Bortolozzi and J. Facon

479

Identification of a stochastic system involving neuroelectric signals A.G. Rigas

483

Invited Session Q: Signal Processing Theory and Applications Design of m-band linear phase FIR filter banks with high attetuation in stop bands T. Kida and Y. Kida

489

Robustness of filter banks F.N. Kouboulis, M.G. Scarpetis and B.G. Mertzios

493

Design and learning algorithm of neural networks for pattern recognition H. Takahashi and M. Nakajima

497

Statistical comparison of minimum cross entropy spectral estimators R.C. Papademetriou

501

Generalized optimum approximation minimizing various measure of error at the same time T. Kida

507

Determination of optimal coefficients of high-order error feedback upon Chebyshev criteria A. Djebbari, A1. Djebbarri, M.E Belbachir and J.M. Rouvaen

511

Invited Session R: VLSI DSP Architectures Dynamic codelength reduction for VLIW instruction set architectures in digital signal processors M. Weiss and G. Fettweis

517

Implementation aspects of FIR filtering in a wavelet compression scheme G. Lafruit, B. Vanhoof, J. Bormans, M. Engels and I. Bolsens

521

Recursive approximate realisation of image transforms with orthonormal rotations G.J. Hekstra, E.F. Deprettere, M. Monari and R. Heusdens

525

Radix distributed arithmetic: algorithms and architectures M.K. Ibrahim

531

Order-configurable programmable power-efficient FIR filters C. Xu, C.-Y. Wang and K.K. Parhi

535

xviii

Session S: Video Coding III: Multimedia On speech compression standards in multimedia videoconferencing: Implementation aspects M. Markovic and Z. Bojkovic

541

Multimedia communication graphical user interface design principles for the teleeducation J. Turan, K. Fazekas, L. Ovsenik and M. Kovesi

545

Image and video compression for multimedia applications D.G. Sampson, E. da Silva and M. Ghanbari

549

A multilayer image coding and browsing system G. Qiu

553

Switched segmented image coding-JPEG schemes for progressive image transmission C.A. Christopoulos, A.N. Skodras, W. Philips and J. Cornelis

557

Low bit rate coding of image sequences using regions of interest and neural networks N. Doulamis, A. Tsiodras, A. Doulamis and S. Kollias

561

Session T: Image Analysis I Iterated function systems for still image processing J.-L. Dugelay, E. Polidori and S. Roche

569

Sensing Surface Discontinuities via Coloured Spots C.J. Davis and M.S. Nixon

573

Image analysis and synthesis by learning from examples S.G. Brunetta and N. Ancona

577

A stabilized multiscale zero-crossing image representation for image processing tasks at the level of the early vision S. Watanabe, T. Komatsu and T. Saito

581

Finding geometric and structural information from 2D image frames R. Jaitly and D.A. Fraser

585

Detection of small changes in intensity on images corrupted by signal-dependent noise by using the wavelet transform Y. Chitti and P. Gogan

589

Deterioration detection in a sequence of large images O. Buisson, B. Besserer, S. Boukir and L. Joyeux

593

xix

Invited Session U: Color Processing Segmentation of multi-spectral images based on the physics of reflection N. Kroupnova

599

Using color correlation to improve restoration of colour images D. Keren, A. Gotlib and H. Hel-Or

603

Colour eigenfaces G.D. Finlayson, J. Dueck, B.V. Funt and M.S. Drew

607

Colour quantification for industrial inspection M. Petrou and C. Boukouvalas

611

Colour object recognition using phase correlation of log-polar transformed Fourier spectra A.L. Thornton and S.J. Sangwine

615

SIIAC: Interpretation system of aerial color images S. Mouhoub, M. Lamure and N. Nicoloyannis

619

Session V: Industrial Applications Nodular quantification in metallurgy using image processing V.L. Ballarin, E. Moler, E Pessana, S. Torres and M. Gonzalez

625

Image processing in the measurement of trash content and grades in cotton B.D. Farah

629

Automated visual inspection based on fermat number transform J. Harrington and A. Bouridane

633

Segmentation of birch wood board images D.T. Pham and R.J. Alcock

637

Techniques for classifying sugar crystallization images based on spectral analysis and the use of neural networks E.S. Gonzalez-Palenzuela and P.I. Vega-Cruz

641

Large-scale tomographic sensing system to study mixing phenomena M. Wang, R. Mann, EJ. Dickin and T. Dyakowski

647

Session W: Image Analysis lI Structural indexing of infra-red images using statistical histogram comparison B. Huet and E. Hancock

653

XX

A model-based approach for the detection of airport transportation networks in sequences of aerial images D. Sarantis and C.S. Xydeas

657

Context driven matching in structural pattern recognition S. Gautama and J.P.E D'Haeyer

661

An efficient box-counting fractal dimension approach for experimental image variation characterization A. Conci and C.F.J. Campos

665

An identification tool to build physical models for virtual reality J. Louchet and L. Jiang

669

Cue based camera calibration and its application to digital moving image production Y. Nakazawa, T. Komatsu and T. Saito

673

Session X: Signal Processing II A novel approach to phoneme recognition using speech image (spectrogram) M. Ahmadi, N.J. Bailey and B.S. Hoyle

679

Modified NLMS algorithms for acoustic echo cancellation M, Medvecky

683

Matrix polynomial computations using the reconfigurable systolic torus T.H. Kaskalis and K.G. Margaritis

687

Real-time connected component labelling on one-dimensional array processors based on content-addressable memory: optimisation and implementation E. Mozef, S. Weber, J. Jabar and E. Tisserand

691

A 2-D window processor for modular image processing applications and its VLSI implementation P. Tzionas, C. Mizas and A. Thanailakis

695

Session A IMAGING CODE I: VECTOR QUANTISATION, FRACTAL AND SEGMENTED CODING

This Page Intentionally Left Blank

Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

Joint optimization of multi-dimensional SOFM codebooks with QAM modulations for vector quantized image transmission O. AITSAB*, R. PYNDIAH* & B. SOLAIMAN** TELECOM BRETAGNE, B.P. 832, 29285 Brest Cedex, France. (Tel : (33) 98 00 10 70, Fax : (33) 98 00 10 98) *Dept. S.C., **Dept. I.T.I. Email : omar.aitsab @enst-bretagne.fr

Abstract Traditionally, source coding and channel modulation characteristics are optimized separately. Source coding reduces the redundancy in an input signal (information compression), while the modulation adapts the information to the transmission channel characteristics in order to be noise resistant. In this paper, the internal structure of the source coding scheme (a self organized feature map, vector quantizer) is trained in conjunction with a QAM modulation type, in order to increase the tolerance of transmission error effects. Results obtained using the standard Lenna image are extremely encouraging.

I- Introduction The requirements of digital transmission systems are now becoming so severe that it is no longer possible to optimize different functions in the system independently. Today, most transmission systems use the concept of coded-modulation [1] (TCM) which leads to a better spectral efficiency through the global optimization of channel coding and modulation. On the other hand, powerful source coding techniques are used to increase the number of sources transmitted in a given frequency bandwidth. However, the quality of the transmitted sources using these source coding techniques usually depends on the channel bit error rate. To go one step further, one would expect the subjective quality of the transmitted sources (image or speech) to remain acceptable even at a very low channel signal to noise ratio as in an analogue transmission system. In this paper, the joint optimization of image coding (using vector quantization) and modulation is considered in order to minimize the effect of transmission errors on the subjective quality of the received/reconstructed images.

II - Image source coding Recently, vector quantization (VQ) has emerged as an effective tool for image compression (source coding) [2]. In VQ, a data vector X (or a sub-image) to be encoded is represented as one of a finite set of M symbols. Associated with each symbol "i" is a reference vector (sub-image) "Ci" called a codeword. The complete set of M Code,re-orals is called the codebook. The codebook C = {Ci, i=1,2, .. M} is usually obtained through a train'mgprocess using a large set of training data that is statistically representative of the data encountered in practice. In this study, the determination of the codebook is conducted using the Self Organizing Feature Map (SOFM) proposed by T.Kohonen [3]. This model builds up a mapping from the N-dimensional vector space of real numbers ~RN to a two dimensional array "S" of cells. Each cell is given a virtual position in ~ N . This position (given by synaptic weights connecting this cell to the input vector) is in fact the codeword.

The purpose of the self-organization process is to find the position vectors such that the resulting mapping (correspondence between an input vector X and the cell which lies nearest in ~RN ) is a topology-preserving mapping (adjacent vectors in ~RNare mapped on adjacent, or identical, cells in the array "S"). The learning algorithm that forms feature maps selects the best matching (or winning) cell according to the minimum Euclidean distance between its position and the input vector X. All position vectors in the neighborhood of the winning cell are adjusted in order to make them more responsive to the current input. The quantized Lenna image using a 16xl 6 SOFM is given in figure 2 (Image 1). The codebook trained by the SOFM algorithm presents an internal order, which means that the Euclidean distance between codewords increases with the topological distance in the codebook (see figure 1); this order can be employed to increase error tolerance. In the next section, each codeword will be referenced by its topological position (i,j) on the SOFM. III- Image

transmission

In the case of a vector quantized image, the image transmission is done by transmitting the coordinates (i,j) of the different codewords representing the image. At the receiver end, the codewords corresponding to the received coordinates are used to reconstruct the transmitted image. It is clear that the received codeword can be different from the transmitted one when the received coordinates are subject to transmission errors. Furthermore, if we do not take any precautions, these codewords can be completely different, that is a white block may be transformed into a black one and vice-versa ("salt and pepper" noise). This can lead to a very bad subjective quality of the received image with black dots in white zones and vice-versa as illustrated by Image 3 in figure 2. To reduce the effect of transmission errors on the received image, the probability of a transition between two codewords must be a decreasing function of the Euclidean distance between them. To obtain this characteristic, the internal order of the bi-dimensional (16x16) codebook obtained with the SOFM algorithm was used in conjunction with a 256QAM modulation. In this particular case, each codeword is associated to one specific point in the 256QAM constellation (see figure 1). This means that the topology of the SOFM is preserved in the modulation space. Thus, and since the symbol error probability is a decreasing function of the Euclidean distance between the constellation points, the transition probability between two codewords will be a decreasing function of the Euclidean distance between them. The performance of this approach is illustrated by Image 2 in figure 2. We observe that the subjective quality of the reconstructed image is very good for a bit error rate of l 0 -2.

Figure I : Mapping of bi-dimensional (16x16) SOFM codebook and 256 QAM constellation

However, the 256QAM modulation is rarely used in practical transmission systems. So, we propose to transmit the codeword coordinates using a QAM modulation with a smaller number of states, for example 16QAM modulation. In this case, each coordinate is represented by 4 bits and associated with a specific point in the 16QAM constellation by using a Gray mapping. The result of the reconstructed image is shown in figure 2 (Image 3). The degradation of the image is great because the bi-dimensional codebook is not adapted to 16QAM modulation. In order to improve the quality of the received image, we have adapted the SOFM codebook topology to the type of modulation without increasing the complexity of modulation and source coding [4]. The main idea is to minimize the transmission error effects. So, two adjacent codewords must have adjacent points in the QAM constellation. In the best case, the number of codewords must be equal to the number of modulation states. This was the case with the 256QAM modulation and the reconstructed image presented good subjective quality even at a low BER (10-2). However, when the codeword number is greater than the number of modulation states, the SOFM topology must be adapted to the modulation. For 16QAM modulation, a four-dimensional codebook is required, and each codeword has 4 coordinates. Each coordinate takes 4 values, and each specific constellation point is associated with two coordinates. Thus, the four-dimensional codebook is trained for 16QAM modulation. Image 4 in figure 2 shows the reconstructed image by using this ordered codebook for a BER of 10-2. We clearly observe an improvement in the subjective quality: the PSNR is 5.7 dB higher than for the unordered codebook. IV - Simulation results

We simulated the effects of transmission errors and their compensation by joint opimization of the SOFM codebook and QAM modulation in image compression [5][6], using codebooks consisting of 256 codewords for 3 by 3 pixel subimages. The codebooks were trained using two images (boat and bridge) and were tested on the Lenna image. All the images were 512 by 512 pixels, with 256 grey levels. Distortion in the decoded images was measured using a peak signal-to-noise ratio (PSNR) defined as :

PSNR = 10 log

2552 MSE

dB,

where MSE is the mean square error.

V- Conclusion

The optimal association of a two-dimensional code book containing 16x16 elements with a 256QAM modulation is very robust to transmission errors. When using a 16QAM modulation, the overall performance of the system can be improved by using a 4-dimensional codebook specifically trained for 16 QAM modulation. However, we obtain lower performances than with the 256QAM constellation. This is due to the fact that in a 4-dimensional codebook of 256 elements, each codeword has 8 closest neighbors instead of 4. In this case it is difficult to minimize the VQ distortion and reduce the transmission error effect.

Figure 2 : The reconstructed VQ image after transmission through a Gaussian noisy channel. 1 : The reconstructed image without transmission errors PSNR = 30dB. Image 2 : the reconstructed image with ordered codebook for 256QAM modulation (BER = 10 "2) PSNR = 29.1dB. Image 3 : the reconstructed image with u n o r d e r e d codebook for 16QAM modulation (BER = 10 "2) PSNR = 21.12dB. Image 4 : the reconstructed image with ordered codebook for 16QAM modulation ( B E R = 10 "2) PSNR = 26.82dB.

Image

References [1] G.Ungerboeck, "Channel Coding With Multilevel/Phase Signals", IEEE Trans. on Information Theory, vol. IT-28, 1982, pp 55-67. [2] R.M.Gray, "Vector quantization," IEEE Acoustic, Speech and Signal Processing Magazine, vol. 1, pp 4-29, Apr. 1984. [3] T.Kohonen, " Self Organization and Associative Memory, "New York, Springer-Verlag, 1984. [4] J.Kangas, "Increasing the Error Tolerance in Transmission of Vector Quantized Images by SelfOrganizing Map", ICANN 95, pp 287-291, Paris. [5] J. Kangas and T. Kohonnen, "Developments and applications of the Self-organizing map and related algorithms". In Proc. IMACS Int. Symp. on Signal Processing, Robotics and Neural Networks, pp 19-22, 94. [6] D. S. Bradburn, "Reducing transmission error effects using a self-organizing network". In Proc. IJCNN'89, Int. Joint Conf. on Neural Networks, vol.II, pages 531-537, Piscataway, NJ,1989

Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

Visual Vector Quantization For Image Compression Based on Laplacian Pyramid Structure tZ. H e , SG. Qiu )~University of P o r t s m o u t h , U . K .

and tS. C h e n SU n i v e r s i t y o f D e r b y , U . K .

Abstract In this paper, we propose a new image coding scheme based on the Laplacian pyramid structure (LPS) and the visual vector quantization (VVQ). In this new scheme, the LPS is used to generate the residual image sequence, and the VVQ is used to code these residual images. Comparing with other block-based coding methods, the new scheme has much less blocking effects on the reconstructed image since coding is performed on the basis of hierarchical multiresolution blocks. The new scheme also has an additional advantage of a much lower computational cost over traditional vector quantization (VQ) techniques since encoding and decoding are based on much smaller dimensional 'visual vectors'. Experimental results show that the new scheme can achieve comparable rate distortion performance to that of traditional VQ techniques, while the computational complexity of the new scheme is only a fraction of that of traditional VQ techniques.

1

Introduction

In recent years, the demand for image transmission and storage has increased dramatically and research into efficient techniques for image compression has attracted extensive interest. Among many coding techniques, the LPS [1] and the VVQ [2] are two efficient coding techniques in terms of compression ratio, fidelity and computational expense. In this paper, we propose a new image coding scheme by combining the LPS and the VVQ, which inherits the advantages of the both techniques. In this new scheme, the LPS is employed to generate the residual image sequence and the VVQ is used to code these residual images. Experimental results show that the new scheme can achieve comparable rate distortion performance to that of traditional VQ techniques, while the computational cost of the new scheme is much lower since the encoding and deoding are based on much smaller dimensional 'visual vectors'. Because the coding operation is performed on the basis of hierarchical multiresolution blocks, the new scheme has much less blocking effects on the reconstructed image than that of traditional VQ techniques. The remaining of the paper is organized as follows. Section 2 summarizes the LPS and the VVQ system for coding Laplacian residual images is described in section 3. Section 4 discusses the image reconstruction. Section 5 presents experimental results and section 6 gives some conclusion remarks.

2

The Pyramid Structure

The generation of the pyramid structure includes the generation of the Gaussian pyramid and the generation of the Laplacian pyramid. The process is illustrated in Fig.1.

Gaussian Pyramid Generation The original image Go of size M • N pixels becomes the level 0 of the Gaussian Pyramid. Upper level images are generated b y using the reduction function R(.)[1] defined in (1), iteratively.

Gl(i,j) -

~ m---2

y ] w(m,n).Gl_l(2i + m, 2j + n)

O< l < i,

O < i < Ml, O < j < Nl.

(1)

n---2

L is the number of levels in the pyramid, Ml and Nl are the dimensions of the lth level, and w(m, n) are weighting kernels. Fig.2 shows a 5-level Gaussian pyramid of "Lena".

Laplacian Pyramid Generation The reverse of the reduction function R(.)is the expansion function E(.)[1] defined in (2). Let the GZ,n be the result of expanding Gl n times. Then

Figure 1: Pyramid Structure Generation

2

Figure 2: 5-Level Gaussian Pyramid of "Lena"

2

Gt,n(i,j)- 4 ~

~

w(m,n).Gl,~_a( i~- ' m j - 2 n )

O
O<_n<_l,

(2)

m---2 n=-2

0 _ i < Ml-n,

O<_j
The Laplacian pyramid is a sequence of residual images Io, Ix, " ", IL, each being the difference of two adjacent levels of the Gaussian pyramid. Thus for 0 _ _ _ l _ L - 1 (3) Ii-Gz-Gl+l,l IL--GL Fig.3 shows a 5-level Laplacian pyramid of "Lena" image generated by Eqn.(3).

Figure 4: VVQ Image Coding System

Figure 3: 5-Level Laplacian Pyramid of "Lena"

3

Visual Vector Quantization

The design of the VVQ coding system consists of the design of the coding-book used in coding phase and the design of the decoding-book used in decoding phase. D e s i g n of C o d i n g - b o o k The residual image Il of size M l x Nz is divided into Pl x Ql blocks of size ml x nz, where Pl = Ml/ml, QI = N~/n~. The horizontal and vertical derivatives [2] of each block are calculated as

1 ~I,(4p+i,4q+j).gh(i,j) Oh(p, q ) - -~ i=o j=o

O<_p
O<_q
(4)

Dv(p,q) - -~ ~

0 <_p < Pl,

0 <_ q < Ql

(5)

Ii(4p + i,4q + j ) . gv(i,j)

i=0

j=0

where the values of the kernels gh and gv can be written collectively in matrix form as 1

Gh --

-1

-1

1 1 -1 1 1 -1 1 1 -1

1

-1 -1 -1

1

Gv -

1

1

1

1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1

The horizontal and vertical derivatives of a block are used to form a "visual vector", (Dh, Dr), to represent the block. The visual vectors of all blocks of residual image Il are partitioned into Nc clusters using the competitive learning [3], and these cluster centers are used to comprise the coding-book for residual image Il.

D e s i g n of D e c o d i n g - b o o k A multilayer neural network is trained using backpropagation [4] to reproduce the residual blocks of Il when the corresponding visual vectors are presented at the input. The decoding-book is obtained from the output of the trained network by feeding the cluster centers to the input. The network has 2 input neurons and ml • nl output neurons. The number of the hidden-layer neurons is decided by experiment. The whole VVQ image coding system is illustrated in Fig.4.

4

Image Reconstruction

The reconstruction of the original image G0 from the decoded residual image sequence I0, I1, "" ", IL is achieved by reversing the operations of the Laplacian pyramid generation as follows:

GL = IL GL-1 -- GL,1 -~" YL-1 .

.

.

.

.

.

.

G1 -- G2,1 -}- 11 G0 - GI,1 -Jr I0

.

.

GL,1 -- E(GL) GL-I,1 -- E(GL-1) .

.

.

(6)

G1,1 -- E(G1)

where E ( . ) i s defined as in (2).

5

Experimental Results

Two 512• 512 monochrome images, "Lena" and "Peppers", with 8 bits per pixel, were used to evaluate the proposed new scheme. The "Lena" image was used to train the system and the "Peppers" image was used to test the system. A 5-level pyramid structure and a 4-level pyramid structure were investigated separately. The highest level residual image of the pyramid structure was coded directly using 8 bits per pixel. All other lower level residual images were coded using VVQ with the 4 • 4 block size for levels 0 and 1 and the 2 • 2 block size for levels 2 and 3, respectively. The coding-book size was chosen as N~ = 9. A 3-layer neural network with 2 input neurons, 10 hidden neurons and 16 or 4 output neurons was used to generate the decoding-book, depending on whether the 4 • 4 or 2 • 2 block size was used. In the test, it was found that a large amount of blocks fell into the class which had least significant edge contents. Hence a variable bit rate coding strategy was used. One bit was used to code the blocks which belong to this class, and 4 bits for each of the other 8 classes. Fig.5 and Fig.7 are the images reconstructed by the proposed coding scheme using 5-level and 4-level pyramids, respectively. As a comparison, Fig.6 and Fig.8 show the images reconstructed by the traditional VQ technique LBG [5] using the codebooks of size 8 and 16, respectively, with a block size of 4 x 4. The performance and the computational cost of the proposed new scheme and the LBG are summarized in Table 1. Experimental results show that the proposed new scheme has much less blocking effects on the reconstructed image compared with the conventional VQ technique. This results in a smother reconstructed image as is evident in Fig.5-8. From Table 1, it can be seen that, at similar bit rate and peak signal to noise ratio (PSNR), the computational requirements of the new scheme is only a small fraction of those of the LBG scheme.

6

Conclusions

A new image coding scheme has been proposed based on a combined LPS and VVQ approach. Since the coding is performed on the basis of hierarchical multiresolution blocks, the proposed new scheme has much less blocking effects on the reconstructed image compared with other block-based techniques. The computational cost of the proposed scheme is also much lower than traditional VQ techniques because the new scheme uses much smaller dimensional visual vectors to represent image blocks.

References [1] P. J. Burt and E. H. Adelson, " The Laplacian pyramid as a compact image code," IEEE Trans. Comm., Vol. COM-31, No. 4, pp532-540, April 1983. [2] G. Qiu, M. R. Varley and T. J. Terrell, "Image coding based on visual vector quantization," Image Processing and Its Applications, IEE Conference Publication No.410, pp301-305, July 1995 [3] T. Kohonen, Self-Organization and Associative Memory. Second Edition, Berlin, Springer-Verlag, 1988.

l0

[4] D. E. Rumelhart, G. E. Hinton and R. J. Williams, "Learning internal representations by error propagation," Chapter 8 in Parallel Distributed Processing, Vol. 1, Cambridge, MA, MIT Press, 1986. [5] Y. Linde, A. Buzo and R. Gray, "An algorithm for vector quantizer design," IEEE Trans. Comm., Vol. COM-28, No. 1, pp84-95, January 1980. Table 1: Comparison of Proposed Scheme and LBG Coding Scheme Performance Computational Cost Scheme Parameters bitrate(bpp) PSNR(dB) Addition No. Multiplication No. Proposed 5-L Pyramid 0.18 24.71 153,600 25,600 LBG codebook size 8 0.19 25.60 8,388,608 8,388,608 Proposed 4-L Pyramid 0.27 27.49 137,216 24,576 LBG codebook size 16 0.25 26.63 16,777,216 16,777,216

Figure 5: New scheme reconstruction (5-L pyramid) Figure 6: LBG reconstruction (codebooksize 8)

Figure 7: New scheme reconstruction (4-L pyramid) Figure 8: LBG reconstruction(codebooksize 16)

Proceedings IWISP '96," 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and,P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

11

Kohonen's Self Organizing Feature Maps with variable learning rate. Application to image compression A . C z i h 6 *#, B . S o l a i m a n * , G . C a z u g u e l * , C . R o u x * and I.Lov~inyi # *Ecole Nationale Sup6rieure des T616communications de Bretagne, FRANCE D6pt. Image et Traitement de I'Information (I.T.I) B.P.832, 29285 Brest Cedex #Technical University of Budapest, HUNGARY Department of Process Control, Budapest XI, Muegyetem rkp. 9, 1521 I. I n t r o d u c t i o n The most important task encountered in digital image transmission systems is image compression aiming at reducing the amount of information to be transmitted. The overall goal is to represent an image with the smallest possible amount of bits, and thereby, to speed up transmission and minimize storage requirements. In the past decade, a promising compression approach using vector quantization (VQ) [1-2] has received great attention. In this approach, images to be encoded are first divided into small n• blocks. Each block is considered as a N-dimensional vector (with N=n 2) in the fitN vector space. VQ is a mapping from 9~N into a finite subset f2 ofg~ N, where f~ ={Wl . . . . wi . . . . WM} is a set of prototype vectors. The set f2 is generally called the VQ codebook and each vector w i in f2 is called a codeword or codevector. For each source block x, a codeword w i in f2 is selected as the representation of x and only the address (or the index) of this codeword in the codebook has to be transmitted instead of the transmission of the whole source block x in ~N. The effectiveness of the VQ is mainly determined by the set of codevectors. Therefore, the codebook design is a key question in this approach. This is generally performed by applying a learning method on a training set. This data base is formed by a certain number of blocks issued from images that are supposed to be representative of the images to be encoded. Recently, the use of neural networks for codebook design has been investigated [3]. Kohonen's Self Organizing Feature Map (SOFM)[4] is one of the most promising neural networks for this type of application. This is mainly due to its ability to form ordered topological feature maps in a self organizing fashion. In this paper, a codebook design approach based on the use of a variable learning rate Kohonen learning algorithm is proposed. It will be referred to as Distance-Dependent Learning Rate (DDLR) model. Simulation results show that the proposed approach is extremely promising.

lI. K o h o n e n n e t w o r k and c o d e b o o k design The Self Organizing Feature Map (SOFM) introduced by T. Kohonen [4] is one of the most successful vector quantization neural models to form clusters in the input space using an unsupervised learning approach. This model builds up a mapping from the N-dimensional input vector space of real numbers ~ N to a two dimensional array "S" of cells. Each cell is associated with a synaptic weight vector connecting the cell to the input vector (i.e. the codeword) in ~N. The purpose of the self organization process is to find the weight vectors such that the resulting mapping (correspondence between an input vector x and the cell which lies nearest to in 9~N) is a topology preserving mapping (adjacent vectors in ~ N are mapped on adjacent, or identical, cells in the array "S"). The basic idea behind this model is to move weights to the centroids contained in learning set by updating weights on each input value. The learning algorithm selects the best matching cell according to the minimum Euclidean distance between its weight vector Wk and the input vector x. This cell is referred to as the winning cell. All the weight vectors in the neighborhood of the winning cell are adjusted in order to make them more responsive to the current input. The neighborhood decreases in size with time. The updating rule is: w~(t) = w i ( t - 1) + ~(t)" ( x - w i ( t - 1))

Vi ~ Nk(t)

(1)

where wi is the weight vector of the i-th neuron, [3 is the learning rate that decreases with time, x is the input pattern, Nk is the neighborhood of the winning neuron and t is the time (the number of accomplished iterations).

12 This model has shown promising results in image compression. In this case the training set consists of sub-image blocks issued from some learning images, and at the end of the learning process the synaptic weights of each cell represent a reproduction block (codeword). The topologypreserving property makes possible the exploitation of the map in the so-called finite state VQ scheme. This feature allows fast codeword searching as well. The purpose of the SOFM algorithm described above is to find position vectors that approximate the probability density function (p.d.f) of the training set while preserving a topology. However, some blocks in a typical training set are much more frequent than others. For example, homogeneous blocks occupy typically a large portion of the image area, while blocks with large variation, such as edges, are few. Consequently, in the training set what is created from such images, the p.d.f in the area of homogeneous blocks is much more important than around blocks representing edges. This provides a quite unbalanced codebook: the SOFM creates too many homogenous codewords and not enough edge blocks. However, edges - and other parts of images occupying relatively small portion of the image area - are visually very important. This suggests that the SOFM learning algorithm in VQ codebook generation is not visually optimal. We propose to introduce a modification to the updating rule (1) by varying the learning rate according to the distance between the input pattern and the winning cell. Rather than approximating the p.d.f in the input space, we try to avoid creating many similar blocks while finding codewords that are visually very important, even if they are ill-represented in the training set. This idea is applied by inserting a parameter o~in the updating rule as follows: wi(t) = w i ( t - 1) + 0~(X,Wk). I](t). [X-- wi(t-- 1)] Vi eNk(t ) (2) This parameter must be low if the actual pattern of the training set is near to the winning neuron. If the distance is large, (x becomes higher. One possibility is to set o~ to 0 or 1 depending on a distance threshold. However, the following choice seems to be more adequate: t-1 (3) a ( x , w k) = f ( d ( x , w k) /dma x) where d0 is the Euclidean distance and dtm-1 is the maximum value of the Euclidean distances between each input sample and the corresponding winning neuron, and f0 is the function:

f (Y) - {Yl

if

y<-l}y>l

With this definition the parameter a has always a value between 0 and 1. The more the presented vector is close to the winning cell, the less the modification concerning the weights of the winning cell as well as its neighborhood is important. However, if the training vector is far from the winning cell, a greater modification is applied. In this way, the effect of the weight updating rule is guided in order to better represent the whole training set. Including d t-1 into the parameter definition provides a simple normalization and allows to adapt the DDLR rule to the training set.

III. Simulation results In order to justify the proposed DDLR model two tests are proposed. First, this approach is tested using an artificial training set of two-dimensional vectors given in Fig.la, where 12 gaussian clusters containing 200 points and 4 clusters containing only 20 points are created. Figs. lb and lc show the synaptic weights of a 4x4 SOFM using the classical and the DDLR models, respectively. It is clearly shown that in the second case the map has found also the four poorly represented clusters.

Fig. 1. Simulation results with the artificial training set

13

The second simulation was done using medical images. Training vectors were issued from four ultrasonographic endoscopy images. The image of this type presented in Fig.2 was further on compressed. The VQ was done using both the classical and DDLR approaches. Since we used 4x4 block size and 16x 16 maps, the codebooks contain 256 16-dimensional vectors. This means that the compression ratio is 0.5 bit/pixel, without applying any entropy code. A comparison of the two constructed SOFM with quantitative data is given on Table I. The resulted codebooks and the reconstructed images are presented in Fig.3 and 4 respectively.

Fig. 2. Original image

The effect of our proposed modification is clearly shown on Fig.3. The classical learning rule provides unnecessarily many completely dark and homogenous codewords, which is due to the large ratio of this type of blocks in the training set. The DDLR model avoids this problem and permits to obtain light blocks as well as large variation codewords.

Fig. 3. Generated codebook provided by the classical as well as the DDLR rules

Classical model DDLR model

Max wins

Min ED

Max ED

1241 2838

0.38 4.48

164.45 124.10

Table I. Comparison of the two SOFMs

The way the created SOFMs represent the training set is also illustrated in Table I through some numeric data. These data are: the number of winning case of the most active cell (i.e. the cardinality of the cluster containing the most of the 13464 training vectors, Max wins), and the minimum and the maximum Euclidean distance between each training block and its winning codeword (Min ED and Max ED). These results show that the most frequent codeword represents a cluster that contains much more training blocks in the DDLR case. In fact, this codeword is the completely black one. Representing many similar black block with one codeword permits the creation of small, but visually important clusters. This also means that the worst case (maximum distortion) is better represented in cost of degrading the quality of reproduction in the better cases. The compressed images are shown in Fig.4 and a comparison in terms of objective measures is given in Table II. Even though our aim was to improve visual compression quality, objective distortion criterion such as peak signal to noise ratio (PSNR) is also moderately increased. Its definition is:

14 2552 PSNR= 10 l~176_1 2.,[T'xi-

x' i

)2 dn

T i=1 where 255 is the peak signal value, T denotes the total number of pixels and xi and x'i denote the original and reproduction pixels, respectively. The subjective image quality is enhanced when applying the DDLR method (see Fig.4): the blocking artifact is reduced (see the smallest circle in the middle or the contour of the oesophagus wall) and the details are less masked in the visually important areas. However, it is also visible that the quality of the second and fourth circle is slightly degraded. This is because these parts are quite dark and therefore these blocks are clustered with other dark blocks. However, these circles do not belong to the oesophagus and therefore this degradation is an acceptable cost considering the quality improvement in other important areas of the image. Another interesting effect of our model is that the block entropy of the compressed images is decreased (see Table II). The entropy is definedas 256 E = - ~ Pi log Pi i=1 where Pi is the occurrence probability of the i-th block. The smaller entropy indicates that, applying an entropy coding method (e.g. Huffmann coding) on the block indexes, a greater compression ratio could be provided while using the DDLR approach. Entro[gy 7.39 6.25 ! Table II. Image compression quantitative results

I Classical model I DDLRmodel

PSNR 27.78 27.91

Fig. 4. Reconstructed images using the classical (a) and the DDLR (b) codebooks

IV. Conclusions In this paper we proposed a modification of the Kohonen's learning algorithm in order to improve the visual quality of VQ codebooks. Using the Distance-Dependent Learning Rate, the codebook becomes more balanced and contains a larger variety of codewords. This was showed through an artificial as well as through a real training set. The significant effect of the proposed approach concerns the improvement of the visual quality of compressed images while decreasing the entropy which was illustrated with the application on medical images. [1] N.M. Nasrabadi and R.A. King, "Image coding using vector quantization: a review", IEEE Trans.Com., Vol.36, pp. 957-971, Aug. 1988. [2] R.M. Gray, "Vector Quantization", IEEE ASSP Mag., pp. 4-29, Apr, 1984. [3] R.D. Dony and S. Haykin, "Neural network approaches to image compression", Proceedings of the IEEE, Vol.83, No.2, February 1995. [4] T. Kohonen, "Self-Organization and associative Memory", 3d ed, 1989, Springer-Verlag.

Proceedings I W I S P '96; 4-7 N o v e m b e r 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

15

An Efficient Training Algorithm Design For General Competitive Learning Neural Networks J. Jiang

Department of Computer Studies, Loughborough University, United Kingdom D. Butler School of Engineering, Bolton Institute, United Kingdom Abstract:

This paper presents an efficient algorithm design for all competitive learning based image compression neural networks. Similar to conventional vector quantization algorithms, this type of LVQ neural networks compute Euclidean distance to select a winner each time a block of image pixels is processed. The proposed algorithm introduces a simplified distance definition as a pre-test to exclude most of the neurons which are unlikely to be the winner throughout the training cycle in the competitive learning process. Thus provides significant efficiency improvement over the standard algorithm.

Keywords: Neural networks, image compression and algorithm design.

1. Introduction Vector quantization has been used as a major technique in image coding and compression for many years. It attracts numerous publications and research interests each year in which one area of research is to design various fast algorithms to improve its computing efficiency and running speed in constructing the optimised code-book. Straightforward implementation of all VQ algorithm uses full-search method which involves an exhaustive search of the distances for all available centroids. In this way, the complexity of processing each input vector is proportional exponentially to its dimension N and the overall bit rate. Hence, numerous ideas and methods have been developed to address the issue under one principle of the so called nearest neighbour search[I-4]. In summary, all previous work can be classified into two basic groups. One is to seek a sub-optimal solution which is almost as good as the full search in terms of mean square-error(MSE), instead of solving the neighbour Search problem itself. The other is to use a treestructured code-book search to divide the full search into a number of stages and each stage excludes a substantial subset of those candidate vectors by a relatively small number of operations. Based on general competitive learning algorithm, a family of LVQ neural networks have been developed for direct image vector quantization to achieve data compression[5-7]. The basic idea of the neural network image compression system can be illustrated in Fig. 1. Input images are split into blocks of 4 x 4, 8 x 8 or 16 x 16 pixels which construct input vectors for neural networks. A number of sample images are often used to train the network and obtain the best possible code-book represented by neuron coupling weights {wij: i = 1, 2 .... N; j = 1, 2 .... M}, where N is the dimension of each codeword and M is the code-book size. The code-book can also be described by M codewords of N dimensional vectors {Wj: j = 1, 2 .... M}. No matter how different each individual neural network is developed from each other, the basic training algorithm can be summarised as follows: Input v e c t o r ~

Step 1 Initialisation of Neuron Weights: The M neuron coupling weights are initialised as the starting codebook: Wj(O); j = 1, 2 .... M.

Step 2 Competition:

Computing the distance:

:i=l 2...N j=12 M

Dij = d(X~, Wj(t)) j = 1, 2 .... M The winning neuron k is selected with:

Step 3 Learning and Updating:

Output Dik = min Dij Figure 1 Image compression neural network J Wk(t+ 1) = Wk(t) + (I, (t) Ik(t)[Xi - Wk(t)}

where o; (t) - learning rate at iteration t; and Ik(t) - scaling function specifying the sign and magnitude of the difference vector being updated for the winning neuron k.

Step 4 Termination: Repeat steps 2-3 until the terminating criterion is met.

16

Without efficient algorithm being developed, the training often takes long computing time. As a fundamental problem for all VQ algorithms whether they are conventional VQ or LVQ neural networks, the Euclidean distance calculation or its modified version can always be identified as the major bottleneck: N

d(Xi, Wj ) = Z (Xni - -

Wnj

n=l

)2

(1)

According to the standard definition of Euclidean distance given in (1), its computation requires N multiplication and (2N-1) additions/subtractions. Hence for each input vector, the distance calculation in training the neural network takes M x N multiplication and M x (2N-1) additions/subtractions. For one image sample of size 256 x 256, the number of input vectors will be 4096 with N=4 x 4. Considering that the training of neural networks often take a multiplication and group of T images, the total computation cost at minimum will require 4096 x T x M x N 4096 x T x M x (2N-1) additions without considering those repeated training cycles. It is clear that efficiency in neural network training is an important issue. Although various fast searching algorithms[I-4] have been developed for conventional vector quantization algorithms as discussed above, direct introduction of those techniques often make the whole training algorithm more sophisticated as all the neurons have to be organised into tree structures. In this paper, we propose a substantially simplified distance calculation as the solution to the problem without involving any tree structures of reorganising the neurons. The rest of the paper is simply arranged as follows: the section 2 describes the new algorithm design and the section 3 reports the experimental results and give conclusions.

2. Algorithm Design To improve the efficiency of neural network training, we expand the equation (1) into: N

N

N

N

d(Xi, Wj) -- Z xni2 4- Z Wnj2 - 2 Z XniWnj "~-Ilxill + n=l

n=l

IIwII -%

n=l

x.,w.~

(2)

n=l

where Xi = [x,i, x2i, ... x,i} and Wj = [wq, w2j.... w,,i} are N-dimensional vectors corresponding to input image block and neuron coupling weights. The first term, IIx, II= in the above equation, is derived from the input vector. It is a constant and has nothing to do with the competitive learning operation. Hence the equation can be further arranged as: N

d(X,, wj)- Ilxill = -IIw~ll =- 2Zx~,w~,

(3)

n=l

To select a winning neuron, we only need to calculate equation (3) rather than (1). Further analysis shows that the term,

, can be pre-calculated and stored in the network prior to the competitive learning search since it is only

related to the neurons. The last term in equation (3),

Z

n_.]x.iw.j, relates to both input vector and the neuron

weights. It plays an important role in selecting the winner at each iteration. However, when all the vector elements, x.i and w.j, are positive, we have: N

N

(4)

ZXniWnj~XmaxZWnj n=l

n=l

where X,,ax corresponds to the maximum element in the input vector Xi. This can also be pre-selected before the training starts. Therefore, if we define: N

w )-IIx,

=

o(x,.

Iiw ll -

=Zx.,w., n=l

(5)

17

as the modified distance between Xi and Wj, the equation (4) can be used to define an approximate distance

D(Xi, Wj ) as follows: N

-D(Xi, % ) = IIw~II= - 2Xmax Z Wnj <- O(Xi, Wj )

(6)

n=l

II II and Zn=lWnj, can be pre-calculated and stored in the network, equation (6) can be further

Since both terms, simplified as: m

(7)

O( Xi, % ) = Aj - Xmax nj where Aj =

and Bj = 2

n=l

w,~j. They only relate to the neurons in the network. Hence they are constants as

far as the competitive learning is concerned. After all the modifications, an efficient algorithm can be designed below to complete the step 2 in the general competitive learning of neural networks given in the first section:

Step 2:

min_distance =A l-Xmaxnl; for (j = 2; j <= M; j++){ distance = Aj- XmaxBj; if (distance < min_distance){ distance = Aj- 2

X n=l xniwnj "

if (distance < min_distance){ winner = j; min_distance = distance;

1

For most of the neurons, only the approximate distance, D(Si, % ), is calculated which requires 1 multiplication and 1 subtraction. The real distance, D(Xi, Wj), will not be needed until the condition, D ( X i , Wj )< min_distance, is satisfied. Under this circumstance, the calculation will take N + I multiplication and N additions/subtractions. In the real distance calculation, further efficiency can be achieved when the partial distortion technique is incorporated[8]. In step 3, learning and updating occurs to Wj as well as Aj and Bj. Since this updating only concerns one winning neuron, the overall training algorithm is more efficient than the standard one. Finally, to make equation (3) correct, all the elements in X and W are required to be positive. The simplest solution is to add a positive offset to the two vectors[3]. Hence the new vector becomes:

X'ni-" Xni +

Ip[

wo + Ip[

n = 1 , 2 .... N

It is not difficult to prove that the modification will not affect the calculation of distances between X and W, namely: N

d(X'i, W j) -- Z (x' ni -W' nj )2 __d(Xi Wj)

(7)

n=l

3. Experimental Results and Conclusions To assess the efficiency improved, we tested the proposed algorithm by training the general competitive learning neural network with Lena (256x256) as shown in Fig. 2. When M=128, the total number of multiplication and

18

additions/subtractions are obtained as given in Table I. The experimental results are compared with the straightforward implementation with full Euclidean distance calculation which is given as standard algorithm in Table I. The efficiency improvement is expressed in percentage obtained in dividing the figures for standard algorithm by the figures for the proposed algorithm. It is clear that the advantage of the proposed algorithm is significant compared with the full Euclidean distance one.

Table I ExperimentalResults Standard Algorithm .... Efficiency Improvement

.....Computing Cost ...... Pr0p0sedA!gorithm multiplication 3119513 addition 5714739

8388608 16252928

................................

37% 35%

In this paper, we have designed an efficient algorithm for training competitive learning neural networks with image compression examples. The efficiency improved is significant in comparison with straightforward full search implementation without affecting the optimisation of the final code-book. The basic idea introduced is to define an approximate version of Euclidean distance and use this approximation to exclude most of the neurons for being the possible winner in the training process, and the approximated Euclidean distance only requires one multiplication and one subtraction which is substantially simplified in comparison with both the standard definition and other simplified versions[4].

References Figure 2 Imagesample [ 1].Linde Y., Buzo A. and Gray R. 'An algorithm for vector quantizer design', IEEE Trans. On Communications, Vol Com-28, (1), pp 84-95, 1980. [2].Lee C.H. and Chen L.H. 'Fast closest codeword search algorithm for vector quantization' IEE Proceedings-Vision, Image and Signal Processing, Vol 141, No 3, June 1994, pp143-148. [3].Katsavounidis I., Kuo C.C.J. and Zhang Z. 'Fast tree-structured nearest neighbor encoding for vector quantization', IEEE Trans. on Image Processing, Vol 5, No 2, February, pp 398-404, 1996. [4].Torries L. and Huguet J. 'An improvement on codebook search for vector quantization' IEEE Trans. On Communications, Vo142, No 2/3/4, 1994, pp 208-210. [5].Ahalt, S.C. Krishamurthy, A.K. et.al 'Competitive learning algorithms for vector quantization, Neural Networks, Vol. 3, pp 277-290, 1990. [6].Chung, F.L., Lee, T. 'Fuzzy competitive learning', Neural Networks, Vol. 7, No 3, pp 539-551, 1994. [7].Fang W.C., Sheu B.J. et al 'A VLSI neural processor for image data compression using self-organization networks' IEEE Trans. On Neural Networks, Vol. 3, 1992, No. 3, pp 506-519. [8].Bei C.D. and Gray R.M. 'An improvement of the minimum distortion encoding algorithm for vector quantization', IEEE Trans. on Commun., Vol Com-33, October 1985.

Proceedings IWISP '96," 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

19

Architecture Design for Polynomial Approximation Coding of Image Compression Chung-Yen Lu and Kuei-Ann Wen Abstract Polynomial approximation coding (PAC) is well-known for image compression. 2-D natural polynomial is chosen to approximate the shape of block images, and the polynomial coefficients are completed by regressive method. A fast method for the calculation of the regressive coefficients of block size 8x8 is derived, and the encoder design for PAC algorithm could achieve not only smallarea but also high-speed advantages. The architecture of PAC is presented in this paper with its performance analysis. It will be shown that PAC is well suited for very low bit rate transmissions.

I. Introduction Polynomial approximation coding (PAC) is derived from polynomial regressive technique. A set of polynomial coefficients are obtained by regressive technique from a set of image data, and the coefficients are then used to represent the set of image data. In comparison, the bits for the coefficients for the coefficients are less then the bits for the original image. Although polynomial regression (PR) [1]-[2] is a well-known science technique, it is not widely applied in image compression. This is because that the high frequency components of block images will be abandoned automatically by PR method, it turns out that the coding quality by PR is worse than transform-based algorithm, e.g. JPEG [3]. While for low bit-rate compression applications, most of the high frequency components will be quantized to be zero as in transform-based algorithm, the performance degradation of transformbased algorithm will be close to that of PR. Since, a simplified computation method of PR is derived, the architecture of PR is much simplified compared to most fast DCT algorithms. Hence, high speed and low-cost encoding process could be provided by PAC.

II. PAC system There are three processes in the PAC encoding system, including regression coefficients estimator (RCE), quantization and variable-length coding. An overall system model of PAC is illustrated in Fig. 1. .... f(x,y) ~ /

coefficients

Quantizer

~

length

estimator

encoder

Regression

Variable

_ _~ ,

!channel .f(x, y) ~

surface

generator

Dequantizer ~

length

__

Decoder

Fig. 1 The overall system of PAC At the input to the encoder, source image samples are grouped into square blocks of size 8x8, and fed into the regression coefficient estimator (RCE). At the end of the decoder, the regression surface generator (RSG) output 8x8 sample blocks to form the reconstructed image. Each 8x8 block of source image samples is effectively a 64-point discrete signal which is approximated by a function of two spatial dimensions x and y. The RCE takes such a signal as its input and obtain the parameters of the surface by regression method. The output of RCE is a set of regression coefficients whose values are uniquely determined by the particular 64-point input signal.

20 A 2-D polynomial regression equation in PAC is expressed as" p ( x , y ) = [30 + ~ lX + ~ 2Y + ~ 3X2 +[~4y 2 + ~ 5xy + ~ 6x2 y + ~ 7xy 2 + [38x2y 2

(1)

The regressive model is expressed as matrix form as the following equation: (2)

F = XBY' + e

where

f(0,0)

f(0,1)

9. 9 f(0,7)l

F = f(1.,0)

f(1,1) 9

... .

9

.

Lf(7,0)

f(1.,7)[ is the image data of size 8x8

",

9

f(7,1)..,

9

f(7,7)]

I -1 [30

B = [32

/

[~1

[~3

[35

[36 is the polynomial coefficients vector.

137

X --

e"-

1

xo

x o2

1

x,

Xl2

X7

X7

.

.

and Y =

eoo

eo~ ...

eo7

e~o .

e~.

el7 .

I

,... .

1 Yo Yo2

is

1

Yl .

Yl2 . are position vectors.

Y7

Y72

the matrix of error terms.

/e70 e71 "" e77 The least square normal equations for the linear regression model is X ' F Y = (X'X)B(Y'Y) The least square estimating of polynomial coefficients is completed by B :

[(xtx)-I x t ~ ~ ( y t v ) -1

(3) (4)

Let x~ = y~ = i - 3.5, i= 0, 1, ..., 7, then we observe X = Y. And let G = ( X ' X ) - l x ' , w e could get the simple form from Eq. (4) B = GFG' (5) where G is the generator matrix which is used to compute 1-D polynomial coefficients. From Eq. (5), we could compute the 2-D polynomial coefficients by row-column decomposition method. The block diagram is illustrated in Fig. 2. block image data 1-D Regression

(row operation) Transpose Buffer

1-D Regression

(column operation) 2-D polynomial coefficients

21

Fig. 2 The row-column architecture of RCE Each row vector of the generator matrix G is used to compute one of the 1-D polynomial coefficients, they are equivalent to the multiplication of scaling factors Si and kernel matrices Wt, i =0, 1,2. G=[S 0

where

S~

fw01

1

1 S1 = 8 4 '

$2

(6)

S1 8 2 W 1

1

168'

W 0=[3

-3

W 1 =[7

5 31--1--3--5--71

W2

-7

[-7-1

-9

-9

-7

3 5 5 3

-3

3]

1"-7

Due to the scaling operation could be merged with quantization process, only the computation of W need to implement. The weighting coefficients of W is easily implemented by adding and shifting operations. The architecture of row-computation for RCE is illustrated in Fig. 3.

!

Ir Ir

J A_].

II ;-_

,I

i

,_L

J

1

I

R2

t

I

Ro

t

I

I L ].Latch

R

t

Fig. 3 The pipelined architecture for the row computation of the polynomial coefficients III. Performance and architecture complexity of PAC

PAC is an approximation technique for image representation. It is not the same as orthogonal transform, e.g. DCT, which could preserve high-frequency coefficients. While for low-bit rate compression, the coding performance is comparable to transform coding. We illustrate the coding gain and the peak signal-to-noise ratio (PSNR) in Table 1. PSNR = 101og~0

-

t#x, y) -/(x, y))

, where fix,y), f ( x , y ) are original image and coded

image respectively. Table 1. The coding performance of LENA by JPEG and PAC Rate (bits/pixel)

0.5 bpp

0.4 bpp

0.25 bpp 0.2 bpp 0.16 bpp

JPEGimages

34.8dB

33.1 dB

30.4dB

PAC images

31.2 dB

30. 7 dB

29.5 dB 28.2 dB 26. 7 dB

28.1dB

26.3dB

22 It is shown from Table 1 that PAC is a good choice for replacement of JPEG in very low bit rate applications. First, when the low bit-rate compression is required, the performance of PAC is close to JPEG. Second, the simple and fast computation process provides high-speed and low-area implementations. The architecture complexity of PAC is made comparison with DCT-based algorithm. The kernel issue is the number of operations in DCT and RCE. For 1-D 8-point DCT, there are at least 12 multiplication operations and 26 addition operations. But for 1-D 8-point RCE, there are only 22 addition operation. The numbers of multiplication and addition in several DCT algorithm and RCE will be illustrated in Table 2. Table 2. The comparisons of architecture complexity in DCT and RCE 1-D 8-point DCT Chert' s [4]

Lee' s [5]

RCE

Chan' s [6]

No. of Multipliers

16

12

12

0

No. of Adders

26

29

29

22

IV. Conclusions

The PAC is well suited for low bit-rate image compression, because its performance is close to that of JPEG standard under the condition of high compression ratio. Besides, the complexity of PAC is much simplified than DCT-based algorithm. High-speed and small-area encoder will be provided by PAC. References

[1 ] M. Eden, M. Unser, and R. Leonardi, "Polynomial representation of pictures," Signal Process., Vol. 10, No. 4, 1986, pp. 385-393. [2] M. Kocher and R. Leonardi, "Adaptive region growing technique using polynomial functions for image approximation," Signal Processing, Vol. 11, No. 1, July 1986, pp. 47-60. [3] G. K. Wallace, "The JPEG still picture compression standard," IEEE Trans. Consumer Electronics, Vol. 38, No. 1, Feb. 1982, pp. xviii-xxxiv. [4] W. H. Chen, C. H. Smith, and S. C. Fralick, "A fast computational algorithm for discrete cosine transform," IEEE Trans. Commun., Vol. COM-25, pp. 1004-1009, Nov. 1977. [5] B. G. Lee, "A new algorithm to compute the discrete transform," IEEE Trans. Acoust., Speech, Signal Process., Vol. ASSP-35, pp. 1243-1245, Dec. 1984. [6] Y. Chan and W. Siu, "A cyclic correlated structure for the realization of discrete cosine transform," IEEE. Trans. Circuits and Systems - Analog and Digital Signal Process., Vol. 39, No. 2, pp. 109-113, Feb. 1992.

Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

23

Application of Shape Recognition to Fractal Based Image Compression S. Morgan, A. Bouridane Department of Computer Science The Queen's University of Belfast Belfast BT7 1NN Northern Ireland

Keywords: fractal, i m a g e c o m p r e s s i o n , shape r e c o g n i t i o n , S o b e l operators. ABSTRACT This paper describes the application of fractal geometry and shape recognition to still-frame image compression. The technique is based on edge detection using Sobel operators to identify the high frequency components of an image. Using this information image pieces are classified according to detail and the Partitioned Iterated Function System (PIFS) computed by searching the appropriate class. Further exploitation of redundant image data is achieved by identifying shapes within each range block and compressing, using only a subsample of that range, when the block contrast is low. The result is a decrease in encoding time while maintaining image fidelity. This technique has been implemented successfully using a set of test images of various contrast levels.

INTRODUCTION With the advent of technologies such as multimedia systems and the Internet there is a practical and necessary need for image compression tools. These tools can improve effective bandwidth by providing rapid transmission speeds and reduced storage requirements for digital images. There are many still-frame compression algorithms available, however methods such as JPEG and Wavelets rely on storing only the low frequency components of a signal and eliminate the high components. This results in the loss of sharp edges causing a blurring effect when applied to high contrast images [1]. Fractal compression algorithms overcome this problem by applying the concepts of Iterated Function Systems (IFS) theory and focusing on the self similarity in real-world images [2]. Images are viewed as a collage of self-similar parts that can be mapped onto each other using concepts of IFS theory. An IFS consists of the set of N contractive affine transformations, denoted by { al, a2 .... aN } on a subset of points in the plane R 2 ( i.e. an image F ). This set of N transformations defines a map A(F) which approximates the original image. N

F = A(F)

Uai(F)

where

a i ' R 2 ---> R 2

......... (1)

i=l

Each transformation a/is contractive, thus A is also contractive, determining the structure of the unique attractor[4]. The encoding process involves partitioning an image into a collection of range blocks ri e R and domains d/~ D with certain restrictions applied [3]. Then for each range the IFS code of the domain block that most resembles it is computed. The problem is finding a minimum metric distance between each range and domain in an acceptable time period, known as the "Inverse Problem". Current fractal compression algorithms consist of a partitioning scheme coupled with some classification method. Though effective these approaches do not exploit the redundancy within range blocks, which are limited in size by the partition scheme. A technique is required to identify and take advantage of this redundancy by using only a subset of the range block information, thus improving encoding times. In this paper we describe an image block classification routine based on shape recognition using the cumulative measure of a blocks gradient magnitude by applying Sobel operators. This provides a measure of the block contrast along with the degree of redundancy. In the case of low contrast range blocks a subsample of the pixels can be using for compressing the image block. This improves compression speeds as the number of computations is reduced.

24

THE TECHNIQUE IN DETAIL When determining the PIFS codes to encode an image the problem exists of finding the range-domain block combination with a minimum or acceptable metric distance dmetrie within a reasonable timespan, where

dmetric ( F ~ ( r i

E R),ai(F)

) < L

......... (2)

The variable L can be a fixed constant, a minimum value relative to the problem domain or a combination of both. The mapping of domain to range blocks is achieved using contractive affine transformations. The inference that the set of maps { al, a2 .... aN } is contractive means there is a single fixed point solution Xw which is independent of the original image, as proved by the Contractive Mapping Fixed-Point Theorem which follows. xW

= A (xw)=

lim

A o

n (x)

......... (3)

n---~ oo

During decompression the image A(F) is successfully reconstructed with the help of the Collage Theorem as defined in equation 4 h (S ' x w )<

1 --1 S h ( S ,

......... (4)

a (S))

given metric space (E,h) with contractivity factors s and fixed point Xw, such that S E E and W(S) the collage of the image [3]. This ensures that the contractive affine mappings for all range block, when recursively applied to any arbitrary valued initial image, will piece together forming A( F ) = F . The process of finding A( F ) is computationally intensive and improvements are constantly being sought to address this problem. The essence of the proposed technique is the reduction in size of the set of domain blocks D by the application of an edge or "Shape" based classification scheme using Sobel gradient operators and multiple thresholds [5]. The basic algorithm consists of a quadtree partitioning scheme with a maximum block size of 2 6 and minimum size of 22 pixels with a domain blocks spacing of 2 pixels. The compression process involves convolving each domain block with both horizontal and vertical Sobel operators. The combined gradients Vf provide a single measure of the strength of the edge and are cumulated for all edges with each block.

V f = I Gxl + [ G y l

where Gx is

/!2i/ 0 2

and Gy is

/l~ _2

0

-1

0

......... (5)

The cumulative gradient values of each block of a particular size are then sorted in order of gradient magnitude and divided into N equal classes. For the purpose of this paper N was chosen to be four. The class thresholds t i E T are the cumulative gradient values at positions .25, .5, .75 in the ordered list of gradients. The thresholds are sensitive to the image being compressed and more importantly to the other image blocks of the same size. This ensures automated classification relative to the image in question. The computational overhead of sorting the gradients is minimal when using a quicksort routine and is justified later when the size of the set D being searched is reduced by N. As each range ri is being compressed it is firstly analysed to determine if it contains any objects using edge detection. If there are objects then its cumulative gradient value is checked against the table of thresholds t i E T and the range ri is classified and then evaluated against all di within the same class and size. Each domain being evaluated undergoes 8 transformation consisting of 4 rotations, a flip and a further 4 rotations. If a satisfactory match can not be found the range is divided in 4 and each quadrant evaluated once more. If however, there are no objects then the block is of low contrast and only a subsample of the range block pixels are used for compression. The ratio of the subsample size to the block size is 4:1 providing the reduction is not below the minimum partition size acceptable (i.e. 4x4) in which case the reduction defaults to the minimum size. Next the range is classified and the domain pool searched as described previously. The size of the domains being searched is that of the subsample size and not the original block size. This helps improve compression times for low contrast images. After finding the best range-domain match the affine map is written to file. Each mapping consists of the domain position Xi , brightness Si , contrast Oi and finally the transformation code Ri. T h e partition structure is coded by writing a single bit to file, where 1 signifies the block mapping follows and 0 that the block was divided. Both Si and Oi are quantised to 5 bits and 7 bits respectively as recommended by Fisher [5].

25

RESULTS To evaluate the similarity of the original and reconstructed images required a quantifiable measure of success and this was achieved using objective fidelity criteria which allow data loss to be represented as a function of the both images. The functions or metrics used are RMSE(root-mean-square-error), and PSNR(Peak-signal-to-noise-ratio). Both are referred to in evaluation table below. The RMSE is an induced metric of the 12 norm which states that given two images of size MxN, an original image F(x,y) and a reconstructed image G(x,y), the distance or error e(x,y) between them for any point x,y, is

e(x,y) = G(x,y)- F(x,y)

......... (6)

thus, the RMSE can be calculated as the square root of the squared error averaged over all the pixels (MxN) and is defined as ......... (7) e ....

=

[G

(x,y)

-

F

(x,y)] 2

x=0 y=0 The PSNR, in decibel units (dB), gives the ratio of the peak signal and the difference between two images. It includes the RMS metric and is defined as

PSNR

= 2 0 1 ~ 1 7 6 )Tr M ms e

......... (8)

where max is the maximum value gray level value. To determine the effectiveness of this technique extensive experiments were performed using a number of 265x256 gray-scale PGM image files with varying contrast levels. The experiments involved varying the number of image classes and for illustration purposes only two pictures are shown. Figures 1, 2, and 3 overleaf show the reconstructed Goldhill image using 4 classes, 16 classes and the original image. Figures 4, 5, 6 show the reconstructed Peppers image using 4 classes, 16 classes and the original image.

Table 1: Evaluation figures

From the evaluation figures it is evident that varying the number of classes has a negative effect on the compression ratios. The greater the number of classes then the lower the probability of finding a good match, which results in an image block being divided and each quadrant evaluated once more. The RMSE values were higher than expected and this is due to the thresholds being fixed once they are initially calculated. A more robust method would define a subset of domains to be searched using a standard deviation based on the current range block gradient. ( e.g. evaluate 25% of the domain blocks whose cumulative gradient value lie either side of the range gradient). Some sharp edges of each image are lost because of the domain averaging and also the limited number of domains per class. In addition to this spurious pixels show up at sharp edges. This is caused by the least squares regression, where the minimum of the combined brightness and contrast variables are found. This is not the true minimum of each variable independently, but rather of both together. Finally, this research was carried out cross-platform using Sun Sparc stations and Pc's. For this reason encoding times are not included at this stage but will be disseminated in due course.

CONCLUSION

AND FURTHER

WORK

This paper describes a novel technique based on shape recognition, which reduces the encoding time of fractal based image compression. A classification scheme using Sobel gradient operators is proposed and the concept of shape

26

recognition is applied to exploit the redundancy within low contrast range blocks. The technique has been successfully applied to images of varying contrasts. Work is currently under way to extend this technique allowing automatic determination of image block size based on object recognition. Also block classification based on entropy values is being investigated.

REFERENCES

[1]

Woods, A., Gonzalez, X., "Image Processing", Addison-Wesley Pub. Co., 1992.

[2]

Peitgen, H-O., Jurgens, H., Saupe, D., "Chaos and Fractals, New Frontiers of Science", Springer-Verlag, 1992.

[3]

Fisher, Y., "Fractal Image Compression", Springer-Verlag, 1995.

[4]

Barnsley, M. F., "Fractals Everywhere", 2nd Edition, Academic Press Professional, 1993.

[5]

Morgan, S., "Fractal based Coding of Still-Frame Video", MSc Thesis, Faculty of Engineering, The Queen's University of Belfast, UK, 1995.

Figure 1: Goldhil1256x256 4 Classes

Figure 2 : Goldhil1256x256 16 Classes

Figure 2 : Goldhil1256x256 Original

Figure 4 : Peppers 256x256 4 Classes

Figure 5 : Peppers 256x256 16 Classes

Figure 6 : Peppers 256x256 Original

Proceedings IW1SP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

27

Chrominance Vector Quantization for Coding of Images and Video at Very Low B itrates Maciej B a r t k o w l"a k * , M a r e k D o m a f i s k i * and Peter G e r k e n 0

9Politechnika Poznafiska, Instytut Elektroniki i Telekomunikacji, Poznafi, Poland 0 Universit~t Hannover, Institut ftir Theoretische Nachrichtentechnik und Informationsverarbeitung, Hannover, Germany

Abstract

The paper describes vector qunatization in the two-dimensional space of the chrominance coordinates U and V. This vector quantization results in one scalar signal which is then processed independently of the luminance in any coder, e.g. JPEG or H.263 coder. The coder processes only one chrominance component instead of two. Nevertheless this scalar chrominance signal exhibits usually broader spectrum than each of the two input chrominance signals U and V. Proper ordering of the codebook decreases the frequency band of this scalar chrominance signal. The paper describes also an efficient and noniterative technique to generate the codebbok which is then used to encode the chrominance pairs.

1. INTRODUCTION" Vector quantization [1] is well-known as a powerful technique for image and video data compression. There are many pratical possibilities to create vectors in an image. Vector quantization, where vector components are the luminances of some neighboring pixels, is often used. Another approach is to use the color coordinates of a pixel as the vector value which is then quantized. Here, we deal with this approach, which in some other context has been already considered in [4-7,14]. A color video sequence is in fact a vector signal where each pixel of an individual frame is represented by a threeelement vector. This representation is highly redundant because usually only a small part of all possible combinations is present in an individual frame and even in a whole sequence [8,9]. The commonly used techniques process the both chrominance channels almost separately taking no advantage of their mutual dependencies. Our idea is to use vector quantization of chrominance for image and video coders where very high compression ratios are required. Before input to the source coder (for example DCT-based still image or video coder [12,13] or objectbased analysis-synthesis video coder [3,11]) an image or a video sequence shall be preprocessed in that way that the two chrominance components of the sequence are converted to a scalar signal being a stream of chrominance labels. The technique includes automatic codebook design and, in case of video coding, update according to changes of the frame content. The assumption for the work is that there is as little as possible interaction with the following coder, so that any type of source coder can be used. Similarly, at the decoder side only some postprocessing shall be performed for recalculation of the actual color coordinates. 2. CODING STRATEGY The approach is based on two-dimensional vector quantization with the nearest neighbor search using the Euclidean distance in the UV plane. The vector quantizer transforms the two input chrominance components into the scalar signal being a stream of labels of (U,V) pairs. At first, a basic codebook is designed and then it is enlarged by inserting interpolated entries. The size of the basic codebook has been fixed to 32 because experimental results show that for many natural images and sequences this is a reasonable value that does not lead to visible degradation of the picture quality. Note that the codebook entries represent chrominance values only and that even with this small number of entries, there is still the possibility to generate lots of colors in combination with the individual luminance values. In the cases of video sequences, the basic codebook is computed for each frame and it has to be transmitted at least for the first frame. For the next frames, all the entries are compared with those from the previous frame. The new codebook is sent if a dramatic change of the scene is detected. Otherwise, the same codebook is used for consecutive frames. The codebook entries are losslessly encoded and transmitted as side information. The 32 basic codebook entries are ordered and mapped onto the range from 16 to 240 in order to be compliant with standard video input data formats. The order of the entries is very important because it deeply affects the performance of the system. The differences between the values representing the consecutive codebook entries are proportional to the distance on the chrominance plane used. The intermediate values represent chrominances which can be calculated by linear interpolation between two neighboring codebook entries. These intermediate values together with the basic codebook entries form the enlarged codebook. The pixel chrominances are mapped using the nearest neighbor search to a chrominance labelled by an integer which range from 16 to 240.

28 Due to quantization in the video coder, the values of chrominance labels can be changed. The decoder assignes to the decoded chrominance labels those pairs of interpolated chrominance values which are represented by them. For this purposes the basic codebook is interpolated at the decoder side in the same way as at the coder side.

u

v

.

i

VECTOR

'1

OR VIDEO CODER

QUANTIZER

....

ii HBASIC[ ]BASIC INTERPOLATION [

DESIGN

.q,,. :

]

H A

I

NE

.

CODEBOOK

ORDERING

I

I

~ !

[

Hoo OOK INTERPOLATION

DECODER

information

i

i

control

DECODER

L

,J ENTROPY

| basic

CODEBOOK

bitstream

I RECEIVER

TRANSMITTER Fig. 1. General structure of the transmitter and receiver.

3. DESIGN OF BASIC C O D E B O O K

There exist several techniques to design codebooks in the color spaces [5-7,14]. A common approach is to start with a relatively poor (in the sense of total square error) codebook and improve it using the Linde-Buzo-Gray (LBG) algorithm [10]. On the contrary to those techniques some algorithms based on splitting of the color space have been developed [5]. The method proposed here is a binary-split technique. As a measure of a vector distance, the Euclidean norm in the chrominance plane is used. At the beginning, the codebook has only one entry being a vector with its components calculated as the mean values of all chrominances in a picture. In the first step, the set of all chrominances is optimally divided into two subsets according to the rule described in [8]. This procedure is repeated recursively. At each step, only the set of chrominances with the highest total square error of its vector representation is divided into two subsets. In general, n steps of this algorithm result in an (n+l)-element basic codebook (Fig.2). Consecutive frames of video sequences usually show similar chrominance histograms and therefore produce codebooks with similar tree structure. This property makes the frame-to-frame comparison of the codebooks easy. This comparison is necessary for the decision whether the codebook (or optionally a part of it) must be retransmitted. This algorithm results in very good codebooks in the sense of total square error (cf. fig. 3).

i

9

,

9

9

~ @

fx~--~"zCk~')l~ O ~

9

" division line

, "for step No. 1

",N

"

starting 9

division h'ne , for step No.4 ,

x~ ,

"

|

9

O K~..x.

"for step No 3 .

Fig.2. Binary splitting example.

4. BASIC C O D E B O O K O R D E R I N G

Random ordering of the basic codebook would lead to very broad power spectrum of the signal fed into the coder. This signal is an image with pixel values being the labels of the codebook entries. The goal of proper ordering should be to keep this image as low-frequent as possible. In our approach we use a strategy of simultaneous basic codebook generation and ordering. At each step one codebook entry is substituted by two new entries as described above. The set of codebook entries is already ordered and the two new entries are put onto the place of the removed one. The order that minimizes the distances to the neighboring entries is chosen out of the two possibilities of ordering the two new entries. In fact, this strategy leads to relatively good results (Fig.4). 5. CODEBOOK INTERPOLATION In order to get finer quantization, the basic codebook is augmented with interpolated codebook entries. The ordered set of the basic codebook entries is mapped onto the set of integers in the range of 16 to 240. The integers assigned to the codebook entries are henceforth called labels. A label difference of two consecutive basic codebook entries is set proportional to the respective distance in the chrominance plane. All other integers from the above mentioned range are assigned to the interpolated codebook entries. Therefore, the longer the distance between two

29

consecutive codebook entries is, the more interpolated entries are inserted in between. The chrominances corresponding to the interpolated codebook entries are equally distributed on the straight lines between consecutive basic codebook entries (cf. Fig.5). Note that only integer values of the U and V coordinates are allowed.

60

i,

|

,

....::

50

4o

40

!

~

,

i

30

.... i::9 +:i:.... i i ~::ii i~i!i i ~+ii +!~::+,:(: :

,

V

W

........

2O I0

A -20

.-:,~!.,...............::.. -20 -5o ~ o - ~ o - ~ o

-30

Fig.3. Basic codebook entries (black dots) of the first frame of the test sequence "CLAIRE" shown on the set of all chrominances present in this frame (grey points).

:io

o

, ~ to 20

3o

+

5o

Fig. 4. Ordered entries of the whole codebook of the first frame of the sequence "CLAIRE".

The above rule is known on the receiver side, therefore there is no need to transmit the information about interpolated codebook entries. Only, in the case of video coding, frame-to-frame updates of the basic codebook entries have to be transmitted as side information, if needed. 6. APPLICATION TO IMAGE CODING We use our technique as a preprocessing stage for DCT-based image coding in the area of very high compression ratios. In our experiments, the standard JPEG coder is used as the image coder both within our coding scheme and as the reference for comparison. Fig.5 shows some results for test images LENA and CLOWN. Note that for very high compression (0.02 bpp and less) we have a significant gain in the SNR. There are very strong color distortions introduced by the JPEG coder, even when optimized quantization coefficients are applied. With our scheme we achieve visible improvement in the subjective quality of the decoded images. 24

v

23 sr,m [dBj

/22

22

'

7 - " "-"-'--~/CLAIRE --

our scheme

21 20

/J/ LENA

JPEG

_]~SSA

our s e h e m e ~ , , ~

19

18 17

JPEG

16 our

s c ; ~ m ~ ~ s s i o N [bpp] "

I~).01

i

0.02

OJ03

0;04

0105

OJ06

0.07

Fig.5. Signal to noise ratio versus compression ratio for still test images LENA and CLOWN

COMPRESSION [bppl I

0.04

i

0.045

0105

t 0.055

, O.O6

Fig.6. Signal to noise ratio versus compression ratio for average frames of test sequences CLAIRE and MISSA

Similar experiments are performed with single frames from standard video sequences MISSA and CLAIRE in QCIF format (176 x 144 pels). In this case we have much less correlation observed in the pictures, which results in lower compression. Nevertheless, our approach gives even better performance (cf Fig.6)

30

7. VERY LOW BITRATE CODING OF VIDEO For the verification of our technique in the field of video sequence coding at very low bitrates we perform a series of experiments with a standard H.263 video coder. In order to obtain the desired output bitrate we have to apply a control mechanism in the stage of quantizing the DCT coefficients. For this task we scale the quantization factor for interframe mode. The control loop keeps the scaling factor on such level which gives similar bitstream as in the case of typical operation of the H.263 coder, i.e. below 64kbps. Both output sequences from our system and the standard Coding scheme are compared in the means of SNR averaged over 50 frames of the sequence. The results are promising.

ACKNOWLEDGEMENTS The work has been supported under the KBN Reasearch Grant No. 8 $504 002 06 and the NATO Linkage Grant HTECH.LG 941338. Some of the computations were performed using the resources of the Poznafi Supercomputing and Networking Centre.

REFERENCES [1] H. Abut (ed.), Vector quantization, IEEE Press, 1990. [2] G.Wyszecki, W. Stiles, Color science, Wiley 1982. [3]H.G.Musmann, Object based analysis synthesis coding, IEEE Int. Symposium on Circuits and Systems, Tutorials, eds.: C.Tommazou et al., London 1994. [4] J.Barilleaux, R.Hinkle, S. Wells, Efficient Vector Quantization for Color Image Encoding, Proc. ICASSP 1987, vol.2, pp.740-743. [5] R.S.Gentile, E.Walowit, J.P.Allebach, Quantization and multilevel halfioning of color images for near original image quality, J.Opt.Soc.Amer.-A, Vol.7, No.6, 1990 [6]M.T.Orchard, Ch.A.Bouman, Color Quantization of Images, IEEE Transactions on Signal Processing, December 1991 [7] M.Domaflski, M.Bartkowiak, Color image archivization for medical purposes, Journal on Communications, vol. XLV, pp.66-68, July-August 1994. [8] M.Bartkowiak, M.Domafiski, Palette Representation for Data Compression of Color Video, Proc. XVIII Nat. Conf. Circuit Theory Elec. Syst, pp. 473-478, Polana Zgorzelisko 1995. [9] M.Bartkowiak, M.Domafiski, Color statistics in image sequences and their implementations for VLBC, 2nd Int. Workshop on Image Processing, pp. 68-72, Budapest 1995. [10]Y.Linde, A.Buzo, R.Gray, An algorithm for vector quantization design, IEEE Trans. on Commun., vol. COM28, pp. 84-95, 1980. [11] P.Gerken, Object-based analysis-synthesis coding of image sequences at very low bit rates, IEEE Trans. Circuits Syst. Video Techn., vol 4, pp. 228-235, 1994. [12] ISO International Standard 10918, Digital compression and coding of continuous-tone still images, Geneva 1994. [ 13] ITU, Draft recommendation H.263, Video coding for narrow telecommunication channels at < 64kbit/s, April 1995. [14] E. Roytman, C.Gotsman, Dynamic color quantization of video sequences, IEEE Trans. Vizualization and Computer Graphics, vol. 1, pp. 274-286, 1995. [15] I.H.Godlove, Improved Color-Difference Formula, with Applications to Perceptibility and Acceptability of Fadings, J. Opt..Soc. Am., 41, 11, pp.760-772 (Nov.,1951)

Proceedings IWISP '96, " 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

31

Region-of-Interest Based Compression of Magnetic Resonance Imaging Data Nikos G. Panagiotidis and Stefanos D. KoUias National Technical University of Athens Department of Electrical and Computer Engineers, Computer Science Division. Zografou 15773, Athens, Greece. Tel" +30-1-7722491,7722488, Fax: +30-1-7722459,7722492 e-mail : [email protected], [email protected]

1. Introduction Current picture processing, archiving and communication systems in hospitals and medical care centres deal with large volumes of information, obtained every day from a variety of disciplines such as chest, breast or bone X-rays, magnetic resonance imaging, tomography or other radiology, pathology and cardiology examinations. Therefore rises the need for efficient coding and compression of this information. Until recently, the requirement for retaining every detail of the encoded medical images has restricted the interest in lossless compression techniques, which allow exact recovery of the original image from the compressed version of it. However these methods achieve minimal compression ratios of approximately 3:1. In this paper we propose an efficient coding scheme which takes advantage of the difference in visual importance between areas of the same image, classifies them in distinct categories and reproduces the image with variable spatial reconstruction quality. The scheme is based on the fact that most medical images consist of areas of minimal contribution to the perceived information and of regions which are of extreme interest (regions of interest, ROD to medical experts. Depending on the percentage of ROI to low importance (background) regions, substantial saving can be achieved both for storage and transmission use. It is shown in the paper that progressive DCT coding can be smoothly interwoven with the use of regions of interest so as to increase the compression ratios obtained in a variety of cases, while causing user-acceptable degradation in image quality. A ROI-JPEG coder is implemented, which provides the means for encoding regions of low/high interest in the image by differentiating the quantisation tables among these areas. This goal is achieved by using different quantisation quality factors (QF), as defmed in the baseline JPEG algorithm, for each category of image regions. Thus, while blocks belonging to important regions are coded with high quality quantisation tables, a substantial gain in bitstream volume is achieved by quantising the rest of the image blocks with low quality quantisation tables. Further reduction in the volume of information transmitted or stored may be achieved by further filtering of the low importance regions. This is achieved through the DCT transform, and does not affect the perceived quality of these regions, since coarse quantisation already incorporates a low-pass filtering process. Additionally, for off-line storage applications, visually optimal quantisation tables on a bits/pixel target rate basis can be computed for both the high interest and background regions of the image. The classification of image regions into ROI and background regions can be implemented either through unsupervised automated procedures or by user interactivity.

2. Regions of Interest in Medical Imaging Medical images can be segmented into the following two discrete categories of regions: -

Regions of Interest (ROI) which are areas containing parts or features of the image having a maximal contribution to the perceived information. These regions are typically rich in high frequency content and thus require a fine quantisation process in order to maintain an acceptably high reconstruction quality. If coding is extended to moving image sequences, ROI usually correspond to moving parts of the image. As a rule of thumb, if only the areas of the image corresponding to ROI were to be transmitted or stored, at least 70% of the image information could be perceived.

-

Background Regions (BR) which contain information of reduced importance as is the case of statistically uniform image background or texture. These regions contribute to the perceived information by acting as placeholders or boundaries for the ROI, especially by depicting information concerning the relative location of ROI within the image frame. In the case of moving image sequences, BR present minimal or no motion. Consequently, in a

32 multiple quality coding scheme, BR can be coded in medium fidelity, thus yielding a significant compression in the volume of information to be transmitted or stored. The aforementioned image modelling allows the implementation of particularly attractive coding schemes. More specifically, particularly high compression ratios with simultaneous high image quality can be achieved by using lossy coders to encode regions of interest with high fidelity, while reducing the representation fidelity of background regions. The classification of image regions into regions of interest and background can be achieved either interactively through user intervention or by the use of an automated classification procedure. In the simplest case, an expert end user chooses the regions of interest in every image through a graphical user interface (GUI). The information gathered by this process as well as the actual image data are fed to the coding unit that performs the actual processing. A more sophisticated approach consists of utilising an automated classification system based on appropriate feature extraction (e.g. edge detection, contour following), statistical processing (e.g. presence of certain coefficients or couplings in the frequency domain), or non-linear artificial neural networks. Neural network classifiers, even though requiring a complex and computationally demanding supervised training process, can yield the most satisfactory results, since neural network architectures are able to classify images even in adverse or noisy environments with particularly high success ratios. Regions of interest can be represented in two different modes: The first uses a set of points and vertices corresponding to one or more polygons that closely follow the shape of ROI in the image. Even though this representation is most precise, it presents the following disadvantages. First, non uniform convex polygon regions are difficult to be represented and additionally, if this image is to be processed by any block based coding scheme, a preprocessing step transforming edges and vertices to blocks is required. The second mode initially segments the image into square blocks of 8x8 or 16x16 pixels; these are subsequently tagged as blocks corresponding to ROIs or background regions. The result of this process is a bitmap image of size N/8 • N/8 or N/16 x NIl 6 pixels (where N x N is the size of the original image). This bitmap image will be referred, in the following, as a classification map (CM) and will be coded using run length coding in addition to the original image at a generally negligible coding cost. Each white pixel of the classification map marks a ROI block in the corresponding region of the original image.

3. Incorporationof R01 in DCT-based Coding Schemes The JPEG standard'for coding of still, grey-scale or colour images is based on the Discrete Cosine Transform (I)CT). Colour images represented in the R,G,B colour space are transformed to the Y,Cr,Cb luminance-chrominance colour space and are subsampled into a 4:1:1 format prior to coding. According to the baseline encoder model, the input image is divided in blocks of 8x8 pixels which are transformed in the DCT domain. The coefficients obtained are then quantised using either standard JPEG quantisation tables or user-defined quantisation tables. In the latter case it is possible to improve the quality of the encoded picture in specific cases by selecting appropriate quantisation tables that take into account image-dependent information. Subsequently, the quantified coefficients go through a lossless coding procedure, using either the standard Huffinan coding method, or an arithmetic coder. All of the above procedures are, however, applied on a global basis, since quantisation matrices are fixed within the whole image. Therefore, one cannot exploit advantageous properties that appear locally on certain regions of the image. In contrast, the ROI-DCT based coder presented in this paper provides the means for encoding regions of low/high interest in the image by differentiating the quantisation tables among these areas. This goal is achieved by using different quality factors (QF), as defined, for example, in the baseline JPEG algorithm for each category of image regions. Thus, while blocks belonging to important regions are coded with high quality quantisation tables, a substantial gain in bitstream volume is achieved by quantising the rest of the image blocks with low quality quantisation tables. The image is coded on a per block basis, as in the case with, say, baseline JPEG. Horizontal and vertical sampling factors are defined for each colour component, specifying the number of samples of the component compared to the other colour components; all sampling factors are equal to one in the case of grey scale images. Blocks of different colour components, which correspond to the same physical area of the image, are grouped into minimum coded units (MCU),which are'very similar to the macroblock (MB) entity defined in MPEG and H.261 coding standards. On a higher syntax level, MCUs are grouped into slices. Each MCU may contain as many as 10 blocks, belonging to any colour component, depending on the colour sampling factors. By default, for a component YUV 4:2:0 image, an MCU consists Of 4 luminance blocks and two chrominance blocks. Given the architecture described above and the pipelinelike operation of~the JPEG model, both coder and decoder may only be aware of the total number of blocks already coded/deco~led, irrespectively of the colour component to which these blocks belong. It is, however, imperative for the proposed ROI coding scheme to establish whether the currently decoded block corresponds to a ROI or not. This

33 information is obtained through the classification map (CM); high quality quantisation tables are used for each block matching a white pixel in the classification map, while coarse quantisation is applied to the remaining areas. Progressive or hierarchical implementations of the JPEG standard are considered within the ROI-JPEG coder proposed in the next section, for further increasing the achieved compression ratios, while introducing imperceptible reconstruction errors.

4. A Progressive ROI.JPEG Coding Scheme A ROI-JPEG coder is described next, which provides the means for encoding regions of low/high interest in the image by differentiating the quantisation tables among these areas. This goal is achieved by using different quantisation quality factors (QF), as defined in the baseline JPEG algorithm, for each category of image regions. Thus, while blocks belonging to important regions are coded with high quality quantisation tables, a substantial gain in bitstream volume is achieved by quantizing the rest of the image blocks with low quality quantisation tables. Further reduction in the volume of information transmitted or stored may be achieved by further filtering of the low importance regions. This is achieved through the DCT transform, and does not affect the perceived quality of these regions, since coarse quantisation already incorporates a low-pass filtering process. Additionally, for off-line storage applications, visually optimal quantisation tables on a bits/pixel target rate basis can be computed for both the high interest and background regions of the image. The proposed ROI-JPEG procedure proposed includes the typical components of the JPEG system, i.e., the DCT transform, the quantizer and the entropy coder (Huffmann or arithmetic), applied to each block of the image. A decision step is added, which classifies the image block either to the (ROI) category requiring high reconstruction quality, or to the category of relatively low importance. Using more than two categories is possible; however in most applications of interest, two categories seem to be enough for achieving high compression ratios. Let us assume first, that the decision is based on information which an expert interactively gives to the system; as already mentioned, this can be, for example, performed by marking the important areas on the captured image, before applying the compression procedure to it. The selection of regions of interest results in a classification map of the image blocks; this map uses one bit (when classifying blocks in two categories of high/low importance) per block to denote whether it belongs to a ROI or not. The encoder stores this classification map in the image header bit stream according to the JPEG standard, so that the decoder is capable of recognising the category of each decoded block. The quantisation tables, which are used for coding and reconstructing blocks belonging to ROIs, are also stored in the image header, according to the JPEG specifications. Different quantisation tables can be defined by letting the user specify a quality factor value (QF) for low importance regions, and a window quality factor (WQF) for high quality regions (ROI); the use of more than two categories of regions is possible, if respective quality factors are defined for each category. The properties of QF and WQF are similar to those of the standard JPEG QF; both are used for the derivation of quantisation tables from the standard templates incorporated into the JPEG baseline. In general QF and WQF lie in the intervals [30,~i0] and [70,85] respectively. Progressive, or hierarchical, implementations of the JPEG standard are of great importance for transmission, storage and retrieval of medical images. Such implementations can be produced, by further filtering, i.e., separating the DCT coefficients of each image block into groups, which are subsequently processed, using conventional zig-zag scanning~ sequentially. A frequently met case is to generate three groups of coefficients, correspon~ng to low, medium and high frequency content of the image block. The boundaries of each group can be adapted so as to describe a corresponding frequency band. In general, coefficients from the latter groups which correspond to higher frequencies can be set to zero, yielding imperceptible errors while achieving significant increase in the compression ratios.

5. Medical Applications- MRI The proposed ROI-based coding scheme can be applied to a variety of medical applications, where regions of interest can be effectively defined and reconstructed with very good quality. Such applications include X-rays, where specific parts of the chest, breast, bones or skulls of patients are of major importance for medical diagnosis or monitoring, pathology imaging, radiology examinations, as well as ultrasonography or angiography images used for cardiosurgical applications. In the following we illustrate the performance of the proposed methods by applying the proposed coding schemes on MRI data. Magnetic Resonance Imaging (MRI) is a non invasive imaging technique based on a combination of a magnetic field and an RF (radio frequency) excitation field. Under these circumstances, certain nuclei behave in a manner that provides information about their chemical nature and environment in the tissues of the human body in vivo. In principle, magnetic resonance imaging consists of submitting the region of the body to a broadband RF-magnetic excitation. This results in a situation where the protons of the nuclei in the body tissues absorbe energy, which is

34 radiated in the form of electromagnetic waves later, when the external RF-magnetic excitation is terminated. The transformed values of the spectrum of the resulting electromagnetic waves expressed in integer values correspond to 256 grey levels (8-bit depth), composing the resulting MR image. Typical sizes of such images are in the range of 2n x 2 n pixels, where n = [5,10]. The MRI field is particularly suited for imaging sensitive regions of the human body such as the brain and the spinal cord. Additionally, the ability to generate sagittal, coronal, oblique and transverse views, as well as excellent softtissue contrast representation makes MRI a perfect complement to both anatomical and physiological diagnostic tools.

6. Simulation Results-Conclusions Pictures with size of 256x256 pixels were encoded by the proposed ROI-JPEG coder using quality factors in the range of [50,75] and [75,95] for background regions and ROI respectively. A wide variety of cases were examined first for determining the percentage of important, from a medical point of view, regions in the source images: in the worst case, ROI represented 43% of the image area, whereas in the simplest case only 4% of the image was of any particular interest. It should be noted that the above observations stand under the condition that no pre-processing is performed on the original images. Such a measure could be advantageous in MRI images, where a black coloured background always surrounds the important image data. In contrast, in pathology imaging, the whole of the image consists of pixels contributing to the perceived information. The available image data set consisted of 50 MR images and 16 frames taken from pathology examinations.

These results indicate that the proposed approach constitutes a powerful tool for compressing images in medical experiments, provided that the expert performing the experiment, or receiving the visual information, defines the regions of interest in the image before compressing and storing it. The proposed algorithm can be accommodated into modern PACS as well as used as an extension to the widely accepted DICOM 3.0 standard. We are currently extending our approach for compressing medical video, obtained from ultrasonic measurements in a hospital environment. The MPEG-1 coding scheme, which is also DCT-based, is combined with the proposed ROI and classification map definitions, for effective capturing, storage, retrieval and visualisation of the medical video information.

35

7. Acknowledgement The authors wish to thank PHILIPS Medical Systems, Greece for providing the MR images, as well as expert consultation on raw data.

References [1]

Avinash C. Kak, Malcolm Slaney, "Principles of Computerized Tomographic Imaging" IEEE Press, 1988.

[2]

A.K.Jain, "Fundamentals of Digital Image Processing", Englewood Cliffs, Prentice Hall, 1989.

[3]

A. Netravali and B. Haskell, "Digital Pictures: Representation and Compression", Plenum Press, 1988.

[4]

S. Wong, L. Zaremba, D. Gooden and H.K. Huang, "Radiologic Image Compression-A Review", Proceedings of

the IEEE, vol.83, no 2, pp. 194-219, 1995. [5]

W.B.Pennebaker and J.L. Mitchell, JPEG Still Image Data Compression Standard, Van Nostrand - Reinhold, 1993.

[6]

S. Elnahas, K. Tzou, J. Cox R. Hill and R. Jost, "progressive Coding and Transmission of Digital Diagnostic Pictures", IEEE Trans. on Medical Imaging, vol. 4, pp. 73-83, 1986.

[7]

N. Panagiotidis, D. Kalogeras, S. Kollias and A. Stafylopatis, "Neural Network Assisted Effective Image Classification", Proceedings of the IEEE, accepted for publication.

[8]

N. Ahmed, T. Natarajan and K. Rao, "Discrete Cosine Transform", IEEE Trans. Computers, vol. 23, pp. 90-93, 1974.

[9]

M. Tekalp, Digital Video Processing, Prentice Hall Signal Processing Series, 1995.

[ 10] "Principles of Magnetic Resonance Imaging", PHILIPS Medical Systems, 1984. [11] M.T. Vlaardingerbtoek, J.A. Boer, "Magnetic Resonance Imaging", Springer Verlag N.Y., 1996. [12] ACR/NEMA, "ACR-NEMA Digital lmaging and Communicatibns Standard", NEMA Standards Publication No. 300-1985, Rosslyn, VA.

This Page Intentionally Left Blank

Proceedings IWISP '96; 4-7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

37

SCALABLE PARALLEL VECTOR QUANTIZATION FOR IMAGE CODING APPPLICATIONS

D. G. SAMPSON

A. C UHADAR

A. C. DOWNTON

Demokritus University of Thrace, GREECE

University of Gaziantep, TURKEY

University of Essex, ENGLAND

Abstract In this paper, we show that the encoding complexity of vector quantization can be conveniently distributed as sub-codebooks over general purpose MIMD parallel processors, to provide almost linearly scalable throughput and flexible configurability. A particular advantage of this approach is that it makes feasible the use of higher dimensional image blocks and/or larger codebooks, leading to improved coding performance with no penalty in execution speed compared with the original sequential implementation. As an example, we show that an implementation with 32 transputers using 8x8 blocks and 4096 codebook entries reduces the bit-rate by a factor of 2.625 and rtms 79% faster than a sequential implementation based upon 4x4 blocks and 256 codebook entries, while producing a similar PSNR.

1. Introduction Vector quantization (VQ) has been extensively investigated for audio, speech, image and video coding applications [1 ]. VQ offers a simple decoding process, where the index of the selected code vector is used to produce the output vector through a look-up table operation. On the other hand, the selection of the bestmatched code vector typically involves expensive computations. The encoding complexity of full-codebook searched VQ increases exponentially with the vector dimension and the coding rate. The main drawback of VQ is the fact that the complexity of the encoder imposes restrictions on the size of the codebook that can be used in practice. This can restrict the efficiency of VQ-based compression systems due to two main reasons: (i) only blocks of small dimension (typically, 4x4) can be used. However, operating on vectors of larger size (e.g. 8x8) can result in higher compression ratios due to the fact that the dependencies between neighbouring vectors can be exploited. (ii) moreover, a large codebook is essential for applications where high quality coding (e.g. super high definition images) is required, or in image sequence coding where the VQ codebook should be able to respond to the changes in input statistics. Different methods have been suggested to reduce the encoding complexity at the expense of suboptimal coding performance. Typically, these techniques involve imposing a certain structure on the VQ codebook, so that unconfined access to all effective code vectors is restricted [ 1]. An alternative approach reported in literature has been to exploit parallelism in special-purpose VLSI implementations of VQ [2]. The approach described in this paper is to employ general purpose Multi-Instruction Multi-Data (MIMD) parallel processors in a pipelineprocessor farm (PPF) configuration [3] which utilises a form of VQ codebook parallelism. The advantage of using general purpose processors is that they perform the encoding task of full-codebook search VQ, so that a high throughput optimal vector quantizer can be realised, but at the same time provide the flexibility to allow any desired trade-off to be made between algorithm speedup, PSNR and bitrate. Furthermore, it is relatively straightforward to apply fast codebook search algorithms to processor farms (which exhibit automatic load balancing between processors) to achieve further speedups, whereas this is oRen impractical for synchronised dedicated VLSI implementations. Parts of the work described in this paper have been published in references [4] and [5].

2. Approach adopted for parallelisation 2.1 The Pipeline Processor Farm (PPF) design model. The PPF design model is part of a parallel design methodology which can be used to decompose existing sequential applications onto any type of Multi-Instruction Multi-Data (MIMD) parallel processor network. The design model emerges from the observation that embedded signal processing applications with continuous data flow may be decomposed into a series of independent stages. The sequential application algorithm is then mapped onto a generalised multiprocessor architecture based upon a

38 pipeline of stages with well-defined communication patterns between them. The parallelism within each stage is exploited in the most appropriate way, for example data parallelism or algorithm parallelism can be applied at various levels, or temporal multiplexing can be applied to complete input data sets, or a combination of these approaches can be implemented as appropriate. In an homogeneous MIMD processor implementation, processor farming is used to implement all these forms of parallelism, because it allows indefinite incremental scaling, provides automatic load balancing and results in a single tractable design model. 2.2 Parallelisation schemes for VQ. The design strategy for the parallelisation of the VQ algorithm should be capable of meeting the requirements for execution speed-up as well as efficient codebook storage. Two different schemes are possible for parallelising the VQ encoding algorithm: Sub-codebookAJ 9 Applying image data parallelism, the entire image is x,lx x l... I ,, , partitioned into a number of sub-images which are distributed over worker processors. Each worker processor then needs to perform an exhaustive search of the entire codebook to select the best-matched available code vector.

i

9 Applying codebook parallelism, each worker processor can perform the encoding process on its own portion of the codebook. Upon receiving the same image block, each worker processor then needs to search a smaller part of the entire codebook to select the closest codevector in the corresponding sub-codebook. However, the partial encoding results from the worker processors need to be compared at a final stage where the best-matched available codevector is computed according to the minimum distortion criterion. Figure 1 illustrates this scheme assuming four worker processors.

i ~ ....}/~,,,~l"'ave~1~~!.----- ..---!

I

Sub'c~176176CI/

' x

!s,.,ve,V"

i I Sub-codebookDI

, __'~'

'"':

Figure 1" Block diagram of image-parallel approach

The first approach is straightforward to apply since there is no need to further process the encoding results received from the worker processors, but it has the disadvantage that the entire codebook needs to be stored at each worker processor. This can impose a limitation on the size of the codebook that can be employed for the particular application. In order to alleviate this drawback, we have implemented the second parallelisation scheme. To achieve further speed-up, the selection of the final codevector (through comparison of the intermediate encoding results) is assigned to a separate processor, referred to as the collector. This offers the advantage that the encoding task of the next input vector is overlapped with the final comparison process of the current input vector. Hence, the parallelisation of the VQ encoding algorithm comprises three processes which are mapped onto a 3stage pipeline configuration as follows: 9 Distributor. This process partitions the input image into rectangular blocks of m • n size (e.g. 4 • 4 or 8X 8) and sends each block (input vector) to every worker processor. 9 Worker. This process performs the encoding task on the received input vector using its own sub-codebook and sends the index of the selected codevector and the corresponding distortion value for the particular input vector to the process collector. The worker process is duplicated S-2 times where S is the total number of processors in the configuration. 9 Collector. This process receives the indices and the corresponding distortion values for the particular image block from the worker processors and compares the partial results to find the best-matched coding index according to the minimum distortion criterion. 3. E x p e r i m e n t a l results The VQ encoder was parallelised in two steps. In the first step, the sequential Sparc2 implementation was ported to a single T800, running on a Meiko Computing Surface. Then, the implementation was decomposed into three different processes as outlined in the previous section. The parallel application is designed such that the number of the processors in the configuration is defined by the user as a rtmtime argument. Hence, the user does not need to modify the application as the size of the transputer network is altered. Although the method described in the parallelisation of VQ is applicable to any data compression application that employs vector quantization, the results reported in this paper are based on the encoding of still images. The

,, ~

....

39

spatial resolution of the test images used was 512X 512 pixels. For our experiments, three different codebook populations, namely N=256, 1024 and 4096 for vector dimensions of 4 X 4 and 8X 8 were used to evaluate the performance of the parallel implementation. Figure 2 illustrates the speed-up performance of the algorithm as the number of worker processors is increased when the vector dimension is set at 4 X 4. As can be seen, the performance of the implementation increases fairly linearly up to the point where the communication links become saturated. Saturation occurs when there are 10 workers and 20 workers in the parallel configuration for codebook populations of 256 and 1024 codevectors, respectively. Since the communication requirements are fixed, but computations increase linearly with the codebook size (for the same vector dimension) as the codebook size is increased, the load at the worker processors becomes larger and hence better speed-ups are obtained. The maximum speed-up achieved with the codebook population of 4096 is 25.6 Further increases in execution speed could be achieved for this codebook if more transputers were available, as the implementation does not yet have saturated communication links. 30,,

30-

259 .~

20

-

N=256

- --.o.---

N=102,

/

(-

t

r --

11501

7. ~ ---o--

20

.~ ,.I f

/J/

N=25q

S:---

N=40!6 N=10;14

I

J

\ ~

10

\

0

2

4

8

10

14

16

20

23

0

26

2

4

8

10

14

16

20

23

i

26

Total number of worker processors

Total number of worker processors

Figure 3: Speed-up graphs for k=8 X 8

Figure 2" Speed-up graphs for k=4 X 4

When the vector dimensions are increased to 8X 8, the corresponding speed-up performance of the implementation as the number of worker processors is increased for different codebook populations is shown in Figure 3. The graphs exhibit similar characteristics to the case of 4X 4 block size, however, better speed-up performance figures are obtained for 8 X 8 block size due to the increased work load. As the task size increases, the execution time required to perform sub-codebook search by the workers increases, whereas the cost of transmitting the intermediate results to the collector remains the same. For a codebook population of 4096 codevectors, a maximum speed-up of 27.75 is obtained. 500(

"

v

~ ~

I =256

--

r

P1=1024

~

~

1=4096

3000

E

._

.5

2000

UJ

1000

\

400(

\

3001

I=256 I=1024

~

=4096

\

2001

8

-- ~ -- r

~x I0(~

2

4

8

10

14

16

20

23

26

Total number of worker processors

Figure 4: Execution timings for k=4 X 4

30

;

4

~

;0

,4

;6

~0

23

2~

Total number of worker p r o c e s s o r s

Figure 5" Execution timings for k=8 X 8

Figures 4 and 5 illustrate the execution timings obtained by the parallel implementation for the 4 X 4 and 8X 8 block size cases respectively. By selecting the points where the execution time performance is a minimum for the particular codebook population, the effect of increased codebook population on the execution time performance of both sequential and parallel implementations can be examined. Figures 6 showes the execution time of the encoding process as the codebook population is increased for the sequential and the 32-processor parallel implementation. It can be seen that the execution time of the parallel encoding process even for the largest (e.g. N=4096) codebook population is still well below the execution time of the sequential implementation with the smallest (e.g. N=256) codebook population. In general, a larger VQ codebook population results in better quality of the compressed image at the expense of extra bit rate. There are applications, such as super high definition TV and medical images where perceptually transparent quality is essential. In our experiments, for the test image LENA 512X 512X 8 and vector dimension k=4X 4, using N=4096 rather than N=256 codevectors provided peak signal-to-noise ratio (PSNR) of 33.78dB instead of 30.11 dB.

40

10000

10000. 9164.80

uquentlal

---qn----

parallel

.j

i

4oo0 2000 o

f

i

/

~ 5 9 1 y

_~-.~2

0 2'56

,3~u

, 1024

. Codebook population

369.20

! 4096 ,

I

4ooo,

~,~.32/

2OO0. ~ 0 " ~ ~ 2 $75.$6

o

_

0 2~

.

I~4

.

9 Codebook population

328. I 40tO

Figure 6: Comparison between the sequential and the parallel implementation for k=4 X 4 and k=8 X 8 Finally, Table 1 illustrates the advantage of using large dimensional blocks in low bit rate coding. In the table, two vector quantisers operating on blocks of different size are compared in terms of PSNR, compression ratio and execution time. It can be seen that, the vector quantiser which operates on 8x8 blocks and N=4096 codevectors gives similar PSNR results to the one operating on 4x4 blocks and N=256 codevectors. However, the former leads to compression ratio 42:1 rather than 16:1 of the later. This corresponds to a reduction of 2.625 in the total amount of data required to represent the compressed image. Yet, although the sequential implementation of VQ8x8 is 15.49 times slower than the one of VQ4x4, the parallel VQSx8 is 1.79 times faster than the sequential VQ4x4. Hence we can conclude that parallel processing can be used to enhance the overall performance of VQ-based compression systems, as well as to speed up their execution, and that by trading off between image compression, PSNR and speedup, improvements to all three parameters can be achieved simultaneously. 1(-8 X 8 k-4X 4 Codebook Population N=256 N=4096 PSNR (dB) 30.114 29.105 Bit Rate (bit per pixel) 0.50 0.1875 Compression Ratio 16:1 42:1 Execution Time (sec) -Sequential 591.36 9164.80 Execution Time (see) - Parallel 88.32 328'96 Table 1: Performance Evaluation of parallel VQ for still image coding 4. Conclusions Parallelising the VQ encoder aims to alleviate the encoding complexity and allow the practical implementation of vector quantisers which operate on large block sizes and/or codebook populations. We have presented a scalable parallel approach to vector quantization. A three-stage pipeline implementation of the VQ encoder which offers the advantage of both increased execution speeds and efficient storage of large codebooks was described. Simulation results for still image coding applications demonstrated that parallel implementation of a vector quantiser operating on large codebooks (e.g. N=4096) and large vector dimensions (e.g. k=SX 8) can be faster than the sequential VQ for smaller codebooks (e.g. N=256) and block sizes (e.g. k=4X 4). This is very encouraging, since it indicates that parallelising VQ can offer an improvement to the overall efficiency of VQbased coding systems. 5.References [1] A. Gersho and R.M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, New York, USA, 1991. [2] K. Dezhgosha, M.M. Jamali amd S.C. Kwatra, "A VLSI architecture for real-time image coding using a vector quantization based algorithm," IEEE Trans on Signal Processing, vol. 40, no. 1, pp. 181-189, January 1992. [3] A.C. Downton, R.W. Tregidgo and A. Cuhadar, "Generalised parallelism for embedded vision applications," in Parallel Computations: Paradigms and Applications, A. Zomaya, Editor, Chapman and Hall, 1995. [4] A. Cuhadar, D.G. Sampson and A.C. Downton, "A scalable parallel approach to vector quantization," to appear in Journal of Real-Time Imaging, Academic Press, 1996. [5] A. Cuhadar, D.G. Sampson and A.C. Downton, "Scalable vector quantization architecture for image compression," IEEE Second International Conference on Algorithms and Architectures for Parallel Processing, Singapore, 11-13 June 1996.

Session: B

WAVELETS IN IMAGE/SIGNAL PROCESSING

This Page Intentionally Left Blank

Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

43

Real-Time Image Compression Methods Incorporating Wavelet Transforms

D. T. Morris and M. D. Edwards Department of Computation, UMIST, PO Box 88, Manchester, M60 1QD, United Kingdom Abstract The aims of the work described here are to develop and implement new generic image compression methods, based on the use of wavelet transforms and vector quantisation techniques, to achieve very high compression results whilst maintaining satisfactory image quality. A major problem with current wavelet-based compression methods is the large amount of computation required to process a single image. This problem is exacerbated with the requirement to process video sequences in real-time. We intend to use previously developed hardware/software co-design techniques to partition the compression and decompression algorithms between hardware and software implementations in order to maximise the compression performance at acceptable cost. The paper describes the different software and hardware architectures we propose to investigate.

Introduction There is an increasing use of multimedia computing techniques in a wide range of diverse application areas, for example, manufacturing and commerce, publishing, education, and leisure services. Central to these techniques is the processing, storage, and transmission of both still photographs and video sequences. It is well known that the volume of data required to describe such images in their raw form makes storage prohibitively expensive and greatly slows transmission. For example, a standard 35 mm digitised photograph requires nearly 20 Mbytes of storage and a one second high resolution video sequence about 30 Mbytes giving a required data transmission rate of at least 240 Mbits/sec. It is, therefore, evident that the information contained in images must be compressed in some w a y - usually by eliminating redundant information and encoding the remaining entropy. The goal is to reduce the bit rate for storage and transmission whilst maintaining acceptable quality when the images are subsequently de-compressed. It is possible to achieve relatively high compression ratios using current international standard techniques, for example, JPEG [1] and MPEG [2]. These standards are normally used in commercial multimedia applications with medium resolution images, for example, 352 by 288 pixel video sequences at 25 frames per second. Unfortunately, very high compression ratios, in excess of 100, can only be achieved at the expense of significant losses in image quality. In addition, the compression/decompression of video information in real-time can only be realistically performed using expensive special-purpose video processing components [3]. The major objective of the work described here is to develop and implement new generic image compression methods, based on the use of wavelet transforms and vector quantisation techniques, which will achieve very high compression results whilst maintaining satisfactory image quality. A major problem with current wavelet-based compression methods is the large amount of computation required to compress/decompress single images- this is especially true when processing video sequences in realtime. We believe a key issue in the realisation of feasible compression methods is the partitioning of the algorithms between hardware and software implementations in order to maximise compression performance at acceptable cost. Previous research in the area of hardware/software codesign has indicated that "software acceleration" methods [4, 5] can be used to enhance the performance of software-based systems, using relatively inexpensive programmable hardware as a special-purpose coprocessor. We intend to investigate the implementation of these compression algorithms as coprocessors for conventional microprocessor systems. This will give us a range of implementation alternatives for image compression systems with differing cost/performance characteristics. This work represents the first stage of our ongoing research into the design and implementation of distributed multimedia systems using novel technologies and methods. It is hoped that the results from this work will allow high-quality images to be stored in a computer system using less disk memory, and will permit images to be transmitted on computer networks using cheap Ethernet-based communications media.

44

Previous Work The discrete wavelet transform has recently received considerable attention in the context of sub-band coding in image compression. An image is decomposed into a set of sub-images with different resolutions corresponding to different frequency ranges in the original image, A number of researchers [6, 7, 8] have used wavelet transforms and vector quantisation techniques to compress still images. They identify two key areas of research (i) choice of suitable wavelet filters for image compression and (ii) methods for encoding wavelet coefficients using scalar and vector quantisation techniques [9]. Both these areas will be addressed in our research. There have been software implementations of MPEG decoders, for example, 160 by 120 video sequences were processed in real-time on a RISC-based workstation [10]. It is estimated that 320 by 240 sequences could be processed at 10-15 frames per second on a powerful state-of-the-art workstation. Wavelet techniques can also be employed to decompose a video frame into multiple layers with different resolutions and frequency bands [11]. A motion estimation scheme can then used to track motion activities at the different layers across multiple frames. Good compression results at real-time frame rates were achieved using different variations of this motion-compensated wavelet video compression system. We intend to investigate the use of 3-D wavelet transformations to assist with the compression of video sequences. A range of VLSI devices have been developed for performing JPEG, MPEG-I, and MPEG-II image compression tasks [ 12]. It is possible to generate a range of hardwired and programmable architectures, which provide a range of cost/performance trade-offs. A video co-processor for a conventional microprocessor can prove to be a viable option. Our proposed use of programmable hardware (FPGAs) to act as a co-processor will allow us to explore different implementations in an efficient manner. Some researchers have developed effective VLSI architectures [13] for implementing the discrete wavelet transform. We intend to take account of this work in the design' and implementation of our coprocessors.

Proposed Work Our ultimate aim is to encode full resolution colour video data at real-time frame rates, reduced by a factor that allows transmission over public networks whilst allowing high quality decoded data to be derived. We anticipate that encoding still images and video sequences using the 2-D discrete wavelet transform will have the same relationship as the JPEG and MPEG compression methods using the 2-D discrete cosine transform. Therefore, we shall initially investigate using the wavelet transform to compress still images in a manner similar to that employed in the JPEG method. That is, processing the image by transforming sub-blocks and encoding the wavelet coefficients. We intend to investigate the interplay between various coding parameters, for example, size of sub-block and choice of wavelet filter, together with the quality and degree of compression that may be achieved. Whilst we realise that much of this work has already been performed, it is our intention to implement these new algorithms on a Pentium-based workstation to obtain cost/performance benchmarks for future implementations of the algorithms using special-purpose programmable co-processors. Subsequently, we will investigate methods of encoding sequences of related images. The MPEG standard suggests that a sequence of images are encoded essentially by interpolating between JPEG encoded frames. The MPEG standard specifies that the JPEG versions of a number of frames are derived (Iframes), together with 'predicted' frames - known as P-frames and B-frames - which allow the motion of objects between I-frames to be taken into account. This approach could be used in our scheme simply by replacing the JPEG encoded frames by wavelet encoded ones. However, Hilton et al. [14] suggest that temporal redundancy is better exploited by encoding the difference between two sequential frames, as shown in Figure 1. Since images in a sequence are highly correlated, the difference between any adjacent pair of images will contain much less information than either of the original images. Encoding these difference images will, therefore, be highly efficient. The 'support analysis' operation is concerned with the identification of those wavelet coefficients that are required by the inverse wavelet transform to reconstruct an approximation of the ith difference image. By altering the threshold value different compression ratios can be achieved. The decoder can rapidly reconstruct the difference image by computing the inverse wavelet transform for only those pixels that are influenced by the coefficients sent by the encoder.

45

frame

i+ 1

-

frame

i

Aframe i

=

~~kenc~176 Figure 1

~

Video compression using frame differencing

Prior to this stage of the project we have been concerned with achieving efficient image sequence encoding/decoding algorithms (efficient with respect to compression rates and quality of decoded images), their execution times have been of secondary interest. We do not anticipate that a purely software implementation of the wavelet transforms will achieve our goal of real-time operation. We shall, therefore, examine methods for speeding up the sequential execution of the compression/decompression algorithms using special-purpose co-processors. At two extremes of performance are software and hardware implementations of video encoding. A pure software execution has been shown to be cheap but slow, whilst a pure hardware implementation is fast but can be prohibitively expensive. We aim to investigate compromises between pure hardware and pure software solutions. Previous work [4, 5] has indicated that by identifying the performance critical regions of a sequential algorithm and transferring them to a special-purpose hardware implementation a speedup of about three times the execution time of the software-only implementation can be achieved. We envisage that such a performance enhancement will allow us to achieve close to real-time performance. The system hardware architecture will take the form of a conventional Pentium-based workstation and includes the 'software acceleration architecture', which is interfaced to the system's PCI bus as shown in Figure 2.

Cache memory

Pentium processor

I

I video__~ Frame store input video I output Figure 2

System hardware architecture

Bridge

DRAM

I I

I

PClbus

J I

I

software acceleration architecture

46 In the software acceleration architecture the processor (P) will execute the main components of compression and decompression algorithms. The associated code and data will reside in the memory (M). The FPGA will execute the identified time critical software components in hardware. By having programmable hardware - the F P G A - it will be possible to experiment with different hardware/software tradeoffs in the implementations of the algorithms in order to optimise the overall system performance. Finally, we plan to perform an extensive series of experiments where we shall be able to compare various wavelet transformation and vector quantisation methods with the industry standard JPEG and MPEG schemes. This will permit us to determine the best choices with respect to execution speed, volume of compressed data, quality of decompressed data, and cost. We shall further be able to report on the cost/benefit of various hardware and software compromises. Conclusions

In this paper we have proposed a plan of work which will allow us to develop and evaluate new generic image compression methods, based on the use of wavelet transforms and vector quantisation techniques, which will achieve very high compression results whilst maintaining satisfactory image quality. The novel aspects of our work include the ability to explore tradeoffs between hardware and software implementations in order to maximise performance and achieve close to real-time processing of video sequences at acceptable cost. Future work will include the implementation of these new algorithms using a multicomputer system. We will investigate methods for partitioning the algorithms - based on an analysis of 'spatial' and 'temporal' parallelism requirements- for execution on a network of T9000 transputers. We shall also develop special-purpose hardware that will be integrated into the T9000 network and thereby speedup the execution of the algorithms as necessary. References

[1] [2]

[3] [4] [5] [6] [7]

[8] [9] [10] [11]

[12] [13]

1-14]

G. Wallace, "The JPEG Still Image Data Compression Standard", Communications of the ACM, vol. 34, no. 4, pp. 30-44, 1991. D. LeGall, "MPEG: A Video Compression Standard for Multimedia Applications", Communications of the ACM, vol. 34, no. 4, pp. 45-68, 1991. CL450 MPEG Video Decoder User's Manual. C-Cube Microsystems, Milpitas, USA, 1995. M. D. Edwards and J. Forrest, "Software Acceleration using Programmable Hardware Devices", IEE Proceedings - Computers and Digital Techniques, vol. 143, no. 1, pp. 55-63, 1996. M. D. Edwards and J. Forrest, "A Practical Hardware Architecture to Support Software Acceleration", Microprocessors and Microsystems, vol. 20, no. 3, pp. 167-174, 1996. M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, "Image Coding Using Wavelet Transform", IEEE Transactions on Image Processing, vol. 1, no. 2, pp. 205-220, 1992. J. D. Villasenor, B. Belzer, and J. Liao, "Wavelet Filter Evaluation for Image Compression", IEEE Transactions on Image Processing, vol. 4, no. 8, pp. 4-15, 1995. A. Averbuch, D. Lazar, and M. Israeli, "Image Compression Using Wavelet Transform and Multiresolution Decomposition", IEEE Transactions on Image Processing, vol. 5, no. 1, pp. 4 15, 1996. W. Li and Y. Q. Zhang, "Vector-Based Signal Processing and Quantization for Image and Video Compression", Proceedings of the IEEE, vol. 83, no. 2, pp. 317-335, 1995. K. Patel, B. C. Smith, and L. A. Rowe, "Performance of a Software MPEG Video Decoder", Computer Science Division, University of Califomia at Berkeley, USA, 1993. Y. Q. Zhang and S. Zafar, "Motion-Compensated Wavelet Transform Coding for Color Video Compression", IEEE Transactions on Circuits and Systems for Video Technology, vol. 2, no. 3, pp. 285-296, 1992. P. Pirsch, N. Demassieux, and W. Gehrke, "VLSI Architectures for Video Compression - A Survey", Proceedings of the IEEE, vol. 83, no. 2, pp. 220-246, 1995. K. K. Parhi and T. Nishitani, "VLSI Architectures for Discrete Wavelet Transforms", IEEE Transactions on Very Large Scale Integration Systems, vol. 1, no. 2, pp. 191-202, 1993. M. L. Hilton, B. D. Jawerth, and A. Sengupta, "Compressing Still and Moving Images Using Wavelets", Multimedia Systems, vol. 2, no. 3, 1994.

Proceedings IWISP '96,"4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

47

Custom Wavelet Packet Image Compression Design* Mladen Victor Wickerhauser t

July 11, 1996 Abstract

This tutorial paper presents a meta-algorithm for designing a transform coding image compression algorithm specific to a given application. The goal is to select a decorrelating transform which performs best on a given collection of data. It consists of conducting experimental trials with adapted wavelet transforms and the best basis algorithm, evaluating the basis choices made for a training set of images, then selecting a transform that, on average, delivers the best compression for the data set. A crude version of the method was used to design the WSQ fingerprint image compression algorithm.

1

Introduction

No single image compression algorithm can be expected to work well for all classes of digital images. The sampling rates, frequency content, and pixel quantization all influence the compressibility of the original data. Subsequent machine or human analyses of the compressed data, or its presentation at various magnifications, all influence the nature and visibility of distortion and artifacts. Thus compression standards like those of the JPEG committee [1], established for a "natural" images intended to be viewed by humans, do not satisfy the requirements for compressing fingerprint images intended to be scanned by machines. In that particular example, it was necessary to develop a new algorithm WSQ [2]. Both JPEG and WSQ are examples of transform coding image compression algorithms. That class provides a rich selection from which custom compression algorithms may be chosen. This paper presents a meta-algorithm for rationally and automatically choosing one of them to suit a particular application. It focuses on the transform portion of the compression algorithm: the best basis method is used to optimize it to provide the best average compression of a representative set of images, subject to speed constraints. A crude version of the method was used to design the WSQ fingerprint image compression algorithm.

2

T r a n s f o r m coding image c o m p r e s s i o n

The generic transform coding compression scheme is depicted in Figure 1. It consists of three pieces: 9 Transform: Apply a function, which is invertible or lossless in exact arithmetic, which should decorrelate the pixels in the image. It does this by decomposing the image into a superposition of independent patterns; it produces a sequence of floating-point amplitudes which are the intensities of the new components. 9 Quantize: Replace the transform amplitudes with (small) integer approximations. This is the lossy or non invertible part of the algorithm, where all the distortion is introduced. 9 Code: Rewrite the integer stream of quantized transform coefficients into a more efficient alphabet, so as to approach the information-theoretic minimum bit rate. This operation is akin to a table lookup, and is invertible. These three steps are depicted in Figure 1. *Research partially supported by NSF, AFOSR, and Southwestern Bell Corporation tDepartment of Mathematics, Washington University, St. Louis, Missouri, 63130 USA

48

I

Scanned image

Transform

Quantize

Code

Figure 1: Generic transform coding image compression device.

Decode

Unquantize

Untransform

.•Restored image

Figure 2: Inverse of the generic transform coder: the decoder.

To recover an image from the coded, stored data, the steps in Figure 1 are inverted as shown in Figure 2. The first and third blocks of the compression algorithm are exactly invertible in exact arithmetic, but the Unquantize block does not in general produce the same amplitudes that were given to the Quantize block during compression. The errors thus introduced can be controlled both by the fineness of the quantization (which limits the maximum size of the error) and by favoritism (which tries to reduce the errors for certain amplitudes at the expense of greater errors for others). The compression ratio produced by such an algorithm is computed by dividing the size of the input file by the size of the output file. It thus takes into account all of the side information stored with the output file that is needed for reconstruction. Roughly speaking, if the coding step is perfectly efficient, the compression ratio is maximized for a given distortion when the transform and quantize steps produce a sequence with minimal entropy. However, since minimal entropy is hard to characterize and harder achieve, it is better to aim at a broader target: a sequence with almost all of the values being zero. Such a sequence will have a low, if not minimal, entropy, since its value distribution with be highly peaked at zero. This paper concentrates on the Transform operation. The goal is to choose, from a large family of wavelet, wavelet packet, and local trigonometric transforms, the one which can be expected to yield the largest fraction of negligible amplitudes on data represented by a training set. Those will be quantized to zero in exchange for a given degree of distortion, yielding the biggest peak at zero in the value distribution and resulting in the best compression. It will assumed that the transforms are orthogonal or nearly orthogonal, so that their condition number is close to 1 and they introduce no significant redundancy.

3

Custom transforms

There are two fast ways to decompose images at the transform step: splitting into small blocks of pixels and then applying some fast transform to the blocks, or splitting the whole image into frequency subbands by convolving with short filters. Both methods cost O(P log P) operations for an P-pixel image. Detailed formulas and a proof of the complexity statement can be found in Reference [5], so only a brief summary will be presented here. In the pixel splitting scheme, the image is cut into blocks, either of fixed or variable size, but small enough so that the intensities of all pixels contained within a block are correlated. This cutting is depicted in Figure 3. Then decorrelation is performed by applying the two-dimensional discrete cosine transform (DCT) to the blocks. This method is used in the JPEG still picture image compression standard [3]. The resulting amplitudes represent spatial frequency components in the blocks. Because digitized images are often limited in their spectral content, most of the amplitudes in each block will be negligible. To maximize the proportion

49

II Figure 3: Division of a 128 x 128 pixel image into 8 x 8 blocks, as in JPEG, or into blocks varying from 4 x 4 to 32 x 32. of negligible amplitudes, the blocks should be chosen as large as possible subject to the constraints that (1) only a few spatial frequencies are present in each block, and (2) describing the block boundaries does not create too much side information. In the subband splitting scheme, a low-pass and a high-pass filter are used along rows and columns to split the image into four subimages characterized by restricted frequency content. This process is repeated on the subimages, down to some maximum depth of decomposition, resulting in a segmentation of frequency space into subbands. Two such segmentations are depicted in Figure 4; the one on the right is used in the WSQ fingerprint image compression algorithm [2]. The resulting amplitudes again represent spatial frequency components, computed over portions of the picture determined by the depth of the subband and the location of the amplitude in its subband. Again, for images of limited spectral content, most of these amplitudes will be negligible. The two example subband decompositions are approximately radial with respect to the "origin" in the upper left hand corner; this works well for isotropic images, i.e., where no direction is favored over any others.

4

The joint best basis

Both splitting schemes can be organized as quadtrees to a specified depth, with the selected transform determined by the leaves of a subtree like the one depicted in Figure 5. To choose the subtree and thus the transform, each member of a representative training set of images is decomposed into the complete quadtree of amplitudes. Then the squares of these amplitudes are summed into a sum-of-squares quadtree. Using an information cost function such as "number of nonnegligible amplitudes", the sum-of-squares quadtree is searched for its best basis, which is the one that minimizes this cost ([5], p. 282). Figure 6 depicts this algorithm. The best basis for the E quadtree is the joint best basis for the training set of images 1, 2 , . . . , N. That is the transform which produces, on average, the largest number of negligible output coefficients. To find the best basis requires examining each coefficient in the quadtree and examining each subband or pixel block at most twice, which means that the complexity is O(P log P) for P-pixel images. To find the joint best basis requires building the sum-of-squares tree first, which dominates the total complexity with its O(NPlog P) cost for a training set of N P-pixel images. Of course, the joint best basis transform is only optimal within its own class, and the class is determined by the technical details and mathematical properties of the splitting algorithm. If these constraints were removed and the search performed over all orthonormal transforms, then the joint best basis will be the

50

-H I

Figure 4: Division of an image into orthogonal wavelet subbands to level 5, or into the WSQ subbands. Frequencies increase down and to the right.

Figure 5: Splitting schemes produces quadtrees; custom bases are determined by the leaves of a subtree such as the one shown here, shaded for emphasis.

51

Sample image 1

S u

m

Sample image 2

I Best-basis

search]

-P -HI -HI ! I I

!

Sample image N Figure 6: A joint best basis from a class of splitting algorithms is determined by a sample set of N images.

class 1 Fi class 2 rl

Subband

sphn~ Training set

~

: classnsra

"~

P sphttmg

class 1 17

!

x

e

~

l

~ : classnpra

Costs of the joint best basis from the class

Least cost [] determines the winning transform

Figure 7: A meta-algorithm for deciding which splitting algorithm to use with a particular class of images.

Karhunen-Lo~ve (KL) or principal orthogonal basis [4], which is known to be the minimizer of the number of nonnegligible amplitudes. With the constraints, whose purpose is to speed things up, the chosen transform is just an approximation to KL.

5

C h o o s i n g t h e b e s t t r a n s f o r m f r o m m u l t i p l e classes

There is a meta-algorithm for relaxing the constraints a bit while preserving the speed. Namely, a custom transform can be chosen by checking many classes of splitting algorithms in order to further increase the expected number of negligible coefficients. This scheme was first proposed by Yves Meyer, and is depicted in Figure 7. At the end of each path is a cost figure, the expected number of nonnegligible coefficients for the training set of images. The path that leads to the lowest cost determines which algorithm should be used to find the custom transform for compressing the images represented by the training set. Examples of different classes are the different subband splitting schemes associated to different conjugate quadrature filters ([5], Chapter 5 and Appendix C), or the adapted local trigonometric bases determined by different windows ([5], Chapters 3 and 4).

52

6

Conclusion

Given a training set of images, a transform coding image compression algorithm may be rationally chosen from a class of fast splitting algorithms. The choice criterion is a cost function that, when low, yields high compression ratios for transform coding image compression. The method works for wavelet packet and local trigonometric transforms and thus produces well-conditioned compression and decompression methods of complexity O(PlogP) for P-pixel images. Searching for the best choice itself costs O(NPlogP), where N is the number of training images.

References [1] ISO/IEC JTC1 Draft International Standard 10918-1. Digital compression and coding of continuous-tone still images, part 1: Requirements and guidelines. Available from ANSI Sales, (212)642-4900, November 1991. ISO/IEC CD 10918-1 (alternate number SC2 N2215). [2] IAFIS-IC-0110v2. WSQ gray-scale fingerprint image compression specification. Version 2, US Department of Justice, Federal Bureau of Investigation, 16 February 1993. [3] Gregory K. Wallace. The JPEG still picture compression standard. 34:30-44, April 1991.

Communications of the A CM,

[4] Mladen Victor Wickerhauser. Fast approximate factor analysis. In Martine J. Silbermann and Hemant D. Tagare, editors, Curves and Surfaces in Computer Vision and Graphics II, volume 1610 of SPIE Proceedings, pages 23-32, Boston, October 1991. SPIE. [5] Mladen Victor Wickerhauser. Adapted Wavelet Analysis from Theory to Software. AK Peters, Ltd., Wellesley, Massachusetts, 9 May 1994. With optional diskette.

Proceedings IWISP '96," 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

53

Two-dimensional directional wavelets and image processing Jean-Pierre Antoine Institut de Physique Th~orique, Universit~ Catholique de Louvain B - 1348 Louvain-la-Neuve, Belgium E-mail: [email protected] Abstract The two-dimensional continuous wavelet transform (CWT) is characterized by a rotation parameter, in addition to the usual translations and dilations. This enables it to detect edges and directions in images, provided a directional wavelet is used. In this paper, we review the general properties of the 2-D CWT, with special emphasis on the directional aspects. We discuss, in particular, the problem of wavelet calibration and we present several applications of directional wavelets. 1.

The

Continuous

Wavelet

Transform

in two dimensions

Both in 1-D (signal analysis) and 2-D (image processing), the wavelet transform (WT) has become by now a standard tool (see [1]-[4] for a review). Although the discrete version, based on multiresolution analysis, is probably better known, the continuous W T (CWT) plays a crucial role for the detection and analysis of particular features in a signal, and we will focus here on the latter, with particular emphasis on the directional aspects. Indeed the CWT is a very efficient tool for detecting oriented features in a signal, provided one uses a directional wavelet, that is, a wavelet which has itself an intrinsic orientation. We refer the reader to [5, 6] for a detailed analysis.

1.1. Mathematical properties By an image, we mean a 2-D signal of finite energy, represented by a function s E L2(R2,d2~). In practice, a black and white image will be represented by a bounded n0n-negative function: 0 _ s(s _ M, V s E ]~2 (M > 0), the discrete values of s(s corresponding to the level of gray of each pixel. A wavelet is a function r E L2(]~2) which is admissible, that is: d2~

~r = (2~r)~ f

I,~(~)12 < ~.

(1.1)

If r is regular enough, the admissibility condition simply means that the wavelet has zero mean:

r

r

fd2~r

(1.2)

A

In practice, both r and its Fourier transform r are supposed to be well localized, and, in addition,the wavelet r is often required to have a few vanishing moments, as in the 1-D case [2]. This condition improves the capacity of the W T to detect singularities. Let now s E L=(R 2, d=~) be an image. Its continuous wavelet transform with respect to the fixed considered as a wavelet r S - Wcs, is the scalar product of s with the transformed wavelet r function of (a, 0, b) (for simplicity, we assume r to be normalized by c O = 1)"

s(~,o,~) =
- a -~ / r = a f eig'g r J

s(~) d2:F. "g(f~) d2k.

(1.3)

54 In these relations, b 6 R 2 is a translation, a > 0 a dilation, and r_0(0 < 8 < 27r) denotes the usual 2 x 2 rotation matrix. The parameter space G = { (a, 8, b)} is in fact the similitude group of R 2 . Indeed the CWT, including the admissibility condition (1.1), originates from group theory, namely the natural representation of G in the Hilbert space L2(R 2 , d2~). The main properties of the wavelet transform We 9s ~ S may be summarized as follows [5, 6]" 9 Since the wavelet r is required to have zero mean, We provides a filtering effect, exactly as in l-D, i.e. the analysis is local in all four parameters a,8, b, and it is particularly efficient at detecting discontinuities in images. 9 We is

linear, contrary, for instance, to the Wigner-Ville transform, which is bilinear.

9 We is

covariant under translations, dilations and rotations.

9 We conserves energy:

///~33dSd2b ,S(a,8, b").2= / d2~.s(~), 2,

(1.4)

i.e. it is an isometry from the space of signals into the space of transforms, which is a closed subspace of L2(G, dg), where dg - a-3dad8d2b is the natural invariant measure on G. 9 As a consequence, We is invertible on its range and the inverse transformation is simply the adjoint of We. Thus one has an exact reconstruction formula:

s(~) =///~33dSd2b Ca,o,~(~) S(a,8, b).

(1.5)

In other words, the 2-D CWT provides a decomposition of the signal in terms of the analyzing wavelets Ca,e,~, with coefficients S(a, 8, b). 9 The projection from L2(G, dg) onto the range of We is an integral operator, whose kernel K is the autocorrelation function of r (also called reproducing kernel)"

K(a',O',E'la,O,g)= (r Therefore, transforms satisfy the

[r

(1.6)

reproduction property:

S(a',e',~') =///~dOd2~ K(.',e',~'I~,e,~) S(~,O,~).

(1.7)

1.2. Interpretation and implementation: The various representations The first problem one faces in practice is one of visualization. Indeed S(a, 8, b) is a function of 4 variables: two position variables b 6 R 2 , and the pair (a, 8) 6 ] ~ x [0, 27r) ~_ ] ~ . This splitting has an intrinsic geometrical meaning [5, 6]. Indeed, the pair (a -1 , 8) plays the role of spatial frequency, expressed in polar coordinates, exactly as a -1 defines the frequency scale in the 1-D case [7, 8]. Thus the full 4-D parameter space of the 2-D W T may be interpreted as a phase space, in the sense of classical mechanics. Now, to compute and visualize the full CWT in all four variables is hardly possible. Therefore, in order to obtain a manageable tool, one must restrict oneself to a section of the parameter space {a, 8, bx, by} . The geometrical considerations made above indicate that two of them are more natural: either (a, 8) or (bx, by) are fixed, and the W T is treated as a function of the two remaining variables. The corresponding representations have the following characteristics. (i) The position representation, where a and 8 are fixed and the CWT is considered as a function of position b alone. This is the standard representation, and it is useful for the general purpose of image processing: detection of position, shape and contours of objects; pattern recognition; image enhancement by resynthesis after elimination of unwanted features (such as noise). Alternatively, one may use polar coordinates, in which case the variables are interpreted as range [b[ and perception angle a, another familiar representation of images.

55

(ii) The scale-angle representation: for fixed b, the CWT is considered as a function of scale and angle (a, 0), i.e. of spatial frequency. In other words, one looks at the full CWT from b, and observes all scales and all directions at once. The scale-angle representation will be interesting whenever scaling behavior (as in fractals) or angular selection is important, in particular when directional wavelets are used. In addition to these two familiar representations, there are four other ones, corresponding to twodimensional sections. Among these, the angle-angle representation might be useful for applications [10]. Here one fixes the range Ibl and the scale a and considers the CWT at all perception angles a and all anisotropy angles 0. For the numerical evaluation, discretization of the WT in any of these representations, and systematic use of the F F T algorithm will lead to a numerical complexity of 3NIN2 log(NiN2), where N1, N2 denote the number of sampling points in the two remaining variables. The natural discretization is linear for the position variables (bx, by) and the angles 0, ~, but logarithmic in the scale variable a.

2. Choice of the analyzing wavelet The next step is to choose an analyzing wavelet r At this point, there are two possibilities, depending on the problem at hand, namely isotropic or directional wavelets.

2.1. Isotropic wavelets If one wants to perform a pointwise analysis, that is, when no oriented features are present or relevant in the signal, one may choose an analyzing wavelet r which is invariant under rotation. Then the 9 dependence drops out, for instance, in the reconstruction formula (1.5). A typical example is the isotropic 2-D mexican hat wavelet, which is simply the Laplacian of a Gaussian: ~bH(:~) - - ( 2 - [~[2) exp(-1~1x1~2).

(2.1)

This is a real, rotation invariant wavelet, introduced by Marr [11]. The anisotropic version is obtained by replacing in (2.1) ~ by A~, where A = diag[e -1/2,1], e > 1, is a 2 x 2 anisotropy matrix. However, this wavelet is of little use in practice, because it still acts as a second order operator and detects singularities in all directions. Indeed it is not a directional wavelet, in the technical sense defined below. Hence the mexican hat will be efficient for a fine pointwise analysis, but not for detecting directions.

2.2. Directional wavelets When the aim is to detect oriented features (segments, edges, vector field,... ) in an image, for instance to perform directional filtering, one has to use a wavelet which is not rotation invariant. The best angular selectivity will be obtained if r is directional. By this we mean that the effective support of its Fourier transform r is contained in a convex cone in spatial frequency space {]~}, with apex at the origin, or a finite union of disjoint such cones (in that case, one will usually call r multidirectionaO. According to this definition, the anisotropic mexican hat is not directional, since the support of CH is centered at the origin, no matter how big its anisotropy is, and, indeed, detailed tests confirm its poor performances in selecting directions [5]. Typical directional wavelets are the 2-D Morlet wavelet or the Cauchy wavelets [6]. A

2.2.1. The 2-D Morlet wavelet This is the prototype of a directional wavelet: CM(Z) -- exp(if%. Z) e x p ( - ~~IA~I 2) ,

(2.2)

r

(2.3)

= V~ exp ( - l i c k 2 + (ky - ko)21).

The parameter ko is the wave vector, and A the anisotropy matrix as above. As in l-D, we should add a correction term to (2.2) and (2.3) to enforce the admissibility condition CM(0) = 0. However, since it is numerically negligible for [f%[ _ 5.6, we have dropped it altogether. The modulus of the (truncated)

55 wavelet GM is a Gaussian, elongated in the x direction if e > 1, and its phase is constant along the direction orthogonal to ko. Thus the wavelet GM smoothes the signal in all directions, but detects the sharp transitions in the direction perpendicular to k"o. The angular selectivity increases with Ifcol and with the anisotropy e. The best selectivit.y will be obtained by combining the two effects, i.e. by taking k-'o = (O, ko). The effective support of GM is centered at ko and is contained in a convex cone, that becomes narrower as e increases.

2.2.2. The Cauchy wavelet Let g _= g(0, a,/3) = {fr E R 2 [a < r _ 0, V/0 e g(0,a,/3)} is also convex. Given afixed vector r7 e C(0,&,/~), we define the Cauchy wavelet in spatial frequency variables [6]:

,•(e)(•) rn

/ (~" r l, 0,

(~. ~), ~-~.,~, fi e c(0,~,Z) otherwise.

(2.4)

where e~ (resp. e~) denotes the unit vector in the direction c~ (resp. /3). The Cauchy wavelet r162 is strictly supported in the cone g(0, c~,/3) and the parameters m, 1 E N* give the number of vanishing moments on the edges of the cone. An explicit calculation then yields the following result:

(c)-.

Gtrn (x) = const.

(~. ~. )-l-~ 9

(E. g~)

-~-~

,

(2.5)

where we have introduced the complex variable E = E + ir~ E R 2 + iC. We show in Figure 1 the wavelet 4 (k) for C -- C ( - 2 0 ~ 20~ this is manifestly a highly directional filter. ~4(C)

,,

\

"..',,

Figure 1" Two directional wavelets, in spatial frequency space: (left) The Morlet wavelet (e = 5, 0 = 45~ (right) The Cauchy wavelet (a = 20~

3.

Evaluation

of the performances

of the CWT

Given a wavelet, what is its resolving power, in particular what is its angular and scale selectivity ? What is the minimal discretization grid for the reconstruction formula (1.5) that guarantees that no information is lost? The answer to both questions resides in a quantitative knowledge of the properties of the wavelet at hand, that is, the tool must be calibrated. To that effect, one takes the WT of particular, standard signals. Three such tests have proven useful [5], and in each case the outcome may be viewed either at fixed (a, ~) (position representation) or at fixed (scale-angle representation).

9 Point signal: for a snapshot at the wavelet itself, one takes as signal a delta function, i.e. one evaluates the impulse response of the filter: <~]l/)a,0,~) __ a-1 1/)(a-lr_8(_~)).

(3.1)

57

This yields the effective support of r and from there one may define the resolving power of r 9 Reproducing kernel: taking as signal the wavelet r itself, one obtains the reproducing kernel K, which measures the correlation length in each variable a, 0, b: K(a,O,b[1, O,O)

(r162

= a-1 / d2 s 1 6 2

s

b))r163

(3.2)

A detailed analysis of K also yields the resolving power of the wavelet r in each variable. 9 Benchmark signals: for testing particular properties of the wavelet, such as its ability to detect a discontinuity or its angular selectivity in detecting a particular direction, one may use appropriate 'benchmark' signals.

3.1. T h e scale a n d angle resolving power Suppose the wavelet r has its effective support in spatial frequency in a vertical cone of aperture A~p, corresponding to ko - (0, ko). The width of r in the x and y directions is given by 2w~, resp. 2wy"

Then the wavelet r is concentrated in an ellipseof semi-axes w~, wy, and its radial support is ko - wy <_

p <_ ko + wy. Thus the scale width or scale resolving power (SRP) of r is defined as:

SRP(r

= ko + w_________~y. ko - wy

(3.4)

In the same way, one defines the angular width or angular resolving power (ARP) by considering the tangents to that ellipse. Then a straightforward calculation yields: ARP(r

= 2 cot -1 V/k~ - w~ _ A~.

(3.5)

Wx

For instance, if r is the (truncated) Morlet wavelet (2.2), one obtains: SRP(r

kov~ + 1 = k o v ~ - 1'

ARP(r

= 2cot -1 ~/e(k 2 - 1)

(3.6)

and, for ko >> 1" ARP(r

= 2 cot-l(kov~).

(3.7)

This last expression coincides with the empirical result of [5]: the angular sensitivity of CM depends only on the product kov/-e. Notice also that the SRP is independent of the anisotropy factor e. If r is the Cauchy wavelet (2.4) with support in the cone C ( 0 , - a , a), the ARP is simply the opening angle 2a of the supporting cone. 3.2. T h e r e p r o d u c i n g kernel and the resolving power of the wavelet A natural way of testing the correlation length of the wavelet is to analyze systematically its reproducing kernel 9 Let the effective support of the wavelet r in spatial frequency be, in polar coordinates, Ap and A~. Then an easy calculation [6] shows that the effective support of K is given by a 'nin = (Ap) -1 < a _ a max = A p for the scale variable, and - A ~ _ 0 ___A~ for the angular variable. Thus we may define the wavelet parameters (or resolving power) Ap, A~ in terms of the parameters Aa, A0 of K, as: = ~/amaz/amin; scale resolving power (saP). A p = ~ 9angular resolving power (ARP)" A~ = 89 This result may be exploited for determining the minimal discretization grid needed for the numerical evaluation of the reconstruction integral (1.5). In particular, one may design a wavelet filter bank {r (f~)}, which yields a complete tiling of the spatial frequency plane, in polar coordinates [6, 9]. Clearly this analysis is only possible within the scale-angle representation. Thus it requires the use of the CWT, and it is outside of the scope of the DWT, which is essentially limited to a Cartesian geometry.

58 3.3. C a l i b r a t i o n of a w a v e l e t w i t h b e n c h m a r k signals The capacity of the wavelet at detecting a discontinuity may be measured on the (benchmark) signal consisting of an infinite rod (see [5] for the full discussion). The result is that both the mexican hat and the Morlet wavelet are efficient in this respect. For testing the angular selectivity of a wavelet, one computes the W T of a segment, as a function of the difference in orientation, Ar between the wavelet and the segment. The conclusion is that the Morlet wavelet is highly sensitive to orientation, but the mexican hat is not. For an excentricity e -- 5, CM detects the orientation of a segment with a precision of the order of 5 ~ That is, the W T reproduces the segment if Ar < 5 o, but the latter becomes essentially invisible for Ar > 15 o, except for the tips. In the end, the image of the segment reduces to two peaks corresponding to the two endpoints.. The same test performed with an anisotropic mexican hat gives a result almost independent of Ar Another way of comparing the angular selectivity of the two wavelets is to analyze a directional signal in the angle-angle representation (~, 8) described above. The result confirms the previous one.

4. A p p l i c a t i o n 1- Directional filtering As a consequence of its good directional selectivity, the Morlet wavelet is quite efficient for directional filtering. A good illustration is the analysis of a pattern made of rods in many different directions [6]. Applying the C W T with a fixed direction selects all those rods with roughly the same direction, wheras the other ones, which are misaligned, yield only a faint signal corresponding to their tips. The same two operations are then repeated with various successive orientations of the wavelet. In this way, one can count the number of objects that lie in any particular direction. This method yields an elegant solution to a standard problem in fluid dynamics, namely, to measure the velocity field of a 2-D turbulent flow around an obstacle [12]. The directional selectivity of a wavelet may also be used for evaluating the symmetry of a given object. Let S(a, 8, b) be the wavelet transform of such an object with respect to the Cauchy wavelet. Define the following positive valued function, called the angular measure of the signal: , s ( a , e) = f d~bIS(~,e, b)l ~. This is different from using the scale-angle representation, where the position parameter b is fixed [6]. Here, on the contrary, #s averages over all points in the plane, thus eliminating the dependence on the point of observation. For any signal of finite energy, it is clear that #s is a continuous bounded function of a and 8. Let us fix the scale, a = a0 and consider #s(ao, 8) as a function of the rotation angle only. In general, it is a 27r-periodic function of 8. But when the analyzed object has rotational symmetry n, that is, it is invariant under a rotation of angle --~-, 2~ the angular measure is in fact ~-periodic. To give a simple example, we consider three geometrical figures [13], a square, a rectangle and a regular hexagon. The square has symmetry n = 4, its angular measure #s(ao, 8) is thus ~-periodic and shows four identical peaks at ~ = 0 ~ 90 ~ 180 ~ 270 ~ The width of these peaks is simply the aperture of the cone defined in (2.4). The rectangle has symmetry n = 2, and indeed its angular measure has two large peaks corresponding to the long edges and two smaller peaks corresponding to the short ones. Finally the hexagon has symmetry n - 6, and its angular measure shows six equal peaks. The same technique allows also to identify the symmetry of a lattice or a quasi-lattice.

5. A p p l i c a t i o n 2: Disentangling of a wave train A second example concerns the disentangling of a wave train, by a linear superposition of damped plane waves. The problem originates from underwater acoustics: when a point source emits a sound wave above the surface of water, the wave hitting the surface splits into several components of very different characteristics, and the goal is to measure the parameters of all components. In the 2-D case [6], the underwater wave train is represented by the following signal: N

f(~) = ~ n--1

cne ig~'~ e -C~'~,

(5.1)

59

where, for each component, k~ is the wave vector, l~ is the damping vector, and cn a complex amplitude. The method proceeds in three steps. First one computes the C W T of the signal (5.1) with a Morlet wavelet. Of course, by linearity, the result is the linear superposition of the contributions of the various components. Now we go to the scale-angle representation and write the WT, for fixed b, as" N

F(., 0, b~) = Z %. F-(", 0),

(5.2)

n=l

We notice that each term Fn(a, 8) in this superposition admits a unique local maximum. Suppose that these local maxima are well separated. Then, barring some interference effects (which may often be alleviated by increasing the selectivity of the wavelet), one may write: N

IF(a, O, b-')l ~ a E I%.11/7-(a, 0)1,

(5.3)

n=l

One then reverts to the position representation, choosing for (a, 0) successively each of the maxima. Then the filtering effect of the CWT essentially eliminates all components except the nth one, which is then easy to treat. In this way, one is able to measure easily all the 6N parameters of the signal [6].

6. Application 3: Character recognition Exactly as in the 1-D case, the WT is especially useful to detect discontinuities in images, for instance the contour or the edges of an object [5]. Here an isotropic wavelet may be chosen, e.g. the radial mexican hat CU. In that case the effect of the W T consists in smoothing the signal with a Gaussian and taking the Laplacian of the result. Thus large values of the amplitude will appear at the location of the discontinuities, in particular the contour of objects (which is a discontinuity in luminosity). In order to test this property, we compute the WT of a set with the shape of a thick letter 'A', represented by its characteristic function [5]. For large values of the scale parameter a, the W T sees only the object as a whole, thus allowing the determination of its position in the plane. When a decreases, increasingly finer details appear. The W T vanishes both inside and outside the contour, since the signal is constant there, and thus only the contour remains, and it is perfectly seen at a = 0.075 (Figure 2).

Figure 2: Detecting the contour of the letter A with the radial mexican hat: (left) The C W T at a = 0.075, in level curves; (right) The same, in 3D perspective. Of course, when a gets too small, numerical artefacts (aliasing) appear and spoil the result. The corners of the figure are highlighted in the WT by sharp peaks. The amplitude is larger at these points, since

60 the signal is singular there in two directions, as opposed to the edges. In addition the WT detects the convexity of each corner. The six convex corners give rise to positive peaks, the concave ones yield negative peaks, because we use a real wavelet and plot the WT itself, not its modulus.. This exercise leads to an algorithm for automatic character recognition [10]. The letter 'A', for instance, is entirely characterized by the succession of its 12 corners and a logical flag (concavity or convexity) for each of them. The algorithm consists in locating the local maxima of the CWT and eliminating everything else by thresholding, and it is able to detect an 'A' unambiguously Actually, since only the corners are needed, we may as well use a wavelet that sees only the corners, not the edges. Typically, a directional wavelet (when misaligned), or a real wavelet such as the gradient wavelets 0x exp(-[~[ 2) or v%0y exp(-[~[2). This simple technique may be further improved by adding some denoising and inclusion of a second wavelet capable of dealing with letters of arbitrary shape (for instance, a ring-shaped wavelet sensitive to circular shapes). In addition, the automatic recognition device will need some training. An elegant solution would then be to use the simple wavelet treatment as a preprocessing for some sort of 'intelligent' device, such as a neural network. References [1] I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, PA, 1992 [2] Y. Meyer, Wavelets: Algorithms and Applications, SIAM, Philadelphia, PA, 1993 [3] Y.Meyer (ed.), Wavelets and Applications, Springer-Verlag, Berlin, and Masson, Paris, 1991 [4] Y. Meyer and S. Roques (eds.), Progress in Wavelet Analysis and Applications , Ed. Fronti~res, Gif-sur-Yvette, 1993 [5] J-P. Antoine, P. Carrette, R. Murenzi and B. Piette, Image analysis with 2D continous wavelet transform, Signal Proc. 31 (1993) 241-272 [6] J-P. Antoine and R. Murenzi, Two-dimensional directional wavelets and the scale-angle representation, Signal Proc. 53 (1996) (to appear) [7] S.G. Mallat, A theory for multiresolution signal decomposition: the wavelet representation, IEEE Trans. Pattern Anal. Machine Intell. 11 (1989) 674-693 [8] I. Daubechies, The wavelet transform, time-frequency localization and signal analysis, IEEE Trans. Inform. Theory 36 (1990) 961-1005 [9] J-P. Antoine and R. Murenzi, The continuous wavelet transform, from 1 to 3 dimensions, in Subband and Wavelet Transforms: Design and Applications, pp. 149-187; A.N. Akansu and M. Smith (eds.), Kluwer, Dordrecht, 1995 [10] J-P. Antoine, P. Vandergheynst, K. Bouyoucef and R. Murenzi, Alternative representations of an image via the 2D wavelet transform. Application to character recognition, Proc. Conf. "Visual Information Processing IV", SPIE's 1995 Symposium on Optical Engineering/Aerospace Sensing and Dual Use Photonics , 2488 (1995) 486-497 [11] D. Marr, Vision, Freeman, San Francisco 1982 [12] W. Wisnoe, P. Gajan, A. Strzelecki, C. Lempereur and J-M. Math~, The use of the two-dimensional wavelet transform in flow visualization processing, in [4], pp. 455-458 [13] J-P. Antoine and P. Vandergheynst, 2-D Cauchy wavelets and symmetries in images, Proc. 1996 IEEE Intern. Conf. on Image Processing (ICIP-96), Lausanne (to appear).

Proceedings IWISP '96," 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

61

The Importance of the Phase of the Symmetric Daubechies Wavelets Representation of Signals J.-M. Linal,2*and P. Drouilly1 l:Centre de Recherches Mathematiques, Univ. de Montreal C.P. 6128 Succ. Centre-Ville, Montreal(Quebec), HUC 3J7, Canada and 2:Atlantic Nuclear Services Ltd., Fredericton, New Brunswick, EUB 5C8, Canada

Abstract

The multiscale representations of signals based on Symmetric Daubechies Wavelets (SDW) are complex-valued. This work investigates the role of the phase in this representation and describes an iterative algorithm that restores the signals from the phase information only. We then discuss some applications in signal processing based on the information encoded in the phase of the complex wavelet coefficients. 1. I n t r o d u c t i o n

The title of this article deliberately refers to some early works related with complex representations of signals. In the early eighties, Oppenheim, Lim and Hayes investigated the "importance of the phase in signal processing" using the usual Fourier representation of signals [1]. More recently, the "Marseille group" also promoted the importance of the phases in the continuous wavelet representation of analytic signals [2]. Their so-called "ridge and skeleton" representation based on both modulus and phase of the complex wavelet modes is, by now, an important tool for the analysis of non stationnary signals. The recently studied Symmetric Daubechies Wavelets [u] (SDW) also exhibit complex-valued wavelets; the present work investigates the role of the phase in such orthogonal basis. The paper is organized as follows. Section 2 briefly describes the SDW's wavelet basis. More details and results can be found in Ref.[3,4]. In Section 3, the "phase reconstruction" algorithm is described and commented. The "importance of the phases" and the applications it suggests are finally discussed in the conclusion. 2. T h e S D W basis The SDW multiresolution basis is endowed with the usual Daubechies properties: the scaling function qo(x) and the wavelet r are compactly supported inside the interval [ - J , J + 1] for some integer J, the set {~)j,k ---= 2 8 9 1 6 2 e ~ } is an orthonormal basis of L2(]R) and r have J vanishing moments. The symmetry condition forces J to be even and the so-called SDW solutions are complex-valued but not in quadrature. In general, a field with finite energy will be "empirically known" at some scale and expanded in some approximation space Vjm~ spanned by the suitable scaled scaling functions r . . . . k(X) = 2 2 z ~ O ( 2 J " ~ k):

f(x) = ~

c~m~ ~pj. . . . k(x)

(1)

k The discrete multiresolution analysis of f then consists in the computation of the coefficients of the expansion jma~ --1

E

E E

k

j=j0

k

(2)

where j0 is a given low resolution scale and

= (~j,klf), *e-mail" [email protected]

d3k = (r

(3)

62 are the orthogonal projection components of f(x) over the multiresolution basis. The change of basis is done through the well-known fast wavelet decomposition algorithm (FWT) composed with the low-pass complex filter ak and the corresponding high-pass complex filter bk = (--1)kill-k:

C~-1 : V~ E Ctk-2n4'

din-1 = v/2 E bk-2n4

and

k

(4)

k

Conversely, the reconstruction is expressed by the inverse FWT:

c~ - ~ E

an-2k4

"--1

j--1

+ vf2 E bn-2kdk

k

(5)

k

The real and imaginary parts of the complex scaling function and wavelet are endowed with many interesting properties studied in Ref.[4]. Figures 1 and 2 show examples of those functions for J = 2 and J = 4.

Figure 1: SDW for J = 2. Left: ~, right: r

Figure 2: SDW for J = 4 . Left: ~, right: r The corresponding filter coefficients v ~

IJ k 1 2 3 4 5

ak

are given in the following table:

J=2 0.662912 + 0.171163i 0.110485 0.085581i 0.066291 0.085581i 0.000000 0.000000

[I 0.643003 + 0.182852i 0.151379 0.094223i -0.080639 0.117947i 0.017128 + 0.008728i 0.010492 + 0.020590i

It is worth to mentionning that the second centered moment of the scaling function real part is always vanishing. 2k-1 the As a consequence, assuming that f is a real function sampled at the scale 2 -jmo~, i.e. Xk -- 2Jmo~+x, coefficients of the expansion (1) are given by

2~~

~~ = f(xk) + O(

1 1 24J,,,~. ) + iO( 22Jm~. )

(6)

63 The scaling coefficients are thus well approximated by the value of the function at the regular sampling points. 3. P h a s e a n d P O C S We first define two projectors, PR and Pr. The projector PR extracts the first order approximation of the scaling coefficient of the expansion (1) at the finest resolution (~ denotes the real part): pR(~mo~) = ~(~mo.)

(7)

Let us now consider the wavelet expansion (2) of a given field f0 and define the phase of the wavelet coefficients is also an orthornormal basis of L2(]R): this "local rotation" of the wavelet basis leads to a multiwavelet basis adapted to the signal. Indeed, we define the isophase space F by the set of all expansions

OJk -- Arg(dJk). We observe that the new set of functions q2j,k(x) -- eiO~r

j . ~ -1 k

j =jo

k

where the coefficients r~ are now positive real numbers. Pr is the orthogonal projector on this space that depends on the the phase of the wavelet coefficients of the original field we start with. Given an arbitrary wavelet expansion of the form (2) with dJk = W3k + i vJk, the projection on the isophase space is defined by the closest point on F, i.e. 9

j . ~ -1 Pr(f)--~(~j,klf0)~j0,k+ k

~

j--jo

.

.

I"

~ k

rJk ~j,k, w i t h r J k - - / 0 ' J J J J COS~kWk + sinOkVk

" " ifcos0~w~+sin0~v~ < 0 otherwise

(9) We further observe that both PR and Pr project onto convex spaces (POCS). Considering an arbitrary point ]o in F, a well-known theorem [1] states that the sequence of alternate projections

fn = ( P R p r ) n p R ( s

(10)

converges. In the present case, the limit point is the original real signal f0 from which we defined F. The two-dimension generalization of this algorithm is straightforward using the usual cross-product of the 1-d multiresolution basis. For the sake of illustration of the "phase reconstruction algorithm", Fig.3 displays the original picture (f0), the initial point PR(]o) (obtained by killing all the modulus of the wavelet coefficients of the fourlevel decomposition i.e. jo = jma~ - 4 with SDW J = 2) and the POCS reconstructions f,~=loo and fn=looo.

Figure 3: From left to right: f0, PR(]o), fl00 andflooo We first notice that the POCS gradually restores the details of the image from coarse to fine. As we can notice in Eq.[9], the projector Pr "shrinks" , even to 0, the modulus of the wavelet coefficients. It is worth recalling that shrinkage techniques are nowadays an efficient tool for denoising[ 5]. Phases thus encode the "coherent" structures of the signal and the POCS algorithm reconstructs the original image through the coherency of the encoded information. In order to quantify this process, Fig.4(a) shows the evolution of the theoretical dimension (exponential of the entropy) and Fig.4(b) displays the ratio of phases effectively used at each iteration of POCS. The restoration of the modulus of the wavelet coefficients is illustrated in Fig.4(c) for the level j = jo - jma~ - 4 and Fig.4(d) for j = jma~ - 1, both for fn=1000. We can observe the resulting shrinkage of the wavelet coefficients that depends on the scale of the details.

64

Figure 4: (a): theoretical dimension vs. iteration; (b): phases vs. iteration (%); (c): coarse scale wavelet modulus of fl000 vs. original wavelet modulus; (d): finer scale wavelet modulus of fl00O vs. original wavelet modulus. Let us further mention two points: (i) the significant speed-up of the POCS algorithm by adaptively fixing a relaxation parameter in the isophase projector and, last but not least, (ii) the alternative possibility of choosing the Symmetric Daubechies Wavelet basis since, as shown in Ref.[4], there exist 2~ -1 such solutions for a given number of vanishing moments J. 4. Conclusion The present work emphasizes the role of the phase of the wavelet modes in the rather new context of the Symmetric Daubechies Wavelet analyses. An immediate application is an iterative process for denoising signals for which little information about the noise content is available. An other application presently under development is the edges enhancement by modifying the phases of wavelet coefficients. This technique, that relies on the phases of the complex scaling coefficients at each scale of the decomposition [4] can be simultaneously coupled with the shrinkage of the wavelet coefficients leading to denoising. Most of the material presented here is detailed in Ref.[6]. This work was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada. REFERENCES

1. A.V. Oppenheim and J.S. Li, "The importance of phase in signals", Proc. IEEE, vol.69, p.529-541, 1981; M. Hayes, "The reconstruction of a multidimensional sequence from the phase or magnitude of its Fourier transform", IEEE Trans. ASSP, vol.30, p.140-154, 1982; D.C. Youla and H. Webb, "Image restoration by the method of convex projections", IEEE Trans. on Medical Imaging, vol. 1 (2), p.81-94, 1982. 2. A. Grossman, R. Kronland-Martinet and J. Morlet, "Reading and Understanding Continuous Wavelet Transform", in Wavelet, Time-frequency Methods and Phase-Space, J.M. Combes et al. Eds, Springer-Verlag (1989). 3. J.M. Lina and M. Mayrand, "Complex Daubechies Wavelets", App. Comp. Harmonic Anal., vol.2, p.219-229, 1995; W. Lawton, "Applications of Complex Valued Wavelet Transforms to Subband Decomposition", IEEE Trans. on Signal Proc., vol.41, p.3566-3568, 1993. 4. J.M. Lina, "Image Processing with Complex Daubechies Wavelet", CRM-preprint 2335, to appear in the Special Wavelet issue of the Jour. of Math. Imaging and Vision, 1996. 5. R. De Vore and B. J. Lucier, "Fast wavelet techniques for near-optimal image processing", Proc. 1992 IEEE Military Commun. Conf., IEEE Communications Soc., NY 1992; D. Donoho and I. Johnstone, "Adapting to unknown smoothness via wavelet shrinkage", to be published in J. Amer. Statist. Assoc., 1995 (and reference therein). 6. P. Drouilly, M.Sc Thesis, Physics Dept and CRM, Univ. of Montreal, (1996).

Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

using

65

Contrast enhancement in images the two-dimensional continuous wavelet

transform

Jean-Pierre Antoine and Pierre Vandergheynst* Institut de Physique Th~orique, Universit~ Catholique de Louvain B - 1348 Louvain-la-Neuve, Belgium E-mail: [email protected] Abstract The 2-D Continuous Wavelet Transform (CWT) is now a standard tool in image analysis. In this paper we recall the definition of local contrast, introduced by M. Duval-Destin [3] and show how it can be used to obtain a discrete reconstruction formula. 1

The

2-D Continuous

Wavelet

Transform

Let us first recall the basic properties of the 2-D CWT. We represent an image by a function s E L 2 ( ~ 2 , d2~). In practice it is often a positive valued function whose discrete values correspond to the level gray of each pixel. Then a wavelet is just a function r E L2(R 2,d2~) that satisfies the following admissibility condition: d2fc ~r = ( 2 ~ ) ~ / I~ < ~" (1.1)

Ir

For a sufficiently regular function, this simply means that the wavelet has zero mean: = 0 ~

/d2~r

= 0.

(1.2)

The continuous wavelet transform of the image s with respect to the wavelet r is defined as the scalar product of s with the transformed wavelet Ca,e,~:

S(a,9, b-') = (r

-

a-2 f

r

= f ei6"gr

~ - b)) s(Z) d2Z.

(1.3)

~(fc) d2fc.

where we assumed r to be normalized (cr = 1). This transform has nice natural properties (see [1] for a quick review), in particular it is invertible:

s(~) =///dadgd2ba 2

Local

Ca0,,~(~) S(a, 9, b).

(1.4)

Contrast

Psycho-physicists usually think that our visual system is contrast sensitive, that is it reacts to relative variations of the image intensity. On the other hand, it is well known that the 2-D CWT is particularly good in detecting absolute variations of the intensity, for example discontinuities [2]. Following this idea, it might be useful to introduce an adaptive normalization of the CWT, which takes care of the surrounding luminance of each pixel. More precisely, let h E L I ( R 2) N L2(R 2 ) be a real, positive-valued *Boursier F.R.I.A, Belgium.

66 function. We also take h and the wavelet r isotropic in order to get rid of any directional sensitivity. We define the luminance level of the image s around position b as 2t:/8(a, b) = Ilhll~-1/R= d2~s(~)ha,~(~), where

h,a,~(:~) - a-2h

(2.1)

( a - l ( ~ - b)). Since the image and the normalization function are positive,/Qs is

also positive and

/tT/s plays the role of a local mean of s, and the overall luminance of the image does not depend on a, the scale parameter fixing the size of the region taken into account. Introduce now the following functional:

c~(~, ~) =

s;~' 0, b')

(2.2)

Ms(a,b) "

Cs(a, b) is well defined for all a 6 ] ~ and b 6 R 2 , if and only if the essential support of the wavelet is included in the corresponding support of the normalization function h. Then, by positivity of h and s,

M~(~,g)=0 ~ C~(~,g)=0. Cs(a, b) is called the local contrast of the image s around b"at scale a [3]. One easily verifies from (2.2) that Cs takes its largest values in regions of low luminance. A nice simplification arises when the wavelet r is taken as a difference of two positive functions, = a-2h(a-ls

r

- h(Z) (0 < a < 1).

The function r satisfies (1.1) if the first moment of h vanishes at the origin. Taking the same function to compute the luminance, we have

c~(~,b) = (r_

(h a,~ls)

= (ho.,~l~) - 1

(2.3)

(ha,dis)

The support condition imposed on the wavelet turns now into a constraint on h alone:

sup

supp

which means that the support of h is a star-shaped domain around the origin. Notice that it suffices that h decays radially for Cs to be bounded. In the real world, the approximation of an image at a fixed resolution is the only accessible data. This can be viewed as an estimation of the luminance of the image at a given scale. In this case, using difference wavelets, local contrast yields a simple reconstruction scheme. Let ao be the finest resolution (scale), that is,/tT/s(ao, b) is the original dataset. Then let/17/,(a, b) be a low resolution approximation of the image with a0 = a a n , a < 1. We have

l~s(aa, b) : /tT/s(aa2,g)

l~s(a, b-*). (Cs(a, [) +

= lVls(a,b-*). (Cs(a,b)+

1) 1).

(Cs(a,,g)+

1).

By recurrence, we find a multiplicative reconstruction formula: n--1

=

II

+,)

/=0

That is, one reconstructs the original image at full resolution by starting with a low resolution approximation and adding successive details. This clearly mimics the usual multiresolution analysis scheme. Figure 1 shows an example of a local contrast chart of an image, and its reconstruction using (2.4)

57

Figure 1: An image (a) and its C W T using a mexican hat wavelet with a = 0.125 (b). Local contrast of the same image (c) and reconstruction over five dyadic scales using a gaussian normalization function (d).

3

Infinitesimal

multiresolution

analysis

In this section we will briefly show how to generalize the previous formalism. The interested reader should refer to [4, 5] for more details. It is well known that the continuous reconstruction formula (1.4) allows one to use two different wavelets: one for decomposition and the other for reconstruction, provided they satisfy a cross admissibility condition

/Rd

2~

I~1~

= 1.

(3.1)

This yields the so-called bilinear scheme 1 of the C W T [4]. If the analyzing wavelet r is regular enough, one can choose as reconstructing wavelet a Dirac delta measure, getting a simpler reconstruction formula which is just the sum of the wavelet coefficients over all scales:

da (r ~(~) = R .+ ~-

(3.2)

This is the so called linear scheme, and we will use it in the sequel (although everything extends to the bilinear formalism). Let p be a positive valued smoothing function, that is f~(0) - 1 and ~ is continuous and derivable. We define the infinitesimal wavelet associated to p as:

~(~) = -~~(~1. Then we introduce the approximation of s at scale (resolution) a

and the details of s at scale a

(das) = (Ca[S). The original image can be expressed as s = lim a~ a--+0

1Bilinear with respect to the wavelet.

=

(Tao

-~-

~oa~daa --

(das).

58 j=J Now let us choose an arbitrary decreasing sequence of scales {aj)j=0, aj+l < aj < a j - i , where a j (resp. a0) stands for the finest (resp. coarsest) resolution. We call wavelet packets, the following integrated filters:

~J (~) = fa aj -da - r

(3.3)

j-I-1 a

These wavelet packets allows one to compute the approximation of the image at resolution aj+l from the coarser approximation at resolution aj, yielding a discrete reconstruction formula similar to that of ordinary multiresolution analysis: J

s(~,) = aao(Z) + Z ( ~ J l s ) .

(3.4)

j=0

Now, if we introduce the following contrast coefficients

(r ,

Cj = ~

O'aj

a multiplicative reconstruction formula similar to (2.4) naturally comes.in: J-1

s(Z) = aao(Z) H (1 + Cj(Z)).

(3.5)

j=O

Thus again, contrast coefficients form a sufficient information for the characterization of the analyzed signal. Remarks: 9 If (aj) is a geometric sequence, aj = )~aoj, then wavelet packets are simply wavelets in the usual sense since they are generated from a unique function r by dilations by powers of )~. The choice aj = 2-J yields the familiar dyadic wavelet analysis [6]. 9 Wavelet packets do not form an orthogonal basis, but they characterize the signal without loss of information. 9 Anisotropic wavelet packets can be used, this simply results in a sum over all possible orientations in (3.4). 9 Fast pyramidal algorithms are available [5], just like in the usual discrete wavelet transform scheme.

References [1] J.-P. Antoine, Two-dimensional directional wavelets and image processing, this volume. [2] J.-P. Antoine, P. Carrette, R. Murenzi and B. Piette, Image analysis with two-dimensional wavelet transform, Signal Processing 31 (1993) 241-272. [3] J.-P. Antoine, R. Murenzi, B. Piette and M. Duval-Destin, Image analysis with 2-D continuous wavelet transform: detection of position, orientation and visual contrast of simple objects, Wavelets and Applications (Proc Marseille 1989), pp. 144-159, Y. Meyer (ed.), Masson, Springer-Verlag, 1992. [4] M. Duval-Destin, M.A. Muschietti and B. Torresani, Continuous wavelet decompositions, multiresolution and contrast analysis, SIAM J. Math. Anal. 24 (1993) 739-755. [5] M.A. Muschietti and B. Torresani, Pyramidal algorithms for Littlewood-Paley decompositions, SIAM J. Math. Anal. 26 (1995) 925-943. [6] S. Mallat and W.L. Hwang, Singularity detection and processing with wavelets, IEEE Trans. Inform. Theory, 38 (1992) 617-643.

Proceedings IWISP '96," 4-7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

69

WAVELETS A N D DIFFERENTIAL-DILATION EQUATIONS T. Cooklevt, G. Berbecel1:, and A. N. Venetsanopoulost 1:Genesis Microchip Inc. 200 Town Centre Boulevard

)~Dept. E l e c t r . & C o m p . E n g . U n i v e r s i t y of T o r o n t o 10 K i n g ' s C o l l e g e R d . Toronto, ON M5S 3G4, Canada

S u i t e 400 Markham,

ON L3R 8G5, Canada

ABSTRACT

In the paper a wavelet is constructed starting from differential-dilation equation. It has compact support and excellent time domain and frequency domain localization properties. The wavelet is infinitely differentiable and therefore cannot be obtained using digital filter banks. New sampling and differentiation techniques are also introduced.

1

INTRODUCTION

Wavelets are an important tool in signal processing. There are three types of wavelet transforms : continoustime wavelet transform (CWT), discrete-time wavelet transform, and discrete wavelet transform (DWT). The C W T is defined as [1]

X(a,b) = < x(t), Ca,b(t) > =

1 /R x(t)r

(~)dt

(1)

The continuous-time wavelet transform depends on two parameters: dilation a and shift b. The C W T is, in principle, invertible, provided the wavelet is admissible (i. e. , it has sufficient decay). The wavelet transform involves basis functions which do not have a constant length: very short basis functions are used to achieve good time resolution, while longer basis functions can be used to obtain fine frequency analysis. When a and b are continuous the set of basis functions does not constitute an orthonormal basis, i. e. , the representation is redundant. The discrete-time wavelet transform can be obtained by discretizing a and b. The basis functions m t-bo n ) , where a = a~n and b0 = n b0 a~n. The case where a0 = 2 and b0 = 1 become Cm,n(t) = aom/2r is the most common and the corresponding grid is called dyadic. The discrete wavelet transform (DWT) corresponds to a filter bank iterated along the lowpass channel. In this paper we shall be concerned only with the CWT. A very important question in wavelet analysis is choosing the basis function, and this is the focus of our concern in this paper. We are looking for a continous-time wavelet r that is infinitely differentiable, has compact support and provides good frequency localization. A wavelet with these three properties has not been used in signal processing. The Mexican hat and Morlet wavelets are infinitely differentiable, but do not have compact support. Wavelets generated from filter banks can have compact support, but cannot be infinitely differentiable.

2

THE BASIC CONSTRUCTION

Iteration of a digital filter followed by downsampling leads to a limit function provided the filter satisfies certain constraints [1]. Downsampling and upsampling are discrete-time multirate operations. We are going to find it useful to define a continuous-time decimator (Fig. 1 (a)). While the term "continuous-time decimator" may not be ideally appropriate the idea is clear - the support of the function f(t) shrinks by a factor of two. Note that the block in Fig. 1 (a) is purely a mathematical tool that is only conceptually similar to the discrete-time decimator. Suppose now that the blocks of continous-time filtering and decimation are cascaded and iterated (Fig. l(b)). The impulse and frequency responses of the resulting system after two stages will

f(t)

@

f(2t)

Figure 1: Fig. 1 (a) Continuous-time decimator and (b) infinite iteration of a continuous-time system followed by decimator

70

be r = 4 h ( 4 t ) , 2h(2t) and O2(w) = H (r H (~0/2). The functions r satisfy a dilation equation Oi(w) = H (w/2) ~i-1 (w/2). If we continue the iteration to infinity and assume convergence the impulse and frequency responses of the system will be r

-

lim r

= 2i + i - 1 + + 2

i-+oo

lim [ h ( 2 i t ) , h(2i-lt),

i-+oo

... , h(2t)]

(2)

'

{:x3

~(w)

-

i~oolimOi(w)-HH(W)~

9

(3)

i=1

is equal to the support of h(t). The iterations make sense only if they converge.

Note that the support of r The simplest case is when h(t)

_ { 1/2 -l_
1,

otherwise.

(4)

The function h(t) is assumed to be symmetric with respect to the origin to avoid problems with the phase. Its Fourier transform is H(~o) = sinw/~o. The normalizing constant in (2) is to make the area under the graphs of h(2t), h ( 4 t ) , . . , equal (to one). For this function h(t) the dilation equation is r

which can be written as O(w) -

- sin(w/2) 0a/2 ~ ( 2 ) ,

eJw/2 _ e-jw/2 9 2j~/2

or jw@(w)

de(t) The last equation in the time domain is ~ - 2 [r

+ 1) - r

-

~(0)-

(e j~12

-

1.

e-j 12)

- 1)].

(5)

(6)

(7)

The above equation is a differential-dilation equation, a new type of equation for a scaling function. The support of the function r is [-1, 1]. There is no analytic expression for r but there is an elegant formula for its Fourier transform: oo

r

1 j_~o eJ~tIIsine (~/2'1 d~

=

2~

oo

-

~

~

(S)

i--1 oo oo

cos~d~o.

(91

i=1 k=l

P r o p o s i t i o n 1 The function O(w) - IIi~=1 sinc (w/2 i) is a well-d4ned function and is continuous. The proof is based on the result that the Fourier transform of a compactly-supported function is continuous. It can be proven that f _ ~ r - 1 because the conditions of the theorem of Fubini are satisfied and the integral of the convolution is equal to the product of the integrals. In general, the function r has properties between polynomials (which are solutions of differential equations) and scaling functions (which are solutions of dilation equations). It is more smooth than scaling functions, but less smooth than polynomials. The odd-indexed moments of the function r are zero, since in the expansion

~(~o) -

~

cJ.

(10)

k=0 (I)(2k'~" 1 ) ( 0 ) ( - - 1 )

F

k

r

1

(2k + 1)[ - (2k + 1)[

The function r ation properties.

has a smaller second-order moment than B-splines, which means that it has better localiz-

~o oo

t2r

d20(~0) ~o-~o dw ~

oo

= ... = - - .

For the B-spline function of order N r

r

d t - O.

we have c2k+1 =

1

(12)

9

(w) --

(11)

sinc~

N > -~, 1 and it can be shown that J t2flN(t)dt -- -~

,

N > 1.

(13) (14)

71

Figure 2: Fig. 2 (a) The wavelet r 2.1

The corresponding

The function r

I

-

fo ~ I'I'(')l~

and (b) the CWT of a signal that is two sinusoids and two pulses.

wavelet

= r (t) is a mother wavelet, since it satisfies the admissability condition: <~.

(15)

w

The wavelet r has compact support, excellent localization characteristics and is infinitely differentiable. This wavelet has been used in signal analysis with very good results. A simple example is given in Fig. 2. The CWT gives a picture of how the frequency content of a signal varies with time. We hope that the function r will be a useful "benchmark" wavelet in addition to the Morlet and Mexican hat wavelets. 2.2

Approximation

of polynomials

using the function r

An important property is that by dilations and translations of the function r any polynomial can be represented. This is convenient, because polynomials are not square-integrable functions. If we start from E r

-- k) - 1

(16)

k

then ~ k / r 1 7 6 1 7 6 1 6 2 1 7 6 k

But E r 1 6 2 1 6 2 k

(17) k

k

r

+ 1)- r

- ~ - 1)]

(18)

k i

and k ~

r

k)dt- E

kr 2 2

k

from which we obtain t - E

Ir=(t-k)/2--E -2 r

~r

k

The next step ist2 -~ - v k / ~

k

- c0 E r

- k).

'

(19)

(20)

k

r (~)

(21)

dt - cot - el.

(22) k

k

72

(23) it follows that E

~

dt- ~

r

k

-~-r

(24)

k

and therefore t 2 = E

--4-r

- e0

~r

- cl E

k

r

(25)

- k)

k

It is plain to see that we can continue in a similar way for higher powers of t. The formulae, as well as other results, cannot be given here due to the space limitation.

2.3

Reconstruction function r

of a continuous-time

function

from

its samples

Consider now the space V = span (r - k); k E Z). and suppose f(t) e V: f(t) = ~ k Ckr looking for a function u(t) E V which satisfies

f(t) = E

f(k)u(t - k) .

using

the

-- k). We are (26)

k

We want the samples of the function f(t) to be the coefficients in the expansion. From

f(k) - ~

c,r

-

k) -

(27)

c~

l

it follows that the function u(t) must be exactly r In the space Y the interpolation function is r itself. The space V can be written as V = Vod~ @ V~,en, where Vodd = s p a n { r (2k + 1)], k E Z} and Ve,~,~ - s p a n { r k E Z}. The functions { r k E Z} and { r k E Z} are orthonormal bases for the spaces Yodd and Vev~n respectively. 2.4

A differentiation

technique

We can derive an efficient algorithm for differentiation.

f(t)

--

E ckr

(2s)

- k)

k

f' (t)

-

~

2ck [r

- 2k + 1) - r

- 2k - 1)] - ~

k

2(ck-1 - ck)r

- k) - 11.

(29)

k

If dl(k) is defined as dl(k) - 215(k- 1 ) - 5(k)]

(3O)

[dl(k) * ck] r

(31)

then f' (t) - E

k) - 1]

k

For the second derivative f" (t) - 8 E ( e k _ l - 2ck + ek+l)r (4(t - k))

(32)

k

and if we define d2(k) in a similar way d~(k) = 815(k - 1) - 23(k) + 5(k + 1)] then f" (t) - E

[d2(k) * ek] r

k) - 1]

(33) (34)

k

3

CONCLUSIONS

In the paper a new continous-time C ~ wavelet with useful properties was constructed starting from a differential-dilation equation. Useful sampling and differentiation schemes were also discussed. Applications of the constructed wavelet r in image analysis will be discussed in another publication. Many opportunities exist for generalization of the results obtained here. In a smilar way, as shown in Fig. 1, other functions can be obtained that have useful properties. The two-dimensional non-separable case looks also interesting (and difficult). It is to be noted that that systems described by differential, difference and, recently, dilation equations, have been studied, while systems described by differential-dilation equations have not previously been studied and used.

References [1] I. Daubechies, Ten lectures on wavelets, CBMS-NSF Regional Conf. in Appl. Math., vol. 61, SIAM, Philadelphia, PA, 1992.

Proceedings IWISP '96," 4-7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

WAVELETS IN HIGH RESOLUTION RADAR IMAGING CLINICAL MAGNETIC RESONANCE IMAGING

73

AND

Walter Schempp

Abstract. Coherent wavelets form a unified basis of the multichannel perfect reconstruction analysis-synthesis filter bank of high resolution radar imaging and clinical magnetic resonance imaging (MRI). The filter bank construction is performed by the Kepplerian spatiotemporal Hilbert bundle strategy which allows for the stroboscopic and synchronous cross sectional quadrature filtering of phase histories in local frequency encoding channels with respect to the rotating coordinate frame of quadrature reference. The Kepplerian strategy of dynamic physical astronomy and the associated filter bank construction take place in symplectic affine planes and are implemented by the polarized symbol calculus of the Heisenberg nilpotent Lie group G which extends Fourier analysis. Thus the pathway of this paper leads from Keppler to Heisenberg, and to the fascinating aspects of electronic engineering concerned with the implementation of the symmetries inherent to the semi-classical approach to clinical magnetic resonance tomography by large scale integrated (LSI) circuit technology. Where the telescope ends, the microscope begins, which of the two has the grander view? - VICTOR HUGO (1862)

A radar system employs a directional antenna that radiates energy within a narrow beam in a known direction. The radar antenna senses the return scattered by the target, and the receiver amplifies the echo and translates its energy band to the intermediate frequency of the radar. The intermediate frequency signal wavelet is operated on linearly by the predetection filter. Finally, the output of the predection filtering is coherently detected by a linear phase-sensitive processing configuration. One unique feature of the synthetic aperture radar (SAR) imaging modality is that its spatial resolution capability is independent of the platform altitude over the subplatform nadir track. This is a result of the fact that the SAR image is formed by simultaneously storing the phase histories and the differential time delays in local frequency encoding subbands of wideband radar, none of which is a function of the range from the radar sensor to the scene. It is this unique capability which allows the acquisition of high resolution images from satellite altitude as long as the received echo response has sufficient strength above the noise level. The Kepplerian spatiotemporal Hilbert bundle strategy of dynamic physical astronomy centered on the sun is derived from the quadrature conchoid trajectory construction and the second fundamental law of planetary motion analysis ([1], [16]), as displayed in Johannes Keppler's famous Mars commentaries of 1609 entitled Astronomia nova seu physica coelistis. It allows for the stroboscopic and synchronous cross sectional quadrature filtering of phase histories in local frequency encoding channels with respect to the rotating coordinate frame of quadrature reference, and provides the implementation of a matched filter bank by orbit stratification in a symplectic affine plane. An application of this procedure leads to the landmark observation of the earliest SAR pioneer, Carl A. Wiley, that motion is the solution of the high resolution radar imagery and phased array antenna problem of holographic recording. Whereas the Kepplerian spatiotemporal strategy is realized in SAR imaging by the range Doppler principle ([6], [10], [11], [12]), it is the Lauterbur encoding principle ([15]) which takes place in clinical MRI ([3], [4]). At the background of both high resolution imaging techniques lies the construction of a multichannel coherent wavelet perfect reconstruction analysis-synthesis filter bank of matched filter type ([7]). Beyond these applications to local frequency encoding subbands, the Kepplerian spatiotemporal Hilbert bundle strategy leads to the concept of Feynman path integral or summation over phase histories.

74

As approved by quantum electrodynamics, nilpotent Fourier analysis allows for a semi-classical approach to the interference pattern of quantum holography ([15]). Indeed, the unitary dual (~ of the Heisenberg nilpotent Lie group G consisting of the equivalence classes of irreducible unitary linear representations of G ([18]) allows for a coadjoint orbit fibration by symplectic affine planes O, (u # 0). 9 The hierarchy of energetic strata O~ (u :/= 0) is spatially located as a stack of tomographic slices inside the vector space dual Lie(G)* of the real Heisenberg Lie algebra Lie(G). This fact is a consequence of the Kirillov homeomorphism ([13])

~ Lie(G)*/CoAda(G). In terms of standard coordinates, the Heisenberg nilpotent Lie group G consists of the set of unipotent matrices

(1 x yz)lx,y,z E 0 0

1 0

R}

1

(lXZ) (1 x z)(1 x+x z+z+xy)

under the non-commutative matrix multiplication law 0 0

1 0

y 1

.

0 0

1 0

y' 1

--

0 0

1 0

y+y' 1

,

and Lebesgue measure dx | dy | dz as a Haar measure. If the unipotent matrices {P, Q, I} denote the canonical basis of the three-dimensional real vector space Lie(G), where

(110)

expa P -

0 0

1 0 0 1

(100)

,expaQ-

0 0

1 0

1 1

,expaI-

holds under the matrix exponential diffeomorphism expa : Lie(G) ~ is given by

(1 x z)

CoAda

0 0

1 0

y 1

(101) 0 0

1 0 0 1

G, the coadjoint action of a on Lie(G)*

(1 0 -y)

-

0 0

1 0

x 1

.

Therefore the action CoAda reads in terms of the coordinates {a, fl, u} with respect to the dual basis {P*, Q*, I*} of the real vector space dual Lie(G)* as follows: CoAda

The linear varieties

(l Z) 0

1

y

0 0 1

(c~P* + ~Q* + u P ) - (c, - uy)P* + (~ + ux)Q* + uP.

0,, = CoAda(G)(uI*)= R P * + RQ* + u/*

(u :/= 0)

actually are symplectic affine planes in the sense that they are in the natural way compatibly endowed with both the structure of an affine plane and a symplectic structure. Therefore the symplectic affine planes O~ (u :/= 0) in Lie(G)* are the predestinate planar mathematical structures to implement the Kepplerian spatiotemporal strategy of Hilbert bundles sitting over the bi-infinite phase coordinate line R, and to carry quantum holograms acting as multichannel perfect reconstruction analysis-synthesis filter banks ([14]). 9 In radar imaging, u :p- 0 denotes the center frequency of the transmitted pulse train, whereas in clinical MRI the center frequency u is the frequency of the rotating coordinate frame of quadrature reference defined by tomographic slice selection. The stationary singular plane u=0 in Lie(G)* consists of the single point orbits or focal points

75

corresponding to the one-dimensional representations of G. As the reconstruction plane it plays a fundamental role in the coherent optical processing of radar data ([6]), morphological MRI, and functional MRI recording of synchronized neural activities in the brain ([15]). It follows from this classification of the coadjoint orbits of G in Lie(G)* the highly remarkable fact that there exists no finite dimensional irreducible unitary linear representation of G having dimension > 1. Hence the irreducible unitary linear representations of G which are not unitary characters are infinite dimensional and unitarily induced. Their coefficient cross sections for the Hilbert bundle sitting over the bi-infinite phase ccordinate line R define the holographic transforms. Let C denote the one-dimensional center of G transversal to the plane carrying the quantum holograms. Then C = R.expaI is spanned by the central transvection expaI , and aligned parallel to the bore of the magnet. In coordinate-free terms, G forms the non-split central group extension

c,~ a ---~ a / c where the plane G / C is transversal to the line C. The irreducible unitary linear representations of G associated to the coadjoint orbit O~ are unitarily induced in stages by the unitary characters of closed normal abelian subgroups which provide a fibration of G sitting over the base line R. If the elements w E O1 are represented by complex numbers of the form

0

(; :)-(0

~ 1)

(x 0)

including the differential phase

0

x

and the local frequency

as real coordinates with respect to the frame of quadrature reference rotating with center frequency u :/: 0, and the Weyl matrix J-(01

O1)

as imaginary unit, it becomes obvious that the multiplication law of the quadrature cell matrices reads X

yl

xl ) - -

( xxl yyl - ( y x ' "t- xy') "~ yx ~ + xy ~ xx ~ - yy~ ,] "

The conjugation identity

:1)(;

:1) -1

-y

x

yields the area law Iwl2 = det w. for w E C. Hence, for w :/= 0, the inverse of the associated quadrature cell matrix reads in terms of the complex conjugate x _ 1 x y y x2 + y2 _y x

( :)1

(

)

9 In spin echo protocols, refocussing of nuclear spin angular momenta is performed by conjugation with respect to the rotating coordinate frame of quadrature reference. The rotational curvature form of the coadjoint orbit O~ is exactly the standard symplectic form of R | R, dilated by the center frequency u r 0. The bundle-theoretic interpretation of the inducing mechanism gives rise to the pair of isomorphic irreducible unitary linear representations

(u~,v ~)

(~,r

of G unitarily induced in quadrature by the unitary characters of the associated closed normal abelian subgroups of G. Then the commutation relations

76

U ~( \

( 1 x Z))y o ((1 x Z))y = ((1 x Z))y ((1 x Z))y 0 1 V" 0 1 e4~ri~yV ~ 0 1 oUv 0 1 0 0 1 0 0 1 0 0 1 0 0 1

are satisfied for all triples (x, y, z) E R 3 and v 76 0. For non-zero center frequencies u 76 v', the irreducible unitary linear representations U ~ and U ~' of G are non-isomorphic, and the same holds for V" and V ~'. The metaplectic representation of the commutator group of G lifts the Weyl quadrature matrix d to the square root of the symmetric phase factor (x, y) "-+ e 47riuxy

occurring in the commutation relations supra. Hence the Fourier transform gives rise to an intertwining operator of the isomorphic irreducible unitary linear representations U ~ and V ~ of G acting on the standard complex Hilbert space L 2 (R) and admitting the same central unitary character

[x,-U'lC-V'lC

(~r

I

of the form )Cv : Z "--+ e

27riv z

with center frequency u. An infinitesimal version of the unitary representation U" is given by its differentiated form. The differentiated form of U" provides by the evaluation d U~ ( P ) - -~i

0 1 0) 0 0 0 E Lie(G) the temporal derivative on the bi-infinite real time scale, in accordance to the 0 0 0 fact that the Kepplerian dynamic physical astronomy centered on the sun is in terms of magnetic forces which control local frequencies in orbital planes, not acceleration. Thus Kepplerian forces are not analogs of the iewtonian forces which are controlled by gravitation ([16]). Moreover, at P -

provides by evaluation at Q finally

provides by evaluation at I -

(ooo) (001) 0 0 1 0 0 0

u ~ (Q) = 2~i~,t •

E Lie(G) the multiplication with the imaginary time scale, and

U ~'(I) = 27riu x

0 0 0 E Lie(G) the multiplication with the imaginary angular frequency, 0 0 0 with the usual maximal domains as skew-adjoint Hilbert space operators. 9 The rotating coordinate frame of quadrature reference of the coadjoint orbit O~ (u :/: 0) of G is defined by the polarized cross section G / C . 9 The rotating coordinate frame of quadrature reference implements in the coadjoint orbit Ov (v r 0) of G the second Kepplerian fundamental law of planetary motion analysis, the area law. Similarly, an application of the Weyl quadrature matrix J yields for the differentiated form of the unitary representation V v of G d Y ~ ( P ) - 2~rivt •

V~(Q) -

dt'

V ~ ( I ) - 27rim •

when evaluated at the canonical basis {P, Q , I } of the Heisenberg Lie algebra Lie(G). The induced Hilbert bundle sitting over the bi-infinite phase coordinate line R is G-homogeneous in the sense that the Heisenberg group G moves its fibers around by linear transformations. Therefore Johannes Keppler needed Tycho de Brahe's data base of observations of the planets in all different configurations, far surpassing any data base that had ever before accumulated, in accuracy and - equally important - in quantity. Previous astronomers had a more restriced program, chiefly concerned with critical moments of the planets' motions

77

such as oppositions to the sun. The linear representation U" of G and its swapped copy V u are globally square integrable mod C. Indeed, it is well known that a coadjoint orbit is a linear variety if and only if one (and hence all) of the corresponding irreducible unitary linear representations is globally square integrable modulo its kernel. It is reasonable to regard global square integrability as an essential part of the Stone-von Neumann theorem of quantum mechanics, because a representation of a nilpotent Lie group is determined by its central unitary character X~ if and only if it is globally square integrable modulo center. Thus X, allows for selection of the tomographic slice O~ with rotating coordinate frame of quadrature reference of center frequency u r 0. The corresponding equivalence classes of irreducible, unitarily induced, linear representations U" of G acting on the complex Hilbert space of globally square integrable cross sections for the Hilbert bundle sitting over the bi-infinite phase coordinate line R are infinite dimensional and can be realized as Hilbert-Schmidt integral operators with kernels If" E L2(R | R) ([13]). 9 The kernel function K ~ E L 2 (R | R) associated to the irreducible unitary linear representation U" of central unitary character )C, = U ~ I C implements a multichannel coherent wavelet perfect reconstruction analysis-synthesis filter bank of matched filter type. The reconstruction of the phase histories in local frequency encoding subbands of K ~ is performed by the symplectically reformatted two-dimensional Fourier transform. 9 Application of the symplectic Fourier transform which is inherent to the polarized symbol with respect to the rotating coordinate frame of quadrature reference precludes the ability to directly relate signal intensities to the number of excited protons within the selected tomographic slice (9~ (u r 0). The Heisenberg nilpotent Lie group approach leads to the non-locality phenomenon of quantum mechanics displayed by the double-slit interference experiment ([14]), to a quantum vacuum radiation explanation of the sonoluminescence phenomenon where a light pulse is emitted during every cycle of the sound wave with extremly small jitter ([8], [2]), and to major application areas of pulsed signal recovery methods, the corner turn algorithm in the digital processing of high resolution SAR data ([17]), the spin-warp procedure in clinical MRI, and finally to the variants of the ultra-high-speed echo-planar imaging technique of functional MRI recording of synchronized neural activities in the brain ([5], [15]). Combined with multi-slice acquisition, it is the spin-warp version of Fourier transform MRI which is used almost exclusively in current routine clinical examinations to acquire tomographic images from the distribution of nuclear spin angular momenta of protons. Switching from the SchrSdinger representations U ~, V ~ of G acting on complex Hilbert spaces of equivalence classes of globally square integrable cross sections, to their alternative realizations which are given by the Bargmann-Fock model of creation and annihilation operators acting on complex Hilbert spaces of holomorphic functions on symplectic affine planes ([13]), provides a spatial localization of the excited proton clusters. Because holomorphic functions allow for direct point evaluations, the positions of these clusters can be projected onto the stationary singular plane u = 0 of focal points, and therefore provide a quantum mechanical localization approach to the functional MRI recording of synchronized neural activities in the brain. ,, In terms of the symbolic calculus on the Heisenberg nilpotent Lie group G, the transition from morphological MRI of soft tissues to functional MRI recording of synchronized neural activities in the brain is performed by transition from the polarized symbol with respect to the rotating coordinate frame of quadrature reference to the isotropic symbol associated to the plane G/C transversal to the line C. ,, Morphological MRI requires a Fourier transform evaluation procedure; the quantum mechanical localization requires a statistical evaluation procedure. The speed with which clinical MRI spread throughout the world as a diagnostic imaging tool was phenomenal. In the early 1980s, it burst onto the scene with even more intensity than X-ray computed tomography (CT) in the 1970s. Whereas at the end of 1981 there were only three working MRI scanners available in the United States, presently there are more than 4.000 imagers performing in a non-invasive manner more than 8.5 million examinations per year. At the Division of MRI in the Department of Radiology of Johns Hopkins Medical Institutions, for instance, there are 5 MRI scanners but only one X-ray CT imager currently working. This illustrates clearly the trend towards MRI in the field of clinical diagnostic radiology: There are only a few circumstances under which X-ray CT still plays a role.. The speed of growth is a testimony of the clinical significance of the technique. Today the fastest growing imaging modality in radiodiagnostics is firmly established as a core diagnostic tool in the fields of neuroradiology and musculoskeletal imaging ([9]), routinely used in all medical centers in Western Europe and the United States.

78

Acknowledgment. The author is grateful to Professor George L. Farre (Georgetown University) for his generous advice as well as the Austrian Society for Cybernetic Studies (Vienna) for continuing support of this work.

Figure 1: Phased array MRI- long spine imaging

79

Figure 2" High resolution cranial MRI - sagittal and coronal tomographic slices

80 References [1] E.J. Aiton, The elliptical orbit and the area law. In: Kepler - Four hundred years. Proceedings of conferences held in honour of Johannes Kepler. A. Beer and P. Beer, Editors, Vistas in Astronomy, Vol. 18, pp. 573-583, Pergamon Press, Oxford, New York, Toronto 1975 [2] G. Barton, C. Eberlein, On quantum radiation from a moving body with finite refractive index. Ann. Phys. 227 (1993), 222-274 [3] J. Beltran, Editor, Current Review of MRI. First edition, Current Medicine, Philadelphia 1995 [4] M.A. Brown, R.C. Semelka, MRI: Basic Principles and Applications. Wiley-Liss, New York, Chichester, Brisbane 1995

[5]

M.S. Cohen, Rapid MRI and functional applications. In: Brain Mapping- The Methods, A.W. Toga, J.C. Mazziotta, editors, pp. 223-'255, Academic Press, San Diego, New York, Boston 1996

[6]

L.J. Cutrona, E.M. Leith, L.J. Porcello, and W.E. Vivian, On the application of coherent optical processing techniques to synthetic-aperture radar. Proc. IEEE 54, 1026-1032 (1966)

[7]

E.R. Davies, Electronics, Noise and Signal Recovery. Academic Press, London, San Diego, New York 1993

[8]

C. Eberlein, Sonoluminescence as quantum vacuum radiation. Phys. Rev. Lett. 76, 3842 - 3845 (1996)

[9]

R.R. Edelman, J.R. Hesselink, and M.B. Zlatkin, Clinical Magnetic Resonance Imaging. Two volumes, second edition, W.B. Saunders Company, Philadelphia, London, Toronto 1996

[lO]

M. King, Fourier optics and radar signal processing. In: Applications of Optical Fourier Transforms. H. Stark, editor, pp. 209-251, Academic Press, Orlando, San Diego, San Francisco 1982

[11]

E.N. Leith, Synthetic aperture radar. In: Optical Data Processing, D. Casasent, editor, pp. 89-117, Topics in Applied Physics, Vol. 23, Springer-Verlag, Berlin, Heidelberg, New York 1978

[12]

E.N. Leith, Optical processing of synthetic aperture radar data. In: Photonic Aspects of Modern Radar, H. Zmuda, E.N. Toughlian, editors, pp. 381-401, Artech House, Boston, London 1994

[13]

W. Schempp, Harmonic Analysis on the Heisenberg Nilpotent Lie Group, with Applications to Signal Theory. Pitman Research Notes in Mathematics Series, Vol. 147, Longman Scientific and Technical, London 1986

[14]

W. Schempp, Geometric analysis: The double-slit interference experiment and magnetic resonance imaging. Cybernetics and Systems '96, Vol. 1, pp. 179-183, Austrian Society for Cybernetic Studies, Vienna 1996

[15]

W. Schempp, The Structure-Function Problem of Fourier Transform Magnetic Resonance Imaging: Coherent Wavelet Filter Banks, and Spatiotemporally Encoded Synchronized Neural Networks. John Wiley & Sons, New York, Chichester, Brisbane (in print)

[16]

B. Stephenson, Kepler's Physical Astronomy. Princeton University Press, Princeton, NJ 1994

[17] D.R. Wehner, High Resolution Radar. Artech House, Norwood, MA 1987 [18] A. Weil, Sur certains groupes d'op~rateurs unitaires. Acta Math. 111 (1964), 143-211. Also in: (Euvres Scientifiques, Collected Papers, Volume III (1964-1978), pp. 1-69, Springer-Verlag, New York, Heidelberg, Berlin 1980

Walter Schempp Lehrstuhl fiir Mathematik I Universit~it Siegen D-57068 Siegen, Germany

Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

Wavelet

Transform

based

81

Information

Extraction

from

1-D and 2-D

Signals Adam D~browski Poznafi University of Technology Institute of Electronics and Telecommunication ul. Piotrowo 3a, PL-60 965 Poznafi Poland [email protected]

Abstract Information extraction from signals requires sufficiently high resolution in both time (space) domain and the frequency domain. This is however impossible in a single analysis because of general principle, the so-called uncertainty principle stating the impossibility of satisfactorily good simultaneous signal representations in both time (space) and frequency. The difficulty following from the uncertainty principle can be overcome using the multiresolution signal representation, the concept resulting in the so-called wavelet transformation. The multiresolution information extraction procedure consists of at least two steps: information is first only roughly extracted from the lowest resolution signal. Then the interesting signal parts or components are analyzed precisely in the highest resolution signal in order to get the satisfactorily exact information [4]. In this paper, this idea is applied to efficient edge detection in images (a 2-D example) and to the detection of DTMF (dual tone multifrequency) signals used for signaling in touch-tone telephones [8, 9] (a 1-D example).

1

Introduction

T h e classical way for signal analysis, say of a signal x(t), is by its spectral r e p r e s e n t a t i o n X(w), i.e. by the Fourier t r a n s f o r m

x@) =

~(t)~-J~dt.

(1~)

9(t) = ~1 f ? ~ X(w)6Wtd w

(lb)

oo

Thus

Signals which are short in time (or narrow in space) yield a wide s p e c t r u m a n d vice versa. p r o p e r t y is well-known as the uncertainty principle [1] and can formally be f o r m u l a t e d as 1

crtaoj >_ -~

This

(2a)

where

a~ = f _ ~ ( t - < t >)21x(t)12dt

f-~oo Ix(t)l 2dr

2

f-~176w - < w >)2[X(w)]2dw

and % =

(2b)

L% Ix(w)l 2dw

moreover

< t >--

F

oo

tlx(t)12dt

and

< w >-

F

eolX(w)12dw .

(2c)

oo

T h e u n c e r t a i n t y principle states t h a t time (or space) waveform w i d t h at a n d frequency s p e c t r u m w i d t h cr• cannot be b o t h a r b i t r a r i l y small simultaneously. For e x t r a c t i o n of information from signals, this

82

means that for high frequency resolution, a long time (or space) analysis is necessary, and vice versa [3]. Therefore, in order to reduce the analysis time, a hierarchical, i.e. multiresolution analysis should be made: first the signal is only roughly analyzed and then its interesting parts are analyzed precisely

[4]. Two particular illustrative examples of this approach to the extraction of information from signals are presented: 9 edge detection in images serves as a 2-D example, 9 DTMF (dual tone multifrequency) signaling detection, consisting in the detection of two sinusoidal components of equal magnitude, is a 1-D example.

2

Wavelet transform

The continuous wavelet transform of a signal x(t) is defined by

1/ X(a, T) = --~

x(t)V

(t--T) a dt

(3)

where ~(t) is the so-called mother or basic wavelet and v/-a-l~(.(t- r)/a) are the baby wavelets determined by the mother wavelet using shift r and scale a; v/-~-t being the energy correction factor [2]. The discrete wavelet transform is given by

X(m, n) = / x(t)q2mn(t)dt

where

~mn(t) = 2~ 9 (2rot - n) .

(4)

The most important case is the orthonormal wavelet basis. We can write CO

x(t) =

~

amn~mn(t)

(5)

.

In order to introduce the multiresolution concept we say that the moth order resolution signal Xmo (t) approximating x(t) is a part of the sum (5) composed of terms with m < mo. Introducing the scaling functions (~mn(t) = 2 ~ (2rot - n) , where function (I)(t) is frequently called the father wavelet, we obtain CO

m(t)= 9 ~

(6)

c,..~m.(t)

n'----CO

and

O0

Xm(t)

-- X m - l ( t ) + Wm-l(t) --

E n----CO

3

CO

C(m-1)n~(m-1) n(t) + E

d(m-1)n~(m-1) n(t) 9

(7)

n~--CO

Edge d e t e c t i o n

Discrete signals C(m-1)n and d(m-1)n in (7) can be considered as outputs of a low-pass filter (L) and a high-pass filter (H), respectively, excited with the signal Cmn. These filters form a QMF filter bank. In order to represent 2-D signals (images), we can use separable scaling functions and separable wavelets defined as follows LL

LH .L l n 2 ~IIT/~n

(ti, t2) HH q2mnln2 (tl, t2)

83

Figure 1: Original image boats

Figure 2: Edge detection using FDG masks: (a) (-1, 0,1), (b) ( - 1 , - 1 5 , 1 5 , 1 ) For edge detection, or generally, for information extraction, we have more freedom with the choice of wavelet and scaling functions than for other wavelet transform applications, as e.g. subband coding. That means that these functions need no more to form a QMF filter bank. This is because, in our case, we do not need to reconstruct the signal after it was split into subbands by the wavelet decomposition. The scaling function can therefore be as simple as possible. It can be, e.g., defined by a (1/2, 1/2) or even (1, 0) mask. The latter has been chosen in the example presented below. Wavelet filtering is, however, more complicated because it is used for the edge detection. Two alternative mask classes were tested: the first derivative of Gaussian (FDG) type mask and the second derivative of Gaussian (i.e. separable Laplacian) type mask (or the LOG type mask for short). The latter is, in fact, the most common type of mask for this application. As an edge detection criterion, the extremum search was used in both cases (for LOG type masks, we have, however, to count absolute values) and the zero-crossing search for LOG type masks only. Extremum search was realized very simply by an appropriate threshold. Results of experiments with the image boats Fig. 1, are shown in Fig. 2a and b for the FDG type masks (-1, 0, 1) and ( - 1 , - 1 5 , 15,1), respectively and with the threshold edge detection, and also in Fig. 3a and b for the LOG type mask ( - 1 , - 4 , - 1 3 , 3 6 , - 1 3 , - 4 , - 1 ) with the zero-crossing and the threshold criteria, respectively.

84

Figure 3: Edge detection using LOG mask ( - 1 , - 4 , - 1 3 , 3 6 , - 1 3 , - 4 , - 1 ) (b) threshold criterion

4

DTMF

with (a) zero-crossing and

detection

In DTMF signaling, each signal consists of a couple of sinusoidal signals with proper frequencies. They belong to two separate frequency groups: the low frequency group: 697, 770, 852, 941 Hz and the high frequency group: 1209, 1336, 1477, 1633 Hz. Signal detection can be interpreted as filtering with a nonuniform filter bank. For our goal, we discretize equation (3) in the following way 1

Z(a, N r , k ) = --~~u x(u)@k(u--Nr).

(9)

Assuming the sampling rate F - 1/T, the correspondence between the continuous-time t and the discrete-time p is u = [t/(aT)J where by LrJ the greatest integer less or equal to r is denoted. Thus we can also write Nr = LT/(aT)J 9For the detection of kth DTMF component, a set of mother wavelet functions of the form ff2k(V) = Wk(P)wkk (10) is assumed in equation (9), where

Wgk = e-J(2r/Nk ) and Wk(P) is a smoothing window function, e.g., the Blackman window =

0.42-0.5cos

Nk--1

+0.08cos\Nk_l

where u = 0, 1 , . . . , Nk -- 1. Furthermore, we assume NT - mNk, m being any integer; provided that the block size Nk is optimally chosen for minimization of the passband center frequency error 6. Thus, without any lack of generality, we shall henceforth assume that m = 0. The proposed algorithm is a modified Goertzel algorithm [6] for computation of the wavelet transform X(a, k) (i.e. for NT = 0). It follows from equation (9) and the following manipulations 1 Nk-1 v--O

=

1 w _ k N k Nk-I y=0

(Ii)

85

1

Nk-1

The equation (11) is a convolution of signals w k ( u ) x ( u ) and - ~1 w-k{w-k'~u gk~, Nk! ' l / - - 0'I,...,N 1. The latter can be interpreted as the impulse response of an IIR filter with the following transfer function (12)

H k ( z ) = - ~ 1 -- W N k Z -1

Thus, the value X ( a , k )

in equation (11) is the ( N k - 1 ) t h output sample of this filter, i.e. (13)

= y k ( N k -- 1) ,

X(a,k)

with 1

=

n

(14)

Z

v--0

where k - 0, 1 , . . . , N k -- 1. Transfer function (12) can be expanded to the form 1

WNk--z

H k ( z ) = v/-d 1 -- "ykz - 1

-1 + Z -2

(15)

'

with ~k = 2 c o s ( ( 2 u / N k ) k )

(16)

,

Thus, the basic algorithm step given by qk(n) =

7 } q k ( n - 1) - qk(n - 2) + w k ( n ) x ( n )

(17)

consists exclusively of real computations. Next, the signal energy but only once for a block, i.e for n = Nk - 1, must be computed, given by s

-- a - 1

([qk(Nk - 1) - qk(Nk -- 2)] 2

(18a)

\

2~rk

- 2 q k ( N } - 1)qa(N} - 2)(cos - ~ k -- 1)/

/

or

s

-'- a - 1

([qk(Nk -- 1) + qk(Nk -- 2)] 2

2~k

--2qk(Nk -- 1)qk(Nk -- 2)(cos - ~ k + 1)

)

(18b) .

In the experimental program, equation (18a) was used. Now we shall determine the necessary block size Nk. This may be done by analysis of required frequency resolution or of permissible relative error 5 of the filter passband center frequencies. As the required frequency resolution we can define the minimum difference between the DTMF frequencies, i.e. 73 Hz = ( 7 7 0 - 697) Hz. Thus, we get N >_ 110. The relative error 5 of the passband center frequency fp for the DTMF frequency fDTMFis defined as

= fp - fDTMF 100% .

(19)

fDTMF

Since DTMF frequencies are generated with relative error less than :kl.8%, then approximately the same bounds can be postulated for 5. Assuming some margin of say 45% we get the bounds -t-1.8 x 1.45% = • and then we obtain N >_ 104, i.e., approximately the same value as before from the frequency resolution requirement.

86

5

Conclusions

Multiresolution approach has been proposed for information extraction from signals and was applied to edge detection in images. By this means pseudoedges can easily be removed while the proper edges are placed precisely. Two mask types: the FDG mask type and the LOG mask type and also two edge-detection criteria: extremum search and zero-crossing, have been considered. FDG type masks surprisingly occured to be better in the considered application than the commonly used LOG type masks. This is because even very simple FDG masks like (-1, 0,1) (see Fig. 2a) yield quite satisfactory results and the threshold can be adaptively chosen for adges with the prescribed thickness. On the other hand, LOG type masks are very noise sensitive and the threshold criterion works very badly with them (see Fig. 3b). Therefore it cannot be practically used for them. It should be stressed that the threshold operation is much easier than the zero-crossing detection. The second example considered in this contribution is the DTMF detection. The main idea proposed for that application is a modification of the classical Goertzel algorithm. This modification consists in varying the analysed block length (to some extent only - - of say 5-6%) according to the frequency to be detected. This leads to the massive reduction of the average block size and to the reduction of the center frequency errors (f.

References [1] L. Cohen Time-Frequency Analysis, Prentice-Hall, Englewood Cliffs, N J, 1995. [2] Y. T. Chan Wavelet Basics, Kluwer Academic Publ., Boston, 1995. [3] S. G. Mallat, Multifrequency channel decompositions of images and wavelet models, IEEE Trans. Acoustics, Speech, a. Signal Proc., vol. 37, No. 12, Dec. 1989, pp. 2091-2110. [4] A. D~browski, A. Franc, A. Czajka, Realization of wavelet transform for Windows with application to edge detection in images, First International Symposium "Mathematical Models in Automation and Robotics", Mi~dzyzdroje, Sep. 1-3, 1994. [5] A. Dajbrowski, Wavelet transform-based modification of Goertzel algorithm for detection of DTMF signals, The International Conference "Signal Processing Applications & Technology, Boston, MA, 1995. [6] G. Goertzel, "An algorithm for the evaluation of finite trigonometric series", The American Math. Monthly, vol. 65, pp. 34-35, Jan. 1958. [7] A. D~browski, "Nonuniform digital filter bank for DTMF receiver", Proc. Workshop on Multirate Systems, Filter Banks and Wavelet Analysis, ETH Zurich, Oct. 26, 1992, pp. 10-13. [8] A. Da~browski, W. Kabacifiski, "Experiences with DTMF receivers and tone senders in Poland using DSP's", Proc. Int. Conf. Signal Process. Appl. & Technology, ICSPAT'93, Santa Clara, USA, Sep.1993, pp. 193-198. [9] S. Bagchi and S. K. Mitra, "An efficient algorithm for DTMF decoding using the subband NDFT", Proc. ISCAS'95, Seattle, USA, April 1995, pp. 1936-1939. Work supported in part by the grant KBN 0452/P4/94/07 and in part by the project DPB 44-443.

Invited Session C:

GENERAL TECHNIQUES AND ALGORITHMS

This Page Intentionally Left Blank

Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

89

C o m p u t a t i o n a l M e t h o d s and Tools for Simulation and Analysis of C o m p l e x P r o c e s s e s

k criteria, ANN and CA) (applying cv~V.V. Ivanov Laboratory of Computing Techniques and Automation Joint Institute for Nuclear Research, 1~1980 Dubna, Russia FAX: 007-096-21-651~5; E-mail: ivanov 9 Abstract The tutorial is devoted to computational methods and tools for simulation and analysis of different complex processes in physics, in medicine and in social life. There are considered: 1) multivariate data analysis methods based on w~k-criteria and artificial neural networks, 2) neural networks applications for solving problems of data classification and one-dimensional function approximation, and 3) cellular automata usage in pattern recognition and complex system simulation. These methods and tools are developed in the Laboratory of Computing Techniques and Automation of the Joint Institute for Nuclear Research (Dubna, Russia) in collaboration with the International Solvay Institutes for Physics and Chemistry (Brussels, Belgium).

1. M u l t i v a r i a t e d a t a a n a l y s i s b a s e d on cv~ k-criteria and A N N The primary goal of experimental data processing consists in identification of useful events among all events obtaining in the experiment. Under an event we mean the set of features characterizing the analyzed pattern. The classification of events in one-dimensional case is carried out with the help of a simple cut on a feature variable. When an event is characterized by more then one variable, the procedure for constructing a multivariate classifier is not trivial. k In paper [1] we have suggested and investigated a class of new nonparametric co~statistics

w~ =

n5 k+l

i=1

i - 1 _ F(xi) n

1

-

- F(xi)

1

where F(x) is the theoretical distribution function of x, Xl < x2 < ... < x,~ is an ordered sample, and n is the sample size. On their basis were constructed corresponding goodnessof-fit criteria, which are usually applied for testing the correspondence of each sample-event to the distribution known apriori. On the w~k criteria basis was developed a method for extracting low probability multivariate events from a background of predominant processes [1], which was successfully applied in several experiments for selection of rare events [2, 3]. Recently, the use of artificial neural networks (ANN) in multi-dimensional data analysis has became widespread [4]. One such problem consists in classifying individual events represented by empirical samples of finite volumes pertaining to one of the different partial distributions composing the distribution analyzed. A feed-forward multilayer network multilayer perceptron(MLP) - is a convenient tool for constructing multivariate classifiers, although its learning speed and power of recognition depends critically on the choice of input data [5].

90 Such network involves an input layer corresponding to the processing data, an output layer dealing with the results and, also, hidden layers. A network architecture is presented in Fig. 1.

Result

t

t

yi

Wij hj Wjk

t

t

Input

t

data

t

Xk

Figure 1: Architecture of multilayer perceptron with one hidden layer Here xk, hj and yi denote the input, hidden and output neurons, respectively; wjk are the weights of connections between the input neurons and the hidden layer, and wij are the weights of connections between the hidden and the output neurons. The signals aj = ~kwjkxk and a i = ~jcoijhj are fed to the inputs of hidden and output neurons, respectively. The output signals from these neurons are determined by the expressions hj = g[(aj + Oj)/T] and yi = g[(ai + Oi)/T], where g ( a , T ) i s a transfer function, T is the "temperature", determining its slope, 0 is the threshold of the corresponding node. Typically, g(a, T)is a sigmoid, for example, of the form g(a, T ) = tanh(a/T). The tuning of MLP on the solving problem (this procedure is known as the ANN learning) consists in minimization of the following error functional with respect to weights E-~I ~[r

t-(p)]2

p

where p = 1,2, ..., Ntr~i~ - is the number of training patterns, and tip) is the desired value of the output signal. A comparative study of multidimensional classifiers based on the goodness-of-fit criteria k and multilayer perceptrons (MLP) has been carried out in work [6]. It was shown that a) n MLP exhibits the "instantaneous" learning effect and a power close the limit in the case of input data represented in the form of variational series. The reasons are analyzed that underlie these effects. Recommendations for joint usage of the w~k criteria and of MLP are given [6]. Rare events identification on a background of dominated processes is an important problem of applied mathematical statistics. The practical impossibility of ANN training on data with significantly different contributions of separating classes strongly restricts the wide inculcation of neural computational methods in this field. The method for solving this problem was developed in work [7]. It is based on application of MLP with a single layer of

91 hidden neurons having a step-like transition function. The procedure includes two stages: 1) the network learning on data with identical contributions of each separating class, and 2) the transformation of calculated bias matrix. It is shown that the developed approach allows to use neural networks for the identification of rare events with a contribution of order 0.1%. 2. A N N in d a t a classification and function a p p r o x i m a t i o n In recent years artificial neural networks have acquired widespread application in natural sciences, in medicine etc. Here we present some examples of ANNs usage for solving problems of data classification and pattern recognition, and for function approximation. A two-level trigger is developed for suppression of the background and for effective selection of events involving short-lived A-, E- and C-particles in the experiment DISTO. The first-level trigger is intended for selection of events by their multiplicity" only fourprong events are selected. Events accepted by the first-level trigger are then examined with the help of the second-level trigger, which is to be applied for track recognition, in searching for a secondary vertex, and for identifying the secondary particles. It is based: 1) on a recognition of straight tracks applying the specialized cellular automaton (see details in the next section), 2) on the momentum variables permitting effective selection of events containing a secondary vertex, and 3) on the identification of the secondary charged particles applying MLP [8]. A simple and efficient algorithm for identifying events with secondary vertex making use of MLP was developed in paper [9]. The differences Rx, Ry (respectively, in X O Z and Y O Z projections) between the largest and the smallest impact parameters 1 D~ (i - 1,2, ..., n) of all tracks belonging to each of the events analyzed were used in establishing the identification criteria of signal and background events. An effective method for identifying the tracks associated with a particular secondary vertex in an event was developed. The method is based on the differences between the asymmetries exhibited by the sets Di of individual signal events and background events. A procedure for recognition of features in the ECG of one heart beat and from a single channel using MLP was developed in work [10]. The main idea of the method is to present to the network not raw data, but the transformed data. We believe that a system of polynomials orthogonal on a set of uniformly spaced points is the adequate formalism for the analysis of electrocardiograms as measurements are taken in equal time intervals, and all points can be denoted 0, 1,2,... n. The above mentioned polynomials Pk,n(x), k - O, 1,2, ...,m _< n are related by the following recurrent equation (see, for instance, [11])"

(x

(rn + 1)(n - rn)

2(2, + 1)

m(n+m+l) 2(2m + 1)

P.~_~,~(x)- O,

Po,~(x)- 1, P l , ~ ( x ) - 1

Pm+l,n(X)+

1 < m < n, 2x n

The polynomial P,~(x) approximating the function f ( x ) i n this case is

P (x) - c0P0,

(x)+ ClPl, (x) + . . . + cmP ,n(x),

1The impact parameter of a track in the plane passing through the center of the target and perpendicular to the beam.

92 where

(2i + 1)n (i)

Ci -- (i q- n q- 1)(i+1)

E f(k)Pi,,~(k), k=0

i = 0, 1 , 2 , . . . m ,

and x (i) is the polynomial of the form x ('~) - x ( x -

1)... (x - n + 1).

The proposed transformation provides significantly simpler data structure, stability to noise and to other accidental factors. The method was tested on the data generalizing features of normal and modified ECGs and provided high level of recognition for unveiling of barely noticeable pathologies. A by-product of the method is compression of the raw data and reduction of its amount; the compression coefficient has a value of 5+ 10 and can be improved. The procedure adopted for the parametrization of functions defined on a finite set of argument values plays an essential role in the problem of experimental data processing. Diverse methods have been developed and are widely applied in constructing approximating functions in the form of algebraic or trigonometric polynomials. In our work [12], a nontraditional approach to the interpolation of one-dimensional functions is presented. It is based on the application of the specialized feed-forward neural network, which realizes expansion in the set of orthogonal Chebyshev polynomials of the I-st kind. This approach permits to calculate the expansion coefficients during the network training process, for which arbitrary points (for instance, measured in experiments) from the function's domain are used. The neural network provides an accuracy of function approximation practically coinciding with the accuracy, that can be achieved within the traditional approach, when the values of the function at the nodal points are known. 3. CA in p a t t e r n r e c o g n i t i o n and c o m p l e x s y s t e m simulation Cellular automata arose from numerous attempts to create a simple mathematical model describing complex biological structures and processes [13]. A cellular automaton is a most simple discrete dynamical system, the behavior of which is totally dependent upon the local interconnections between its elementary parts [14]. This model turned out to be very productive and has been widely and successfully applied in describing various complex structures and processes in physics, biology, chemistry and etc. A typical cellular automaton is constructed in accordance with the following algorithm: 1. cells and their possible discrete states are defined; usually, each cell may assume one of two states, 0 or 1; however, there may be cellular automata with more states; 2. interconnections between cells are defined; usually, each cell can only communicate with neighbour cells; 3. rules determining evolution of the cellular automaton are fixed; they depend on the actual problem considered and usually have a simple functional form; 4. the cellular automaton is a timed system, in which all cells change states simultaneously. A model of the cellular automaton for recognition of straight tracks has been developed in paper [15]. In this case a cell was identified with the straight-line segment connecting two hits in neighbouring coordinate detectors. To take into account the inefficiency of the detectors one must consider, also, the segments connecting hits skipping one detector.

93

Clearly, only such segments can be considered neighbours which have a common point serving as the end of one segment and the beginning of the second. At each step a cell can assume one of two possible states: 1, if the segment can be considered a part of the track, and 0 otherwise. As the criterion in assigning segments to a track was taken the angle between two adjacent segments. Owing to the coordinate detectors having a discrete structure and to multiple scattering in the material of the experimental apparatus, the angles between track segments in the real experiment are not zero, but an upper limit can be imposed. Upon completion of the work of the cellular automaton additional testing of the quality of reconstructed tracks (for instance, for the presence of at least two hits belonging only to each individual track) is carried out. This permits rejecting "phantom" tracks, which were accidentally constructed from hits belonging to different tracks.

Figure 2: Initial configuration of the cellular automaton for a typical Monte-Carlo event in the spectrometer DISTO

Figure 3: Resultant configuration of the cellular automaton for the event presented in the previous figure

The program realization of the described approach has shown high efficiency and speed for the simulated data for the experiment DISTO. Its working speed provides for the processing of approximately 1000 events/sec using the 50 MIPS RISC processor. This makes suitable its application for track recognition in the second-level trigger of the DISTO spectrometer. In the paper [16] the implementation of Probabillistic Cellular Automata in the study of multispecies agent groups is investigated. As a first step we consider the communication between the two species governed by a probabilistic rather than a deterministic process. This way we implement the kind of coupling suggested in [17] following the spirit of probabilistic control for unstable systems. Here as controller one can consider the population of agents following a specific pattern, and as the unstable system the population which tends to cover all available space in an ergodic-like fashion. From the variety of all possible realizations of the above idea we start our investigations, for the sake of clarity and simplicity (helping us fix ideas by simple examples), considering first two-species one of these species being represented by a single agent and the other by a small group of agents (50 + 100).

94

Acknowledgments This work has been partly supported by the Commission of the European Community within the framework of the EU-RUSSIA Collaboration, in accordance with the project ESPRIT 21042: Computational Tools and Industrial Applications of Complexity.

References [1] V.V. Ivanov and P.V. Zrelov:

"Nonparametric Integral Statistics w~" k Main Properties and Applications", Int. Journal "Computers & Mathematics with Applications" (in Press); JINR Communication P10-92-461, 1992 (in Russian).

[2] P.V. Zrelov and V.V. Ivanov:

"The Relativistic Charged Particles Identification Method Based on the Goodness-of-Fit w~3-Criterion', Nucl. Instr. and Meth. in Phys. Res., A310 (1991) 623-630.

[3] P.V. Zrelov, V.V. Ivanov, V.I. Komarov, A.I. Puzynin and A.S. Khrykin:

"Modelling of Experiment on Investigation of Processes of Subthreshold K + - Mesons Production". JINR Preprint, P10-92-369, Dubna, 1992; "Mathematical Modelling", v.4, N%ll, 1993, p.56-74, (in Russian).

[4] B. Denby: "Tutorial on Neural Networks Applications in High Energy Physics: The 1992 Perspective". In Proc. of II Int. Workshop on "Software Engineering, Artificial Intelligence and Expert Systems in High Energy Physics". New Comp. Tech. in Phys. Res. II, edited by D. Perret-Gallix, World Scientific, 1992, p.287. [5] A.Yu. Bonushkina, V.V. Ivanov and P.V. Zrelov: "Input Data for a Multilayer Perceptron in the Form of Variational Series". In: Proc. of the Fourth Int. Workshop on Software Engineering, Artificial Intelligence and Expert Systems for High Energy and Nuclear Physics, April 3-8, 1995, Pisa, Italy; "New Computing Techniques in Physics Research IV", edited by B. Denby & D. Perret-Galix, "World Scientific", 1995, p.751. [6] V.V. Ivanov: "Multidimensional Data Analysis Based on the wnk Criteria and Multilayer Perceptron". In: Proc. of the Fourth Int. Workshop on Software Engineering, Artificial Intelligence and Expert Systems for High Energy and Nuclear Physics, April 3-8, 1995, Pisa, Italy; "New Computing Techniques in Physics Research IV", edited by B. Denby & D. Perret-Galix, "World Scientific", 1995, p.765. A.Yu. Bonushkina, V.V. Ivanov and P.V. Zrelov: "Multivariate Data Analysis Based on the wnk-Criteria and Multilayer Perceptron', Int. Journal "Computers & Mathematics with Applications" (in Press). [7] V.V. Ivanov and P.V. Zrelov: "Rare Events Selection on a Background of Dominated Processes Applying Multilayer Perceptron'. Report at this conference. [8] M.P. Bussa, L. Fava, L. Ferrero, A. Grasso, V.V. Ivanov, I.V. Kisel, E.V. Konotopskaya, G.B. Pontecorvo: "On a Possible Second-Level Trigger for the Experiment DISTO", "Nuovo Cimento", vol. 109A, 1996, p. 327. [9] A.Yu. Bonushkina, V.V. Ivanov, Yu.K. Potrebenikov, T.B. Progulova and G.T. Tatishvili: "Identification of Events with Secondary Vertex in the Experiment EXCHARM". JINR Communications, P1-96-56, Dubna, 1996 (in Russian).

95 [10] A. Babloyantz, V.V. Ivanov, P.V. Zrelov and P. Maurer: "A New Approach to ECG's Analysis Involving Neural Network". "Neural Networks Letters" (in Press). [11] I.S. Berezin and N.P. Zhydkov: "Computing Methods", vol. 1, Moscow, 1959 (in Russian). [12] V. Basios, A.Yu. Bonushkina and V.V. Ivanov: "On a Method for Approximating OneDimensional Functions", Int. Journal "Computers & Mathematics with Applications" (in Press). [13] S. Wolfram (ed.): "Theory and Applications of Cellular Automata". World Scientific, 1986.

[14]

T. Toffoli and N. Margolus: "Cellular Automata Machines: A New Environment for Modelling". MIT Press, Cambridge, Mass., 1987.

[15]

M.P. Bussa, L. Fava, L. Ferrero, A. Grasso, V.V. Ivanov, I.V. Kisel, E.V. Konotopskaya, G.B. Pontecorvo: "Application of a Cellular Automaton for Recognition of Straight Tracks in the Spectrometer DISTO", Int. Journal "Computers & Mathematics with Applications" (in Press).

[16]

V. Basios, F. Bosco, V.V. Ivanov and I.V. Kisel: "From Individual Interactions to a Collective Behaviour of Autonomous Agents Group". Report at this conference.

[17] "Probabilistic Control of Chaos: Chaotic Maps Under Control", to appear in: The Int. Journal "Computers & Mathematics with Applications", Special Issue, Eds. I. Prigogine, I. Antoniou, et al. (in Press).

This Page Intentionally Left Blank

Proceedings IWISP '96," 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

97

RARE EVENTS SELECTION ON A BACKGROUND OF DOMINATED PROCESSES APPLYING MULTILAYER PERCEPTRON Ivanov V.V. and Zrelov P.V. Joint Institute for Nuclear Research, Dubna, Russia Abstract

Rare events identification on a background of dominated processes is an important problem of applied mathematical statistics. The practical impossibility of a neural network training on data with significantly different contributions of separating classes strongly restricts the wide inculcation of neural computational methods. In this work the uniform approach for solving the mentioned problem is developed. Our approach is based on application of one type of neural networks - a multilayer perceptron with a single layer of hidden neurons having a step-transfer function. 1. I n t r o d u c t i o n The present work deals with problems involving the application of neural-network classifiers for identifying rare events in the case of a background of dominant processes. Under an event we mean the set of features characterizing the analyzed pattern. The main difficulty of application of neural networks for solving the indicated problem is connected with "reluctance" of network to learning, when samples corresponding to different classes with strongly distinguishing apriori probabilities are supplied to its input. The network as if does not "observe" patterns presented by relatively small quantities. In this case the source of errors is caused by a will of investigator to train the network on the basis of equal probabilities (P(Wl) = P(w2)), and then to apply it for separation of classes with inequal contributions (P(Wl) =/= P(w2)). We call this procedure the "approximate" bayessian classification. The investigator usually does not take into account the fact that separating boundaries for these two cases can differ significantly in these two cases and this can lead to incorrect classification. All problems considered in the paper correspond to the bayessian classification with minimal level of error [1]. Results of this work correspond to the case when bayessian limit exists and separating boundary can be found. 2. C r i t e r i a of I d e n t i f i c a t i o n of R a r e E v e n t s In the theory of pattern recognition a keyword criterion characterizing a quality of receiving result is, so-called, level of recognition R. It represents a fraction of correctly identified events from a whole number of events presented for classification and can be written in the form

R-

1-a+m(1-fl) l+m

'

(1)

where m - ~ and N1, N2 are numbers of events of the first and the second class respectively, c~ and/3 are parts of correctly identified events of the first and the second classes. However, it must be noted that in the problem of identification of rare events the value R can not be basic or, at least, the only criterion, because in such problems it is necessary

98 to extract useful events with minimum losses and to leave that part of background events on which level the examined signal is displied quit well. A fraction of signal events to the whole number of events can be used as a convenient criterion. It can be represented in the form: 1--OZ 71

1 - c~ + / 3 m '

(2)

In the dependence of the subject field and on the concrete problem a role of parameters R and 71 can be changed. It can be convenient also to consider some modification of criterion 713. B a y e s s i a n classification of d i s t r i b u t i o n s w i t h different c o n t r i b u t i o n s Let's carry out the quantitative consideration of indexes of bayessian classification on the example of classification of multidimensional gaussian distributions with diagonal covariance matrixes Ej - 0"~ I, where j - 1,2, and I is an unit matrix. Let O"1 # 0"2- The boundary separating classes has a character of a hypersphere with a radius r and with a center in a point b, 2 2 0"10"2 11/71 -

2

-

-

} 1/2

0"~) l n [ ( . e ~ ) = g @ l

0"2#1i

I ,1

-

-

,

,

(3)

i - 1 , 2 , . . . n; fij is a vector of mean values, and P(wj) is apriori probability of events wj, j = 1, 2; n is the dimension of space. It can be shown, that in common case fil 7~ fi2 (for definiteness 0.2 < 0.1, fil = 0), a good approximation for value c~ is served an expression { l(n+2a)

a ~ I

-2 n + 3 a

(_~1)

+

a2 ] ~f}l

n+3a

(4)

'

where I is incomplete V function, a = Y ] n _ l ( b i / 0 . 1 ) 2 , and f = (n + 2a)3/(n + 3a) 2. Similarly the approximate expression for/3 has a form

~- 1 - I

1

-~

+ 2a'

+ 3a'

r

a t2

+

n + 3a'

1,}

, ~f

,

(5)

where a ' = ~=l((bi-~- ~2i)/0.2) 2, and f ' = (n + 2a')3/(n + 3a') 2. Expressions (1) - (5) connect values Crl, cr2,/71, fi2 with variables R and 71 characterizing the quality of classification. 4. R a r e e v e n t s classification a p p l y i n g n e u r a l n e t w o r k The general scheme of the method involves a two-step procedure. At the first stage the training of a network is performed for a ratio of 1:1 between the classes being separated, while at the second stage correction is carried out of a certain group of network parameters termed shifts. In a number of simple cases the correction permits simple analytic transformations, and in more complicated cases it requires minimization of a functional in the space of shifts for given weights of the neural connections. In general case a change of

99

Fig.1.

T h e efficiency of t h e m e t h o d for two rela~tions b e t w e e n c o n t r i b u t i o n s of t h e e v e n t s :

1) P ( w 1) = 0.1, P ( w 2 ) = 0.9 (two t o p c h ~ r t s ) , 2)

P ( w l ) -- 0.001, P ( w 2 ) = 0.999 ( t w o b o t t o m c h a r t s ) .

relation between P(col) and P(co2) leads to some transformation of the separating surface and in special case are determined by a relation of similarity. In the process of construction of separating boundary with the help of a multilayer perceptron with one layer of hidden neurons, which have step-transfer function, fitting of parameters approximating this boundary is carried out. The change of the relation between P(col) and P(w2) corresponds to a parallel translation of each hyperplane. The value of this shift is determined by the threshold of a network. The method are best considered using the example of previous chapter. In this case a change in the relationship between P(COl) and P(aJ2) only results in a change of the radius of the separating hypersphere, i.e. is determined by similarity relations. It can be readily shown, that for known radii, r and r', of the Bayes hyperspheres the

100 shift (c~) of each neuron of the hidden layer should be recalculated by the formula

-

+

.

(6)

i=1

where wij is the weight connecting the j-th neuron of the hidden layer with the i-th input neuron.

For practical realization of the method a multilayer perceptron simulator from the package J E T N E T - 3 [2] was used. There were considered two versions of the method: l) with recounting values of shifts after network training for a ratio P(w~) = P(w2), and 2) with determining the indicated values by means of minimization of network functional in the process of its repeated training. In the second case to the input of neural network with fixed weight matrix were supplied the same data. The problem of separation of two gaussians with diagonal matrixes Ej - crj2I, j - 1 2 i n the space of dimension n = 5 and with or1 = 1, ~r2 = 0.3 was considered. In fig. 1 are presented values of variables R and 71 for the case ~1 = /~2 = 0 and P(wl) = 0.1, P(w2) = 0.9 (figures l a and l b) and for the case IA#il = 1,i = 1 , 2 , . . . 5 and P(wl) = 0.001,P(w2) = 0.999 (figures 1 c and 1 d). The presented values are marked by asterisks for the case of minimization of functional, and by squares - for the case of recounting shifts using a formula (6). Contour notations concerns the training pattern, s h a d e d - to the control sample. The results concerning the approximate bayessian classification (denoted by circles) are presented for comparison. Moreover, results of training and testing of network after first stage of the correction procedure concerning a ratio P(wl) = P(w2) = 50% (notated by triangles), as well as curves concerning bayessian classification are presented. Some deviations from a theory are caused by insugiciently well carried out of training at the first stage of the correction procedure, which characterizes the accuracy of the method. 5. C o n c l u s i o n The method for solving the problem of small probability events classification was developed. It is based on application of one type of neural network - - multilayer perceptron with a single layer of hidden neurons having a step-transfer function. The developed approach allows effectively use the neural network for identification of rare events, which contribution does not exceed 0.1%. This method is mostly actual in the case of small sample sizes n< 10+20. Acknowledments This work has been partly supported by the Commission of the European Community within the framework of the E U - R U S S I A Collaboration, in accordance with the project ESPRIT 21042: Computational Tools and Industrial Applications of Complexity.

Bibliography [1] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, (Wiley, New York, [2] C.Peterson, T.RSgnvaldsson and L.LSnnblad, : "JETNET 3.0- A Versatile Artificial Neural Network Package", CERN-TH.7135/94, December 1993.

Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

101

Cellular Automaton and Elastic Neural Network Application for Event Reconstruction in High Energy Physics I. K i s e l

E. K o n o t o p s k a y a

V. K o v a l e n k o

J o i n t I n s t i t u t e for N u c l e a r R e s e a r c h , D u b n a , 141980 R u s s i a

Abstract

cellular automaton for filtering data and elastic net for geometrical reconstruction of events in high energy physics. The advantages of methods are simplicity of the algorithms, fast and stable convergency, and reconstruction efficiency close to 100%. These methods were tested with success on simulated events and real data obtained in the experiments NEMO (Modane, France) and MI0 (PSI, Switzerland). W e use

1

Introduction

The rapid development during last 10-15 years of various theories of artificial neural networks [1] was a reflection of and attempt to overcome the gulf between the huge amount of factual material relating to the biological mechanisms of brain operation accumulated in neurophysiology and the inadequate existing mathematical formalism and computational means of technical realization of the formalism. The principal advantages of the brain in fulfilling logical, recognition, and computational functions, using capabilities that are essentially parallel, nonlinear, and nonlocal, did not match the prevailing principle of sequential calculations with orientation of the mathematical formalism toward locality, linearity, and stationarity of the descriptions. Included among these are problems whose solution is complicated precisely by nonlinearity, nonlocality, discreteness, and, often, nonstationarity of the situation. For instance, we have here problems of pattern recognition, construction of associative memory and optimization. Essentially, the theory of artificial neural networks is a part of the general theory of dynamical systems in which particular attention is devoted to the investigation of the complicated collective behavior of a very large number of comparatively simple logical objects. Having their own significance cellular automata can be regarded as local discreet form of neural networks. They are used particularly in high energy physics for data filtering and track searching. Here we describe an application of a cellular automaton for searching tracks and an elastic neural net for fitting tracks in the NEMO experiment [2] and for searching for vertex in the MI~ experiment [3].

2

N E M O experiment

The goal of the NEMO collaboration I is to study tiff decays of l~176 and other nuclei to probe the effective Majorana neutrino mass down to 0.1 eV. The collaboration is building the NEMO-3 detector to realize this. A prototype detector NEMO-2 designed for tiff studies has already provided some measurements and is presently running in the Frdjus Underground Laboratory. The detector is a 1 m 3 volume made of tracking chambers composed of drift cells operating in the Geiger mode and two plastic scintillators arrays for energy and timeof-flight measurements. A typical event in this experiment has a few number of tracks usually good separated in space. But this situation is complicated by essential effect of multiple scattering and even hard scattering on wires. 2.1

Cellular

automaton

for track

searching

Searching for tracks in the presence of the left-right ambiguity of drift tubes and significant effects of multiple scattering in a gas and even hard scattering on wires becomes a task lying out of typical problems of event reconstruction in high energy physics. Therefore the method of cellular automata was chosen as flexible one and was good recommended itself working in nonstandard situations. Cellular automata are dynamical systems that evolve in discreet, usually two-dimensional, spaces consisting of cells. Each cell can take several values; in the simplest case one has a single-bit cell: 0 and 1. The laws of Ihttp ://nuweb. j inr. dubna, su/LNP/NENO

102

evolution are local, i.e., the dynamics of the system is determined by an unchanged set of rules (for example a table) in accordance with which the new state of a cell is calculated on the basis of the states of the nearest neighbors surrounding it. It is important that this change of states is made simultaneously and in parallel, with time proceeding discreetly. Cellular automata became particularly popular in the 1970's through the publication of M. Gardner in Scientific American which was devoted to Conway's game, Life. Specific features of the experiment make more preferable the segment model of cellular automaton where elementary unit (cell) is segment connecting two fired wires in neighboring layers. To construct a cellular automaton for track searching in NEMO-2 data, one proceeds following the logic of cellular automata. First, note that the cellular automaton is three-dimensional. A cell is identified with a straight-line segment connecting two fired wires in neighboring drift tube layers, making the cellular automaton essentially local. To take into account Geiger tube inefficiencies one must also include segments connecting wires which skip one layer. At each step an individual cell has two possible states: 1 ~ if a segment can be a part of a track, and 0 otherwise. Second, in establishing the criterion for assigning segments to a track it is obvious that only segments with a common extremity can be considered as neighbors. Then owing to the coordinate detectors having a discreet structure and to multiple scattering in the material of the experimental apparatus, the angles between track segments in the real experiment are not zero, but an upper limit Cmax can be imposed. Third, all cells are initialized with a state 1 and at each step of evolution they look on neighbours and decide to change states to 0 if there are no neighbours with state 1 at both sides. Neighboring segments forming a small angle are preferable during evolution. Fourth, the definition of time is as usual and evolves discreetly. All cells change their states simultaneously. The cellular automaton has following features (comparing with the previous track finder): increase in the tracking efficiency of 9%; increase by a factor 35 in the processing speed; working in 3D space; good reconstruction of tracks with hard scattering on wires; reconstruction of short tracks; simple to modify. 2.2

Elastic

net for tracking

Let us consider only single straight tracks out of magnetic field. There are no noise, which does not need track searching, and no missing wires, which slightly simplify an algorithm. It is obvious to

Define track with multiple scattering as the most smooth line touching all circles rounded fired wives and crossing all layers. Let's try to construct the elastic net as a line which is deformed under influence of t w o t y p e s of force: 1. the first pulls it to the edges of circles; 2. the second smooths out the line. In the case of left-right ambiguity of drift tubes the task can be considered as a problem of minimization of a function of many variables with many local minima. To solve this problem we propose to start from two points rounding global minimum and covering all possible area of physical region of the parameters have to be found. These points are not independent but attract each other. So the points will pull each other from all local minima until the global minimum will be reached. According to this idea we start from two bounded tracks which restrict geometrical area of the real track. One of them touches circles at upper sides but another one B at down sides. Then we introduce a t h i r d t y p e o f force which is: 3. attraction between these bounded tracks to squeeze a geometrical area to the real track. This method allows to find an optimal trajectory which corresponds to our model of track. The elastic net can be simply modified to be able to reconstruct broken tracks. We have only to switch off track smoothing at a break point, which has to be found during preliminary analysis. An example of evolution of the elastic net for a multiple scattered track is presented in Fig. 1. Layers are numbered from left to right and iterations go from up to down. There are two starting tracks B upper and down tracks. One can see smoothing the upper track at first layer after first iteration, but then this track becomes almost linear at the left group of layers and is stable being attracted to neighboring edges of circles. It is in a local minimum and moves down only under pressure of the third type of force ~ attraction to the down track. The middle part of upper track is smoothed at the beginning and then evolves mainly due to attraction to the down track. The right part of tracks is in equilibrium stage at the middle of the evolution and goes to the global minimum only due to smoothing.

103

~

!40

~120

IO0

8O

4O

2O

2:_~

~

.

.

.

.

.

.

.

.

.

.

.

.

.

00 ' .

5

.

10

.

.

.

.

15 20 25 Number o f iterations

i

30

.

~

.

35

.

Figure 1" Example of the elastic net evolution for a track with multiple scattering.

Figure 2: Number of iterations for convergency.

The elastic net method has fast convergency to track with few iterations (see Fig. 2) and reconstruction efficiency close to 100%. m

MM experiment

3

This is a new experimental search for the lepton-number-violating process: M = (~+, e - ) --+ M - ( j u - , e+), a spontaneous conversion of muonium (M) into antimuonium (l~I). This process is forbidden in the standard electroweak theory but allowed in some modern theories beyond the standard model. The experiment is performed at the proton cyclotron at the Paul Scherrer Institute (PSI) in Villigen, Switzerland. The detector consists mainly of two parts. The first one is a magnetic spectrometer with a large solid angle. It consists of five cylindrical multiwire proportional chambers and one scintillator hodoscope built up of 64 strips surrounding the chambers. The target is located at the center of the detector. The second part of the detector, the positron detector, consists of position sensitive micro channel plate and 12 segmented CsI crystal surrounding it.

3.1

E l a s t i c net for v e r t e x search

We use elastic net for vertex search in the case of arc tracks. This kind of task is appeared in many experiments, so the algorithm can be applied widely. The main problem is caused here by a big target (~ 10 cm in diameter) used in the experiment. So we have no good initial approximation for the least squares method applied usually for such tasks. Another feature of the experiment is different number of tracks per event w up to 10. So the algorithm must search for vertex in events with any number of tracks. Let us

Define vertex as geometrical point with maximum density of tracks. We construct the algorithm on the basis of elastic ring and introducing only t w o t y p e s of force: 1. attraction of the ring to all tracks placing it at the condensation area of tracks; 2. attraction to the nearest tracks localizing the vertex region. An example of testing of the algorithm on simulated events is presented in Fig. 3. Here you see 3 tracks crossing the big target. The iterational procedure is also presented in the picture by circles converging to the vertex. A comparison of the elastic net method with the fast vertex search method [4] based on the Chebyshev metrics was also made. Good correlation between errors of vertex for the elastic net method and the method based on Chebyshev metrics (Fig. 4) shows reliable working of the method.

104

Figure 3: Vertex search in 3 tracks event. The iterational procedure is presented by circles converging to the vertex.

4

Figure 4: Correlation between errors of vertex for the elastic net method and the method based on the Chebyshev metrics.

Conclusion

The results of testing on simulated and real NEMO tracks and simulated and real MI~I events demonstrate reliable working of the cellular automaton of the elastic net method. The advantages of the methods are: 9 simplicity of the algorithms; 9 fast and stable convergency; 9 high reconstruction efficiency. This work is partially supported by the Commission of the European Community within the framework of the EU-RUSSIA Collaboration under the ESPRIT contract P9282-ACTCS.

References [1] I. Kisel, V. Neskoromnyi and G. Ososkov, Applications of Neural Networks in Experimental Physics. Phys. Part. Nucl. 24 (6), November-December 1993, p. 657. [2] R. Arnold et al. (NEMO Collaboration), Performance of a Prototype Tracking Detector for Double Beta Decay Measurements. Nucl. Instr. and Meth. A354 (1995) 338. [3] W. Bertl et al. (MI~I Collaboration), Searching for Muonium-Antimuonium Oscillations. Proc. IV INS on Weak and Electromagnetic Interactions in Nuclei (WEIN'95), Osaka, ed. H- Ejiri et al., World Scientific Publ., Nov. 1995. [4] N. Chernov, A. Glazov, I. Kisel, E. Konotopskaya, S. Korenchenko and G. Ososkov, Track and Vertex Reconstruction in Discrete Detectors Using Chebyshev Metrics. Comp. Phys. Commun. 74 (1993) 217.

Proceedings IWISP '96," 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

105

R E C O G N I T I O N OF T R A C K S D E T E C T E D B Y D R I F T T U B E S IN A M A G N E T I C FIELD Baginyan S.A., Ososkov G.A. Joint Institut for Nuclear Research, Dubna, Russia. Abstract An algorithm of track recognition in an uniform magnetic field is proposed for the drift straw tube detecting system of solenoidal geometry. The problem solution is given for (x,y) plane perpendicular to the magnetic field. Our algorithm is elaborated on the basis of (1) sequential histogramming method, which is, in fact, a modifications of the Hough transform and (2) a modification of deformable template method following by a special procedure of parameter correction. Being tested on simulated events our method shows satisfactory efficiency and accuracy in determination of particle momenta.

Introduction The efficiency of the track reconstruction algorithm depends on reasonability of a clustering method applied to group measured data points into track candidates. As examples of such reasonable algorithms one can point out well known methods like variable slope histogramming or stringing (track following) methods [1], as well as relatively new approaches like Hopfield neural networks [2]. One of detector systems widely used in modern experiments of high energy physics (ATLAS, EVA/E850) are drift straw tube detectors (DSTD). Each time, when a passing particle track hits a tube, it registers two data: its own center coordinate and the drift radius, i.e. the drift distance between particle tracks and the anode wire situated in the center of this tube. The main problem, which hinders applications of above mentioned conventional track recognition methods, is so-called left-righ-ambiguity of drift radii. They don't contain the information about, on which side of anode wire the track was passed. In this report the algorithm of track recognition in an uniform magnetic field is proposed for the DSTD system of solenoidal geometry. A problem solution is given for (x,y) plane perpendicular to the magnetic field and anodes of drift straw tubes. Our algorithm is elaborated on the basis of modifications of the Hough transform and deformable template methods. However, the main futures of the proposed algorithm have the common character and are independent of the experimental setup geometry.

2

Formulation

of the Problem

The set S = {xi, yi; ri, i = 1,N}, where (x~, y~) are coordinates of the hitted tube centers and ri are drift radii, is the result of the event measurments. Geometrically the set S can be considered as the set of circles on the plain with centers (xi, yi) and radii ri. Thus the m a t h e m a t i c a l f o r m u l a t i o n of the p r o b l e m is to draw the track line as a circle (a, b, R) tangential to the maximum number of these little circles from S. Let us introduce, as a measure of two circle tangency on the plain, the minimum distance between crossing points of these circles with the straight line juncting centers of both circles. If two circles aretangential, their tangency measure is, obviously, equal to zero. Then our above formulated problem can be reformulated as the following" to find such a circle

106

(a, b,R) t h a t m i n i m i z e s t h e s u m of its t a n g e n c y m e a s u r e s w i t h all circles f r o m t h e set S. Let us denote by D{(a, b,R) the distance from the center of the circle (x~, y{; r~) to circle (a, b, R). This variable can take both positive and negative values. Therefore the tangency measure square of those two circles (xi, yi; r,) and (a, b; R) is twofold: if Di(a, b; R) > 0, then d:, - (Di(a, b; R) - ri) 2, otherwise d + - (Di(a, b; R) + r~) 2. As in [3] we define the two-dimensional vector ~'~ - (s+,sT) with admissible values (1, 0), (0, 1), (0, 0). ~'~ - (0, 0) means i-th tube is the noise tube and the combination ~'~ (1, 1) is forbidden. Let us denote by A the measurement error of the drift radius and define a functional L depending of five parameters (a, b, R, s~-, s+) 9 N i--1

Thus to recognize a track one has to: (1) from the set of all measurement extract a subset S, which as much as possible contains all data for one of tracks; (2) find the L global minimum (although it would be enough to reach its close vicinity). To solve the first problem we modify the Hough transform method [4], which we following to [5] call as the method of sequential histogramming by parameters (SHPM). Besides of extracting of a subset S SHPM provides also starting values of the circle (a0, b0; R0) needed to solve the problem on the next step. The second problem is solved by the deformable template method (DTM) with the special correction of parameters of obtained tracks.

3

Sequential histogramming method

Let ~ - {X~, Yi, i - 1, N} be a set of coordinates Xi, Y~ measured in the process of registering of an event. So-called sequential histogramming method [5] gives us the following algorithm for finding of initial track parameters" 1. Circles are drawn through all admissible point triplets. Then the first coordinate aj of each circle is histogrammed. The value am is obtained corresponding to the maximum of this histogram. 2. With the fixed a,~ circles are drawn through all admissible pair of points from Y/. Then the second coordinate bj of each circle is histogrammed. The value b,~ is obtained corresponding to the maximum of this second histogram. i

3. With the fixed coordinates of the center a,~, b,~ all admissible points Rj of the set are histogrammed. The value Rm is obtained corresponding to the maximum of this third histogram. Then the obtained parameters (am, bin; Rm) are subjected to more sophisticated tests and specifyings. If results are positive, i.e parameters (a,~, bin; Rm)are accepted as a true track, all measurements corresponding to it are eliminated from the set ~ and the whole procedure is repeated starting from the step 1. If the circle (a,~, b,~; R,~) is rejected by testing, then select next combination of parameters. In order to apply SHPM the results of measurements must have a format of the ~-set, i.e. to be a set of track point coordinates.. However, we have instead the set S of little circles {xi, y~; r~, i = 1,N}, so we have to determine on each of these circles a point associated

107

with some of tracks. Supposing the vertex area, from which all tracks of the given event are emanated, is known, one can roughly determine such a point, as a tangent point of the tangent line drawn to each little circle (xi, yi; ri) from the center of the vertex area. So, we have two possible track points. It would not restrain us in applying of the SHPM, but it should be kept in mind that the left-and-right uncertainty factor doubles the elements number of the set f~ = {Xi, Y/, i = 1,2N} in a comparison with the number of elements in the original set S = {xi, yi; ri, i = 1, N}.

4

Deformable template method

After obtaining by SHPM initial values of track parameters and choosing an area where this track could lie, we proceed to look for the global minimum of the functional L (1). One of the main problems here is how to avoid local minima of L provoked by the stepwise character of the vector ~'i - (s +, s~-) behaviour. One of known way to avoid this obstacle is the standard mean field theory (MFT) approach leads to the simulated annealing schedule [6]. As it was shown in [3], parameters s + n s~- of the functional L with fixed (a, b; R) can be calculated by the formulae, where the stepwise behaviour of the vector s'i is replaced in fact onto sigmoidal one. The L global minimum is calculated according to the following scheme: 1. Three temperature values are taken: high, middle and a temperature in a vicinity of zero, as well as three noise levels corresponding to them [3, 6]. 2. According to the simulated annealing schedule our scheme is started from the high temperature. With initial circle values (a0, b0; R0) parameters s +, s~- are calculated. 3. For obtained s +, s~- new circle parameters a, b; R are calculated by standard gradient descent method. 4. The ending rule is standard. 5. If the conditions of the step 4 are not satisfied, then with the new circle parameters (ak+l, bk+l,/i~k+l) next values of s +, s~- are again calculated and go to the step 3. 6. After converging the process with the given temperature, it is changed (system is cooled), values of (a, b, R) achieved with the previous temperature are taken as starting values and we go to the step 2 again. 7. With each temperature value after completing step 5 the condition L < Lc~t is tested. If it satisfied, then our scheme is completed and the algorithm proceeds the next stage of correcting of obtained track parameters (a, b, R). Otherwise, if with the temperature in a vicinity of zero we obtain L > Lc~t, then a diagnostic is provided that the track finding scheme is failed.

P r o c e d u r e of the track p a r a m e t e r correction Deformable template method provide us by track parameters (a, b; R). Hovever these parameters could appeare rather apart of the L global minimum. Therefore we have to elaborate an extra stage for the track parameter correction.

108 On each circle of the set S = {x~, yi; ri, i = 1, N} taking in account corresponding values of ~'i a point is found nearest to the track-candidate. Then all these points are approximated by a circle and X2 value is calculated as a criterion of their smoothness and fitness quality. If it is hold X2 < X~t, then the approximating parameters (ac, be; Re) are accepted as true. Otherwise the track-candidate is rejected.

6

Results

Proposed track finding algorithm of tracks detected by DSTD system in a magnetic field was tested on simulated events. 990 tracks have been modelled as circle arches with radii in the range from 40 cm to 5 m emanatying from a target under various angles. 955 tracks from 990 have been recognized correctly that means 96,4% of the algorithm efficiency. The distribution of the radius relative error shows that its mean and RMS are of the order 10 -2 of radii what is satisfactory for considered experimental setup.

References [1] H.Grote, Pattern recognition in High Energy Physics, CERN 81-03, 1981. [2]-C. Peterson, B. S5derberg, Int. J. Of Neural Syst. 1, 3 (1989). [3] S. Baginyan et al.,Application of deformable templates for recognizing tracks detected with high pressure drift tubes, JINR Commun. E10-94-328, Dubna, 1994. [4] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, (Wiley, New York, 1973) [5] Yu. A. Yatsunenko, Nucl. Instr. and Meth. A287 (1990), 422--430. [6] M. Ohlsson et al., Track Finding with Deformable Templates- The Elastic Arms Approach, LU TP 91-27.

Session D: ADAPTIVE SYSTEMS I: IDENTIFICATION AND M O D E L I N G

This Page Intentionally Left Blank

Proceedings IWISP '96," 4-7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

111

AN UNIFIED CONNECTIVE REPRESENTA.TION FOR LINEAR AND NONLINEAR DISCRETE-TIME SYSTEM IDENTIFICATION Jacques FANTINI L.A.M.A, Universit6 d'Orl6ans, E.S.E.M, rue L6onard de Vinci 45072 ORLEANS Cedex 2 Abstract System identification is the subject of much research and many articles propose often complex computation algorithms. Moreover, the models and identification methods used are different according to whether the real system is linear or nonlinear. This paper presents an identification methodology based on a single model deduced from Neural network theory. It measures and analyses the degree of precision obtained, and defines the influence parameters for convergence of network errors. I-- INTRODUCTION In order to regulate and control a real system, a mathematical representation is required which provides a satisfactory estimation of the process and which can be obtained by identification. The particularity of Nonlinear systems lies in the fact that the principle of superposition is not applicable. Thus the identification algorithms currently proposed are based either on approximation principles or on methods which can not be generalized. 2. ===.

DISCRETE-TIME REPRESENTATION AND CONNECTIVE MODELISATION

2.1 Linear systems Let [Y] = [ F ] [U] be the representation by transfer matrix of a multivariable system, defined as commandable and observable such that [YI the output vector of dimension ny, [U] the command vector of dimension nu, and

~b I zl fij = l= 1 , m < n, element ij of matrix F (ny x nu). The output yi is linked to command uj by the following m Z" ! Eal !=0 . . + Y~bluk.l, m j j characteristic polynomial of degree n . (1) y[ = EalY~,.i whereYri the value of yi and u rJ I=i

the value

ofu

j

in

1=1

the time range [rA, (r+ 1)A[, with A the sample period. i The determination of all of the coefficients a~ and b i of It] and of degree n of each characteristic polynomial is a necessary and sufficient condition for defining a satisfactory representation of the system. Given the Neural network in figure 2.1, with p = ~n, q = m2 and the transformation of variables (T) y~ = 1 I exp(yi~) ,U~ = ~1- ,/ ~exp(u~ )+ ) , bijections of the set of real numbers in [0 -~- [. "N"~,exp(y'r ) + The activation function of each neuron f..(x)=sh(x), an odd, ascending function,for which there exists a derivative, oo

X 2n

and which has the following properties: fact (x) = ~o (2n + 1)! = x + x~ with ~-~0 when x-~0. i

The output expression of the neuron r is O r = f0r) = sh(wr, tE[-t + Wr.t+lEk-t+l )" Given hypothesis (H1): sh(x)~,x, whose conditions and limits of validity are set out in section 4 ==> Or = (Wr.tEik.t + ::> (2)

/ i hy, [wy !,(21-1 )Yk-(21-1)

yi k

+ wy

q ,.2,Yl~-21]+ ,~ ha, (wu,,(21-1)U

i k-(21-1) +

i Wr, t + l E k . t + l )

wul,21U ik-2,)

1=1

From the identity (1)=(2) it follows that: i

ar =

hY~(r)WY~r)Yl~'r

i Yk-r

"b~-

hu ~(r) wu d(r) U Jk-r U~-r

, re[1,m] and 8(r)=Whole Part of ( r + 1~ k 2 /

2.._22 Nonlinear systems Let e(t)=eosincot be applied to the input e of a Nonlinear system. The output y is a non sinusoidal periodic function oo

decomposable into a Fourier transform y(t) = Z sisin(iro t + u ). Its discrete-time representation is: i=!

112

S(Z.I) _-- ~. S i [si__n_9'i_.+z"sin(._....~i_coT-q/i ] i=~

eoZ'lsinw

, and E(z "t) ffi

z 2 - 2z Icosco T + 1 YIk.l wyt,t

z 2 - 2z "lcosicoT + 1

yik. 2

hyl

yi

wyza

Yik. 4 yi k

Eik4

Yk

well,! :wel2,1

het

Eik.t i

/ha,

EJk.2

hez

EJ~_, jk-m

!,2

l~UIq,m

input

hidden layer

output

EJk-Z(

fifure 2.1

EJu.3

EJk.3}

The transfert function F(z a) = Y ( z l ) of the E(z "l ) nonlinear system becomes

wel,3 '2,3

~k~

3 ) - ' l b j i z "j F(Z "1) ~. ~ - - - - - - - - L J = ' 3

i=~ eosinoJT

~vell,2

~k-~

Y aj z j

j=0

The output y is linked to the command e by the following characteristic polynomial: Yk ffi

2

i=! e osincoT j9 - - !

b j,i e

~.j

a y

+ "=

J k.j

=

EJk-4~// inpul ~4

(3) ._.-

a multivariables system, each output y' is linked to the input e i by the same polynomial statement. F o r

we~,4

hidden layers

1~4

output

/ieure 2.2

Given the Neural network infigure 2. 2 with the same definitions enonced in 2...21,the output expression of the i neuron r is O r - f(Ir) = sh(wr, tEik.t + Wr,t+lEk.t+l). Hypothesis (Ill) ~ O r = (Wr, tEk.t + Wr,t+lEk.t+l) :::>

From the identity (3)=(4) it follows that"

ar

_.

"b r =eosinwT

hY~

Yk-r

he r

i we o~(r),rE k-r ek-r

2.3 Identification methodoloL~v

.....

The set of data y ik, discrete-time responses of a system subjected to the commands u ~ (e~), are known and define the information vectors of the Neural network. Thus, Vk>O and bounded, the training sample is defined

(i)

,--{

by the s couples X [ , y k with X k the output.

Yii

"

[k - 1, k -

m]/}the input vector of the network and y 'k

113

The weights wyt,r, WUm,r,wel,~, hy,, hui and hel determined by the training stage allow direct calculation of the characteristic polynomials coefficients. However, the quality of the identification performed will depend respectively on the behaviour of the learning error and the incidence of error generated by the approximation hypothesis. 3_ PROPAGATION OF THE APPROXIMATION ERROR 3.1 Expression of the approximation error of the activation function oo

x2n

fact (X) = 0~ (2n + 1)! = x + x~=~.G < exp(x) - ~ let G~0 for x--~0, x any variable treated by the Neural network. The application of Yk and Ek to the inputs of the Neural network makes it possible to reduce the numeric value of the variables of the system without modifying the identification results. Therefore, the transformation (T) with N a scale factor defining the adjustment parameters of all variables x of the network with e sufficiently weak to result in a satisfactory identification of the system. 3....22Expression of the approximation error in feedforward propagation For a neuron j of the hidden layer: Ij = Wj,rEk-r + Wj,~+lEk-<~+l),and Oj = Ij + Ijej [~oj = I j6j ], approximation error on Oj, with ~j < exp(Ij) - 1 p+q

p+q

For the output neuron: I= ~ w j O j = Y'.wj (Ij +Ijzj ) and S ik = I + 16 with e<exp(I)-I ==~ j=l

<

WjI j(exp(/~)- 1

j=l

with L-~0 when I1--~0, approximation error on S~.

Therefore, the approximation error propagated through the Neural network to output S~ tends towards 0 for variables of small dimensions. 3....33Expression of the approximation error in back-propagation The back-propagation algorithm results in the following equation modifying the weights

w j(r)= w j(r-1)-2e(r)(S i- yi)f:ct(i)Oj with e(r)the gradient step at stage r, Si output calculated by the p+q

p+q

network, yJ desired output, f:act derivative of the activation function, and I = ~ w jO J - :c w j=l

(xj,

j=l

P+q

Given that the approximations obtained are respectively Oj = Ij and I = ~ w jIj, the approximation error for j=! P+q

p+q

i=l

i=l

the update ofwj becomes: ~wj=(lj+ ljl~j)ch('W+V) - Ijch(W) with W = ~ w jIj and V = ~ w j l j s j , then I ~wl < llexp(W)[exp(L) - I] with L---~ when I1---~01 Therefore, the approximation error propagated during the training stage, for the updating of weights w~j, tends towards 0 for variables of small dimensions. 3.4 Generalization Hypothesis H1, approximation of the activation function, generates an error ~ such that its property of tending towards 0 is maintained both during feedforward propagation, and during the training stage. Thus, the error generated is linked to a controlled condition: the size of values y~ and th, treated. The following section will analyse the evolution and the incidence of this error on the identification of a real system.

4_ APPLICATION AND CONCLUSION 4...~1Presentation

Identification of a real system by Neural networks is the search for coefficients a j and b I of polynomial

i=i

i-nwT

j=

j,i ek-j +

--

ajy k-j" The analysis of the effect of the approximation error on the precision of

t h e identification result involves the study of two parameters, the scale factor N, and the dimension i of the Network. For each of these parameters, the end condition for the training stage relies on a satisfaction index (the quadratic sum of the errors of each output neuron) or, if this is not reached, on a maximum number of iterations.

114 4....22Scale factor influence Given N e [1 oo[, and the dimension i fixed, the effect of the approximation error and the behaviour of the neural network are analysed using the satisfaction index of the training stage (randomly initialising weights). The difference A between the response of the real system and the identified process subjected to the same requirements is equal to the quadratic sum of the errors of each sample and is studied in order to test the quality of the identification. Examination of these two parameters brings out a minimum for a specific value of scale factor N=N,t. (figure 4.1). The satisfactory result of the identification performed is confirmed by the value of A obtained at Nn~n, and by_the precision of the coefficients of the identified system calculated by the network compared to those of the real process (< 1%). 4....33Network dimension influence Given NfN.t., the satisfaction index and the difference A between the real and identified process are analyzed in terms of the dimension i of the Network. Each parameter tends towards 0 as the dimension increases (figure 4.2), therefore the precision of the identification increases with i.

The convergence of these parameters towards 0 is linked to a growing dimension of the training vector samples, and a richer topology in terms of neurons and network links, as dimension i increases. However, if the dimension is too high, the model obtained will not be very easy to use, and it will be difficult to evaluate the most significant parameter of the system. 4...~4Conclusion Identification of processes using this model produces a simple computation method which can be expanded to linear and nonlinear systems, and which obtains a satisfactory representation of the system. Moreover, this method allows for the evaluation of the precision of the model according to the dimension chosen, and is linked to a growing dimension of the training vector samples, and a richer topology in terms of neurons and network links, as degree n increases. Finally, use of this computation model simplifies the identification mecanism for multivariable systems, as it represents the extension of the identification of a monovariable sub-system by duplication of neural networks. REFERENCES

Pl

BEALE R. & JACKSON T.

[2]

Adam Hilger, Bristol Philadelphia and New York, 1990 KHANA T.

[6]

Neural Computing

Linear Systems PRENTICE HALL, 1980 [7]

KUCERA W.

[8]

Academia Praha, 1979 I J U N G L.

[9l

Prentice Hall, 1987 STOICA & PETRE

Foundations of Neural Networks

[3]

Reading, Addison-Wesley, 1990 DAYOFF J.E.

14l

Princeton, Van Nostrand Reinhold, 1990 WESSERMAN P.D.

Discrete linear control

Neural Network Architectures

System Identification: theory for the user

Neural Computing: Theory and Practice

is]

Princeton, Van Nostrand Reinhold, 1989 BROOMIIEAD D.S & LOWE U.

KAILATH T.

Identification of linear discrete-time systems using instrumental variable approach [10]

T-AC Oct 88 RUTKOWSKI & LESZEK

Multi-Variable Functional Interpolation and Adaptative Networks

Online identification of time-varying systems by nonparametric techniques

HMSO. RSRE report, April 1988

T-AC May 91

P r o c e e d i n g s I W I S P '96; 4 - 7 N o v e m b e r 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

115

Predicting a Chaotic Time Series Using a Dynamical Recurrent Neural Network Roberto

TERAN

Jean-Philippe

Gustavo C A L D E R O N

DRAYE

Davor

Gai~tan

PAVISIC

LIBERT

Parallel Information Processing Laboratory Facult6 Polytechnique de M o n s -- B 7 0 0 0 M o n s ( B E L G I U M ) Abstract.In this paper we present two kind of recurrent neural networks for time series forecasting. They both associate a time constant to each neuron. The first recurrent network is able to adapt its time constants (Dynamical Recurrent Neural Network), and the other one keeps all its time constants with a fixed value (Static Recurrent Neural Network). Our results, using a chaotic time series, show that adapting the time constants decrease the training time, and improves the prediction capability compared with static recurrent neural network. 1 Introduction Feedforward networks are able to perform many tasks well, and have several practical implementations in control, recognition problems and forecasting time series ([4], [11], [14]). However, they have their limitations for predicting nonlinear time series. Recurrent Neural Networks have appeared showing a better performance compared with traditional or feedforward networks ([2], [3], [7]), they are able to learn attractor dynamics, and they can store information for later use. Moreover, Dynamic Recurrent Neural Networks (DRNN) are able to enhance Static Recurrent Neural Networks (SRNN) capabilities, specially handling with time dependent problems or temporal tasks. The main difference between a DRNN and a SRNN is the fact that DRNN use a adaptive time constant associated to each neuron. These time constants act as a linear filter, and we can consider a DRNN as a FIR network (see [ 13]), but with recurrentconnections. SRNN have limited storage capabilities and they may be inappropriate to deal with confusing time series. DRNN ability are well known to handle with temporal processing and with nonlinear problems (see [ 1], [2], [5], [6], [7], [9]), they can recognize the underlying law governing the system by a system of nonlinear differential equations, and a chaotic system can be approximated by a set of nonlinear differential equations (see [9]). DRNN have more parameters than a SRNN, hence in order to implement dynamical systems with chaotic behaviour we may train the network using a proper algorithm. So we will train such network using a modified version of the standard backpropagation algorithm called Time Dependent Recurrent Backpropagation.

2 Network Dynamics We consider the dynamics of our network governed by the following equations:

Ti dyi dt

- - Yi + F ( x i )

dr"I i

(a)

and

xi =

wij y

Co)

(1)

J

where Yi represents the output or the activation level of the ith neuron, F is the transfer function defined by

1

xi represents the total input to neuron i, li is an external dynamical input to neuron i, w 0 is the weight F(x) --"zT_~, - 1+ e between j and i neurons, n is the total number of neurons, and Ti which is a time constant that acts like a relaxation process. Equation 1(a) and Equation l(b) define a general dynamic system. Using the appropriate training procedures the system will exhibit asymptotic behavior (i.e. fixpoints, temporal behaviour), such behaviour is desired to handle with chaotic systems. There are few training procedures that can reach this convergence, we have chosen Time Dependent Recurrent Backpropagation algorithm, which will be explained in the next section.

3 Time Dependent Recurrent Backpropagation Time Dependent Recurrent Backpropagation algorithm (TDRB) is an extension of the standard backpropagation algorithm. Standard backpropagation algorithm has been modified permitting us to yield a powerful algorithm which can be used with dynamical systems in an efficient way (see [10]). This algorithm allows DRNN to learn nonfixpoints attractors, produce desired temporal behaviour, and reach a stable state quickly.

116

Let us consider the following cost function which measures the deviation of the actual output y(t) from the desired output d(t):

E =-21 It' ( y ( t ) - d ( t ) ) 2 d t

(2)

o

where the values to and t l limit the time interval during the correction process happens. Let us now consider

OF. e ~ ( t ) - Oyi(t)

(3)

measuring the influence of an infinitesimal change of the output Yi at time t on the cost function if everything else is left unchanged. Now the training algorithm is obtained by deriving Equation 1 with respect to the various parameters. However, we first introduce some additional variables zi defined by the equation:

dzi 1 ~_~ 1 dt - Ti zi - ei - J ~_ _i j F '

(x~ )zj

(4)

where we use the boundary conditions zi(h)=O. Then the parameters correction process is carried out by using the following equations:

= w--

yiF'(xi)zflt

and

-

and

AT, =-p,

Zi

t

(5)

such that

0E Aw,~ =-po 0w,~

0E

(6)

where Po and Pi are constants which act as learning rates. This algorithm not only modifies the weights (Aw0), but also the time constants (ATi) associated with each neuron. Time constants improve the memory effect of the time delays and the non-linearity effect of the sigmoid function. Then, in order to speed up the convergence to the desired function, we have used a momentum term, and we have associated a learning rates to each neuron. A complete presentation of this algorithm can be found in [6].

Figure 1:1000 time points of a chaotic laser data.

4 The Data This series was obtained in a physics laboratory experiment and shows the chaotic intensity pulsations of an NH3 laser (see [12]). This series has been distributed in 1991-1992 by Santa Fe Institute Time Series Prediction and Analysis Competition. This time series presents a complicated behaviour and it is regarded as a low dimensional chaotic dynamical system. It presents pulsating cycles when the amplitude of the periods begins to grow larger and it presents collapsing cycles when the amplitude of the periods reduces its size. We do not know exactly when a collapsing cycle will occur (see Figure 1).

117

5 Experimental Results We have used two fully connected recurrent neural networks, each one with 20 neurons, and we have associated a time constant to each neuron. We use only one neuron as input and another one as output. All the neurons receive the input signal y(t), except the output neuron. Then, we have done two different experiments: In the first case, we allow the algorithm to modify the time constants of the recurrent neural network (DRNN). In the other case, we do not allow the algorithm to change the time constants values (SRNN), keeping all their values constants. In both cases as training set we use the 500 initial points of the data set, and a maximum value of 2000 iterations (epochs) for the training process. The training process consists on adapting network parameters producing a signal that approximates y(t+A) (The best results were obtained using A=6). This mean that we introduce to the network the first value of the series as input, and the desired output will be t h e 6 th value of the series, then we introduce the second value of the series, and the desired output will be the 7 th of the series, and so on. To estimate the total error value, we use the sum of errors as cost function: E T = ~_~ En where E is the Equation 2, and n is equal to the training set pattern (500 in our case). Thus, our goal is to minimize the value of function ET. We stop the training process when: We have reached the maximum number of epochs (2000), or if we have an error value (ET) small enough to get already good predictions. Afterwards, we freeze the weights for the DRNN and the time constants values of the network and then we use the next 500 points of the data set as the validation. The summarized results are showed in Table 1. Figure 2 represents the comparison between the desired output, and the predicted output of the best trained network. Note that the ideal output in both cases should be a straight line with a slope of 45 ~ We can see that subfigure 2(a) (DRNN) match better this straight line compared with subfigure 2(b) (SRNN).

Neural Network DRNN SRNN ....

Averaged Error 3.84 4.73

Best Error 3.42 4.11

Averaged Epochs 1010 1530

Table 1: Fist column describes the kind of neural network used. Second column is the averaged total error value, and the third column is the minimum total error value over 50 runs (Both errors are normalized). And, forth column represents the averaged number of epochs of the 50 different experiments before attaining convergence.

Figure 2. Actual predicted values 3(t + A) versus the desired predicted output o(t + A). Note that the ideal curve is a straight line with a slope of 45 ~

118

6 Conclusions In this paper, we show that the fact of adapting the time constants increases the prediction ability of a Dynamical Recurrent Neural Network (DRNN), using less time for the training process before attaining convergence, and without losing quality of the predicted values. Even though using a Static Recurrent Neural Networks (SRNN) (using a constant value for the time constants) reach good results in our tests, the fact of adapting the time constants rather than using a constant value outperform these results. Both, DRNN and SRNN can save and use last outputs as processing information for the network. Nevertheless, the effect of adapting the time constants improves the memory effect of the internal memory of the network, and enhances the non-linearity of the sigmoid functions. In this way we do not lose the information produced by the network, enhancing neural capability to handle with temporal task, and allowing to the network to understand better the dynamics of the chaotic series. Consequently, the network identifies the chaotic behavior of the series faster, and the quality of prediction is better.

References [ 1] Alex Aussem, Fionn Murtagh and Marc Sarazin, Dynamical Recurrent Neural Networks - Towards Environmental Time Series Prediction, Technical report, European Southern Observatory, 1994. [2] Y. Bengio and M. Gori and G. Soda and P. Frasconi, Recurrent Neural Networks for Adaptive Temporal Processing. Proc. of the 6th Italian Workshop on Parallel Architectures and Neural Networks, pages 85-117, 1993. [3] Y. Bengio and P. Simard and P. Frasconi. Learning Long-Term Dependencies with Gradient Descent is Difficult. IEEE Trans. on Neural Networks. 5(2): 157-166, 1994. (Special Issue Dynamic and Recurrent Neural Networks.). [4] David E. Rumelhart Bernard Widrow and Michael A. Lehr. Neural networks: Application in industry, business and science. Communications of the ACM, 37(3), March 1994. [5] Jerome T. Connor and L. E. Atlas. Recurrent Neural Networks and Time Series Prediction. IEEE Transactions on Neural Networks. 1994. [6] Jean-Philippe Draye and GaEtan Libert. Dynamics Recurrent Neural Networks: Theoretical aspects and optimization. Neural Networks World. June 1993. [7] John F. Kolen. Exploring the Computational Capabilities of Recurrent Neural Networks. PhD thesis, Ohio State University. 1994. [8] Alfredo Medio. Chaotic Dynamics. Cambridge University Press. 1992.

[9] Hiroyuki Mori and Toshiji Ogasawara. A Recurrent Neural Network for Short-Term Load Forecasting. IEEE Transactions on Neural Networks. 1993. [10] Barak A. Pearlmutter. Dynamic Recurrent Neural Network. PhD thesis, Carnegie Melon University, 1990. [ 11] Zaiyong Tang. Feed-forward Neural Nets as Models for Times Series Forecasting. Technical Report TR91-008, University of Florida. 1991. [12] U. Huebner, N.B. Abraham and C.O. Weiss. Dimension and entropies of a chaotic intensity pulsations in a single-mode far-infrared NH3 laser. Physics Review. 1989. [ 13] Eric A. Wan. Finite Impulse Response Neural Networks for Autoregressive Time Series Prediction. Proceedings of the NATO Advanced Workshop on Time Series Prediction and Analysis. May 1992. [14] Eric A. Wan. Time Series Prediction by Using a Connectionist Network with Internal Delays. Time Series Prediction - Forecasting the Future and Understanding the Past. 1993.

Proceedings IWISP '96; 4-7 November 1996," Manchester, U.K. B.G. Mertzios ans P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

119

A New Neural Network Structure For Modeling Non-linear Dynamical Systems Amir Hussain, John J.Soraghan*, Tariq S.Durrani* and Douglas R.Campbell Department of Electronic Engineering and Physics, University of Paisley, High St., Paisley, U.K. PAl 2BE *Signal Processing Division, University of Strathclyde, George St., Glasgow, U.K. G! 1XW. Abstract: In this paper a new two-layer linear-in-the-parameters feedforward network is presented termed the Functionally Expanded Neural Network (FENN). Its output error surface is shown to be uni-modal allowing high speed single run learning. It employs a least squares based learning algorithm for updating its output layer weights thereby alleviating the non-linear learning difficulties associated with conventional multi-layered neural networks. New nonlinear basis functions emulating other universal approximators, namely the sigmoidal, Gaussian, and polynomial-subset basis functions, have been proposed for inclusion in the network's single hidden layer. Both simulated chaotic (Mackey Glass time series) and real-world noisy, highly non-stationary (sunspot) time series data have been used to illustrate the superior modeling and prediction performance of the FENN compared with other recently reported, more complex feedforward and recurrent neural network based predictor models. 1. Introduction Over the past decade, there has been an increasing interest in the use of Artificial Neural Networks (ANNs) for solving complex real-world problems [ 1-11].This is mainly due to their ability to effectively deal with non-linearity, non-stationarity and non-Gaussianity [1]. The modeling and analysis of so-called chaotic processes has also recently attracted the attention of many researchers [5-12]. Deterministic chaos is characterized by an exponentional divergence of nearby trajectories [8]. For the problem of timeseries prediction, which is synonymous with modeling of the underlying physical mechanism responsible for its generation, there are two related consequences: Firstly, since the uncertainty of the prediction increases exponentionally with time, chaos precludes any long-term predictability. Secondly, on the other hand, itallowsforshort term predictability, that is, a random looking time series might have been produced by a deterministic system and actually be predictable in the short run. A prediction algorithm for chaotic systems thus has to capture the short term structure of the time series [9]. The short-range structure of chaotic behaviour can be captured by expressing the present value of the chaotic time series sample, as a function of the previous d values of the time series: y(k) = f (y(k-1) ..... y(k-d)), where the vector (y(k-1)...y(k-d)) lies in the d-dimensional state space [9]. An efficient method of fitting the non-linear function f(.) is to use a feed-forward neural network predictor with a single output [6-9] [11]; the inputs to the network being the observation vector (y(k-1)...y(k-d)). In real-world chaotic time-series processes, intrinsic noise will be present and the task of the neural network predictor will be to reconstruct .[(.) without modeling the noise. Two well known feedforward ANNs are the MLP and the RBF networks, both of which have been shown to be capable of forming an arbitrarily close approximation to any continuous non-linear mapping [3]. Consequently, both have to-date been successfully employed for approximating fl.) above [6-10][12]. However, the MLP has a highly non-linear in the parameters structure, and requires computationally expensive non-linear learning algorithms (such as the back-propagation) which are very slow and can converge to local minimum solutions [3] [5]. On the other hand, the RBF network has a linear in the parameters structure giving the relative advantages of ease of analysis and rapid learning. However, it suffers the drawback of requiring a prohibitively large number of basis functions to cover high dimensional input spaces [4]. The topology of the RBF can be considered to be very similar to that of a two-layered MLP. The primary difference between the two structures is in the nature of their basis functions; with the hidden layer nodes in the MLP employing sigmoidal basis functions (which are non-zero over an infinitely large region of the input space); whereas the basis functions in the RBF network cover only small localized regions. Hush [3] has recently shown that some problems such as functional approximation can be solved more efficiently with sigmoidal basis functions; while others such as classification problems are more amenable to localized (e.g. Gaussian) basis functions. This paper describes an interesting case which combines both these types of basis functions within a single neural network layer, so that the distinct universal approximating capabilities of both the MLP and the RBF networks can be employed. The idea is developed to yield a new linear-in the-parameters feedforward neural network termed the Functionally Expanded Neural Network (FENN). Its output error surface is shown to be uni-modal allowing high speed single run learning. A least squares based learning algorithm is employed for updating its output layer weights, and a general design strategy is also presented for specifying the type and number of basis functions within the network's single hidden layer, for an arbitrary number of network inputs. The new structure is shown to be highly efficient in the modeling of both simulated chaotic, and real world noisy time-series processes, and its performance is compared with other recently reported neural network models [10]. A simple pruning strategy based on an iterative pruning-retraining scheme coupled with model validity tests has been used to optimise the size of the new network, and shown to result in parsimonous FENN predictor models that consistently outperform the other techniques, both in terms of non-linear prediction ability and relative computational requirements. Two simulation examples are presented using the chaotic Mackey-Glass Equation, and real-world sunspot data.

2. The FENN Structure The complete two-layer FENN is shown in Figure 1. It comprises an input functional expander within its single hidden layer, and an output layer. The FENN functional expander performs a non-linear transformation which maps the input

120

space onto a new non-linear hidden space of increased dimension. The choice of basis functions to be employed in the functional expander has been discussed in [14] and is outlined in the design strategy below. The output layer of the FENN comprises a set of linear combiners. It is interesting to note that the RBF network with fixed non-linear hidden layer basis functions or centres (and widths) can be regarded as a FENN. The linear-in-the-parameters Volterra Neural Network (VNN) which employs a purely polynomial expansion of its inputs [ 16] can also be considered to be a special case of the FENN, in which the number of polynomial expansion terms grow exponentionally with increasing input dimensions. Hussain [ 14] has shown for a variety of non-linear dynamical system modeling applications that the nonlinear approximation ability of the FENN is significantly enhanced by employing a combination of non-linear basis functions emulating other universal approximators such as the squashing type sigmoidal, Gaussian and polynomialsubset activation functions.

2.1 Design Strategy For any number n of FENN inputs (all normalized to within the range (+ 1,-1)), expand the input vector [xl(k) ... x, (k)] using the following expansion model F(k)" F(k)= sum of the following (linear and non-linear) N components: 1. zero-order (dc) term (resulting in 1 term). 2. original input terms x~ .... x, (resulting in n terms). These terms will enable modeling of linear systems as well. 3. sine expansion of the n inputs, comprising sin(xi), sin(2xi) and sin(3xi) terms for i=1 ...... n (resulting in a total of 3n terms). These terms emulate the squashing type sigmoidal basis functions. 4. cosine expansion of the n inputs comprising cos(xi), cos(2xi) and cos(3xi) terms for i=1 ...... n) (resulting in a total of 3n terms). These functions emulate the Gaussian like basis functions of various widths. 5. product of each input with the sine and cosine functions of other inputs comprising xi sin(xj) and xi cos(xj) terms (for i ~:j, i,j= 1.... n) giving a total of 2n(n-1) terms. These cross terms emulate quadratic and sigmoidal type basis functions respectively. 6. Outer-product expansion of the n inputs resulting in a total of (P2 n + P3 n 4" . . . . . . 4" Pn-I n + 1) terms for n greater than two inputs, with Pm n= n! / (n-m)!m! where ! denotes factorial. Note that for n=2 inputs the outer-product expansion will result in 1 term, and for n=l there will be no outer product terms). These higher order outer-product terms can be considered to be a polynomial expansion of the inputs without the q-th power of the inputs [ 16]. Hence, in general for n inputs, the FENN functional expansion model F(k) will comprise a total of N = (1+ 2n 2 + 4n + ~ i = l n Pin ) terms. That is for n= 1, N=8 ; n=2, N=20 ; n=3, N=38 ; n=4, N=64 and so on. Note that the design procedure presented above provides a useful starting point. Nevertheless, the input expansion model of the FENN is extremely flexible in that, virtually any function of the input such as tanh(.), exp(.), etc. could also be employed. In practice some physical knowledge of the non-linear system to be identified can also be incorporated into the input expansion model. If no a priori system knowledge is available and a more enhanced FENN approximation to the underlying system is required (than that provided by use of the above expansion models), then additional higher order polynomial terms from the Volterra series expansion of the inputs can be included in the FENN input expansion model. Thus, the overall FENN structure can perform non-linear approximation by virtue of the input non-linear functional expander, and yet learning of its output layer weights is a linear problem. It is this latter characteristic of the FENN that provides the real motivation for exploiting its use in complex real-world non-linear dynamical system modeling applications.

2.2 Learning Algorithm: (1) Compute the i=1 ..... m FENN outputs at time k, as yi(k) = F T (k) Wi (k-l) where F(k) defines the [Nxl] hidden layer vector comprising the non-linear functional terms, and Wi (k-l) is the [Nxl ] weight vector of the i-th output node. (2) The output prediction error for each FENN output is: ei(k) = di (k) - Yi(k), where di (k) is the i-th desired output. The Mean Squared Error (MSE) is therefore (where E(.) denotes the expectation operator and T denotes matrix transpose): E(ei(k) 2) = E(di(k) 2) - 2Wi(k-1) T E(di(k)F(k) ) + Wi(k-1) T E(F(k)F(k) T ) Wi(k- 1) The corresponding minimum MSE (MMSE) for the FENN can thus be readily written as [ 15] (with superscript - 1 denoting matrix inverse): MMSE = E(di(k) 2) - E(di(k)F(k)) r E(F(k)F(k) r )-I E(di(k)F(k)) which includes the best linear (Wiener) MMSE for of F(k)=input vector [xl(k) ... x, (k)], that is, without a non-linear functional expansion of the inputs. The advantage of this particular FENN structure is that linear adaptive filter theory can be readily applied for on-line adaptation. The quadratic form of the above MSE expression shows that there will be no local minima and so fast and certain convergence may be obtained in practice by use of the following recursive weight updates: Update the FENN weights for each of the m outputs using the exponentionally weighted Recursive Least Squares (RLS) algorithm as follows: (3) Update the inverse of the correlation matrix of the input expansion vector: P(k) = [F(k)F(k)X] -t = 1/~. [ P(k- 1)- P(k- 1)F(k)F(k)Xp(k - 1)/{ ~,+F(k)Tp(k- 1)F(k) } ] where E is the forgetting factor ( < 1), which introduces exponentional weighting into past data. (4) Update the output layer weights for each output as: Wi(k) - Wi(k-1) + P(k) F(k) el(k) Numerically robust versions of the RLS can be used instead of the above [13][15]. The simpler Least Mean Squares (LMS) algorithm which is a stochastic gradient algorithm can also be used for updating the output layer weights as follows: Wi(k)=Wi(k-1) + ix ei(k)F(k) where ix controls the convergence rate. However, the rate of convergence of

121

the LMS algorithm is dependent on the spread of the eigenvalues of the input expansion model's autocorrelation matrix, E(F(k)F(k) r ), with a large eigenvalue spread dictating a significantly slower convergence rate [13]. On the other hand, the Least Squares criterion based RLS algorithm will converge more rapidly compared to the LMS but at the expense of an increased computational complexity, O(N2) compared to O(N). Various Fast RLS (FRLS) algorithms have also been recently proposed [ 15] to reduce the complexity of the RLS from O(N2) to O(N), and can also be readily applied to train the above FENN. Thus, once the full expansion model at the input layer of the FENN has been specified, the exponentionally weighted RLS algorithm can then be used to provide an efficient means for real time adaptation of the network weights. This will give the FENN a significant advantage over the multi-layered neural network structures such as the MLP in recursive identification applications.

3. Simulation Results 3.1 Modeling of Real Sunspots Following Tong [11 ], Weigend [9], Svaver [12] and McDonnel [10], we first trained a fully expanded (2,20)FENN (two inputs expanded into twenty functional terms) on the sunspot series for the years 1700-1920, and then evaluated the onestep predictions of the evolved (pruned) and trained (2,14)FENN network model on the sunspot series for test years 1921-1979. The FENN was pruned by employing an iterative pruning-re-training scheme to successively prune off the insignificant basis functions. Output error auto-correlation and chi-squared statistic based model validation tests were employed at each stage in order to validate the pruned FENN model. The average relative variance (arv) [ 10] achieved by the final non-linear FENN one-step predictor model on the test data is compared in Table 1 with other published results, where TAR denotes a Threshold Auto-Regressive Model, and SLP represents a Single-Layered Infinite Impulse Response (IIR) Perceptron. Our new Tong's [11] TAR Weigend's [9] Svaver's [12] McDonnel's [10] FENN (pruned) model MLP model MLP model SLP model No. of Parameters 14 16 43 16 23 arv( 1921-79) 0.288 0.377 0.436 0.432 0.467 Table 1: Test performance comparison of various single step predictor models on the sunspot test data (1921-1979) The optimally pruned and trained (2,14)FENN (14 term) one-step predictor model of the sunspot series (which satisfied all the correlation and chi-squared model validity tests) is illustrated below: ~(k) = 1.01y(k- 1) + 1.155y(k-2) + 1.23 lsin(y(k-2)) -2.06sin(2y(k- 1)) +1.984sin(2y(k-2)) - 1.139cos(y(k-2)) - 1.49cos(2y(k- 1)) +1.539sin(3y(k- 1)) - 1.096sin(3y(k-2)) -0.662y(k- 1)sin(y(k-2))-0.11 ly(k- 1)cos(y(k-2)) -2.778y(k-2)sin(y(k- 1)) -3.809y(k-2)cos(y(k-2)) +2.693 where ~(k) is the one-step FENN prediction of the current sunspot sample y(k), based on just the previous two sunspot time series samples [y(k-1) y(k-2)]. The FENN predictor model above also illustrates the relative contributions of the various proposed non-linear basis functions which are primarly responsible for the superior FENN performance over all other recently reported, more complex neural network models (all of which required information from at least the last 6 sunspot observations).

3.2 Modeling of simulated chaofic Mackey Glass Equation [10]: d(,}I(k))

ark)

=

0.2 l ~ o ( k ~

) o O.ly(k)

l +y (k.3o)

New pruned FENN McDonnel's [ 10] Recurrent IIR Perceptron Model Total no. of parameters 4 17 arv (1-step predictions) 0.0012 0.0025 arv (2-step predictions) 0.0016 0.0070 Table 2: arv performance comparison of one-step and two-step non-linear predictor models on a 500 sample test set. (Both models trained on a different 500 sample set). The final evolved one-step (2,4)FENN predictor model of the Mackey Glass time series is illustrated below: ~(k) = 1.99y(k- 1)- 0.88sin(y(k-2)) +0.74y(k- 1)sin(y(k-2)) + O.05y(k-2)sin(y(k- 1))

4. Conclusions In this paper a new feedforward two-layer neural network was presented termed the FENN. It can be considered to be a hybrid neural network incorporating to a variable extent, the combined modeling capabilities of the conventional MLP, RBF and VNN structures. A general design strategy was also presented. The linear in the parameters structure of the FENN enabled use of fast least squares based learning in the output layer. The use of an iterative pruning-retraining strategy coupled with model validation tests resulted in parsimonous FENN predictor models (comprising the most significant of the proposed non-linear basis functions). The final evolved FENN models outperformed other recently reported, more complex neural network models in the modeling of simulated chaotic and real-world noisy, nonstationary time series data, both in terms of non-linear prediction ability and relative computational requirements.

122

The use of the design strategy presented above for the FENN structure, together with the least squares based learning algorithm and pruning strategy have also consistently resulted [14] in highly efficient FENN predictor models of a variety of other complex, simulated and real-world non-linear time series processes including the chaotic logistic map, Henon map, Non-linear Auto-Regressive (NAR) time series, Single-Input Single-Output (SISO) and Multi-Input MultiOuput (MIMO) NAR with eXogenous inputs (NARX) processes; and real-world stock market data, real laser time series and actual speech signals. An added benefit of the new FENN is that the structures of the corresponding FENN predictor models may also provide useful insights into the physics of the underlying unknown non-linear system dynamics. 5. References [ 1] S.Haykin, Neural Networks expand Signal Processing's Horizons, Signal Processing Mag., pp.25-49, Mar' 1996. [2] A.Hussain, J.J.Soraghan, T.S.Durrani, Artificial Neural Networks for Array Processing, Proc. of IEE-IEEE Intern. Workshop on Natural Algorithms in Signal Processing, Vol.1, Chem!sford Essex, Nov' 1993. [3] D.R.Hush, B.G.Horne, Progress in Supervised Neural Networks: What's new since Lippmann, IEEE Signal Processing Magazine, pp.9-39, Jan' 1993. [4], D.A.White and D.A.Sofge (Eds.), Handbook of lntelligent Control: Neural Fuzzy, and Adaptive Approaches, NewYork: Van Nostrand Reinhold, 1992. [5] O.E.B.Nielsen, J.L.Jensen, W.S.Kendall, Eds., Networks and Chaos-Statistical and Probabilist(c Aspects, Chapman and Hall, 1994. [6] A. Lapedes and R.Farber, Non-linear signal processing using neural networks: predicion and system modeling, Technical Report LA-UR-87-2662, Los Alamos National Laboratory, 1987. [7] M. Casdagli, Non-linear Prediction of chaotic time series, Physica D, Vol.35, pp.335-356, 1989. [8] D.Lowe and A.R.Webb, Time series prediction by adaptive networks: A dynamical systems perspective, IEE Proc.F, Vo1.138, pp.17-25, 1991. [9] A.S.Weigend, D.E.Rumelhart, and B.A.Huberman, Predicting the future: A Connectionist approach, Technical Report Stanford-PDP-90-01, 1990. [ 10] J.R.McDonnel, and D.Wagen, Evolving Recurrent Perceptrons for Time-Series Modeling, IEEE Transactions on Neural Networks,Vol.5, no.I, pp.24-38, 1994. [ 11] M.B.Priestley, Non-linear and non-stationary time series analysis, Academic Press 1988. [ 12] C.Svaver, L.K.Hansen, J.Larsen, On design and development of tapped delay neural network architectures, IEEE Int. Conf. on Neural Networks, San Francisco, 1992. [13] B.Mulgrew, C.F.N.Cowan, Adaptive Filters and Equalizers, Kluwer Academic Pub., 1988. [ 14] A. Hussain, New Artificial Neural Network Architectures and Algorithms for Non-linear Dynamical System Modeling and Digital Communications Applications, Research Report, Signal Processing Division, University of Strathclyde, Glasgow, 1996. [ 15] H.Schutze, Z.Ren, Numerical characteristics of FRLS transversal adaptive algorithms-a comparative study, Signal Processing, pp.317-332, 1992. [ 16] P.J.W.Rayner, M.R.Lynch, A new connectionist model based on a non-linear adaptive filter, ICASSP, Glasgow'89

n - Inputs

x1

i 2

Xn

/

(Hidden Layer)

Functional Expansion Model

Weights,

m - Outputs

m

Y~

Ym

Figure 1" The Functionally Expanded Neural Network (FENN)

Proceedings IWISP '96," 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

123

A Neural Network for Moving Light Display Trajectory Prediction H. M. Lakany and G. M. Hayes

Department of Artificial Intelligence University of Edinburgh 5 Forrest Hill, Edinburgh, EH1 2QL Scotland-U.K. {hebaal, gmh}@aifh, ed. ac. uk

Abstract In this paper, a radial basis function (rbf) neural network is trained to estimate the positions of markers placed on the joints of a subject. The network was also used to predict the positions of markers that were self-occluded. Results shows that using rbf in this application is successful.

1

Introduction

Moving Light Displays (MLDs) are image sequences containing only selected points of a 3-D object in each frame. This technique has been used for studying human motion by placing passive markers or light bulbs on the joints of a subject - see fig.1. MLDs are also used for studying human motion perception, subject recognition and for human gait analysis, see for example, [4], [6] and [1], respectively.

Figure 1: A single frame of a walker having MLDs on joints - front view A computer system has been built for recognising walking persons based on MLD sequences [5]. As shown in fig. 2, in some frames one or more markers may be hidden behind another part

124

of the body. Self occlusion makes it difficult to correctly plot the trajectories of the motion of a joint and hence correctly analyse gait. In gait analysis labs where one cannot afford to miss the location of a marker at any instant, the problem of self-occlusion is overcome by placing a number of cameras - four or five - in such a way that at each instant each marker is seen by at least two cameras, which is evidently an expensive way to solve the problem. In [7], Taylor et al present an extrapolation technique based on a 2-D linear least squares approximation to predict the the position of each joint/marker in the frame and use a nearest neighbour criterion to choose among different possibilities if more than one marker is in the neighbourhood of the predicted position. In this paper, we propose a neural network trained to predict the position of the joint markers at different instants.

Figure 2: a sequence of frames of a walker having MLDs on joints - side view (sagittal plane)showing self-occlusions (a) arm occludes the right hip marker (b) hip and knee markers are visible (c) left knee marker is occluded

2

Algorithm

In the algorithm we present in this paper, a radial basis function neural network [2], [3] is trained to do the interpolation between the successive frames and use the trained network for predicting the positions of self-occluded markers. A pair of networks is trained for each joint - one for the x- and one for the y-coordinate motion. The inputs to the network are the time and the corresponding relative coordinates of the joint in a sequence of frames (coordinates are measured relative to a fixed marker on the body of the subject, e.g. shoulder marker). Several walks are recorded for the subject while walking at his/her normal speed. The training data of the network consists of a set of coordinates of the marker of a particular joint

125

and the corresponding time of several gait cycles from different walks. A gait cycle is defined as the time interval between two successive occurrences of one of the repetitive events of walking, e.g. between two right-heel strikes [8]. The testing data is a new set of coordinates in other gait cycles that has missing joint coordinates in one or more of the frames and it is the network's job to predict the position of the joint under consideration.

3

Results

A radial basis function network was trained to find the best fit of the trajectory for each marker. The trained network was used to predict the coordinates of the joint marker. Some points were intentionally occluded to test the robustness of the algorithm. The mean squared error (MSE) between the actual locations and the predicted position is calculated. Figure 3 shows the results of a typical experiment predicting the location of the hip marker which was occasionally occluded due to the arm movement in the sagittal (side view) plane. The positions of the markers at ,,~ 70 % and ,-~ 80 % of the gait cycle were intentionally occluded, yet as shown in the figure the network managed to predict the positions of x- and the y-coordinate with only one pixel difference- average Relative Error for the x-coordinate = 7.56 • 10 -3 and average Relative Error for y-coordinate = 2.37 x 10 -3 ~ . The network was trained by 15 cycles and was tested by 10 cycles. Other networks for predicting the location of the rest of the markers for both side view and frontal view were also trained and tested and showed similar results.

4

Conclusion

In this paper, we have shown that the use of radial basis function neural network is a promising method for prediction positions of joints of a walking subject and hence it can be used to estimate the positions of joints that are self-occluded during motion.

Acknowledgement Thanks to Mark Orr at the Centre of Cognitive Science - University of Edinburgh for his useful comments.

References [1] Wilhelm Braune and Otto Fischer. Der Gang des Menschen/The Human Gait. Springer Verlag, 1895-1904. Translated edition by P. Maquet and R. Furlong, 1987. [2] Tomaso Poggio & Girosi Federico. A theory of networks for approximation and learning. A.I. Memo 1140, Massachusetts Institute of Technology- AI Laboratory, July 1989. [3] Don R. Hush and Bill G. Horne. Progress in supervised neural networks. Processing Magazine, pages 8-39, January 1993.

IEEE Signal

[4] Gunnar Johansson. Visual motion perception. Scientific American, pages 76-88, November 1976.

126

Right Hip X-Motion Trajectory , ,

255 254 253 ~25~ .~.251

~o

o x

x

o

~

o

o

.

x

x

+ "t"

x 248 247 246 2450

100

Gait Cycle % Right Hip Y-Motion Trajectory ,

327 326

o

325

o

o

x

x

x

.~324

+ x

~ 323 :~322

~ 321 320 319 31g

3170

o

t

20

40

60

Gait Cycle %

80

100

Actual marker position M i s s i n g marker position (due to occlusion)

x

Predicted marker position

Figure 3i Actual and predicted positions of the right hip marker of a walking subject [5] H. M. Lakany, A. Birbilis, and G. M. Hayes. Recognising walkers using moving light displays. RP-811, Department of Artificial Intelligence, Univ. of Edinburgh, June 1996. [6] Richard F. Rashid. Towards a system for the interpretation of moving light displays. IEEE on Pattern Analysis and Machine Intelligence, PAMI-2(6):576-581, November 1980. [7] K. D. Taylor, F. M. Mottier, D. W. Simmons, W. Cohen, R. Pavlak, D. P. Cornell, and G.B. Hankins. An automated motion measurement system for clinical gait analysis. Journal of Biomechanics, Vol. 15(No. 7):505-516, 1982. [8] Michael Whittle. Gait Analysis : An Introduction. Butterworth-Heinemann Ltd, 1991.

Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

127

Recognizing Flow Pattern of Gas/Liquid Two-component Flow Using Fuzzy Logical Neural Network Peng Lihui, Zhang Baofen, Yao Danya, Xiong Zhijie Department of Automation, Tsinghua University, Beijing China, 100084 Abstract. This paper describes a new method based on fuzzy neural network which is used to recognize the two-component flow pattern. The paper discusses the structure of the fuzzy neural network, including the selection of the fuzzy logical rule and the training sets. An accelerated learning algorithm (adaptive backward propagation algorithm) is used to train the neural network to shorten its learning time. After computer simulation has been done, it is found that this new method can recognize four typical flow patterns existing in the gas/liquid two-component flow, which are stratified flow, annular flow, slug flow and bubble flow. Finally, Some results useful for the future work are also presented. Keywords: two-component flow, process tomography, flow pattern recognition, fuzzy neural network I. Introduction Two-component flow is very common in many industry areas such as power plants, steel factories and chemical manufactures, and the measurement about its parameters is very important to the relative research work. Because of its complex flowing states and property, it is very difficult to measure two-component flow using traditional detecting method and the measurement accuracy is usually much lower. This is unfavorable to industrial practice. After 1980's, a new detecting means has developed. That is process tomography, a technique for extracting spatial information about process parameters by using multiple sensors fixed around the process of interest. Process tomography can provide the cross section images which are valuable for the assessment of equipment designs and for the on-line monitoring of industrial processes, so it is very suitable for two-component flow measurement[ 1] [2] [3]. In the past few years, process tomography techniques have developed very rapidly. Many process tomography systems using different sensing techniques such as capacitance sensors, 2' -rays, X-rays, and acoustic method have been built and tested for various applications. Among many process tomography systems, electrical capacitance tomography(ECT) is one that is developed firstly because it is simple, non-intrusive, robust and cheap. Now, many process tomography research groups have developed their ECT systems. The ECT system built in UMIST has been used successfully for monitoring oil pipelines and pneumatic conveying pipelines[4]. Although much progress has been made in the research about ECT systems, there are still many shortcomings, one of which is image distortion caused by the "soft effect" of the capacitance transducer. While applying ECT system to two-component measurement, it is found that flow patterns inside the pipelines will affect image reconstruction, so it is very useful for improving image quality if the flow patterns have been known[5]. How to recognize the flow patterns of two-component flow is always a difficulty. Many experts have done much work using the methods of statistics and fuzzy mathematics, but each method has its limitation while applied to process tomography. Fuzzy logical neural network is chosen for its simplicity and high speed in our research work. 2. Principle Neural network is a new technique which is mainly used to solve nonlinear problems. Since its appearance, neural network techniques have developed significantly and have been applied in many areas such as signal processing, pattern recognition, control theory and so on. Recently, some experts also try to use neural network techniques in the research area of process tomography [6][7]. In our research project, an 8-electrode ECT system is used to monitor an oil/gas pipeline, and a fuzzy logical neural network is chosen to recognize the flow pattern inside the pipeline. There are four typical flow patterns existing in the oil/gas two-component flow, which are stratified flow, annular

128 flow, slug flow and bubble flow. Figure 1 illustrates the cross section image of these four flow patterns.

F/A

[al

[hi

[el

[d]

I

" continuous phase

] 9discrete phase

Figure 1. The four typical flow patterns existing in oil/gas pipeline (a) stratified flow (b) annular flow (c) slug flow (d) bubble flow Among many neural networks, the feedforward type is chosen for its simplicity structure in our research work. Figure 2 shows the structure of our fuzzy logical neural network.

x,

ol

>

I

P1

P2

I

Hide layer Figure 2. The structure of the fuzzy logical neural network In Figure 2, D 1 is the input data get from the transducer, for an 8-electrode capacitance transducer there are 28(N*(N-1)/2, N is the electrode number) dependent measured capacitance values. The function of the module P 1 is to make the input data fuzzy. The vector C(Cl, c2,... , Czs) is constructed by the 28 measured capacitance values, and the vector X ( x i , x 2 ,...,x2s ) stands for the fuzzy value of C, the fuzzy logical rule is as follows:

129

C,, - -

C i ~ Cir

C v - - el.

.,

=

0.1 0.25

f(x)-

x < 0.1 0.1 < x < 0.4

0.5

0.4 _< x < 0.6

0.75

0.6 < x < 0.9

0.9

(1)

x>0.9

where c m is the normalized capacitance, c~f and c~e denote the capacitance value of the full pipe(the pipe is full of oil) and the empty pipe(the pipe is full of gas) respectively which related to the jth capacitance, f (x) is the fuzzy logical function. The purpose of making input data fuzzy is to strengthen the pattern feature. In general, a fuzzy logical function divides the input data into "very small", "small", "middle", "big" and "very big". The fuzzy data is inputted into a feedforward neural network. The node number of the input layer is equal to the dependent measured data number, and the node number of the output layer is the same as that of the flow pattern. If needed, a hide layer can be added between the input layer and the output layer, and the node number of the hide layer can be chosen through experiment. In our network the number is equal to the electrode number of the transducer. The network output is sent to the procedure P2, according to the maximum likelihood criterion, the flow pattern which relative node output is the biggest is the recognition result D2. The mapping relationship between the input node and the output node of the neural network is showed in equation 2. m

net j - ~ wij 9x~ i=1

y j - S(net j )

(2)

1 S ( n e t j ) - l + e-,, o

where w ij is the linking weight between the ith input node and jth output node. Before the neural network can work properly, it must be trained. For this purpose, a training set must be chosen. In general, the training set should include enough typical samples. If the number of the samples is not enough, the recognition ability of the network alter training is limited. On the other hand, if the number of the samples is too big, the learning period will be very long. So, a suitable samples number is necessary in order that the network have a good recognition ability and the learning procedure is not very long. Figure 3 is the training set which is chosen in our research work. It includes 18 typical flow pattern samples.

130

Figure 3. The training set Usually, the neural network learning procedure is very time Consuming, so a good learning algorithm which can ensure the network is convergent and has a high speed at the same time must be ch0sen[8][9][10]. A learning algorithm which is called BP(Backward Propagation) learning algorithm is frequently used for its simplicity. An accelerated algorithm, adaptive backward propagation algorithm, is used to train the network in our research work in order to shorten the learning period[ 11]. Figure 4 indicates the learning procedure of our neural network.

131

Figure 4. Learning procedure In figure 4, the horizontal coordinate denotes the iterative counters, and each epoch includes 10 iterative period. The vertical coordinate stands for network output error. 3. Simulation results and discussion

Alter the network training is completed, several flow patterns is used to test the recognition ability of the fuzzy logical neural network. It is found that the network is capable of recognizing four typical flow patterns existing in the gas/liquid two-component flow, which are stratified flow, annular flow, slug flow and bubble flow. Table 1 -- 4 illustrates some results.

132

The fuzzy logical neural network makes an achievement in recognizing the flow pattems inside the oil/gas twocomponent flow pipelines, and this is very helpful to parameters measurement ofoi/gasl two-component flow. If the recognition ability of the network is not satisfactory, more samples can be added into the training set.

References

1. M.S. Beck and R.A. Williams, "Process tomography: a European innovation and its applications", Meas. Sci. Technol. Vol.7, No. 3, pp215-224, 1996. 2. T. Dyakowski, "Process tomography applied to multi-phase flow measurement", Meas. Sci. Technol. Vol.7, No. 3, pp343-353, 1996. 3. R. Abdul Rahim, R.G. Green, etc., "Further development ofa tomographic imaging system using optical fibers for pneumatic conveyors", Meas. Sci. Technol. Vol.7, No. 3, pp419-422, 1996. 4. W.Q. Yang, A.L. Stott and M.S. Beck, "Development of capacitance tomographic imaging systems for oil pipelinemeasurements", Rev.Sci.Instrum. Vol.66, No.8, pp4326-4332, 1995. 5. ~byvind Isaksen, "A review of reconstruction techniques for capacitance tomography", Meas. Sci.Technol. 7, pp325-337, 1996. 6. A.Y. Nooralahiyan, B.S. Hoyle and N.J. Bailey, "Pattern Association and FeatureExtraction in Electrical Capacitance Tomography", 1993, Proc.ECAPT,pp266-275. 7. D. Wetzlar, '~leural Network Solving the Inverse Problem of Electrical Impedance Tomography", 1993, Proc.ECAPT, pp275-284. 8. Y. Bengio, P. Simard and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult", 1EEE Transactions on Neural Networks, Vol. 5, No. 2, pp 157-166, 1994. 9. O. Nerrand, P. Rouseel-Ragot, D. Rubani, L. Personnaz and G. Dreyfus, "Training recurrent neural networks: Why and How? An illustration in dynamical process modeling", 1EEE Transactions on Neural Networks, Vol. 5, No. 2, pp178-184, 1994. 10. O. Olurotimi, "Recurrent neural network training with feedforward complexity", IEEE Transactions on Neural Networks, Vol. 5, No. 2, pp185-197,1994. 11. A.G. Parlors, B. Fernandez & A.F Atiya, "An accelerated learning algorithm for multilayer perceptron networks", 1EEE Transactions on Neural Networks, Vol. 5, No. 3, pp493-497,1994.

P r o c e e d i n g s I W I S P '96," 4- 7 N o v e m b e r 1996; Manchester, U.K.

B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

133

ADAPTIVE ALGORITHM TO SOLVE THE MIXTURE PROBLEM WITH A NEURAL NETWORKS METHODOLOGY

Pdrez R.M., Martinez P., Moreno J., Silva A., Aguilar P.L. Dpto. de Informfitica Escuela Polit~cnica Universidad de Extremadura Avenida de la Universidad s/n 10071 Cficeres SPAIN Tel: +34-27-257183 Fax: +34-27-257202 Email: [email protected] ABSTRACT. In this paper we present the development of a robust method for determining and quantificating components in a composite spectrum, assuming the patterns of the individual spectra belonging it are known in advance. The general solving method proposed in the present work is supported by a linear recurrent neural network based on the Hopfield model (HRNN). The neural model guarantees the convergence of this problem using the gradient method for minimizing errors. The HRNN has a reasonable computational cost and one of the most remarkable properties is its robutstness versus increasing levels of noise in the composite spectrum. Other interesting property is related with the possibility of implementing VLSI structures of low complexity amenable for supporting the algorithm, since only multiplications and additions are required for programming an inversion process.

1. I N T R O D U C T I O N . We consider the following problem (Mixture Problem): Assuming we are given the spectra of a number (K) of elements (Basic References), we must determine the unknown composition of a cocktail of the mentioned elements using a radiation spectrum obtained from this mixture. The goal of this paper is to explore the possibility of using a Neural Networks Methodology to give a reliable, robust and efficient solution to this problem, based on the inherent parallelism of neural networks. Adams et al. [ 1] studied this problem in the context of Surface Mineralogy Prospection, and Lawton [2] proposed conventional digital algorithms for its solution. These approaches are fairly slow because serial computation is implied. A method based on the Optical Neural Network was presented by Barnard and Cassasent [3]. To find the composition of a mixture knowing its spectrum and the spectrum of the possible components a quadratic cost function was minimized to find the optimal composition .The snag about this solution is that includes the constrains that all fractionsum the unity. The possibility of using the Multiple Regression Theory, granting an optimal solution in terms of uniqueness, has been developed by Diaz et al. [4]. This approach explore also the possibilities for improving the robustness of the proposed method, that consists of the use of the Pseudo-Inverse Matrix, supported by a Linear Associative Memory, built using the Pyle' s algorithm. The general solution method proposed here is based on the Hopfield Recurrent Neural Network (HRNN). It is a flexible, efficient and robust approach to solve the problem. The Gradient Method for minimizing errors is used to assure the convergence of the algorithm. The use of this model is fully justified since the spectrum formation in the Mixture Problem is a linear process essentially [5]. Other interesting properties, deduced from the algorithmic structure of this method, are related to the possibility of implementing VLSI structures of low complexity amenable to support the algorithm, since multiplications and additions are only required for programming an inversion process.

2. A D A P T A T 1 V E A L G O R I T H M . In order to describe the adaptative algorithm suggested in the present wvrk, it must be taken into account that a Composite S ~ x may be seen as a N-dimensional vector, built sorting the emission intensities associated to each energy channel vs. the channel number, where N is the total number of energy channels:

[

x = Xo,X,,...,XN_ ,

1"

X n >--0

0 < n <_N - 1

(1)

134

beingx,, the intensity measured as the number of photons whose energy is comprised in the n-th energy channel's interval. In this way, a Reference Spectrum is a spectrum of the same nature, but predtu:ed by an Individual Source. We denote these s p e e ~ as rk, with 0_
In a general sense, the set of Composite SpecUa is composed of all possible ~ that may be produced by a linear combination of all elements belonging to the Reference Set. When the Reference Set is composed by K linearly independent vectors, this would result in a K-dimensional Vector Space, integrated by all the vectors y given b)r.. y = R~ =

c,r,

(3)

i=0

where c is the Contribution Vector, defined as: c = [co,c , ..... cx_,] r

ck > 0

0 < k < K-1

(4)

and where every contribution ci is a function of the relative intensities of the Composite and Reference Spectra. Our goal is to estimate c assuming that R and y are known. For the mixtttre described by an estimation of c, called c', the difference between the measured ~ vector y and its re-constructed version y ': 8 = y - y'= y - ~ c ' , r,

(5)

i=O

s is called Estimation Error, which gives us a measure of how well the estimation of c has been accomplished. We use this error to optimize the estimation procedure by means of a Least Mean Square (LMS) minimization procedure. In relation to c, this procedure is lie to minimize the Measure Function F(c), defined as:

Ily- Rc'll"

F ( c ) = I1~11' =

<0

To solve this problem we use an iterative process, supported by the Linear HopfieM Minimization Procedure, that basically is a progressive refinement of the Contribution Vector: where we have used an estimation of the Gradient ofF(c) r ~

to c, given as:

v y(~) = -2R~[y- R~]

~s)

In compact notation, we may express (7) as: c § = Aq + P c where:

O<_i,j
w~ .

.

P = I - ARrR

q = R ry .

pc = - , ~ Z r rW i=/=j

.

.

.

.

.

p =I-,~ZII

p.o

.

p.o

q - - Z r y,

(9)

pffio

being 2, a parameter dependent on the trace of RTR, which controls the speed of convergence [6], and pgthe weight from the i-th node to thej-th one. This method requires only multiplication and addition operations to solve the Mixture Problem. The adaptative algorithm proposed may be described as follows: 1 Initialization steep:

Read R

2 New spectrum decomposed:

to

3 Repeat until cfcj+l[< 10-5

be

Read y

evaluate I~R

Adjust 2,

evaluate f l y

Evaluate P=I- 2 I~R

Evaluate c)+I=PCj+

3. E X P E R I M E N T A L R E S U L T S . The before stated algorithm was been widely simulated using a set of 1024 dimensional composite spectra obtained from 10 individual sources generated by gaussian compositions. Simulations results show that HRNN Algorithm is able to resolve Multi~ Mixtures at a reasonable computational cost [7,8]. Computational Cost = ( 2 K : + 2 K ) ( N + 1 ) + K z + N

-

(~n ~)

(10)

135

To evaluate the algorithm performance we use the ~ c

c:t

i=O

~=

Relative Error (QRE) as:

mir~ci

(11)

Ci * O

/

where c~ and c, are the known and the calculated contribution of the i-th component, respectively. Three sets of experiments were performed to measure the influence of different parameters:: - Level o f Noise in the Mixture Spectra - Proportion o f Elements in the Mixture. - Correlation between Componets.

The first set of experiments was concentrated in measuring the noise effects in the behaviour of the methtxt Each spectrum was contaminated with noise, adding an uniformly distributed random number to each energy channel. This produced a random variation in the energy bin value, by at most n% of its original value ( Noise Level ratio). Values used for n were comprised within the 2.5% - 35% interval. Fig I shows that in general, QRE increases with increasing NL ratios as could be expected. The iteration number used in each experiments does not show a clear dependency with the NL ratio. The results indicate a good performance in the presence of additive noise (when 35% noise was added to the ~ no noticeable degra_d_z__tionin the precision occurred). ,

0.4

,

,

;350

x

,

,

,

,

,

,

340

-"o.3 t~ | > ~ n.

330

13.2 320

s -lo 0 . 1 o

O~O'O--O/ I)---O.o...... j o ' O - -

9/

10 ~li

Fig.

Noise

~O

I

I

20

3D

3130

Level

I

I

l

5

10

15 16

1. INtumeecfthe%

l

l

213

t

25

313

35

Level

No|i 9

level n o i s e i n ~ e Q R E m d i n ~ e l t e ~ n u m b ~

The second set of experiments was intended to measure the behaviour of the algorithm for recognizing spectra of mixtures of two individual sources in different proportions. It may be seen that the procedure works reasonably well with ratios of 1:1000 completely irresoluble to the bare eye. Fig 2 shows the results in the degradation of the precision as a function of the relative proportions between the e ~ of mixtures of two components. In the worst cases a QRE of 1.2% was detected in a ratio of 1:1000. 0.08

,

0.05 L m L_ L. i) ._ .~ o

m bJ

0.04

9

10 -a

4-,

0

0.03

g~

e

n-

u .,,., a

0.02

"o

0.01

u o

0

0.Q0

-0.01

o,

u

~

n 0.00

Fig. 2 ~ c e

10. 4

9e

8

, 0.05

Proportion

9

Between

9

9

,

9 9

e9 9

9

99* *o o ;

9

L* 9 leo

9

, 0.10

component 9

~the pr~,r, ims intheQRE

Correlation

Between

Component-,

Fig. 3 Influmce~the C,xrd.~onbCwemccmponmtsinthe QRE

136

The third set of experiments studied the ability of the method to distinguish between two different ~ as a function of their relative correlation coefficient. The corresponding results may be seen in figure Fig.& .It seems that the QRE tends to be slightly incremented with higher correlation coeficients, although this tendency is not completely uniform.

4. S U M M A R Y A N D C O N C L U S I O N S . In this work, a recursive Neural Network has been introduced to solve the Mixture Problem, based on the HopfieM's of the Reference Set. Model. This model finds the composition of the mixture given the ~ The method seems to be more reliable and robust than other traditional methods based on peak analysis, which fail dramatically in these cases [1,2]. Simulation results show that HRNN has a reasonable computational cost, cubic cost. One of the most remarkable properties of the algorithm is its robusmess versus increasing levels of noise in the composite spectrum. This property is related with the iterative nature of the algorithm, that acts accmnulating average values for the magnitudes of interest, especially the decomposition error, thus filtering noisy components uncorrelated with reference ~ Another interesting property is its ability to resolve in quite uneven mixtures, with disproportion as high as 1:1000, where the bare eye can not detect any evidence of the spectnun present in a low proportion. Other interesting property, that is deduced from the algorithmic structure of the method, is related to the possibility of implementing VLSI structures of low complexity amenable for supporting the algorithm, since only multiplications and additions are required for programming an inversion process. The study conducting to a such implementation is already on the way. This method may be applied to many others problems in Spectroscopy, such as IR and Visible Spectrum Decomposition, with applications in Colorimetry, Remote Sensing of the Earth Surface, Environmental Control, an many others [9]

REFERENCES [1] Adams J., Johnson P., Smith M. and George T. "A Semi-Empirical Method for Analysis of the Reflectance Spectra of Binary Mineral Mixtures". Journal ofGeophysic Res., vol. 88, 1983, pp. 3557-3561. [2] Lawton W. and Martin M. 'The Advanced Mixture Problem-Principles and Algorithms".Technical Report 10M384, Jet Propulsion Laboratory, 1985. [3] Bamard E. and Cassasent P. "Optical Neural Net for Classifying Imaging Spectrometer"Applied Optics, vol. 18, n. 15, pp. 3129-3133, August 1989. [4] Diaz J.C., Aguayo P., G6mez P., Rodellar V. and Olmos P. "An Associative Memory to Solve the Mixture Problem in Composite Spectra". Proc. of the 35th Midwest Symposium on Circuits and System, Washington DC, pp. 422-428, August 1992. [5] P6rez tLM., Martinez P., Silva & and Aguilar EL. "Influence of the Fixed Point Format on the Accuracy of the Neural Network Solution to the Mixture Problem". Proc. of the 3rd Advanced Training Course: Mixed Design of lntegrated Circuits and Systems, pp. 413-418, Lodz (Poland), May 1996.. [6] Rodellar V., Hermida M., Diaz A., Lorenzo A., G6mez P., Aguayo P., Diaz J.C. and Newcomb 1L W. "A VLSI Arithmetic Unit for a Signal Processing Neural Network". Proc. of the 35th Midwest Symposium on Circuits and System, Washington DC, pp. 891-894, August 1992. [7] P&ez tLM., Martinez P., Martinez L., Diaz J.C., Rodellar V. and Gomez E "A Hoptield Neural Network to Solve the Mixture Problem".Proc. of the VI Spanish Symposium on Pattern Recognition and lmage Analysis, C6rddm,..pp. 744-745, April 1995. [8] P&ez 1LM. and Martinez P. "Validaci6n de una Metodologia Neuronal para una Cuantfficaci6n de Firmas E ~ e s " . P r o c . o f the V Reuni6n Cientifica de la Asociaci6n Espa~ola de Teledetecci6n, Valladolid,, pp. 146-147 September 1995. [9] P6rez 1LM., Martinez P., Silva/k, Aguilar P.L. "An Adaptative Solution to the Mixture Problem with Drift Spectra". Press of the ThirdInternational Conference on Signal Processing, Beijing, China, October, 1.996.

Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

137

PROCESS TREND ANALYSIS AND FUZZY REASONING IN FERMENTATION CONTROL

S. Kivikunnas , K. Ibatici

and E. Juuso

Department of Process Engineering, University of Oulu, PO Box 444, FIN-90571, Oulu, Finland Physics Department, University of Bologna, Via Irnerio 46, 1-40126, Bologna, Italy

ABSTRACT Manual supervision of fermentation processes relies heavily on visual detection of characteristic patterns of temporal changes in process variables. The automation of this experience-based reasoning conducted by operators would allow the redirection of their contribution to more profitable tasks, e.g. planning and scheduling of plant operations. In this paper, two different prototype applications for analysing temporal patterns of process trends of a continuous fermentation process variables are described. The first one is designed to operate with tuneable time windows utilising fuzzy reasoning to produce an early indication of the changes in process variable. The second one operates by comparing the first and second derivatives of the filtered process variable to patterns saved in an extendible shape library describing the trends at symbolic level. Tests suggested that both methods could be applicable to bioprocess control. However, the capabilities of present reasoning tools were discovered to be inadequate in combining symbolic and numeric information.

INTRODUCTION Lack of appropriate on-line measurements for some important variables is a serious practical problem in controlling fermentation processes. The normal solution to this class of problems is to estimate interesting variables from on-line measurements and analysis data available. Point estimates in fermentation environment, produced by any feasible technique applied, are normally noisy and very uncertain in nature. For slow processes like fermentation, temporal reasoning could become a very valuable tool to diagnose and control the process. Manual supervision relies heavily on visual monitoring of characteristic shapes of changes in process variables, especially their trends. Systems that are able to detect and analyse temporal shapes of trends from measured or estimated data could boost fermentation control systems performance remarkably. Pattern recognition approach to biotechnological processes has been seen as an emulation of human view of processes [3] and thereby suitable to serve in expert systems applied to process control. Of special interest are the methods that are capable to reason about the recent process history [2]. Although computational efficiency and practical implementation aspects are considered in papers dealing with trend analysis, we want to stress that methodological research and industrial case studies are still needed to bring pattern recognition-based methods to an industrial practice. Good starting point for applications could be simpler approaches where linear regression and moving

138 averages [4] or trend indicator based on difference of two moving averages of a process variable [5] are used for reasoning. In this paper, two applications of analysing temporal shapes of fermentation process variables are described. The first one is designed to operate with tuneable time windows producing two index values for inputs to a fuzzy logic system which gives the type of change in process variable as output, e.g. "started to wind up" or "is constantly increasing". The second one operates with the first and second derivatives of the FIR-Median Hybrid -filtered [ 1] process variable and compares the pattern of recent process history to given templates.

PROTOTYPE DESIGN AND IMPLEMENTATION

The first application was designed to serve as a trend operator in fuzzy control blocks. The method consists of calculating two index values from process data to be applied as input for fuzzy logic block. Index values are calculated as difference in short and long moving averages of recent process measurements and slope of the long time window. In Fig. 1 the principle of time windows division is shown. Different lengths for these short and long time windows can be selected depending on the process characteristics and usage of the system. The output of the rule-based system indicates whether the process variable is constant, changing linearly or having an accelerating behaviour. Five rules and conventional fuzzy logic construction with trapezoidal membership functions for input and output variables were implemented for testing.

Fig. 1. The division of the recent process measurement history to two time windows of different lengths.

The second, more complicated procedure, is based on a priori knowledge about meaningful shapes in recent process measurement history. In experimenting with the method proposed by [2], we found the behaviour of its function approximation stage unsatisfactory with noisy or corrupted data. In search for more robust smoothing method we found FIR-Median Hybrid filtering (FMH) technique introduced by [1 ] a good candidate for pretreating process measurements before template matching

139 procedure. Median filtering benefits from preserving sharp changes in signals and being effective in removing impulsive noise. In FMH filtering, sorting intrinsic to median search routines is replaced with linear averaging substructures, which gives a reduction of computational load. Both systems were implemented in MATLAB-environment and tested off-line with data obtained from continuous lactic acid fermentation experiments. The variable studied was the pH-control base consumption in continuous fermentation process. This variable is very rich in information because it promptly indicates changes in the productivity of the process.

RESULTS AND DISCUSSION The performance of the rule-based trend operator was evaluated by running the system off-line against real fermentation data and letting the process expert to judge the reasoning results. By tuning the length of the two time windows applied and by manipulating the fuzzy sets defined for indexes, it was possible to find appropriate detection sensitivity for the process in case. Because the system is not a stand-alone controller but an operator that could replace the derivative term of a PItype fuzzy controller in certain cases, this simple test gave only insight of tuning possibilities and difficulties. For final evaluation the operator will be implemented on-line connected to a conversion control system of the process. In the shape-analysing procedure FMH filter was used in two-pass manner, and the filtered data was used directly for shape analysis. Over a time window of approximately sixteen hours, nine symbols resulted the reasonable number of derivative signs to describe the real data without loosing relevant shape information. However, other time windows can be used, too. An example of the trend analysis procedure is shown in Fig. 2. The resulting shape information with degree of compatibility was utilised in a fuzzy rule-based diagnostic system with estimated conversion of the process. The use of plain fuzzy logic system for diagnostic purposes when both symbolic and numeric information served as antecedents was quite cumbersome.

Fig. 2. Example of the prototype system in use. Original data ( 'o' ), two-pass FIR-Median Hybrid filtered data ( ' x ' ) , and temporal pattern found with degree of compatibility (de) are shown. The variable considered here is the base consumption of a continuous fermentation process.

140 CONCLUSIONS In this paper we have presented two prototype applications for analysing temporal shapes of process variables. The methods were implemented real-time control requirements in mind and tested off-line with data obtained from continuous fermentation experiments. Test runs suggested that both could be applied to slow processes where sophisticated trend information processing is needed. The first application is a simple rule-based trend indicating procedure. This procedure is aimed to be used as a trend operator substructure in fuzzy logic controllers. Because additional inputs for fuzzy logic controllers make them more difficult to maintain, there has to be a specified benefit from utilising trend information. The shape analysing procedure looks especially promising in diagnosing abnormal process situations for which a priori knowledge is available from process specialists. In the future studies, performance of the both systems will be demonstrated on-line in conjunction with a PC-based control system of a fermentation pilot plant. Great emphasis will be put on developing intelligent control methods that could combine heterogeneous information in a feasible and maintainable way.

REFERENCES

[1] P. Heinonen and Y. Neuvo, "FIR-median hybrid filters," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP'35, pp. 832-838, 1987. [2] K. B. Konstantinov and T. Yoshida, "Real-time qualitative analysis of the temporal shapes of (bio)process variables," AIChE Journal, vol. 38, pp. 1703-1715, 1992. [3] G. Locher, B. Sonnleitner, and A. Fiechter, "Pattern recognition: A useful tool in technological processes," Bioprocess Engineering, vol. 5, pp. 181-187, 1990. [4] P. J. Poirier and J. A. Meech, "Using fuzzy logic for on-line trend analysis," in Proc. 2nd IEEE Conf. on Control Applications, Vancouver, B.C., 1993, pp. 83-86. [5] F. Y. Thomasson, "Improved control of drum level for boilers with 'shrink' and 'swell' problems," Tappi Journal, vol. 71, pp. 65-71, 1988.

Proceedings IWISP '96; 4- 7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

141

Higher Order Cumulant Maximisation using Non-linear Hebbian and AntiHebbian Learning For Adaptive Blind Separation of Source Signals

Mark Girolami and Colin Fyfe Department of Computing and Information Systems, University of Paisley, United Kingdom. email" [email protected],[email protected] Abstract

We propose a novel nonlinear self-organising network which solely employs computationally simple hebbian and antihebbian learning in approximating a linear independent component analysis (ICA). The learning algorithms diagonalise the input data covariance matrix and approximate an orthogonal rotation which maximises the sum of fourth order cumulants, thus providing invariant separation of the input into the individual sub-components. We apply this network to linear mixtures of speech data, which is inherently non-stationary and positively kurtotic, there is no prior requirement for spatially whitened data. We show that the proposed network is capable of separating mixtures of speech, noise and signals with both platykurtic and leptokurtic distributions. Simulations are run on mixtures of three signals with differing higher order statistics, and mixtures of five voice traces; complete source separation is seen in all cases. 1. Introduction

The problem of multi-channel blind separation of sources and blind deconvolution occurs in many application areas of signal processing such as speech, radar, medical instrumentation, mobile telephone communication, and hearing aid devices. The problem is defined as the recovery of original source signals from a sensor output when the sensor receives an unknown mixture of the source signals. The 'Cocktail Party' problem is an example of blind separation, where a person can single out a specific speaker from a group speaking simultaneously. Another biologically inspired example is the ability of the olfactory bulb to discriminate a single scent from a received mixture. From a signal processing viewpoint, the reconstruction of the original signal when the received signal is the output of an unknown filter is an example of blind deconvolution. Blind separation of sources is an underdetermined problem and as such traditional adaptive techniques are unsuitable as the source signal statistics, as well as the mixing and transfer channels, are unknown. Techniques have been developed based on information theoretic criteria and higher order statistics (HOS); if a signal has independent components then the product of the marginal probability densities is equal to the signal probability density. Using the Kullback-Leibler divergence as a measure of independence, Comon [ 1] develops a series of contrast functions based on an Edgeworth expansion of the marginal densities and batch methods are used in their maximisation. Cardoso [2] utilises the invariant properties of cumulants under orthogonal transformations; he develops series updating algorithms based on the maximisation of the sum of the square of fourth order cumulants. Jutten and Herrault [3] were the first to develop a neural architecture and learning algorithm for blind separation; since then a number of variants on this architecture have appeared in the literature, Cichocki et al [4]. Bell and Sejnowski [5] developed a feedforward network and learning rule which minimises the mutual information at the output nodes; this yields excellent results for platykurtic signals such as speech, however, the matrix inversion required is a computational bottleneck and unrealistic from a DSP implementation viewpoint. Recently, Cichocki et al [4] have used the natural gradient descent algorithm which removes the matrix inversion requirement in Bell & Sejnowski's algorithm. The simple multiply and accumulate operations of hebbian and anti-hebbian learning are attractive for DSP hardware implementations, Karhunen and Joutsensalo [6] develop a number of nonlinear variants of principal component analysis (PCA) learning and show their utility in sinusoidal frequency estimation. All the networks and learning paradigms listed, with the exception of Bell and Sejnowski's network, require the input data to be spatially whitened, that is normalisation of the second order statistics, which requires initial pre-processing [1], [2], [3], [4], [6]. Bell & Sejnowski [9] report on improved speed of convergence of their algorithm when the incoming data has identity covariance. Suppression of lower order statistics is required for the algorithms to respond to HOS, data whitening is discussed in [7] within the context of exploratory projection pursuit (EPP).

142

2. Independent Component Analysis Let x be a variable in ~tN with a probability density function (pdf) Px (u). If the vector x has mutually independent components we can then write 8

- o

(1)

"=1

The Kullback-Leibler divergence gives a measure of the mutual information between the components of x. Approximating the marginal densities using an Edgeworth expansion (up to cumulant order 4) yields a measure of the mutual information or contrast between the components [1]. N

I( pz ) ~- J(Py ) - -~ Z {4K2i+ Ki,~, + 7K~2 - 6Ki~2K~i }

(2)

i=l

The term on the left hand side is the mutual information of the components of the vector z, where y=Mz, M being an orthogonal rotation. The first term on the right hand side is the negentropy of y, and is fully defined in [ 1]. The second term is the sum of squares of third and fourth order cumulants of y, it is clear that maximisation of this term will minimise the mutual information between the vector components and as such can be used as a contrast. Further simplifying assumptions, [1], based on the pdf symmetry and the multilinearity of cumulants reduces the contrast to the sum of squares of fourth order marginal cumulants. 2

i=1

It is noted that the sum of squares of fourth order cumulants is invariant under linear orthogonal rotation and so for a whitened two dimensional vector we can write

,- -(

)(

N=2

)

i=1

(

)

i=1

By then maximising (3), under orthogonal constraints, we can see that this will minimise the cross cumulant terms and so yield an approximation to independent components.

3. Network Architecture and Learning Figure 1 shows the network under consideration, it has full lateral and feedforward / feedback connections. The input lateral connections are a variant of Foldiaks second linear model [10], the output lateral connections are similar however the neuron activation is nonlinear. The feedforward section is an exploratory projection pursuit network [7] and has been used with limited success for ICA in [8]. U

W

V

x~

Y~

X~

Y2

X3

i

-

Y3

Figure 1" Fully Connected Network

i,

Consider zero mean source signals s mixed by the unknown linear matrix A the received signals are then, in matrix format, x=As. The output of the first layer of neurons is given as z, and so with the linear lateral connections at the input (4) z - [ I + U ] x - UIX

9 We wish the input z to have an identity covariance, which is inline with the derivation of ICA in [1] and also allows the feedforward section to respond to the higher order moments.

We derive the following learning rule for the input weights by minimising the distance of the data covariance matrix

from identity.

At; oc I - zz T

(5)

(I + U) - C 2

(6)

It can be shown that (5) will cause the weight matrix U to converge to the expression given in (6) which requires a positive definite input data covariance matrix C,.. The linear summation of the feedforward weights is r = WUI x and the final output of the network is given as y = Vl f(r) = ViWVI x - Vi~ (WVix)

(7)

where f ( r ) - r-cp (r) and ~ (r)= tanh(r), (7) can be approximated by y = VI f(r)= kVIwUIx • VI as the tanh nonlinearity will saturate to a value of 1 outwith the approximate linear region. The hebbian learning for the

143

feedforward weights has been detailed fully in [7] and can be considered as the approximate stochastic maximisation of an objective function under orthogonal constraints. The expression for the learning algorithm is given in (8).

AW -" }Tt(g)'(s){Uix-WWTUix}

- 77t(g)'(s)lz-WWTz}

(8)

The term ~' (s) is the derivative of the objective function to be maximised which in this case will be the value of the fourth order marginal cumulant of the network output. Girolami and Fyfe [8] use an EPP network for ICA, however the stochastic maximisation (8) does not generate sufficient HOS to ensure mutual information minimisation at the outputs. The addition of anti-hebbian lateral connections at the outputs of the network will yield the following using the learning of (5) AV---/z ( I - yyT) = g ( I - ( V I [Crr + ~9(r)~o (r)T ]ViT - V I [(p (r)rT + rq~ (r)T ]V:]) As (AV) --) 0 and VI _=I then

---_0 as C r y - I d u e t o (5) and w w T = I ==> <(p(r)r T +r(p (r) T -(p(r)cp(r)T) =_0 Tanh is an odd function and so taking a Taylor expansion

~q) 2k+l- ~k~-"~ 2k+l(/92m+l(ri2k+lr 9 2m+l 9 ) =0 k

m

as C,, ---I ::> (rirj.) = 0 V i ~ j r

(9)

Which is simply the minimisation of all cross cumulants of order four. The stochastic marginal cumulant maximisation of (8) under the cross cumulant minimisation constraint yields a more powerful ICA than standard EPP learning. 4. Simulations

The first simulation considers an unknown linear mixture of five speakers, the signals are presented to a 5x5 network. We show the development of the contrast during learning, along with the original, mixed and recovered signals. The mixing causes the value of the normalised fourth order cumulant of each signal to be reduced.

Figure 2 : Contrast Development During Learning and Signal Traces The input weight learning removes all second order correlation's, this continues for ten passes through the data. Once the second order terms are removed the feedforward weights start responding to the higher order statistics in the data as can be seen from figure 2. The feedforward weight changes are effectively rotating the input data in weight space to maximise the contrast defined in (3), with the maximal converged value being 98% of the original unmixed signal contrast. It is interesting to note that the output lateral weights change little during learning, however, the additional constraint of this higher order de-correlation at the output ensures a high degree of contrast maximisation and therefore independent separation.

144

The second simulation considers a mixture of signals which have fourth order cumulants of differing sign, that is signals with m-modal and bi-modal distributions. As the cumulants of these distributions will have positive and negative values the network nonlinearity used will be r~ • tanh(r~) with the choice of sign depending on the HOS of the original signals. We use the a priori knowledge that one of the sources is naturally occurring speech which will have positive kurtosis. The other signals are white noise and a fundamental tone which both have negative kurtosis. This suggests that the network activation should be chosen to match the sign of the original signals kurtosis. The results are similar with the contrast of (3) being stochastically maximised. Figure 3 shows the original distributions of the signals and the distributions of the mixed signals. It is apparent that the mixing causes the signal distributions to become more normal and as such the mixing removes the structure in the data. Complete separation of the outputs is given.

Figure 3 : Signal Distributions and Traces 5. Conclusions and Further Work We have introduced a neural implementation of Comon's ICA algorithm, the algorithm has been successfully applied to linear mixtures of speech as well as mixtures of speech, noise and low kurtosis signals. We are currently working on separation of convolved mixtures of signals and developing a method of dispensing with the a priori knowledge requirement of the signal statistics for the choice of nonlinearity. References

[ 1] Comon, P. Independent component analysis, a new concept ?. Signal Processing, 36, 287 - 314. 1994 [2] Cardoso, J,F. On the performance of orthogonal source separation algorithms. EUSIPCO-94, Edinburgh. 1994 [3] Jutten, C Herault, J. Blind Separation of Sources, Part 1: An Adaptive Algorithm Based On Neuromimetic Architecture in Signal Processing 24. 1991. [4] Cichocki, A Amari, S Yang, H. Recurrent Neural Networks for Blind Separation of Sources. International Symposium on Nonlinear Theory andApplications Vol 1. 37 -42. 1995 [5] Bell, A and Sejnowski, T. An information maximisation approach to blind separation and blind deconvolution. Neural Computation 7, 1129- 1159, 1995. [6] Karhunen, J, Wang, L and Joutsensalo, J. Neural estimation of basis vectors in independent component analysis. International Conference on Artificial Neural Networks, Vol 1 317 - 322. 1995. [7] Fyfe, C and Baddeley, R. Non-linear data structure extraction using simple hebbian networks. Biological Cybernetics, 72(6):533-541, 1995. [8] Girolami, M and Fyfe, C. Blind Separation Of Sources Using Exploratory Projection Pursuit Networks. International Conference on the Engineering Applications of Neural Networks, ISBN 952-90-7517-0, 249 - 252, 1996. [9] Bell, A and Sejnowski, T. Fast blind separation based on information theory. International symposium on nonlinear theory and applications Vol. 1, 43 - 47, 1995. [10] Foldiak P. Adaptive network for optimal linear feature extraction. IEEE/INNS International joint conference on neural networks. 1 (pp 401- 405), 1989.

Session E:

PATTERN/OBJECT RECOGNITION

This Page Intentionally Left Blank

Proceedings IWISP '96; 4- 7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

147

A Robot Vision System for Object Recognition and Work Piece Location* W a n g M i n Dai Q i z h i W a n g Jun Department of Automatic Control Engineering Huazhong University of Science & Technology Wuhan, Hubei 430074 P. R. China

Abstract This paper presents a practical robot vision system which gathers TV signals for robot assembly

systems. Using 2D binary-state image processing the invariant features of the object are calculated and the object is recognised and located The experiment results show that this vision system can provide the necessary information for industrial robot assembly tasks. 1. Introduction Following the development of the robot techniques, the robot vision has improved very fast. Especially it is a very important part in the applications of industrial robot assembly tasks. A typically practical performance is the part recognition and location. For example, grasping parts from the conveyer and assembling them in random by robot manipulators need to tell robots the relative information of the position, orientation and species of the objects and control the robot hand to finish the above tasks. In order to realize object recognition and work piece location we present a robot scene vision system. In this system the video signals are taken from an industrial camera device f'trst, then the TV signals are translated into the binary signals by sampling and digital signals by A/D translator. A single-chip microcomputer system performs the pre-processing including extracting the invariant features from the object image and taking the feature parameters. The object recognition using model feature parameter library and the work piece location calculating are performed in a PC system which can provide the necessary information of the object position and orientation to the robot control system.

2. Principle and Structure In industrial applications an object is usually recognised by its features, such as the edges, the geometrical centre position and the configuration of the object. The ftrst step is usually to classify the objects. We select a set of typical objects as models. The feature variables of the selected objects compose a n-order vector space, where n is the number of feature variables. A set of the feature variable data of an object composes a feature vector which represents a point in the feature space tl]. Based on the model feature vector positions located in the vector space the feature space can be divided to several areas responding to different species. Each area is called a subspace and represents a species of objects. The features of the object will be extracted and the feature vector position will be calculated and judged in which area the sub-space is located. The species represented by this area is the species of the object. This recognition method depends on the selection of the models. So the models should be representive and the number of the models is big enough to distribute the sub-spaces equally in the whole feature space. The total structure of the system can be shown in Figure 1. The complete TV signals are converted to binarystate signals first. Then the noise-eliminating is used and the feature parameters are extracted. In the end pattern match is going on. A singlechip microcomputer achieves the image I monitor ] I digital feature II sampling and signal preprocessing which ~Con~~le ~ image signal paramete~ can reduce the load of the higher level I .I ~o9~ 1-~ ~ I [~ camera r samplingl 7 single-~hip[ l-l computer computer, so that the total speed of the I iI position system will be more quick. The higher Video I binary image i recognition level computer system, the PC system, signal Ii extracting i calibration then accomplishes recognition and location for the objects. Figure 1 The total structure chart of the vision system |

3. Image Pre-processing The image pre-processing contains two parts: one is to improve the image quality; the other is to calculate the geometric parameters of the work piece. Both of these two parts are the base of the following recognition. 9 This paper is supported by the Chinese National Science Fund ( CNSF ).

148 3.1 Image Smoothing There are three kinds of noises in the object image. The first is the stochastic discrete noise or so-called saltlike-noise; the second is the spot-noise due to unequal luminance or reflection; the third is line-noise caused by power disturbance. There are several ways to eliminate these noises, such as the super-quadrant smoothing method , the multi-image equal method, the Boolean algorithm and so on. In this system we adopted mathematical morphology method t21 and used erosion and expansion algorithms to eliminate noises. As to a binary-state image A, selecting a structure dement B, if A is first eroded and then expanded by B, that is an open algorithm operating on A by B. The result is called A opened by B and we define the open algorithm as A o B = (A (~)B) 09 B (1) By open algorithm, we can eliminate the noises of Y the distributed points and burrs. If A is first expanded and then eroded by B, that is a close algorithm operating on A by B. The result is called A closed by B and we define the close algorithm as A 9 B = (A (9 B) (Z) a (2) By close algorithm, we can connect the two adjacent objects and fill the white spot in a " black" object. The fmal result can be written as C = ( ( ( A (S) B 1) @ B1) (9 B2) (Z) B2

I

2

!

.t

_t

-2 -'- 0

1

_L [

-- 2

~.

x

,k J, J- 1

-4

z. 9 3

"0

2

-3

(a) B1

(b) B2

9 .-'~ 4 x

(3)

Where A is the original image, C is the smoothed image, B1 and B2 are the structure elements of the open and close algorithms shown in Figure 2.

Figure 2 The structure elements Using cross-over structure the four directions of up, down, left and right of the image can be smoothed, the burrs with the size up to 5 x 5 image elements can be eliminated and the white spots with the size up to 9 x 7 image elements can be filled. 3.2 Feature parameter extracting In binary-state image the pixel value f(x,y) is "0" or "1", "1" represents the object and "0" is the background. Then the smoothed image can be edge-tracked and the image contour can be extracted, as well as we calculate the following parameters: (1) circumference P, that is the point number in the image contour; (2) area S, that is the point number whose pixel value is "1"; (3) quadrature of every rank mpq; (4) the shape centre (x, ~ ) " (5)

the maximum radius Rm=, the minimum radius Rmm, and the average radius P~vg; (6) recognising whether an hole is existed or not. In the last calculation we shrink the object image first. If the final result is a point it means that there is no hole in the object. If the final result is a circle it means that there is an hole. In the case of an hole existed, we can obtain the hole parameters by the steps from (1) to (5) as mentioned above. 4. M o d e l R e c o g n i t i o n , P o s i t i o n a n d C a l i b r a t i o n

4.1 Recognition We need to solve three questions for the model recognition: (1) selecting the feature available to classify the object; (2) acquiring the features of the models ahead; (3) making an effective method in which we can judge the area of the feature vector in the space. According to references [3] and [4], we select the following features in our system: (1) the feature of the first invariant quadrature, ~---q2o+~1o2; (2) the feature of the second invariant quadrature, H2=(TI~-TIo2)2+4(TI.)2; (3) the feature of the third invariant quadrature, H3=(TI~-3TI~2)2+(3TI21+~lo3)2; (4) the feature of the fourth invariant quadrature, H,=013o+Tl~2)2+(Ti2~+Ti03)2; (5) the complexity C = p2 / S; (6) the ratio of the maximum and minimum radius, R 1 = Rm~ / Rm~,; (7) the ratio of the average and minimum radius, R 2 = P~vg/R~m, (8) the flag representing the hole state, K = 1 ( an hole exists ). In fact,

149

rleq - geq /[gYo + 11"

(4)

y = ( p + q) / 2

(5)

~'~Pq -- I 2 I 2 (X -- 7 ) P (y - Y)q f (x, y ) d x d y

mpq = ~

~

(6)

xe Y ' f (x, y ) d x d y

m x = m~0/mo0; Y = m01/m00 Where, mpq represents the origin quadrature of p+q rank,

(7) ~tpqrepresents the centre quadrature of p+q rank, ripq

represents the nomalized centre quadrature of p+q rank. After discretion we have

g Pq "-- Z

Z

x

y

(X -- "X)P (y - y)q f ( x ,

y)

(8)

mpq = Z Z xPYq f (x,Y )

(9)

x y We suppose that there is one hole at most in the object to be recognised. If the hole exists the complexity C, the radius ratio R 1 and R2 are all relative to the hole panel. In this case we add an extra feature to the system: (9) the distance from the hole centre to the shape centre D. Selecting the above features is based on the consideration of the invariance of the rotation, translation and proportion and can be able to distinguish itself from others. The process acquiring the features of the model is the exercising process. The visual system extracts several groups of features when placing the model on different positions and orientations. The feature vector consisting of the average values of these group features can represent the model. A set of the selected model feature vectors compose the model feature lab. The feature vector of the object is compared with the model features first and then calculate the weightsummed Euclidean distances between the object and the models. The species of the model which has the smallest distance is the object species according to the nearest-classified principle. The distance is dm2 =

~~ =

P,,,j - P oj

wj

(10)

P mj

where P m~ is the jth feature of the model, poj is the jth feature of the object, n is the dimension of the feature vector, wj is the calculating weight of the jth feature and d m represents the distance between the object and the mth model. 4.2 Position Position includes the determination of the centre position and the rotary orientation of the object. The centre position can be calculated regarding to equation (7). The rotary orientation can be represented by the angle 0 between the inertial main axis and the x axis as shown in Figure 3 ( the counter clockwise rotation is positive ). The T h e inertial axis,,"" / value of the angle0 can be solved as following" I

r 2 +

g20

-

g02

r -

1 =

(11)

0

where, r = tg 0 9When g ~1 ~ 0 , the above equation has two roots: r~ =

- b + "~/b 2 + 4 2

'

r2 =

- b - -~lb 2 + 4 2

where, b = l.t 2o - l.t o2 . When g~l > O, let tg 0

= r, ,

1/11

~tl~ < O,let tg 0

= r 2 . W h e n lXll = O , t h e a b o v e

equation (11) is not applicable, that means that there are two or more symmetric axes in the object: (1) When there are more then two axes,

' I sI

Fig.3 The rotary orientation of the object

g o2 = g 2o, the shapes are the square, circle, polygon and so on.

Considering that the directive information should be applied to the robot hand, we choose the normal direction of the shortest radius vector as the robot grasping direction. (2) When there are two axes, g 0z ~ g z0, we choose the radial of the longest radius vector as the rotary orientation.

150

4.3 Calibration The calibration task is to determine the geometric relation of the camera and the robot coordinates. Referring to 2D image the calibration is performed in 2D coordinate. Suppose the visual sensor fame is xv-ovyv, the robot frame is xr-o~-yr, a point (x~,y~) in x~-o~-y~frame can be represented with (xr,y~) in x~-o~-y~frame. If the origin of the vision frame is (x0,Y0) in the robot frame, then we have

Ix l:Ic~

s,o l[ x l Ixol

Yr sintp C O S 9 PyY,, Yo Where, q~ is the rotation angle of the sensor frame relative to the robot frame in the counter clockwise, Px and py represent the unit length of a pixel Xv and yv orientation. We can get three matrix equations by replacing three different points, then the parameters of Px, Py, x0, Y0and q~ can be solved by the equations. 5. Experiment Results The smoothing effect results using the two different smoothing methods of the super-quadrant smoothing and the open-close algorithm are compared in Figure 4. As shown in Figure 4, (a) is the original digital image on which there are a lot of salt-like-noises and spot-noises bexause of the unequal reflection. (b) is the result processed by the super-quadrant smoothing method. The small random noises are eliminated, but the bigger spot noise exists yet. At the same time the image edges nearby the bigger spot are destroyed and a gap is made up. (c) is the result processed by the open-close algorithm smoothing method. All kinds of the noises are eliminated while the details of the image are kept well. As mentioned above we can see that the effect of the open-close algorithm method is better than the superquadrant method in binary-state image smoothing. However, the former scans the image four times, while the latter needs to scan the image just one time. The experiment results showed that the correct rate of recognition is over 95%, the accuracy of location is _+2 mm and the accuracy of the orientation angle is +_2 degrees.

(a)

(b)

(c)

Figure 4 The compared results of the image smoothing 6. Conclusion This paper presents the object recognition method based on the feature parameters, determines the invariant features of the models, and composes a robot vision system which integrates the binary-state image sampling, recognition and location. This system can be used in the robot assembly tasks as the scene vision. The experiment results show that this system is reliable to accomplish the object recognition and work piece location, as well as the system has the features of low costs, simple structure and easy realization. ReaUy the system has much to be improved, for example, the methods of the image smoothing and feature extracting need to discuss deeply. It is possible to adopt a part of hardware or image process chip to speed the system performance which may satisfy the real-time control requirement for high speed assembly tasks. References: [1] B.K.P.Hom. Robot Vision. The MIT Press, McGraw-Hill Book Company, 1986 [2] Tang Chengqing. The Method and Application of the Mathematical Morphology. The Science Press, 1990 [3] Yang Jingan, Zhang Daincheng. The Vision System Based on the Model Recognising the Complexit Object. The Pattern Recognization and Artificial Intelligence, Vol. 3, No. 2, 1990 [4] Zhou Ruiyu, Wang Dapei, Li Quanyi. A Simple Robot Assembly Experiment System Guided by Vision. The Robot, Vol.3, No.2, 1989

Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

151

Recognition of Objects and Their Direction of Moving Based on Sequence of Two-Dimensional Frames Bo~idar Poto~nik, Damjan Zazula Faculty of Electrical Engineering and Computer Science Smetanova 17, 2000 Maribor, Slovenia {bozo.potocnik, zazula} @uni-mb.si

Abstract We developed a new algorithm that belongs to the so-called differential methods, however, it represents a significant extension of the known analysis approaches. Our algorithm can perceive 4 types of object shifts: translatorical, object shifts on optical axis inwards and outwards, object rotations with regard to optical axis and, eventually, appearance of additive noise. We introduced a new approach in analysis of objects which are in occlusion (analytical optimization with respect to the MSE algorithm). The algorithm is very fast (time complexity is of order O(n:)). Our algorithm is a frame that can be very easily upgraded to the needs of real applications.

1. Introduction In our work, we deal with digital processing of a sequence of images from which we try to determine a moving object and trajectory of its movement. Recently, a few methods for movement analysis have been published. Sonka [5] described basic steps for movement analysis on optical flow basis and on significant point basis, J/ihne [3] attempted movement analysis with assistance of space-time images, etc. Because the result of these methods is a vector or matrix (movement field or displacement vector field), there is no possibility for accurate reconstruction of the trajectory of moving object. Various methods of movement analysis has been gathered in [5] and classified to different groups according to algorithms used. Basic steps of the algorithms may be employed also in determination of the movement trajectory. We developed a new algorithm that belongs to the so-called differential methods, however, it represents a significant extension of the known analysis approaches. The paper is organized as follows. In Section 2, we decribe the algorithm developed for movement analysis in detail, while the results and discussion follow in Section 3. Section 4 concludes the paper.

2. Analysis algorithm With our algorithm, we can analyse movement of one moving object in a sequence of gray value images. It can perceive 4 types of object shifts: translatorical, object shifts on optical axis inwards and outwards, object rotations with regard to optical axis and, eventually, appearance of additive noise. Our algorithm consists of the following steps: 1. The first step of the algorithm is binarisation of the sequence of images. Every image from the sequence is binarized with threshold operation using a global threshold. We determine the threshold for every image from the sequence extra, i.e., as a mean between minimum and maximum gray value in that image. The type of binarisation (or other preprocessing operations) can also be selected with respect to the image sequence, to be analysed (ultrasound, MR, CT or SAR images etc.). 2. Given a sequence of n binary images, the static background background(i,j) is established as follows:

152

/n

/

background(i, j) = 2 b k (i, j) div n,

(1)

k=l

where bk is the k-th binary image from the sequence and div stands for integer division. Equation (1) gives only an estimate of the real background. It is evident, that longer sequences produce better estimates. In sequences where the object is at least in one image in no occlusion with any static part of scene this estimate corresponds to actual image of the background (binary image). 3. Then, every image is subtracted the static background obtained (equation (1)), thus producing a sequence of dynamic images. Dynamic images comprise white areas where changes in gray values appear along with subsequent images in the sequence. This feature is used as a criterion for recognition of moving object in the following steps. 4. The moving object is defined as an object with the largest surface area in the dynamic images. This one is, afterwards, used as a praform (template). Polar histogram is constructed for subsequent comparisons. This criterion proved as a robust one, nevertheless it fails in case of only sligth movement throughout the entire sequence. 5. Now, all the flames with dynamic images are processed in order to find successive appearances of the moving object. This matching or searching is divided with respect to whether the object is partially hidden with another object or not. When there is no occlusion in images the procedure is straightforward. However, the occlusions introduce several problems [1, 6], like uncomplete or faulty object identification. We divided searching of the moving object position into two parts, each of the two variants composed of several steps: a. The object is in no occlusion with any static part of the scene: - Polar histogram is constructed for it (number of elements of polar histogram is selected in advance). Individual components are taken into quotients with components of the praform's histogram:

quot[i] - hist~176

+r~176 mod m] (2) histogramobje, [i] where m is number of elements of polar histogram, rotation is the shift index, and mod stands for modulus division. From Equation (2), it is obvious that the vector of quotients is to be calculated for every single rotation (number of rotations is equal to m). For the vector of quotients obtained, the mean and the variance are calculated. These two values play important role in determination of the type of shifts and rotation of the object. -

Rotating the praform, the position with minimum variance points out the most probable rotation of the processed object. At the same time, the mean of quotients corresponds to the object's scaling. -

b. The object is partially hidden: An extended area is formed in a separate flame containing the visible part of the object (from the dynamic image) and the static component, i.e. occlusion. This, newly composed region -

153 (composed object), is the basis for analysis in the following steps. - Centre of gravity is found for this composed area and a partial polar histogram is constructed for the uncovered part of the object. This calculated centre of gravity is the first estimate of our partially hidden object. The estimate becomes in case of very high occlusions rather unrealible. A partial vector of quotients is also computed for every single rotation of the praform (number of elements of polar histogram is not m anymore, but correspondingly lower). - In every rotational position, an analytical optimization with respect to the MSE algorithm for the differences of successive quotients is applied in order to reposition the centre of gravity. With this optimization we determine the final centre of gravity of moving object. - The centre of gravity calculated in the previous step is a basis for subsequent analysis. Partial variance in quotients, which was recalculated with new centre of gravity, is minimum at the most probable orientation of the object under the occlusion. The quotient mean is equivalent to the object scaling. 6. Centres of gravity discovered either way are finally bind into a trajectory of the moving object. Also the data on the object rotation and shifts on the optical axis are available. Besides, if the minimum variance in quotients at a certain frame exceeds a preselected threshold, the object is declared corrupted by additive noise. 3.

Results

and discussion

In Section 2, we described a new algorithm for analysis of movement in a sequence of gray value images. This algorithm was also implemented in C++ language for Windows and tested. An example is shown in Figure 1. In the sequel, the processing results of the image sequence from Figure 1 are shown as generated by our algorithm. First, we binarise every image, so we get a binary-image sequence (Figure 2). This sequence is used in determination of static background (Figure 3), which is obtained with Equation (1). Then, every image is subtracted the static background obtained, thus producing a sequence of dynamic images (Figure 4). From this sequence we recognize moving object with heuristic criterion (Figure 5). In Figure 6, we can see the final result of processing - an image of the trajectory reconstructed for moving obiect.

Figure 1: Testing a gray-value image sequence. Images of dimensions of 256x256 pixels have 256 gray-value levels. In this sequence, all possible object shifts which the algorithm can percieve (translatorical, object shifts on optical axis inwards and outwards, object rotations with regard to optical axis and appearance of additive noise) are present.

Figure 2: Binary-image sequence.

In the above example, we analysed a synthetic image sequence where our algorithm gives completly right results. But this is not the case for every image sequence. Our algorithm has particulary big

154 problems at sequences where the occlusion is very high. Experimenting, we also realized, that if the first estimate of the centre of gravity (step 5b in Section 2) was not close enough to the right value, then the optimization with respect to the MSE did correct the position of the centre of gravity, but this position was still faulty. Completly different problem arise at sequences where the object moves very slow through the subsequent images. In this cases we misidentify moving object (step 4 in Section 2). This problem can be solved in many ways, e.g. by coarse-to-fine strategy - we consider only every fifth image from the image sequence.

Figure 3: Static background image.

Figure 4: Dynamic-image sequence.

Figure 5: Image of moving object.

Figure 6: Image of reconstructed trajectory.

4. Conclusion In our work we presented a new algorithm for movement analysis in the image sequences. Algorithm is an extension to the differential methods of movement analysis. In its basic version, the algorithm is very simple and thus very fast (time complexity is of order O(n2)). It can be easily extended for concrete real applications.

References [ 1] E. Chamiak and D. McDermott, Introduction to artificial intelligence. Massachusetts: Addison Wesley, 1985, pp. 87-167. [2] F. van Heijden, Image based measurement systems. London: J. Wiley and Sons, 1994. [3] B. J~ihne, Digital image processing. Berlin: Springer-Verlag, 1993. [4] J.C. Russ, The image processing handbook. London: CRC Press, 1995. [5] M. Sonka, V. Hlavac, R. Boyle, Image processing, analysis and machine vision. London: Chapman and Hall, 1994. [6] P.H. Winston, Artificial intelligence. Massachusetts: Addison Wesley, 1984, pp. 335-384.

Proceedings IWISP '96; 4- 7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

155

Innovative Techniques for Recognition of Faces Based on Multiresolution Analysis and Morphological Filtering Anastasios Doulamis, Nicolas Tsapatsoulis, Nikolaos Doulamis and Stefanos Kollias Department of Electrical and Computer Engineering National Technical University of Athens Greece Heroon Polytechneiou 9 Zographou Tel.: +301 772-2491

e-mail: [email protected]

Abstract In this paper, we introduce two new methods for face recognition of frontal images. The methods combine the well known Karhunen-Loeve transform with morphological and subband analysis. The use of this kind of analysis contributes to better discrimination between different images. Morphological and subband approaches are compared while the former is a non linear method and the latter a linear one. The results, obtained using 100 test images, show that both approaches are quite efficient. However, the morphological technique seems to lead to slightly better results (5%, 12% error respectively) while the subband technique has the advantage of decreasing the complexity of the task.

1. Introduction The main purpose of a face recognition system is to find a person within a large database of faces (e.g. in a police database). Such a system typically returns a list of the most likely people in the database. However, there are applications in which we want to identify a particular person (e.g. in a security monitoring system) or to allow access to a group of people and deny access to all others (e.g. access to a computer). Some other applications, like speech recognition, better man-machine interaction or visual communication over telephone and other low bandwidth lines, use face identification as an auxiliary tool. So far, the best results for two-dimensional face recognition have been obtained by techniques based on either template matching [1] or matching eigenfaces [2]. The latter uses KL-transform and has the advantage of not requiring specialised hardware. Since this transform achieves the optimal energy compression, faces can be represented in a low dimensional space as a weighted linear combination of the eigenvectors of the autocorrelation matrix of face images. This enforces the mean square error between the representation of the face image and the original image to be minimal. This representation, although optimal in discriminating physical categories of faces, e.g. sex and race, is not optimal in recognising faces due to the details which are necessary to discriminate different faces [3]. In addition, there is no accurate method that verifies the given results of the identification algorithm in order to avoid false alarms (see sect. 4). Two alternative techniques are proposed in this article, so as to increase the efficiency of the discrimination task and to obtain more reliable results. These two techniques combine KL-transform with subband decomposition and morphological filtering respectively. Subband decomposition separates the original images into complementary frequency bands (e.g. Low-Low LL, Low-High LH and so on) for each of which we create a different KL base. Since the LL band contains the largest amount of information, we use the projection of a test image on this band to find a list of the most likely face images, in the database. The higher bands are used for verification if the confidence of the decision made on the LL base is poor. Thus it is feasible to achieve a correct identification using the details kept on the higher bands. Using morphological filtering we are able to change an image to another one with lower frequencies than the original (for example the morphological opening or closing). Therefore we use the result of these filters in the same way that we use the lower band of the subband analysis. The structuring element of the morphological operator was chosen after measurements for various test images have been made. The difference between the original image and the filtered one, projected on the respective base, is used to verify our results.

2. Subband decomposition In this section we describe the first approach which is based on a multiresolution scheme proposed in [4] (Fig.l). An image of resolution (MxN) is decomposed into four frequency complementary images of resolution ---~• ~ . Using this scheme we can create from the original face database four different databases. The KL transform on each of this databases is used to produce four different instances for each face image in the database. Actually, in our consideration, only two instances for each image are used, the instances related with the LL KL-base and the LH KL-

156

base. In the XLL image, which is the image containing the Low-Low spatial frequencies of the original X(m,n) image, most of the energy is accumulated. The respective LL KI~transform converges faster than that of the KL-transform taken on the original images. In addition the complexity of the computation is lower since the autocorrelation matrix of the LL images is of dimensions ( ~ x ff)x(-~- x if) instead of (MxN)x(MxN) in the original images. The LH KLtransform converges slower than the original so more KL coefficients must be kept. Since the LH images are images of details then are used only in the verification step. The proposed algorithm is described below: D e c o m p o s i t i o n step

Given an image Y(m,n) of dimensions (MxN)we create the images YLL, YLH, YHL, YHH USing the subband decomposition scheme shown in (Fig.l). Projection step

The YLL and YLH images are projected in the LL KL-base and LH KL-base respectively, and k, 1 coefficients are retained in each case. The numbers k, I were chosen after many simulations (see section 4.1, Table II). As a result of this step two vectors, related to image Y(m,n), of sizes k, I are created: y~, Yh.

M S E calculation step

For each LL instance, xil, in the database we calculate the MSE e i -- (Xil -- y/)r

(Xil

_ Yl )

and the emin = I~. (ei ) . 1

As potential instances of the image Y(m,n), in the database, we consider the instances whose MSE lie in the interval [emJn 2*%~]. If for only one instance the MSE lies in the previously stated interval, the confidence of the decision is high, and the instance with the minimal MSE is considered to be the prototype of the image Y(m,n) in the database. On the other hand, if more than 10% of the total instances in the database have MSE which lies within the interval, the confidence of the decision is considered inadequate and the image Y(m,n) is discarded without verification. If neither of the previous extreme cases occurs the verification step is needed. Verification step

For instances selected in the previous stage the error m i = (Xih - Yh ) r (Xih -- Yh ) is calculated (Xth is the LH instances of the databases). If the minimal error is lower than a threshold T, which is equal to 0.9*max(error of images which have a prototype in the database), then the instance with the minimal error is considered as the prototype of the image Y(m,n) in the database, otherwise the image Y(m,n) is considered to have no prototype. columns t XHH G, H: Perfect Reconstruction rows $ 1 ~ II 1 s 2l 2 Mirror Filters

~

[1+21

X(m,n)

[2s1[

Keep one column out of

I1, 1

Keep one row out of two

two

IH

~.q251~

-~G

I I 1+2l

XLH

152]

XL L

I

Figure 1. Subband decomposition scheme used to split an image X(m,n) into four frequency complementary images, XLL, XLH,XHL,XHH.

3. Morphological Analysis The goal of this section is to briefly describe morphological tools of interest for the face identification algorithm. A complete description of the mathematical morphology can be found in [7]. Let f(x) denote an input signal and M~ a window or flat structuring element of size n. Then the erosion and dilation caused by this flat element are given by : e , , ( f ) ( x ) = n f m { f ( x + y),y ~ g n ] and ~n(f)(x) = max({f(x- y),y ~ Mn} Two morphological filters can be defined from the above morphological operators, namely Opening and Closing. A morphological opening (closing) simplifies the original signal by removing the bright (dark) components that do not fit within the structuring element [7]. If it is required to remove both bright and dark elements, an opening-closing or closing-opening should be used. We also define the difference between the original signal and the signal after the morphological opening (closing). We should not confuse this difference with the morphological gradient which is given by subtracting the dilation from the erosion with a corresponding indicator of structuring M~ equal to 1. Based on the above morphological filters, we propose an innovative algorithm both for identification and verification. Fig.2 illustrates the mechanism which is used. As it can be seen in Fig.2, we firstly apply a morphological operator on each image of the database. Thus a new database is created which contains the filtered images. From this database we calculate the "opening KL-base" in which the filtered images are projected. Since the new images consist

157

of lower frequencies, it is expected that higher energy will accumulate in the first coefficients of K-L transform. Moreover for each face we calculate the difference between the original and the filtered images and we also create the "difference KL base". However the images of this database contain higher frequencies and thus more coefficients are needed to accumulate the same energy as the original one. As a result this database can be used only for verification purposes. If the confidence of the decision is poor (there are many faces in the list after the projection of the test image on the open KL-base) we use the verification relied on a difference base. List of likely matching faces

Projection on open. KL- base

opening

Test image ~r

I difference I

prototype 9

J Pr~176 ~ KL-base

Confidence of decision

in

I

database /discard Verification prototype/ discard Figure 2. Face recognition scheme based on morphological filtering.

4. Results We have used the male database of the University of Essex in our experiments. We have chosen 100 different frontal faces, with no facial expressions, oriented in the centre of the image and with small scale and decline variations (let us call these images prototypes) to build the K-L bases. As test images we have selected 90 face images, which have a prototype in the database, with variations in scale, decline, orientation and facial expressions. We have used as well 10 face images with no prototype in the database. Given a test image, the question was to recognise the respective prototype, if there was one, or to discard the image because there was no prototype. Two kind of errors are emerged: False alarms (a face which has not a prototype in the base is recognised as one which has) and false discrimination (a face which has a prototype is discarded or is recognised as a false one).

4.1 Results obtained by the subband algorithm 16

25

36

49

not simulated

not simulated

not simulated not simulated not simulated 4 4

not simulated

Num. Of LL KL-base Coeff

Num. of LH KL-base Coeff.

15 16

12

25

10

not simulated 8

5

36 10 8 5 49 9 8 5 Table I: Total percentage error for various simulations of the subband based algorithm.

not simulated not simulated not simulated 3

In Table I the total percentage error (discrimination error + false alarms) is shown, for various simulations. For example retaining 16 coefficients from the projection on the LL KL-base and 25 from the projection on LH KL-base the total error is 8%. Increasing the number of the retained coefficients of the LL KL-base the total error decreases. However, increasing the number of the retained coefficients of the LH KL-base the total error doesn't decrease essentially. Note also that, the total error consists mainly of the false alarms. This can't be easily reduced because it is dependent of the considered threshold T. Faces (with a prototype in database)

Faces with high Conf. Of Decis..

Faces with inadequate Conf.

Faces with low Conf. of Decis.

Disc. Err. of faces with High Conf. of Decis.

Disc. Err. of faces with Low Conf. of Decis.

158

Faces Faces with high Faces with Faces with low False alarms (no protot~e in database) Conf. of Decis.. inadequate Conf. Conf. of Decis. ! 10 0 5 5 3 Table II : Performance of the subband based algorithm retaining 9 and 16 coefficients of the projection on LL KLbase and LH KL-base respectively. In Table II the results of a simulation in which we retain 9 and 16 coefficients of the projection on LL KL-base and LH KL-base respectively are shown. Comparisons with the results of the morphological algorithm, shown in Table III, an be deducted. 4.2 Results obtained by the morphological algorithm In Fig.3 it is presented the results of error discrimination using the above test images. The results have been taken for different structuring elements (5, 10, 15, 20, 25) and for different number of coefficients for each base. The number of coefficients which are kept for opening are the same as the keeping numbers in the difference base. (in this results 9 coefficients). It is observed that the structuring elements 15 gives better results. This is quite logical since the use of a small structuring element deducts good recognition at the opening base and poor at the difference while the use of large structuring element good verification of the difference base and poor recognition for the opening base. It should also be mentioned that the opening base keep well the significant information and as a results it gives very well identification despite the fact that the images of the database (prototypes) are not easily recognisable by the humans.

Figure 3 : Discrimination Error for different

Figure 4 9Number of Coefficients of KL transform for

structuring element

each base and the used verification.

Fig.4 shows the discrimination error for each base and for the verification (in this case we have kept the same umber of coefficients for the Open. and Dif-base). As the number of coefficients increases the total error decreases significantly, especially for the verification and the Open. Base. We choose the same number of coefficients for verification because the results conclude to be very satisfactory without keeping a large number of coefficients for the Dif. Base. One exception is presented in Table III in order to have a comparison with the subband based algorithm. It should also be mentioned that in the verification procedure the major proportion (about 70%) give the right results without the use of Dif. Base and as a result the computational time reduces significantly. Faces (with a prototype in database) 90

Faces with high Conf. of Decis.. 69

Faces with inadequate Conf. 0

Faces with low Conf. of Decis. 21

Disc. Err. of faces with High Conf. of Decis. 0

Disc. Err. of faces with Low Conf. of Decis

Faces Faces with high Faces with Faces with low False alarms (no prototype in database) Conf. of Decis. inadequate Conf. Conf. of Decis. 10 0 6 4 2 Table III" Performance of the morphological algorithm retaining 9 and 16 coefficient of the projection on opening KL-base and difference KL-base respectively (size of structuring element 15).

5.Conclusions In this paper we have presented two innovative techniques for face recognition. In the morphological based approach the results are more promising, since the verification step increases the efficiency of the algorithm. On the other hand the subband based approach is more attractive computationally. Due to the perfect reconstruction filters used in this

159 approach the LH KL-base converges slowly and consequently the verification step does not improve the efficiency of the algorithm significantly.

References [ 1] R. Brunelli and T. Poggio, "Face Recognition: Features versus templates," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. I0, pp. 1042-1052, Oct. 1993. [2] M. Turk and A. Pentland, "Eigenfaces for Recognition," J. Cognitive Neuroscience, vol. 3, no. I, pp. 71-86, 1991. [3] A. O'Toole, H. Abdi, K. A. Deffenbacher and D. Valentin, "Low-dimensional representation of faces in higher dimensions of the face space," J. Opt. Sac. Am. A., vol. I0, No. 3, pp. 405-41 I, March 1993 [4] S.G. Mallat, "A Theory for Multiresolution Signal Decomposiotion: The Wavelet Representation", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674-693, July. 1989. [5] A. Tirakis, A. Delopoulos and S. Kollias, "Two-dimensional filter bank design optimal reconstruction using limited subband information," IEEE Trans. on Image Processing, vol. 4, no. 8, pp. 176-200, August 1995 [6] L. Vincent, "Morphological grayscale reconstruction in image analysis: Applications and efficient algorithms," IEEE Trans. on Image Processing, vol. 2, no. 2, pp. 176-200, April 1993 [7] J. Serra, Image Analysis and Mathematical Morphology, New York: Academic Press 1982.

This Page Intentionally Left Blank

Proceedings IWISP '96," 4- 7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

161

PARTIAL CURVE IDENTIFICATION IN 2-D SPACE AND ITS APPLICATION TO ROBOT ASSEMBLY Feng-Hui Yao*, Gui-Feng Shao**, Akikazu Tamaki*, Kiyoshi Kato* *Dept. of Electric, Electronic and Computer Engineering, Faculty of Engineering, Kyushu Institute of Technology. 1-1 Sensui-cho, Tobata-ku, Kitakyushu 804, Japan Phone (+81)093-884-3255(Direct), Fax (+81)093-871-5835, E-mail: [email protected] **Dept. of Commercial Science, Seinan Gakuin University, 6-2-92 Nishijin, Sawara-ku, Fukuoka 814, Japan Phone (+81)092-841-131 l( Ext. 262), E-mail: [email protected] A B S T R A C T This paper describes an algorithm to identify the partial curves o f planar objects in 2-D space and its application to robot assembly. For the given boundary curves of objects, dominant points o f every boundary curve are detected. Then, by considering the dominant points as the separation points, the corresponding boundary curve is segmented into partial boundary curves which are called curve segments. And then, the curve segments belonging to the boundary curve o f an object are translated and rotated to match those o f another object to obtain the matched curve segments. From these matched curve segments, the longest consecutive matched curve is detected. At last, the effectiveness of this algorithm is shown by the experiment results.

I. Introduction The shape of the object plays a very. important role in object recognition, analysis and classification. Researches in this field can be roughly classified into (1) edge detection; (2) dominant point detection of the boundary curve; and (3) shape recognition. The researches about the edge detection focus on edges or contours[I]-[2]. Those about dominant point detection focus on the points of high curvature[3]-[4]. And those about the shape recognition pay attention to the entire shape of the boundary curve and identify the objects[5]. These researches seldom involve the problem of object connection relationship, i.e., to determine whether a part of an object can be connected with that of another one. This problem is very important in robot assembly system. This problem can be thought of as the problem of the partial curve identification. This paper focuses on this problem and proposes an algorithm to identify the partial curve of planar object. In this algorithm, firstly, the boundary curves of objects are extracted from the input image after binarization, and dominant points with high curvature are detected. Then, each boundary curve is segmented into partial boundary curves which are called curve segments by taking the dominant points as the separation points. And then, the curve segment matching is performed. The partial curve is identified based on matching errors. In the following, section 2 describes the algorithm for partial curve identification; Section 3 relates its digital implementation; Section 4 shows its application and experiment results; At last, the effectiveness of this algorithm is discussed and the future works are given.

2. Algorithm to Identify the Partial Curve of Planar Object In the following explanation, the boundary curve is simply called curve if not to point out specifically.

2.1 Dominant Point Extraction For a given object, let 7"(s) represent its boundary curve. 7"(s) is expressed parametrically by its coordinate functions x(s) and y(s), where s is a path length variable along the curve. If the second derivatives ofx(s) and y(s) exist, curvature at (x, y) is computed by C(x,y) = (x ' y " -- y ' x " ) - - ((x ' ~ + 0 ~ ' )2)3/2 (1) To express the curvature at varying levels of detail, both boundary coordinate function x(s) and y(s) are convolved with the Gaussian function g(s, or) defined by g(s, or) = exp(-s2/(2 0 2))-~-( O"4 2 7z)

(2)

where cr is the standard deviation of the distribution. The Gaussian function decreases smoothly with the distance and is differentiable and integrable. Let us assume that cr of the Gaussiang function is small compared with the total length of the curve 7 (s). The Gaussian-smoothed coordinate functions X(s, a) and Y(s, or) are defined as x(s) | g(s, or) and y(s) | g(s, a), respectively, where "| means convolution. Because both X(s, or) and Y(s, a ) are smooth functions and their first and second derivative exist, the curvature C(s, a) of the curve 7 (s) smoothed by the Gaussian function is readily given by applying X ' (s, or), Y ' (s, or), X "(s, or) and Y "(s, a ) to equation (1). For a given scale a, the corresponding curvature C(s, or) can be obtained according the procedure related above. The searching process is applied to detect the local maximum of absolute curvature within the region of support given

162

by the sequence {IGI ..... IG-~I, IGI, IG+ +1..... Ifrl}, where C~ is the curvature of the point in question, and C~ and Cr are the leftmost and rightmost points of the local region of support, respectively. The region of support for each point i is the largest possible window containing i in which [C[ to both the left and right of i is strictly decreasing. The points with local maximal absolute curvature are considered as the dominant points.

2.2 Curve Segmentation For any two objects A and B, let us assume that their boundary, curves are represented by a(s) and ,6'(s), respectively, and that their dominant points are denoted by P ( a ) ={ p ~ p ~r..... p ~ ~.~ } and P( fl)={ p ~o, p z~ ..... P aN-r}, correspondingly, where M is the number of dominant points of the curve ct(s) and N the curve ~(s). Dominant points are numbered clockwise and they are considered as the separation points. Therefore, the curves a~(s) and ~'(s) can be spilt up into curve segments. Let S ~ and S a denote these two sets of curve segments, i.e.

S a = { C t o , 1, Ctl,2 ..... a'u.,,,o}(moduloM),

Sa={/Yo,~,

,8~,2 ..... /5'N_~,o}(moduloN)

(3)

where a ~,j (i, j=O, 1..... M-I, modulo M) and ~u,v (u, v=O, 1..... N-I, modulo N) are the curve segments of the curves a~(s) and ,6'(s), respectively. In this notation, dominant point i is the start of a ~,~ and j the end, and dominants u, v has the same meanings.

Fig.1 Partial curve l~j+l,j-I is translated so that the dominant point i andj overlap,

Fig.2 Input image after binarization, which include two objects.

2.3 Partial Curve Matching The partial curve matching includes the extraction of the candidates of the longest consecutive matched curves (abbreviated as LCMC) and the decision of LCMC. 2.3.1 L C M C Candidates Extraction For the dominant point i on curve a'(s), the curve segment a i-~,i terminates at i and a i.i+~ starts from i, clockwise, where a ~-i,~, a ~,~§~ ~ S a ( i=0, 1 ..... M-l). Similarly, for the dominant point j on curve ~(s), the curve segment Bj§ ~,j terminates atj and jSj,j_~ starts from j, counterclockwise, where/qj+ ~,/, jSj,j.~ ~S~(/=O, 1..... N-l). For simplicity, these two pairs of curve segments are denoted as a ~.z, ~+ ~ and flj+ ~,j_~ and are called partial curves. Then let us consider the matching of a ~.~,i +~ and Aqj+ ~,j. ~. To perform the matching of these two partial curves, fig +~,/. 1 is translated so that the dominant pointj included in Bj+ ~,j_~ overlaps the dominant point i included in c~~-r,i+ ~ (see Fig. 1). The displacement in X-axis is the difference of the x-coordinates of the dominant point i on a'(s) and j on B(s). Likewise, the displacement in Y-axis can be obtained by using their y-coordinates. Next, Bj + r,/-~ is rotated around the dominant point j, clockwise, from 0~ to 360 ~ by 1~ per step. Let E(cr ;_r,;,,6'/+ ~,j)0 express the matching error when B j+ r,j-1 is rotated 8 ~ , which is defined by :

ffdxay + ffd ay

Dr

(4)

D2

where Dr is the area surrounded by the arc~j.~, arcj.~,~+~and arc~§ and D2 is the area surrounded by the arc~_r,~, arc~j§ and arcj+r,~.2, as shown in Fig. 1. When ~j+ ~,/.2 is rotated from 0~ to 360 ~ , the minimal value o f E ( a ~.1,i,~/§ 1,/)a is called the minimal matching error between a 5-r, ~+ r and fl~.+~,j_2, and is denoted by E ( a 5-r, f+ r, fl~ + z,/- Z)m~,. And the corresponding rotation angle is denoted by 8 ( a 5-~, ~+~, ~ j + ~,j- ~)m~,. In the following, if no confusion, they are simply written as E,~, and 8 ,~,. E,~, is simply obtained from the follows

Emi, = min{Eo, E~..... Es59}

(5)

IfE,,~, is small compared with the threshold value Te~, the partial curves a ~-r,~+1 and/5'j§ r,j-~ are said "matched". Then the clockwise neighbor of a i-r,~+ 2, i.e., the curve segment a ~+r,i+2 is added to the end of a ~.~,~+r, and the counterclockwise neighbor of /5'j+2a-r, i.e., flj-~,j-2 is added to the end of/qj§ ~,j-r, the matching procedure related

163

above is performed again. Note here that threshold value is dynamically increased by Tel, i.e., the threshold value is set at 2Te 1. If E( a ,_ I, ~+2,/~j +1,j- 2 )m~, is smaller than 2Tel, and the absolute value of difference of 8 ( a ;_1,; +i, Bj + 1,j- 1),m, and 8 ( a i_1,;+2,r 1,j-2)mi, is smaller than threshold value T 0/2, the partial curves a ~+I,~+2 and flj-l,j-2 are said "matched". This repetition will continue until the "unmatched" curve segments are encountered. Likewise, this procedure is also applied to the counterclockwise neighbors of a g-l,i+ 1 and the clockwise neighbors of flj+ ~,j-1. Here, it is worth to note that the new curve segments will be added to the beginning of a ~_i, ;+ 1 and ,6'j+ 1,j-1. The repetition will stop when the "unmatched" curve segments are encountered. These consecutive curve segments form a candidate of LCMC. The above procedure is applied to all curve segments in S ,~ and S ~. LCMC candidates whose numbers of curve segments are greater than the threshold value TL are passed to the next step for the decision of LCMC.

2.3.2 LCMC Decision For the k-th LCMC candidate (k=0, 1..... K, and K is the total number of LCMC candidates), its minimal matching error is recalculated by overlapping the centers of the corresponding consecutive curve segments and rotating the curve segments belonging to S ~ from 8 m,n -T o to 8 ,,in +T 8 by 1~ per step. The LCMC candidate whose minimal matching error is smallest is considered as LCMC at which the two curves match optimally.

3. Digital Implementation When to implement the above algorithm, it is necessary, to define the digital curve, digital curvature and digital matching error. In Cartesian coordinates, the coordinate function x(s) and y(s) of closed curve is digitally expressed by a set of Cartesian grid samples {~q, y~} for i=1, 2 ..... N (modulo N). The digital curvature at point i on curve can be calculated by

c, = A x a ~v- Ay A ~

(6)

where A is the difference operator and A 2 is the second-order difference[3]. The digital Gaussian function in [6] with a window size of K=3 is employed here to generate smoothing functions at various values of a and it is given by h[O] = 0.2261

h[l] = 0.5478

h[2] = 0.2261

(7)

where h[1] is the center value and ~ h[k] =1 (k=-0, 1, 2). This digital function has been mentioned in [7] and [8] as the best approximation of the Gaussian distribution. For digital functions with higher values of or, the above K=3 function is used in a repeating convolution process. A 2(/+1)+1 digital smoothing function is created by repeating the self convolutionj times. Note here that the digital Gaussian smoothing function for a largest a must have a window size no larger than the perimeter arc length N of the curve. A multiscale representation of the digital boundary curve from cr =0 to cr ,,~ can be constructed by the digital function defined above. Therefore, the multiscale digital curvature can be obtained according to the equation (6). And then, for each point i, a searching procedure is applied to detect the local maximum of absolute curvature. Points on the curve with local maxima of absolute curvature are considered as dominant points. For any two objects A and B, let a and/7 represent their digital boundary curve. Then a and/7 can be expressed by the sets of digital points on the boundary curves, i.e., a={(Xo, yo), (xl, yl) ..... (xM_1,yM-1)}, /7={(Xo, yo), (xl, yl) ..... (xN_l, YN-1) }. Their dominant points can be obtained by the method related just above. Their segmentation can be performed according to the method related in section 2.2. And the digital curve segments are also expressed by equation (3). Here and after, if no confusion, the digital curve segments are also simply called curve segments. Next, the matching procedure is applied to these digital curve segments. The matching error shown in equation (4) is digitally computed by max { P, Q }

max { U, V}

E( a r i-l,i, t~j+ 1,j)8 -" E (Sz3(p,p+l,q)+Szs(q,q+l,p+l))+ Z (Szs(u,u+l,v)+Szs(v,v+l,u+l)) p=O u=O q=O v=O

(8)

where P, Q, u and V are the numbers of digital points of the curve segments ,6:.+ ~,j, a ~_i,i,/qj.j- 1 and a ;, ~+ 1, respectively. As shown in Fig.l, S~,e+l,q ) is the area of the triangle formed by the points p, p+l and q. Similarly, S~ (q,q+l,p+1),S• and S~(v,v+l,u+l) have the same meanings. Here, it is worth to note that if the number of digital points included in a curve segment is less than that of the curve segment in comparison, its start point or terminal point will be employed to correspond the rest points of the curve segment in comparison to continue the calculation of equation (8). Which of them will be used is decided by the tracing direction along the curve segment (clockwise or counterclockwise). For example, in the region D2 of Fig. 1, the digital matching error is calculated, starting from the overlapped domoinant point i (or j), by taking out one point from each curve segment a ~_l,iand/~j+1,j and putting into the first item of

164

equation (8). Because the number of the points included in the curve segment a i_l,iis less than that in ,6'j+ 1,j, the start point of a i-1,~, i.e., the point i-1 is employed to continue the calculation for the rest points of ~j,j+ 1. This calculation stops at the terminal point of/~j.j+ 1, i.e., the pointj+l. The same procedure is also applied to the region D1. The partial digital curve/~j.l,j+ 1 is rotated from to 0~ to 360 ~ by 1~ per step. After each rotation, the matching error is computed. The minimal matching error can be obtained according to equation (5). Table 1. L C M C candidates obtained

No. 0 1 2 3

Curve segments of the object on left 6-5-4-3-2 6-5-4-3-2-1-0 6-5-4-3-2-1-0 9-8-7 -6-5-4-3-2

Curve segments of the object on right 17-18-19-0-1 0-1-2-3-4-5-6 0-1-2-3-4-5-6 1-2-3-4-5-6-7-8

4. Application and Experiment

Overlapped dominant point 31eftand lri~t 21eftand 5de,ht 1left and 6ri#t 31eftandZrie,ht

~

0

1

~ ~1~ 2

2

~1 18~17 u--- 19

12

,4

5

" ~15

17

An application model of this algorithm is supposed that a robot mounted a camera assembles machine parts. The experiment is Fig.3 Extracted boundary curves, detected dominant points and the final LCMC. performed with the real image. Fig.2 shows the input image after binarization, which includes two objects. Fig.3 shows the extracted digital boundary curves, the detected dominant points (marked by small "tr') numbered clockwise. Four LCMC candidates are listed in table 1. The first LCMC candidate is decided as LCMC and is shown in Fig.3 by the thicker lines. Fig.4 shows assembled result after the object on the left is translated 162 dots along X-axis and -32 dots along Y-axis, and is rotated 90 ~ clockwise. The values of Tel, T e and TL are 80, 30 ~ and 4. 5. Conclusions and Future Works

This paper proposed an algorithm for the partial curve identification in 2-D space. The application model is supposed that a robot mounted a camera assembles the machine parts in which the connection relationships among the machine parts are necessary. The problem of object connection relationship can be simplified as the problem of partial curve identification. The real images are employed to test this algorithm. From the experiment result, it is clear that this algorithm is effective.

Fig. 4 Assembled result.

This experiment employed the images of objects without texture. If the objects have some texture, the boundary curve detection will become more difficult. Moreover, if the input image includes objects more than three, a partial curve of an object may match the partial curves of multiple objects. In this case, it is necessary to employ the image values near matched curves to decide the optimally matched partial curve. Further, in the vision robot assembly system, only this is not enough. It must be combined with other 3-D information. All these are left to do in the future. REFERENCES [ 1] R.M. Haralick, "Digital step edges from zero crossing of second directional derivatives," IEEE Trans. Patt. Anal. Mach. Intell., vol. PAMI-6, no. 1, pp.58-68, Jan. 1984. [2] R. Mehrotra, K. R. Namuduri and N. Ranganathan, "Gabor filter-based edge detection," Pattern Recognition, vol. 25, no.12, pp.1479-1494, 1992. [3] A. Rattarangsi and R.T. Chin, "Scale-based detection of corners of planar curves," IEEE Trans. Patt. Anal. Mach. Intell., vol PAMI-14, no. 4, pp.432-449, Apr. 1992. [4] P. Zhu and P. M. Chirlian, "On critical point detection of digital shapes," IEEE Trans. Patt. Anal. Mach. Intell., vol. PAMI-17, no. 8, pp.737-748, Aug. 1995. [5] I. Sekita, T. Kurita and N. Otsu, "Complex autoregressive model for shape recognition," IEEE Trans. Patt. Anal. Mach. Intell., vol. PAMI-14, no. 4, pp.489-496, Apr. 1992. [6] P.J. Butt, "Fast filter transforms for image processing," Comput. Vision, Graphics & Image Processing, vol. 16, pp.20-51, 1981. [7] P.J. Butt and E. H. Adelson, "The Laplacian pyramid as a compact image code," IEEE Trans. Commun., vol. COM-31, no.4, Apr., 1983. [8] P. Meer, E.S. Baugher and A. Rosenfeld, "Frequency domain analysis and synthesis of image pyramid generating kernels," IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-9, no.4, pp.512-522, Apr., 1988.

Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors)

165

9 1996 Elsevier Science B.V. All rights reserved.

A fast active contour algorithm for object tracking in complex background Chun Leung Lam, Shiu Yin Yuen E-mail: cllam @ee.cityu.edu.hk, eekelvin@ cityu.edu.hk Department of Electronic Engineering City University of Hong Kong 83 Tat Chee Avenue, Kowloon, Hong Kong Abstract- Active contour is a powerful tool to object tracking. However, the existing models are only applicable to track on simple image. Based on the idea of the original greedy algorithm, we present a fast greedy tracking algorithm to face with the problem of tracking on complex real image. We have demonstrated the algorithms on tracking complex shape object on complex background. 1. Introduction 2D object tracking is a hit research topic in dynamic scene analysis. Different methods can be used : (1) image region based tracking algorithms [1 ]; (2) feature point based tracking algorithms [2]; and (3)line segment based tracking algorithms [3]. In general, these methods require an explicit definition of a dynamic model of the moving objects. Many objects cannot be described by simple geometric shapes (e.g. circle, ellipse) but need to be represented with complex contours. In order to model the complex natural shape contours, Kass et al. [4] has introduced the idea of active contour (deformable contour). Active contour models were successfully applied in computer vision problems such as optimal contour detection [5,6,7], and simple shape object tracking on a uniform background [7,8]. D. J. Williams and M. Shah [9] proposed a greedy active contour algorithm which is fast and stable. In section 2, the tracking results using their greedy algorithm are shown, which is useful in summarizing the difficulties of tracking by active contour. In section 3, a new "greedy tracking algorithm" is proposed and the results of using the proposed algorithm to track objects with complex shapes in complex background is given. Finally, a conclusion is given in section 4. 2. Object tracking by greedy algorithm Suppose the contour Ct of a moving object M at time t is known, Ct can be used as an approximate contour of the target object at time t+l, provided that there is only a slight change in the target object. In order to find the best description Ct+ 1 from Ct, an adjustment process is necessary to fine tune the shape of the contour by using the information available at image frame It+l. 2.1. Classical active contour approach The snake equations provide flexible tracking mechanisms that are driven by simulated forces derived from time varying images. Let the contour be represented by v,.=(x(s) ,y(s)). The classical active contour approach involves minimizing an energy function defined by i

(1 )

Esmtke : i Eint (V(S)) "1- Eex ` (v(s))as 0

for the active contour to move onto the object border. The internal energy is written as 1

2

+

2

)

which serves as a smoothness constraint. The external energy force.

Eext consists of the external constraints and the image

2.2. Greedy algorithm Greedy algorithm is a fast active contour algorithm proposed by D. J. Williams and M. Shah in 1991 [9]. Since the algorithm is both stable and fast, it is suitable as being an adjustment process for object tracking. The quantity being minimized by the greedy algorithm is

E = ~a(s)nor(E ......)+ ~(s)nor(Ecurv) + ~y(s)nor(Ei,,~ge)ds

(3)

and the energy terms are defined by

Eco,,.i -I-d-Iv i -vi_,l[

(4)

-Iv;_,- 2v, + m

where d represents the average distance between contour points in the previous iteration cycle, nor(E) represents a normalizing function with respect to the energy value of the neighboring pixels. The values of Eco,,t, E,.,~v and Eim,,g, are

166 all normalized to values between 0 and 1, and o~=1, 13=0 or 1 (depending on whether a corner is assumed at that location) and y=1.2 in the greedy algorithm.

(a) original image (b) result of greedy algorithm, require 0.16s (c) result of greedy tracking algorithm, require 0.21s Figure 1. Translated square and circle. (20 points used, with window size 3x3)

Figure 2.6 degree/frame rotating cup. (31 points used, with window size 3x3) The complexity of the greedy algorithm is O(nm e) for a contour having n points which allows the active contour to move to any point in a neighborhood window of size mxm at each iteration. (The full greedy algorithm will not be listed in this paper, but for more information, please refer to [9].) The results of using the greedy algorithm to object tracking are given in Fig. 1b, 2b and 3b. In Fig. 1b, the circle and the square are both moved slightly to the right and downward. We find that two of the contour points on the right edge of the square are attracted by the border of the circle when the contour in Fig. 1a is used as the initial contour of Fig. lb. In Fig. 2b, the cup has been rotated. The upper and lower portion of the arm contour are attracted by the internal structure and the background structural noises respectively. Although the rough shape of the body of the cup can be successfully extracted, the extracted border on the arm of the cup is not satisfactory. Fig. 3b is the result of tracking the human body silhouette in two consecutive images using greedy algorithm. We can see that only the regions near the shoulders and the left foot are extracted correctly. Results show the active contour model is sensitive to both the internal structure and background structural noises. Therefore, the active contour model can only be applied to track simple object moving on a uniform background.

Figure 3.0.5frame/s walking man. (68 points used, with window size 5x5) 3. Greedy tracking algorithm To incorporate more shape information into the model and reduce both the influence of internal structure and background structural noises when the method is applied to object tracking problems, we propose a "greedy tracking algorithm". The proposed model aims to maintain the shape and gray level values along the contour, in order to minimize the influence of both the internal structure and the background structural noises, due to the complexity of the target object and the background. The structure of the proposed algorithm is similar to the original greedy algorithm.

167 The algorithm is an iterative process. In each turn, each contour point is allowed to move to the neighboring location which has the lowest energy level, and the computational complexity is O(nm2). However, the definition of the energy function being minimized is different to the greedy algorithm. This makes the model has more desirable behavior when applied to the object tracking problem. The form of the energy function being minimized in the proposed algorithm is similar to equation (3). Internal energy of the contour is contributed by the sum of a continuity force and a curvature force. Let the contour be represented by {vi} = Ix(i), y(i)} where i=0,1 ..... n-l, and x(i),y(i) are pixels' coordinates. The continuity energy is redefined as IIu:i lu:+lll where U t E .......' = II/+lllu'I,- lu'+'--'--]] i - Vi-1;i-I in image I t. (6) I i+1 II (Note that all index arithmetic is modulo n) The internal continuity energy is so defined since we allow the points to be unevenly distributed on the contour and we only want to maintain the approximate distribution of the contour points on a newer image frame. The internal curvature energy is redefined as E ..... , =

Ic: -

I

C: +1

with

I

^,1 "]/~: •x u-"/'+l[

C: : /~i'+1 - ui

i - -

Again, the curvature at each point is maintained by minimizing the curvature energy. The curvature vector C: at point i has a magnitude equal to the square of the difference of the unit vector ui+ ~', and ui^', and with direction parallel to the vector ~.' • t~:+l . The continuity and curvature energy are so defined since it is assumed that the shape of the contour does not change a lot in a short time gap, the results of minimizing the continuity and the curvature energy together is that the approximate shape of the contour across any two consecutive frames is maintained. Note that the original assigned contour point can be of any shape (including low and high curvature point), this is a desirable property since many real objects has sharp curvature points, like corners. In the original active contour model, Econt, i (equation (4)) and Ecurv, i will be zero when the contour points are equally spaced and the curvature is zero. Thus the original active contour model biased towards i) equally spaced contour points and ii) low curvature. Moreover, corners have to be specified using the special method of setting fl = 0. This is undesirable since during a motion, i) maintaining equally distort feature points may not be the best strategy to represent a shape most faithfully (compactly); ii) a priori assumption of low curvature is not particularly realistic in a shape representation; iii) this problem is even more pronounced since the motion and view changes may continuously produce points of sharp curvatures as new occluding contours come into view. On the contrary, our method does not suffer from such anomalies. Eco,t.i (equation (6)) and Ec.... i (equation (7)) will be zero merely when i) the spacing ratio of consecutive contour points and ii) the curvature do not change between frames. Also, from equation (7), it is clear that the method does not have to take special care for corner contour points and the appearance or disappearance of a corner point can be gradually accounted for by the equation. On the other hands, the external energy is defined as Eex , = ]Gbl, (v) - abl,+ l (v)]- ]VI,+ l (v)] 2

(8)

where Gbl is the Gaussian blurred image of I. Minimizing the external energy will cause the contour point to move to the new location where the approximate gray level value can be maintained, and the contrast is high. The proposed algorithm is listed below :

Greedy tracking algorithm

Input : Output :

Image It, It+t, contour Ct o f image It Adjusted contour Ct+l o f image It§

ct = [3 = 1, T = 1.2, ptsmoved = 0"

do{

for i = 0 to n { for j = 0 to m-1

//Note: all index arithmetic is modulo n //first point is processed twice

for k = 0 to m-1 calculate Econt,i(j,k), Ecurv,i(j,k), Eimage,i(j,k) ; nor( Econt,i(j,k) ) = Econt.i(j,k) / MAX( Econt,i(j,k ) ) ; nor(Ec~,i(j,k) ) = Ecurv,i(j,k) / MAX(Ecu~v,i(j,k) ) ; nor(Eimage,i(j,k) )=Eimage,i(j,k) / MAX(Eimage,i(j,k) ) ; for j = 0 to m-1 //mxm =size of neighborhood for k = 0 to m-1 Ei(j,k)=o~ nor( Eeont,i(j,k) )+ ~ nor(Eeurv,i(j,k) ) + y nor(Eimage,i(j,k) ) ; Locate smallest Ei(j,k) ; Move vi to location with smallest Ei(j,k) ; ptsmoved += 1 ;

168

} }while ptsmoved < threshold ; Note that the first contour point Vo is processed twice (like the greedy algorithm), since the point v,.l has not been updated when Vo is processed. Reprocessing the point Vo helps to make its behavior more like that of the other points. Results of using the greedy tracking algorithm to object tracking are given in Fig. lc, 2c, 3c and 4. (Note, we use the same weight settings as in the greedy algorithm, ot=fl=l, 7'=1.2. In contrast to the original greedy algorithm, we have no need to set fl=O for corner points) We use gray level images of size 256x256 pixels. The processing time, the number of point and the window size used for each image (using a PC 486DX33) are listed under each picture. Fig. 2c, 4a and 4b are the results of tracking a rotating cup at time frame 2, 5 and 10 respectively, which demonstrates that the proposed algorithm is successful in tracking rigid objects with complex background provided that the motion is slow. Fig. 3c is the result of tracking the human body silhouette in two consecutive images. The upper portion of the body is correctly extracted which shows that the model is applicable to track complex shape non-rigid body. However the right foot is lost, because the proposed algorithm intends to maintain the shape of the contour across two consecutive image frames. This demonstrates that the algorithm only allows a small change of the shape of the contour across different frames.

Figure 4. 9~ rotating cup image sequence: Fig.2a( I1 )--->2c( 12 )--->4a( I5 )--->4b( I10 ). (31 points used, with window size 3x3) 4. Conclusions A fast "greedy tracking algorithm" is proposed in this paper. The proposed model aims to maintain the shape and gray level values along the contour, in order to minimize the influence of the internal structure and background structural noises due to either the surface texture complexity cf the target object or the background. The proposed algorithm has been applied to tracking objects in complex real images, and results manifest that the model is quite successful in tracking rigid or non-rigid object providing the changes are slight. Also, the tracking results is satisfactory even when the shape of the object is complex. On the other hand, although maintaining the shape of the contour is helpful in tracking complex objects, it limits the flexibility of the model since it only allows slight changes to occur. Alternatively, the method requires that successive frames be closely spaced in time. This is a compromise which we have to make in our approach.

References

1. D.S.Kalivas, A.Sawchuk, "A Region Matching Motion Estimation algorithm', CVGIP: Image Understanding, Vol.54(2), 275-288, 1991. 2. S.K.Sethi, R.Jain, "Finding Trajectories of Features Points in a Monocular Image Sequence", IEEE Trans. PAMI, Vol.9(1), 56-73, 1987. 3. R.Deriche, O.Faugeras, "Tracking Line Segments", Image and Vision Computing, Vol.8(4), 261-270, 1990. 4. M.Kass, A.Witkin, D.Terzopoulos, "Snakes: Active Contour Models", Proc. Int. Conf. Comp. Vis., 259-268, 1987. 5. A.A.Amini, T.E.Weymouth, R.C.Jain, "Using Dynamic Programming for Solving Variational Problems in Vision", IEEE Tans. PAMI, Vo1.12(9), 855-867, 1990 6. C.A.Davatzikos, J.L.Prince, "An Active Contour Model for Mapping the Cortex", IEEE Trans. Medical Imaging, Vol. 14(1), 65-80, 1995. 7. D.Geiger, A.Gupta, L.A.Costa, J.Vlontzos, "Dynamic Programming for Detecting, Tracking, and Matching Deformable Contours", IEEE Trans. PAMI Correspondence, Vol.17(3), 294-302, 1995. 8. F.Leymarie, M.D.Levine, "Tracking Deformable Objects in the Plane Using an Active Contour Model", IEEE Trans. PAMI, Vo1.15(6), 1993. 9. D.J.Williams, M.Shah, "A Fast Algorithm for Active Contours and Curvature Estimation", CVGIP: Image Understanding, Vol.55(1), pp. 14-26, 1992.

Proceedings IWISP '96," 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

169

The Two-Point Combinatorial Probabilistic Hough Transform for Circle Detection (C2PHT) J. Y. Goulermas and P. Liatsis Control Systems Centre, Dept. of EE&E, UMIST, PO Box 88, Manchester M60 1QD, UK e-mail: {goulerma/panos}@csc.umist.ac.uk A novel Hough Transform (HT) for circle detection, the C2PHT, is presented. While, other Combinatorial Probabilistic HTs reduce generation of redundant evidence by sampling point-triples, the C2PHT achieves a much higher reduction in two ways. Firstly, by using the edge gradient information, it allows point-tuples to define circles and consequently decreases the sampling complexity from O(N3) to O(N2). Secondly, the transformation is conditional, that is not all the tuples are eligible to vote. The evidence is gathered in a very sparse parameter space, so that peak recovery is readily despatched. The result is high speed, increased accuracy and very low memory resources. INTRODUCTION The Hough Transform <1'2'3) (HT) is a well-known robust generic method for detecting patterns of points in binary image data whose number of instances, size and spatial positions are unknown. It exhibits considerable immunity to problematic object boundaries, such as ones partially obliterated by occlusion, overlapping effects and breakages, as well as boundaries distorted by interference noise, ambient illumination and object motion. Many circle-HT variations have been proposed in the past (4'5'6'7'8'9) to extend the standard scheme. However, all generate substantial redundant evidence, as every single feature point A is assumed to (potentially) belong to a circle instance. This results to slowing down the transformation process and obstructing peak recovery, without significant reduction of the memory resources. The Combinatorial Probabilistic HTs (CPHTs) (10'11'12'13'14'15'16)are a distinct class of HTs which attempt to reduce the generation of redundant evidence, via transforming minimal subsets of points, that is the least number of points (3 in the case of circle detection) required to define a shape instance. In this way, they force feature data to vote for the most probable, as opposed to all possible shape instances, so that the cast votes accumulate densely in areas associated with the highest probabilities of instances. Nevertheless, the probabilistic nature of the CPHTs in correspondence with the high combinatorial complexity of sampling may cause a severe detection inefficiency. From all the (/if)possible triples of points, if valid ones (i.e. triples that define a real instance) are not quickly sampled, then the algorithm falters and accumulates ineffectual evidence that does not reflect objectively the instances depicted in the feature space F.. This effect becomes more dramatic when the amount of noise in u the number of points of the non-circular objects in the scene is high. Also, because no gradient information is used, it is very likely that false evidence is accumulated by coincidental co-circularities of points that do not reside on true circular arcs. In an attempt towards preserving the advantages of the standard CPHTs, while reducing dramatically their sampling combinatorial requirements, we have developed the C2PHT (2-point CPHT). This algorithm employs the gradient orientation to define potential circles using point-tuples. In this way it samples F 2, instead of F 3 and transforms the sampled tuples to very small sets of parameter vectors in the parameter space P. THE TUPLE-BASED TRANSFORMATION SCHEME Let A~(xA,YA)be a point in Fwith gradient orientation #a and trigonometric measures of cos(#a) and sin(#a), denoted by CA and SA respectively. Assuming that A belongs to a circle, then its centre resides on the line segment LA, as shown in Fig.l, defined parametrically by:

LAULA(rj)=(XA--rj'cA,YA--rj'SA)

VrjElrmin,rmax]

(1)

where [rmin,rma~] is the predefined radii range of the sought circle instances. For another point B (defined similarly as A) to belong to the same circle with A, LA and LB have to intersect at the same centre point. Therefore, there should be a value r~=R that simultaneously satisfies the parametric constraints of the two line segments. However, in order to take into account inaccuracies in the estimation of gradient orientation, we assume that CAand ~ are subjected to an error of _+~. Then, candidate centres suggested by A and B reside within the two triangular shaded areas of Fig.1 and are restricted to lie along the perpendicular bisector P of AB. Each point suggests centres corresponding to different segments of P. Specifically, A suggests centres comprising the

170

segment A~A2, while B suggests centres on B~B2. The common part of the two triangular areas is the polygon STUV and its intersection with P is the line segment A~B~. Therefore, centres of probable circles that contain both A and B are the points of AIBI. This segment may need to be mmcated, so that the related radii are within the predefined range [r,,~,r,,,x].

Figure 1: The tuple-based transformation scheme. The endpoints of AIA2 and BIB2 can be calculated as follows. P is described by: Y - YM = m. ( x - x M)

(2)

its slope. We define LA~ LB:t as the where M - (xM, YM) -- Xa + XB YA + YB is the foot of P and m = ~ 2 ' 2 Ay four line segments passing through A and B with orientations ~A_+~, ~ , where CAe, C~e are their trigonometric measures, respectively, as shown in Fig.1. LA:tand/-,Be are parametrically defined similarly to LA in Eq.1, with rj being the free variable. The next step is to find the four intersection points A~, ,42, B~ and B2 of the four line segments La• and LB• with P. A~ is, for instance, the intersection point La+(rj)c'~'(x,y). Then AI is expressed in terms of a radius value RA+, being its distance from A at angle 4H-~. Hence, by substituting Eq.1 to Eq.2, we obtain: Ya - Ra§ . Sa§ - YM -- m. (Xa - Ra+ . ca§ - xM ) e , RA+ =

m. A x - Ay

(3)

2"(m'Ca+ --St+)

Thus, the coordinates of A~, A2, BI and B2 can be calculated by their corresponding radius values RA• and RBe. By ordering these values, we can readily locate the common segment AIB~ of AIA2 and BIB2, if any. A predicate Fis def'med to test the validity of a tuple (A,B) as: F ( A , B ) = true r {A~BI = aia2 NB~B2 ~PIP2 ~ ~ } (4) where P~,P2~P, d(A,P~)=max(d(A,P),rm~), d(A,P2)=rm~ with d(.) being the Euclidean metric distance. It is obvious that there is no need of selecting points such that d(A,B)>2.rm~. The HT of a tuple (A,B) is then conditionally defined as the set of cells T(A,B), for which F(A,B) is true, as: T ( A , B ) = { ( a , b , r ) ~ P , V(a,b)EA~BI " r = d ( A , ( a , b ) ) } (5) where (a,b) and r are the centre and radius of each suggested circle. The number of circles voted for by a valid tuple (A,B) equals the discrete length of A~B~ and this in turn depends on the predefined angular error ~. The C2PHT Algorithm The accumulator S is implemented as a dynamically allocated 3-D a-b-r sparse arrayC1~ where a set of linked-lists for the three parameters store the cell coordinates and their vote counts. The incrementation strategy employed is adaptive and depends on gradient magnitude(~8). This enables strong edge primitives to outbalance other noisy ones, which usually have lower edge magnitudes, thus resulting to a reduction of noise

171

in S. Let v(A,B) be the vote value cast by (A,B) and bounded by the predefined values Vminand V,,,a~,in order to evade extreme vote counts. To prevent weaker edges from being completely masked off by stronger edges the scaling of the votes is done exponentially. Then, the gradient magnitudes of the participating points are combined as:

( ~]G(A). G(B) - Groin ]I/2 v( A, B) = Vmin+ (Vm,x - Vmin )" ~' -amax--- GZ/ "

(6)

where Gmi~and Gm~ are the minimum and maximum gradient magnitude values in F,, and G(A) and G(B) the gradient magnitude values of A and B. The implemented C2PHT circle detection mechanism works as follows. Initially, a small percentage of n (=5%.N) points is uniformly randomly selected from the N points of F a n d stored in a list L. Next all possible K=n.(n-1)/2 tuple combinations from L are generated and from these, all K" tuples (A,B) which enable the predicate F', together with their produced votes T(A,B), are recorded in S. Once the K" valid tuples are transformed, the instance suggested by the highest peak in S is template-matched against F.. If there is a "hit", i.e. a circle instance is detected, the corresponding peak cell is set to zero. Then points constituting the detected instance are removed from F'and L. Following that, the algorithm template-matches the instance suggested by the second higher peak in S and proceeds accordingly. In the case of a "miss", i.e. the peak is false, a predef'med number of points t (=20%.n) are removed from L. These are selected to be the ones which participated more frequently in invalid tuples; then L is refreshed with an equal number of t new points from F. This heuristic operates efficiently as a penalising mechanism against any likely deficiencies in the initial sampling. To economise processing time, not the entire L has to be reaccumulated. Instead, the removed points are "de-transformed" from S. This means that each removed point is re-evaluated via/" and T with each of the other points in L and negative votes are cast to S. Following refreshing, only the tuples that contain at least one newly inserted point (potentially) record evidence in S, which is added to the already existing one. The algorithm reiterates in the same way until three consecutive misses occur; then it is assumed that no more circle instances exist in the image.

Figure 2: Artificial image (A) and underwater bubble image (B), with detected circular objects. RESULTS For all experiments we used parametrisations of radius range rmin=5 and r,,~=45 (inclusive), angular error of ~=__~o, and vote boundaries of Vmin'-O and Vm~=500. Fig.2.A illustrates a 300x240 artificial grey-scale test image (which produces a feature space F o f size N=8,374 pixels, via edge detection and thresholding) with the 19 detected circles superimposed on the original scene. The dramatic reduction in evidence generation achieved by the C2PHT is manifested by the following: The total number of tuples in F i s D=35,057,751 and out of these the predicate F enables only Dr=299,443. This clearly shows that the proposed conditional tuple/gradient-based voting generates votes from at most a fraction of 0.85% of the D total elements in F 2. Comparing this to the complexity of a no-gradient CPHT, we can see that the total number of triples has to be 97,834,497,124 which far exceeds the transformation elements of C2PHT; in addition, every single triple is potentially capable of generating votes. As described before, a complete enumeration of all D tuples is unnecessary. The sampling list L is of size n=418 and this produces K=87,153 tuples on the whole. Since the sampling is spatially uniform, we assume

172 that L generates votes from K%O.85%.K~-740 valid tuples at each transformation cycle. The average length of the voting pattern T(A,B) of a valid tuple (A,B), for the given r,,,in, r,,,~ and (~, was found to be -3.6 cells which gives at most -2664 non-zero accumulator cells (in practice this number is smaller, since votes tend to overlap and converge to give rise to accumulator peaks). Fig.2.B shows the detection results in a real-world sharply illuminated underwater bubble image, with 29 detected bubbles. CONCLUSIONS A new combinatorial probabilistic Hough Transform, the C2PHT, is proposed for circular object detection. The novelty of the algorithm is that is employs gradient information and makes use of point-tuples as minimal points subsets for defining circles, thus reducing the combinatorial complexity from O(N3) to O(N2). In addition, it introduces the concept of conditional voting, where higher generation of relevant evidence is achieved, by allowing only the "valid" elements of the feature space to vote. The C2PHT incorporates facilely gradient error estimations and is based on computationally simple equations to perform fast evaluation and vote generation of the sampled tuples. The produced transform space is very sparse and hence, simple accumulator structures with Very small memory requirements are only needed. The algorithm was tested with synthetic and real-world scenes of circular objects and yielded fast and accurate results. Overall, the C2FHT manages to balance very well the trade-off between memory demands, simultaneous circle detection, high reduction of generated evidence and detection speed.

REFERENCES i V. C. Hough, "Methods and means for recognising complex patterns", US Patent 3069654, 1962. 2 Illingworth and J. Kittler, "A survey of the Hough transform", Computer Vision Graphics and Image Processing, pp. 87116, vol. 44, 1988. 3 H. Ballard, "Generalising the Hough transform to detect arbitrary shapes", Pattern Recognition, pp. 111-122, vol. 13, no. 2, 1981. 4 C. Kimme, D. H. Ballard and J. Slansky, "Finding circles by an array of accumulators", Communications of the ACM, pp. 120-122, vol. 18, no. 2, 1975. 5 G. Gerig, "Linking image-space and accumulator space: a new approach for object-recognition", 1,t Int. Conf. Computer Vision, London, pp. 112-117, 1987. 6 K. Yuen, J. Princen, J. Illingworth and J. Kittler, "Comparative study of Hough transform methods for circle finding", Image and Vision Computing, pp. 71-77, vol. 8, no. 1, 1990. 7 A. N. Jain and D. B. Krig, "A robust Hough technique for machine vision", Proc. Vision 86, Detroit, Michigan, pp. 475487, 1986. 8 j. lllingworth, J. Kittler & J. Princen, "Shape detection using the adaptive Hough transform", NATO ASI Series, Realtime Object Measurement and Classification, ed: A. K. Jain, Springer-Verlag Berlin HeidelBerg pp. 119-142, vol. F42, 1988. 9 Z. Li, M. A. Lavin and R. J. LeMaster, "Fast Hough transform: a hierarchical approach", Computer Vision Graphics and Image Processing, pp. 139-161, vol. 36, 1986. 10 Xu, E. Oja and P. Kultanen, "A new curve detection method: Randomised Hough transform (RHT)", Pattern Recognition Letters, pp. 331-338, vol. 11, 1990. 11 F. Leavers, D. Ben-Tzvi and M. B. Sandier, "A dynamic combinatorial Hough transform for straight lines and circles", 5th Alvey Vision Conf., Reading, pp. 163-168, 1989. 12F. Leavers, "The dynamic generalised Hough transform", l,t ECCV Conf., Antibes, France, 1990. 13 F. Leavers, "The dynamic generalised Hough transform: its relationship to the probabilistic Hough transforms and an application to the concurrent detection of circles and ellipses", Computer Vision Graphics and Image Processing, pp. 381398, vol. 56, no. 3, 1992. 14N. Kiryati, Y. Eldar and A. Bruckstein, "A probabilistic Hough transform", Pattern Recognition, vol. 24, no. 4, pp. 303316, 1991. 15j. R. Bergen and H. Shvaytser, "A probabilistic algorithm for computing Hough transforms", Journal of Algorithms, pp. 639-656, vol. 12, no. 4, 1991. 16A. Califano, R. M. Bolle and R.W Taylor, "Generalised neighbourhoods: a new approach to complex parameter feature extraction", Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 192-199, 1989. 17M. Brown, "Peak-finding with limited hierarchical memory", 7th Int. Conf. Pattern Recognition, pp. 246-249, Montreal, Canada, 1984. Is j. y. Goulermas, P. Liatsis and M. Johnson, "Real-time intelligent vision systems for process control", Proc. 4th IChemE Conf. Advances in Process Control, York, pp. 69'76, Sep. 27-28, 1995.

Proceedings IW1SP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

173

Modified Rapid Transform Features in Information Symbols Recognition system J. Turan- "K. Fazekas - L. Kovesi- M. Kovesi Department of Radioelectronics Technical University of Kosice Park Komenskeho 13 04021 Kosice Slovakia TeL/Fax." +42 95 6335692 E-mail: s TURAN@CCSUN. TUKE.SK

*Department of Microwave Telecommunications Technical Unwersity of Budapest Goldmann Ter 3 1111 Budapest Hungary TeL~Fax: +36 12 043289 E-mail: T-FAZEKAS@NO V.MHT.BME. HU

Abstract Various transformations have been suggested as a solution of the problem of high dimensionality of the feature vector and long computation time. Transforms which do not change with cyclic shifts in the sequence are called translation invariant. Fast translation invariant transforms are valuable tool for pure shape-specific feature extraction in pattern recognition problems [ 1]. In the field of pattern recognition and also scene analysis is well known the class of fast translation invariant transforms - certain transforms (CT) [2] based on the original rapid transform (RT) [3]. More recently was introduced the modified rapid transform (MRT) [5] which can distinguish many more patterns from one another that the original RT can. The MRT was presented to break undesired invariances of the RT which leads to a loss of information about the original pattern. In the paper application of fast translation invariant modified rapid transform (MRT) in feature extraction stage of Information Symbols recognition system are described. Experimental results are given of applying the proposed recognition system to recognition Airport Passenger Orientation Symbols and Meteorological Symbols, including the dependence of recognition efficiency on the number of selected features and noise.

1. Introduction Transformation methods can be used to obtain alternative description of signals. These alternative descriptions have many uses such as classification redundancy reduction, coding, etc., because some of these tasks can be better performed in the transform domain [ 1]. Various transformations have been suggested as a solution of the problem of high dimensionality of the feature vector and long computation time. More recently the modified rapid transform (MRT) [5] was presented to break undesired invariances of the rapid transform (RT) [3]. In the paper, a new method of recognition Information Symbols using MRT will be presented. We apply the MRT in feature extraction stage of Information Symbols recognition process. Some properties of the RT and MRT will be first reviewed, then the new method of recognition of Information Symbols will be presented. Finally, the experimental results will be given in applying of the proposed pattern recognition method to recognition of Airport Passenger Orientation Symbols and Meteorological Symbols, including dependence of recognition efficiency on number of selected features and noise.

2. M o d i f i e d rapid transform Transforms which do not change with cyclic shifts in the sequence are called translation invariant. Fast translation invariant transforms are valuable tool for pure shape-specific feature extraction in pattern recognition problems. The transforms may be used to extract features of one- or two-dimensional patterns, which are invariant under cyclic permutations to characterize objects independent of their position. In the field of pattern recognition and also scene analysis is well known the class of fast translation invariant transforms - certain transforms (CT) [2] based on the original rapid transform (RT) [3] but with choosing of other pairs of simple commutative operators. The RT results from a minor modification of the Walsh-Hadamard transform (WHT). The signal flow graph for the RT is identical to that of the WHT, except that the absolute value of the output of each stage of the iteration is taken before feeding it to the next stage. This is not an orthogonal transform, as no inverse exists. With the help of additional data, however, the signal can be recovered from the transform sequence, i.e. inverse rapid transform can be defined [4]. RT has some interesting properties such as invariance to cyclic shift, reflection of the data sequence, and the slight rotation of a two-dimensional pattern. It is applicable to both binary and analogue inputs and it can be extended to multiple dimensions. More recently was introduced the modified rapid transform (MRT) [5] which can distinguish many more patterns from one another that the original RT can. The MRT was presented to break undesired invariances of the RT which leads to a loss of information about the original pattern. This is achieved, by combining the RT with preprocessing steps using a

174

asymmetric neighbor operator ~. This operator is used to break undesirable invariances but keep the shift invariance of the MRT. Using the symbolic notation we can introduce MRT as follows: X(0) ~

.

X(1) X(2) X(3) X(4) ~

.

.

.

.

.

.

~. x'(o)

X'(1) ; X'(2)

; x'(3) :- X'(4)

x'(5)

x(5) x(6)

X(7)

; X'(6) .---kPreprocessingsteps ~

X(i)~. X(i+1)

_

~ X(i)~.

~'~Qf0(X(i),X(i+1),X(i+2))

X(i+2) "'"'""

~'(7)

RT j'JO

I,

X(i)~. f~(X(i),X(j))

X(j) ~

_ "~:)f2(X(i),X(j))

X(j) "'"'""

Fig.1. Signal graph of the MRT Signal graph of MRT (Fig.l) results from signal graph of RT with adding in general k preprocessing steps x'=c~x. This maps the element x(i) of input vector x to element x '(i) of vector x' by working on the elements x(i), x(i+ 1) and x(i+2)

x'(i)=fo(x(i),x(i+ 1),x(i+2))

(1)

It is important that the operator j~ be asymmetric because we want to destroy the invariance of RT under reflection. Operator j~ may be realized in the following simple manner

x'(i)-fo(x(i),x(i+ 1),x(i+2))-x(i)+lx(i+ 1)-x(i+2) I

(2)

The transform process of MRT (Fig.l) - identical to the transform process of RT requires N=2" input pixels, where n is a positive integer. Each column of the transform process in Fig.1 corresponds to a particular computational step; n steps are required. In general the variables x (r) in any column (r) are calculated from variables x (r~) in the preceding column (r-l) by

x(r)(i+2js)=fl(X(r'l)(i+2js),x(r'l)(i+(2j+ 1)S))

x(r)(i+(2j+ 1)s)=f2(x(r'l)(i+2js),x(r'l)(i+(2j+1)S))

(3)

where operatorsjq,J) for MRT (or RT) are

fl(a,b)=a+b; fz(a,b)=la-bl

(4)

and S=2n'r; t=2r'l; i=O,...,S-1;j=l .... t-1 and x - x (~ are input data (pixels) and x(") --=~ = MRT{x} are spectral coefficiems of MRT. MRT can be applied in all areas where the RT (or any transform from class CT) can be used. Some undesired invariances of RT can be destroyed applying only one preprocessing step. Experiments from use of MRT [5,6] in character recognition showed, that MRT can distinguish many more patterns from one another than the RT or the Fourier power spectrum.

3. The Information S y m b o l s Recognition @ s t e m M o d e l The recognition system is simulated in digital computer using program package CT-CAD [7]. It contains the following sub-systems(Fig.2):" 1. Original digital picture preprocessing system CSPO-III was used to accepts the physical input picture and then transduce it into a measurable matrix. CSPO-III divides a visual pattern into small elements and after suitable preprocessing produces an NxN matrix over the binary field; the element becomes 1 or 0 depending upon whether it is black or white. 2. The MRT processor according to its function may be also called a feature extractor. A 2D MRT of all binary prototypes is taken in this stage. Than feature selection is carried out in the MRT "spectral" domain on various basis (maximum value of spectral coefficients, variance zonal sampling and interclass standard deviation). 3. The selected MRT features of binary pictures (symbols) are in the teaching process feeded into the memory. Thus the memory unit learn the a priori knowledge of each class before the system can be used to make any decision. In the recognition process the selected MRT features are feeded into the classifier, which discriminates each pattern

175

(symbol) and assigns a category (a class) to it by some decision rule. We use a simple classifier based on cross responses dkl between two different patterns from class k and 1, defined in the next section.

MRT processor

CSPO-III

Input pattern

i :

Digital picture I I I Receptor and I

bq

Selected '

MRT

features

o

MRT features Binary

.

,

,

picture

Classes of patterns

Recognition Classifier

I Teaching

Fig.2. The MRT recognition system 4. R e c o g n i t i o n o f l n f o r m a t i o n S y m b o l s The proposed Information Symbols recognition system was tested on the two classes of selected symbols: 1. Airport Passenger Orientation Symbols (class consist of M=I 1 independent symbols) (Fig. 3).

ZI

Z2

Z3

Z5

Z6

Z7

Z9

ZI0

ZI1

nn nn nn i n n

nn u nn

2. Meteorological Symbols (class consist of independent symbols) (Fig. 4).

Z4

M1

M2

M3

M4

M6

M7

M8

M=16

u

M5

M9

Fig.3. The Airport Passenger Symbols M13

() Q ( M10

M11

M12

MI4

MI5

M16

Fig. 4. The Meteorological Symbols We implemented feature extraction with MRT at the both sets of Information Symbols. In general, the efficiency of feature extraction can be assessed by the system confusion matrix D={dn; k, 1=1..... M} where dkl are cross responses (or the distances between any two different symbols k, l in the feature space) and M is the number of classes or number of different symbols. The confusion matrix can be calculated in two steps shown as follows: A. All M prototypes of Information Symbols, each represented by a binary NxN matrix (xk(i,j), with i, j = l .... N; k--1 ..... Mand M=I 1 or M=16) are transformed to the MRT transform domain Sk(i,j) = r{x(i,j)} where x - MRT. B. The cross response d(1)klbetween two different symbols from class k and l is defined as follows:

(5)

176

N

d~ ) =

~l~k(i,j)- ~,(i,j)l

(6)

i,j=l

The results of experiments of dependence of recognition efficiency on number of selected features and influence of noise are shown in Tab.1. A set of 165 symbols were used for testing and teaching purposes for Airport Passenger Orientation Symbols and a set of 240 symbols were used for testing and teaching purposes for Meteorological Symbols, testing set used on Tab. 1 contains 5 noised symbols for each Airport Passenger Orientation and for each Meteorological Symbol. Tab. 1 Recognition of Airport Passenger Orientation Symbols and Meteorological Symbols

The results of experiments may be summarized as follows: A. Only one preprocessing step in MRT signal graph is sufficient to destroy the undesired invariances and improve significantly capability of MRT distinguish many more patterns from one another than the original RT can. B. Even if avery simple classifier was used, the recognition efficiency 96%- 100% can be obtained with selecting only a couple of features (0.1%-5% of the number of MRT coefficients) in the MRT spectral domain, even if the symbols are corrupted by (1%-4%) noise.

5. Conclusion We apply the MRT in feature extraction stage of Information Symbols recognition system. Experiments with recognition of two classes of symbols (Airport Passenger Orientation Symbols and Meteorological Symbols) demonstrate that even if very simple classifier was used, the very high recognition efficiency can be obtained with selecting only a couple of features in the MRT spectral domain, even if the symbols are corrupted by noise.

References [1] [2] [3] [4] [5] [6] [7]

Chmurny, J. - Turan, J." Two-dimensional Fast Translation Invariant Transforms and Their Use in Robotics. Electronic Horizon, Vol.15, No.5, 1984, 211-220. Wagh, M. D. - Kanetkar, S.V.: A Class of Translation Invariant Transforms. IEEE Trans. on Acoustic, Speech and Signal Proc., Vol. ASSP-25, No.3, 1977, 203-205. Reitboeck, H. - Brody, T.P.: A Transformation with Invariance Under Cyclic Permutation for Application in Pattern Recognition. Inf. and Control, Vol.15, 1969, 130-154. Turan, J. - Chmumy, J." Two-dimensional Inverse Rapid Transform. Computers and Art. Intelligence, Vol.2, No.5, 1983, 473-477. Fang, M. - Hausler, G.: Modified Rapid Transform. Applied Optics, Vol.28, No.6, 1989, 1257-1262. Turan, J.: Recognition of Printed Berber Characters Using Modified Rapid Transform. Journal on Communications, Vol.XLV, 1994, 24-27. Turan, J. - Kovesi, L. - Kovesi, M.: CAD System for Pattern Recognition and DSP with Use of Fast Translation Invariant Transform. Journal on Communications, Vol.XLV, 1994, 85-89.

Proceedings IWISP '96," 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

177

Image Data Processing in Flying Object Velocity Optoelectronic Measuring Device Jan MIKULEC - Vaclav RICNY, Technical University Bmo, Czech Republic

Abstract: Paper aims at the simulative verification of the new optoelectronic method for measurement of the flying objects' track velocity. This paper also shows the block diagram of the adapter which enables simulations by using mentioned algorithms on PC computers.

1. Introduction The trajectory of aircraft or of other flying means in space is not simple one. Due to the influence of the air masses motion velocity Vw, the direction of the flight is not identical with the aircraft axis. Sufficiently accurate determination of so called track velocity VTand of the track angle r is very demanding procedure. At present it is performed by using methods of exploiting terrestrial or orbital devices. The advantages of the described optoelectronic method should be relative low price of measuring device with respect to the attainable measuring accuracy, as well as the fact, that it deals with the autonomous and radio passive method.

2. Measurement of track velocity vector (TVV) using the light sensitive CCD sensors Total time derivation of two-dimensional implicit brightness function of the earth's surface B(x,y,t) projected to the image plane of sensor (see Fig. 1) is expressed by the relation

d B(x, y, t) dt Part of C.CD sensor structttre N

=

~B(x, y, t)

dx

15x

dt

I

6B(x,y, t) d y

" ~ =

15y

~B(x, y, t)

dt

15x

"VTx

+

~B(x, y, t) 15y

9VTy. (1)

It can be seen from Fig. 1, that the TVV components

lnmge plane

VTx and VTy are in some relation (due to the geometry of the S

-t

the applied optical system) with the components Vox a Voy in image plane according to the relations

122 .L-----~

v

Ox =

m

Fo ~

h

" V Tx

and

v Oy

m

~

Fo

~

h

~

v Ty,

(2)

where the meaning of symbols h and F o is evident from Fig. 1. If we succeed in determining the time derivative dB/dt ....

and both the directional derivatives 5B/Sx, ~SB/Sy,there remain two unknown variables in the equation of total differential namely the searched TVV components Vox a Voy, respectively. It is evident therefrom, that these components can be found by solving the system of two independent equations of total differentials of the earth's surface brightness function, thus those of brightness function from two different places of earth's surface. Measuring point (later MP) represents an arbitrary

Fig. 1: Principle of the method

geometrical arrangement of the photosensitive layer (of pixels) of the CCD sensor (two-line or as a part of an area sensor),

178 from which it is possible to approximate, by suitable algorithm, the directional derivatives in two different directions. It is possible to determine the approximation of time derivative from the change of the magnitude of voltage signal samples of pixels at different time instants. Generally, the position of MP in space is totally arbitrary one, but is necessary to know its rotation angle with respect to the aircraft's axis or to other relative coordinate. For accurate determination of TVV, two strategies are offered. The first one consists in as much as possible accurate approximation of the discretised brightness function and following calculation of time- and two directional derivatives of the continuous total differentials by means of discrete values of pixel signals in measuring point. Then it could be sufficient, for the determination of the TVV components at given instant, to measure only in two MP's. The second variant supposes a simple aproximation of time- and directional derivatives in great number of MP's with simple arrangement of pixels. By a great number of combinations of equation system solutions in different pairs of MP's, it is possible to obtain the sets of not too much accurate values of TVV components, which will be consequently processed by suitable statistical method. The first variant seems to be less suitable due to the stochastic character of brightness function and with respect to its machine time of computations of sutliciently accurate algorithms. Moreover, the resulting accuracy is influenced by the quantization error of the A/D conversion. Therefore, the second variant of TVV determination has been chosen and it was verified by computer simulation. In Fig.2 there is a representation of the part of CCD sensor motion over the discretised brightness function B(x,y). In correspondence with chosen strategy of TVV determining, the time- and both the directional derivatives have been approximated by the simplest relations

d B(x,y)

Ul(n + 1,m)- Ul(n,m )

dt

x

, (3)

~SB(x,y) Ul(n,m + l ) - U l ( n , m ) , (4) 5x 5 and

Fig. 2: Shift of the MP after period x

~SB(x,y)~ U2(n,m ) - Ul(n'm), (5)

~y

where n = 0, 1, 2... is the serial number of measurement, m = 1, 2, 3... is the serial number of pixel in CCD structure, ~5 ...is the size of quadratic

pixel. Time- and directional derivatives from relation (1) are replaced by time- and directional differences (3) to (5). MP in this case contains only three pixels with output signals Ul(m,n ), Ul(m+ 1,n) and U2(m,n ).

,3. Computer simulation of the measuring system activity For the verification of features of the designed system and for the determination of achievable accuracy of the measurement, the computer simulation has been used, which models the optoelectronic transformation of brightness distribution B(x,y) in CCD sensors into the voltage samples Ul(m,n ) and U2(m,n ). After the simulation of A/D conversion of that samples all the calculations are performed in numerical form in such a manner, that the used computing algorithms could be exploited in real measuring device.

179 It is important especially the choice of computing algorithms for the elimination of results of partial measurement with great deviation, caused by the choice of second strategy of TVV determination, according to chapter 2. As the best it was shown the application of dynamic filtration by means of the state estimation of measured object. 4. Dynamic filtration of the set of component velocities

Chosen filtration comes out of the reality, that the aircraft's motion is inertial one. due to its great motive power and therefore, despite of the ignorance of all influences, acting on its motion, it is possible to estimate its next state. As a kernel of the applied dynamic filter there are two models, namely the aircraft's model and that of measuring system. The aircraft model performs its own inertial motion and it is possible to compute by the known equation of that motion the more or less accurate estimation of the instantaneous quantities of the aircraft's state vectors Sox and Soyi The task of the model of measurement system lies in the transformation of the aircraft's state estimation into the estimation of measuring quantities (TVV components), which is consequently compared with the values measured by measuring device on real object (aircraft). The obtained deviation will be then used, after amplification, for the correction of motion of aircraft's model. If the amplification factor is chosen appropriately, the accuracy of the aircraft's state estimation will be improved. However, the parameters of the aircraft's model depend upon the estimated values and they have to approximate to the values of real object (aircraft). The filtration is performed on the set of component velocites Vo~i a Voyi. All the elements of that file have been obtained by the same system of measurement using the evaluation of the brightness distribution of different parts of earth's surface.

Fig. 3: Dependence of relative errors of measured values upon the serial number of measurement n

Fig.3 represents the dependences of the relative

error

5(Voy)and relative error of the estimation of the

component of position vector after the dynamic filtration 1

N

/5(So,,) = ~- 9~ ( / 5 , (Vo,) + 1 ) - 1. n=O

(6)

180 5. Hardware for the simulation of the measuring system function

Computer simulation for verification of the developped measuring system and its attainable results has been done not only with the image data generated by means of computer, but also using digitalized videosignals of camera with special double-line sensor CCD (2 x 128 pixels). The camera is able to scan the moving photographs of the real earth's surface. video 1.] input I ] amplifier I double line CCD sensor

CMOS/I'TL] driver[

-.objective J input I video 2"]amplifier I

moveable photograph

t

control circuits

I/I I

Ic~

bu,

bus

driver

I

clock

IIIIIIII I'.1 I I I I I : !

..""'"-.

[ A/D converter

> >

,L I

I

bus

driver btt fer SR kM btt fer SR kM I

vo, I

ISA bus counters

~

data bus drfi,er

I ha,

[driver control signals status signals

add "ess ~ dee ~der

to PC

Fig. 4: Block diagram of the scanlng unit and PC add-on card

The block diagram of the sensing unit and PC add-on card can be seen from Fig.4. This adapter enables the amplification, A/D conversion and storage of both output videosignals video 1 and video 2 of the sensor (sample frequency approx. 1 MHz, 8 bite representation) into RAM or hard disk of standard PC compatible computers. The data can be then processed using the algorithms mentioned above. 6. Conclusion

Computer simulation of the designed optoelectronic method of the aircraft's TVV proved the possibility of obtaining high accuracy of measurement and the admissibility of the design of real measuring device. It enables the optimization of parameters of algorithms applied for the TVV determination and to quantify and minimize the machine time for computations. References

[1] RICNY, V.-MIKULEC,J.: Measuring Flying Object Velocity with CCD Sensors. IEEE Aerospace and Electronic Systems. Vo.9, Nr.6, June 1994 (pp..3-6) [2] JURIK, R.: PC Add-on Card for the Double-line CCD Sensor. Proceedings of the 6th National Scientific Conference ,,Radioelektronika 96". Faculty of Electrical Engineering and Computer Science TU BRNO, 1996 (pp.95-96)

Session F: TEXTURE ANALYSIS

This Page Intentionally Left Blank

Proceedings IWISP '96,"4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

183

Rotation Invariant Texture Classification Schemes using GMRFs and Wavelets Robert Porter* and Nishan Canagarajah* Image Communications Group, Centre for Communications Research, University of Bristol, UK. Abstract Many texture classification schemes suffer from a number of drawbacks. They require an excessively large image area for texture analysis, use a large number of features to represent each texture and are often computationaUy very demanding. Furthermore, few classification schemes have the ability to maintain a high classification rate for textures that have undergone a rotation. In this paper, we present two new rotation invariant texture classification schemes based on Gaussian Markov random fields and the wavelet transform. These schemes offer a high classification performance on textures at any orientation using significantly fewer features and a smaller area of analysis than most existing schemes.

1.

Introduction

Texture classification is a difficult but important area of image analysis with a wide variety of applications ranging from remote sensing and crop classification to medical diagnosis. A number of approaches to this problem have been proposed over recent years including stochastic models such as Gaussian Markov Random Fields (GMRFs) [1] and autoregression [2, 3], statistical analysis methods [4] and spatial frequency based techniques [5, 6] amongst many others. However, many of the existing methods require a large number of features to describe each texture which can lead to an unmanageable size of feature space [4]. Furthermore, the feature extraction techniques employed are often computationally very demanding [4] and require an excessively large image area for the analysis [4, 6]. This is clearly undesirable if only small texture samples are available or if the features are to be applied to a segmentation problem requiring high resolution. Another drawback of the majority of classification schemes is their inability to maintain a high classification rate when the textures for classification have undergone a rotation [5]. Here, two new classification schemes are proposed, employing features extracted using either wavelet analysis or Gaussian Markov random field modelling on a small area of the image. It is shown that these schemes require significantly fewer features than most others and provide high performance rotation invariant texture classification.

2. 2.1

ProposedSchemes The Wavelet Transform

The first approach derives features from a 3-level wavelet decomposition of a small area ( 1 6 x 1 6 ) of the image. Fig. l(a) shows the 10 main wavelet channels resulting from such a decomposition. A feature vector made up of the average energies within these channels was successfully employed in segmenting textured images in [7]. However, the HH channels in each level of decomposition tend to contain the majority of noise in the image and were found to degrade the performance when used for texture classification. Therefore, only the remaining seven channels were chosen to provide features for texture classification (the numbered channels in Fig. l(b)). The energy in each of the chosen wavelet channels is calculated to create a seven-dimensional feature vector for texture classification. The energy of a wavelet channel is given simply by the mean magnitude of its wavelet coefficients, i.e. ec,,, the energy in the nth channel is given by: 1

M N

eo, = - ~ ,~. ~~[x(i,j) l,

(1)

where the channel is of dimensions M by N, i and j are the rows and columns of the channel and x is a wavelet coefficient within the channel. Unfortunately, these features are not rotation invariant, since different features are used to represent the texture's horizontal and vertical frequency components. Rotation invariance can be e-mail : [email protected] * e-mail : [email protected]

184 achieved by combining the horizontal and vertical frequency components to form single features. Hence, the pairs of diagonally opposite LH and HL channels in each level of decomposition are grouped together to produce four main frequency or scale bands in the proposed scheme, as illustrated in Fig. 1(c). The energy in each of the four chosen bands is calculated (using equation 1) to create a four-dimensional feature vector which is then used in the classification algorithm. This approach is thus based entirely on the composition of spatial frequencies within the texture and is not heavily dependent on the texture's directionality. Although this can have disadvantages in distinguishing between textures of very similar spatial frequency, it provides a robust rotation invariant set of features for texture classification.

8

ltl

6

H

1[~

4

i!111

ltJ

H

(a) (b) (c) Figure I - (a) Ten main channels of a 3-1eve1 wavelet decomposition of an image; (b) Wavelet channels used to produce features for texture classification; (c) Grouping of wavelet channels to form the 4 bands used to produce rotation invariant features.

2.2

Gaussian Markov R a n d o m Fields

GMRFs have been shown to perform well both in texture classification [ 1] and image segmentation. Here, the texture can be represented as a set of zero mean observations, y(s), s E ~ , s = {s = (i, j): 0 s i, j < M - 1} (2) for an

MxM lattice.

The GMRF model assumes the observations obey the following equation [ 1],

y(s) where

Ns

=

r~O rY(S + r)+ e(s)

is the neighbour set, 0 r is the GMRF parameter for neighbour r and

(3)

e(s) is

a stationary Gaussian

noise sequence. The neighbour set is assumed to be symmetric: O r = 0_r, for all r E N s

(4)

The GMRF parameters and the variance, v, of the noise source can be estimated for a given texture using the least squares approach [1 ] and are often successfully employed as features for texture classification. However, these features are not rotation invariant since each pair of neighbours can only represent the texture in a single direction. It was found that in order to achieve rotation invariance, the neighbour set should be circularly symmetric so that each GMRF parameter depends on neighbours in all directions. The neighbour sets for the 1st, 2nd and 3rd order circular GMRFs are shown in Fig. 2. The grey levels of neighbours which do not fall exactly in the centre of pixels can be estimated by interpolation. This model is the GMRF equivalent of the autoregressive models in [2] and [3], but was found to give a high classification performance without the need for multiresolution analysis [3] and is thus more computationally efficient. For the third order circular GMRF, just three parameters exist for the three sets of circularly symmetric neighbours. The features used for texture classification comprise these three parameters and the variance parameter, extracted using the least squares approach from a 16x16 area of the image. The third order GMRF is chosen to balance a high performance with a small number of features.

Figure 2 - Neighbour sets for 1st, 2nd and 3rd order circular GMRFs.

185

3.

Classification Results

Sixteen 256x256 Brodatz textures [8] were used to test the performance of the features. One sample image of each texture was used to provide several 16xl 6 sub-images with which to train the classification algorithm. A further 7 sample images of each texture were presented to the algorithm in a random order as unknown textures for classification. A minimum distance classifier was employed (using the Mahalonobis distance [6]) to perform the actual classification. Training and classification were first performed on the original textures, producing the first column of results in Table 1. The training set was then presented at angles of 0, 30, 45 and 60 degrees and the textures for classification at 20, 70, 90, 120, 135 and 150 degrees, yielding the second column of results in Table 1. The classification results for the two proposed rotation invariant schemes were compared to those using features from the traditional 3rd order GMRF and from the wavelet transform without the combination of channels. Table 1 summarises the results. Although the third order GMRF parameters give 100% correct classification when the textures are presented at their original orientation, they perform very poorly on the rotated textures, classifying only 45.8% of the samples correctly (see confusion matrix in Fig. 3a). This is due to the strong directional dependence of the parameters in the traditional GMRF model. The proposed circular GMRF model uses a circularly symmetric neighbour set to remove this directional dependence, resulting in a high classification performance both for the textures at their original orientations (93.8%) and for the rotated textures (95.1%). The confusion matrix in Fig. 3(b) illustrates this performance for the rotated textures. Misclassifications tend to occur either for visually very similar textures (e.g. paper and sand) or for textures with a high level of directionality which cannot be identified using a circular model (e.g. wood). The wavelet-based features using seven channels of the wavelet transform also have a strong directional dependence. These features give a high classification performance for the original textures (99.1%), but a mediocre performance for the rotated textures (86.5%, see Fig. 3c). By combining the directionally dependent wavelet channels, as in the proposed scheme, a high level of rotation invariance is achieved giving a correct classification rate of 95.5% for the original textures and 95.8% for the rotated textures. The scheme's performance for the rotated textures is illustrated in the confusion matrix in Fig. 3(d). The misclassifications occur only on the highly directional textures such as wood and raffia. This is because the directional information is lost when the wavelet channels are combined. For each of the proposed schemes, there is a slight degradation in their performance on the original textures compared to the non-rotation invariant approaches. This is due to the loss in directional information on making the schemes rotation invariant.

4.

Conclusion

Two novel texture classification schemes have been proposed, the first using the wavelet transform and the second using Gaussian Markov random fields. These schemes exhibit comparable performances to existing methods but both use a significantly smaller feature space. Furthermore, the features are robust and computationally inexpensive (both methods are amenable to fast implementation) and only a small analysis area for feature extraction is required, as desirable for texture segmentation applications. In addition, unlike most existing techniques, the proposed schemes are invariant to rotations of the textures to be classified, attaining the same high classification performance on the textures at all orientations. The traditional GMRF approach or the non-rotation invariant wavelet method are obviously preferable if the textures are guaranteed to occur only at the orientation they have been trained at. However, the proposed schemes are far superior when the rotation of the texture is not known a-priori as is often the case in real applications. The waveletbased approach is especially favourable, since it gives a higher performance, is computationally more efficient and its features are easily derivable from its non-rotation invariant counterpart.

3rd order GMRFs (7 features) 3rd order Circular GMRFs (4 features) Wavelet-Based Features (7 features) Rotation Invariant Wavelet-Based Features (4 features)

Original Textures

Rotated Textures

100.0% 93.8% 99.1% 95.5%

45.8% 95.1% 86.5% 95.8%

Table 1 - Texture Classification Performance Results

186 References [1]

[2] [3]

[4] [5] [6]

[7] [8]

R. Chellappa and S. Chatterjee, "Classification of Textures Using Gaussian Markov Random Fields," IEEE Trans. Acoustics, Speech, and Signal Processing, vol.33, no.4, pp.959-963, Aug. 1985. R.L. Kashyap and A. Khotanzad, "A Model-Based Method for Rotation Invariant Texture Classification," IEEE Trans. Pattern Analysis and Machine Intelligence, vol.8, no.4, July 1986. J. Mao and A.K. Jain, "Texture Classification and Segmentation Using Multiresolution Simultaneous Autoregressive Models," Pattern Recognition, vol.25, no.2, pp.173-188, Feb. 1992. Y.Q. Chen, M.S. Nixon and D.W. Thomas, "Statistical Geometrical Features for Texture Classification," Pattern Recognition, vol.28, no.4, pp.537-552, Apr. 1995. K. Etemad and R. Chellappa, "Separability Based Tree Structured Local Basis Selection for Texture Classification," Proc. International Conference on Image Processing 1995, pp.441-445. T. Chang and C.-C.J. Kuo, "Texture Analysis and Classification with Tree-Structured Wavelet Transform," IEEE Trans. Image Processing, vol.2, no.4, pp.429-441, Oct. 1993. R. Porter and C.N. Canagarajah, "A Robust Automatic Clustering Scheme for Image Segmentation using Wavelets," IEEE Trans. Image Processing, vol.5, no.4, pp.662-665, Apr. 1996. P. Brodatz, Textures: A Photographic Album for Artists and Designers. Dover: New York, 1966. CLASSIFIED AS

T E X T U R E

cloth cotton canvas grass raffia rattan wood leather mcrttlnQ wOOl rep~le sand straw plaskln paper

7

38

3

IE 2cJ I 39 3 357 7 112

25

T

4 15 312 1 1

5 7

CLASSIFIED AS

2 13

21

3 20

5 14

13 28 1913 42

6

15

weave

E X T U R E

i 9

14

4~

cloth cotton canvas grass raffia rattan wood leather mating wool rep111e sand shaw pl~kln paper

2

38

42 37

42

37

3 42 4.2 34

42

(b) CLASSIFIED AS

canvas

42

38

weave

cloth cotton 35

42

42

CLASSIFIED AS

T E x T U R E

39

I

(a)

42 cloth cotton 26 42 canvas ~rass raffia rattan wood leather matflna WOOl rep111e sand straw pl~kJn paper weave

42

4~

7

(c)

9

35

9 2417 42

42

42

32

T r raffia E rattan X wood T leather U mating R wOOl E reD111e sand rtraw 31Clskln paper

2

4,2

4~

42

32 35

42

7 31

42 4~

42

42 42

weave

(d)

Figure 3 - Confusion matrices for classification results of rotated textures using: (a) GMRF features; (b) circular GMRF features; (c) wavelet-based features; (d) rotation invariant wavelet-based features.

Proceedings IWISP '96," 4- 7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

187

A NEW METHOD FOR DESCRIBING TEXTURE D. T. Pham* and B. G. ~etiner + *Intelligent Systems Laboratory, School of Engineering, University of Wales, Cardiff, PO Box 917, Newport Road, Cardiff, CF2 1XH, UK. +Istanbul Technical University, Faculty of Aeronautics Engineering, Maslak, Istanbul, Turkey.

ABSTRACT A new method is presented for obtaining feature vectors for describing texture. The method uses grey level difference matrices that are reminiscent of co-occurrence matrices but are much simpler to compute. Textural feature vectors are classified using artificial neural networks (ANNs). Comparative results for the new method and the standard Spatial Grey Level Dependence (SGLD) method are provided. Key words: Texture Analysis, Texture Classification, Neural Networks.

1 INTRODUCTION Texture is a fundamental stimulus for visual perception. Natural image analysis systems, such as the human visual system, use texture as an aid in segmentation and interpretation of scenes. Despite its importance, there is no generally accepted definition of texture and no agreement on how to measure it. This paper describes a new second-order statistics method for computing textural features and provides the results of using neural networks to recognise textures based on those features. Comparative results for the Spatial Grey Level Dependence (SGLD) or co-occurrence matrix method [Haralick et al., 1973] are also presented.

2 GREY LEVEL DIFFERENCE (GLD) METHOD FOR TEXTURE ANALYSIS The method involves computing GLD matrices, each element of which is the sum of scaled grey level differences between neighbouring pixels. Grey levels are quantised into groups to reduce the dimensions of the matrix, the number of groups being the number of rows/columns in the matrix. For each interpixel distance d and direction 0, a matrix can be computed. The concepts of interpixel distance and direction are similar to those adopted in the SGLD method. For example, with d=l, pixels that are immediately next to the pixel of interest are considered and with d=2, pixels that are separated by one pixel from the pixel of interest are used. There is a maximum of 8 directions, namely, 0 - 0 ~ 45 ~ 90 ~ 135 ~ 180 ~ 225 ~ 270 ~ and 315 ~ These define the position of a neighbouring pixel relative to the pixel of interest. For instance, the 0 ~ and 180 ~ neighbours of a pixel are the pixels to its fight and to its left respectively. The GLD matrix for a givend and 0 is computed as follows:(i) Quantise the grey levels inton groups. This fixes the dimensions of the GLD matrices tonxn. (ii) Initialise all elements of the GLD matrix to zero. (iii) Select the pixel to be processed in the image window. Call this pixel 1. (iv) Find the neighbour of pixel 1 at the specified interpixel distance and in the specified direction. Call this pixel 2. (v) Calculate the scaled grey levels of pixels 1 and 2, namely:

188

Pl

= Pl N----g-

pz = P.._2_.2 Ng

where pl and pz are the raw grey levels of pixels 1 and 2. P1 and PE range between 0 and 1. N g is the number of grey levels in the image. (vi)Calculate the scaled grey level difference between pixels 1 and 2:

Thus, GLD is a number between 1 and 2. GLD is equal to 1 when P1 and P2 are the same and to 2 when PI is 1 and PE is 0 or vice versa. GLD is arranged to be between 1 and 2 so that elements representing zero grey level differences are distinguished from ordinary (initialised) zero elements in the GLD matrix. (vii)Determine the GLD matrix element that corresponds to the scaled grey levels P1 and P2 that is to be updated. The position (i, j) of the element is calculated as follows: i = INT[n* P1]; j = INT[n* P2] where INT is a function that converts real numbers n* Pl and n* P2 into the nearest integer numbers. (viii)Update the GLD matrix element found in the previous step by adding to it the GLD value obtained in step (vi), that is: new_GLD(i, j) = old_GLD(i, j) + GLD (ix) If all neighbouring pixels of pixel 1 have been processed then go to step (x). Otherwise, go to step (iv). (x) If all pixels in the image window have been processed then STOP. Otherwise, go to step (iii). As an example, consider the image window in Figure 1 (a). The numbers of grey levels and grey level groups are 64 and 5 respectively. Let pixel 1 (with grey level equal to 48) be element (3, 3) and pixel 2 (with grey level equal to 35) be element (3, 4) in the image window. The scaled grey levels for these pixels are P~=48/64---0.75 and P2=35/64=0.547. The GLD value for these pixels is ]P1P21+1=1.203. The GLD matrix element corresponding to P~ and P2 is element (4, 3). Thus, it is updated from its initial zero value to 1.203. Similarly, let pixel 1 be element (4, 1) and pixel 2 be element (4, 2). The scaled grey levels for these pixels are P~--0.75 and P2=0.5. The GLD value for these pixels is 1.25. Again, the GLD matrix element corresponding to P1 and PE is element (4, 3). Thus, that element now becomes 2.453. The GLD matrix for the entire image window corresponding to an interpixel distance of 1 and a neighbouring pixel direction of 0 ~ is shown in Figure 1 (b).

.I.ID~ ,l~mE (a)

189 Grey Level

aar~ 0-12

0

0

13-25

0

0

0

26-38

2.031

0

0

2.453

1

0

2.219

52-63

0

0 5.219

(b) Figure 1. (a) Image window (b) GLD matrix calculated from the image 3 CLASSIFICATION OF GLD MATRICES GLD matrices were constructed for the 16 texture images from the Brodatz album [Brodatz, 1968]. These were 128x128 images of natural objects or scenes (for instance, reptile skin, grass lawn and beach pebbles). Each image was divided into 32x32 non-overlapping windows. This yielded a total of 256 patterns. The number of grey levels was 256. Eight grey level groups were employed giving GLD matrices of size 8x8. In addition to individual GLD matrices for the eight directions, direction invariant matrices were also computed by adding the corresponding elements in the individual matrices. Interpixel distances of 1 to 5 were adopted. This gave a total of 45 data sets, each with 256 patterns. A GLD matrix was obtained for each pattern. The matrix elements were used directly as features. Half of the feature vectors were selected randomly and employed as training examples. The remainder were used to test the classification accuracy of the trained classifiers. Thus, there were 45x128 feature vectors for training and the same number for testing.

The LVQ2 neural network with a conscience mechanism [Pham and Oztemel, 1994, 1996] was adopted as the tool for classifying the feature vectors into the correct texture class. That network was chosen after comparing its performance with the popular Multi-layer Perceptron classifier [Pham and Liu, 1995] on an experimental group of 9 data sets. The network had 64 inputs (the elements of the GLD matrix), 16 outputs (the texture classes) and 96 hidden Kohonen neurons. The number of Kohonen neurons was chosen empirically. To compare the proposed texture description method against the popular SGLD method, SGLD features were obtained for the same directions as was carried out for the proposed method. A feature vector of five components (energy, entropy, correlation, local homogeneity and inertia) was computed for each direction. An LVQ2 network was also employed for classifying the feature vectors. The network had 5 inputs (the elements of a feature vector), 16 outputs (the texture classes) and 32 Kohonen neurons. Again, the number of Kohonen neurons was found empirically. 4 RESULTS AND DISCUSSION Table 1 gives the results for all the 45 data sets. It can be observed that the classification accuracy using GLD matrices is superior to that using SGLD features for all interpixel distances and directions. The table also shows that with both methods the best accuracies were obtained for an interpixel distance of I.

Note that, although the dimension of the feature vectors in the SGLD method is smaller than that for the proposed method, the computation required to obtain the SGLD feature vectors [Haralick et al., 1973] is much more demanding. Additionally, the time required to train the LVQ classifiers to recognise the information-rich GLD feature vectors was comparable to that for the SGLD feature vectors.

190

Table 1. Number of misclassifications for each data set and average classification accuracies. 5 CONCLUSION A new texture analysis method based on grey level difference statistics has been described and its results have been compared with those of the SGLD method. The new method gave much better texture discrimination accuracies than the SGLD method on the natural texture images chosen from the Brodatz album.

References Brodatz P. (1968) "Textures: A Photographic Album for Artists and Designers", Van Nostrand Reinhold, New York. Haralick R. M., Shanmugam K. and Dinstein I. (1973) "Textural Features for Image Classification", IEEE Trans. Syst., Man, Cybern., Vol. SMC-3, No. 6, November, pp. 610-621 Pham D. T. and Liu X. (1995) "Neural Networks for Identification, Prediction and Control", Spfinger-Verlag, London and Berlin, pp.4-7 Pham D. T. and Oztemel E. (1994) "Control Chart Pattern Recognition Using Learning Vector Quantization Networks", Int. J. Production Research, 32(3), pp. 721-729 Pham D. T. and Oztemel E. (1996) "Intelligent Quality Systems", Springer-Verlag, London and Berlin.

Proceedings IWISP '96," 4-7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

191

Texture Discrimination for Quality Control Using Wavelet and Neural Network Techniques. D.A. Karras 1 and S.A. Karkanis 2 and B.G. Mertzios 3 1University of Ioannina, Department of Informatics, Ioannina 45110, Greece, [email protected] 2NRCPS "Democritos", Inst. of Nuclear Technology, Aghia Paraskevi, 15310 Athens, Greece, [email protected] 3Dernocritus Univ.of Thrace, Dept. of Electr.and Comp. Eng., 67 100 Xanthi, Greece, [email protected]

Abstract This paper aims at investigating a novel solution to the problem of defect recognition from images, that can find applications in building robust quality control vision based systems. Such applications can be found in the production lines of textile, integrated circuits, machinery, etc. The proposed solution focuses on detecting defects from their textural properties. More specifically a novel methodology is investigated for discriminating defects in textile images by applying a supervised neural classification technique, employing a multilayer perceptron (MLP) trained with the online backpropagation algorithm, to innovative wavelet based feature vectors. These vectors are extracted from the original image using the cooccurrence matrices framework and SVD analysis. The results of the proposed methodology are illustrated in a defective textile image where the defective area is recognized with 98.48 % accuracy.

I. Introduction Defect recognition from images is becoming increasingly significant in a variety of applications since quality control plays a very important role in contemporary manufacturing of virtually every product. Despite the lot of interest, little work has been done in this field since this classification problem presents many difficulties. However, the resurgence of interest for neural network research has revealed the existence of powerful classifiers. In addition, the emergence of the 2-D wavelet transform [5],[6] as a popular tool in image processing offers the ability of robust feature extraction in images. Combinations of both techniques have been used with success in various applications [10]. Therefore, it is worth attempting to investigate whether they can jointly offer a viable solution to the defect recognition problem. To this end, we propose a novel methodology in detecting defective areas in images by examining the discrimination abilities of their textural properties. Besides neural network classifiers and the 2-D wavelet transform, the tools utilized in such an analysis are cooccurrence matrix based textural feature extraction [4] and SVD analysis. The problem at hand can be clearly viewed as an image segmentation one, where the image should be segmented in defective and non defective areas only unlike its conventional consideration. Concerning the classical segmentation problem, that is dividing an image into homogeneous regions, the discovery of a generally effective scheme remains a challenge. To this end, many interesting techniques have been suggested so far including spatial frequency techniques [9] and relevant ones like texture clustering in the wavelet domain [9]. Most of these methodologies use very simple features like the energy of the wavelet channels [9] or the variance of the wavelet coefficients [3]. Our approach stems from this line of research. However, there is need for much more sophisticated feature extraction methods if one wants to solve the segmentation problem in its defect recognition incarnation, taking into account the high accuracy required. Following this reasoning we propose to incorporate in the research efforts the cooccurrence matrices analysis, since it offers a very accurate tool for describing image characteristics and especially texture [4]. It clearly provides second order information about pixel intensities when the majority of the other feature extraction techniques do not exploit it at all. Two are the main stages of the suggested system. Namely, optimal feature selection in the wavelet domain (optimal in terms of the information these features carry) and neural network based classification. The viability of the concepts and methods employed in the proposed approach is illustrated in the experimental section of the paper, where it is clearly shown that, by achieving a 98.48 % defective area classification accuracy, our methodology is very promising for use in the quality control field.

II. Stage A: Optimal feature selection in the wavelet domain The problem of texture discrimination, aiming at segmenting the defective areas in images, is considered in the wavelet domain, since it has been demonstrated that discrete wavelet transform (DWT) can lead to better texture modeling [ 1]. Also, in this way we can better exploit the well known local information extraction properties of wavelet signal decomposition as well as the well known features of wavelet denoising procedures [7]. We use the popular 2-D discrete wavelet transform scheme ([5],[6] etc.) in order to obtain the wavelet analysis of the original images containing defects. It is expected that the images considered in the wavelet domain should be smooth but due to the well known time-frequency localization properties of the wavelet transform, the defective areas- whose statistics vary from the ones of the image background- should more or less clearly emerge from the background. We have experimented with the standard 2-D Wavelet transform using nearly all the well known wavelet bases like Haar, Daubechies, Coiflet, Symmlet etc. as well as with Meyer's and Kolaczyk's 2-D Wavelet transforms [6]. However, and this is very interesting, only the 2-D Haar wavelet transform has exhibited the expected and desired properties. All the

192 other orthonormal, continuous and compactly supported wavelet bases have smoothed the images so much that the defective areas don't appear in the subbands. We have performed a one-level wavelet decomposition of the images, thus resulting in four main wavelet channels. Among the three channels 2, 3, 4 (frequency index) we have selected for further processing the one whose histogram presents the maximum variance. A lot of experimentation has shown that this is the channel corresponding to the most clear appearance of the defective areas. The subsequent step in the proposed methodology is to raster scan the image obtained from the selected wavelet channel with sliding windows of M x M dimensions. We have experimented with 256 x 256 images and we have found that M=8 is a good size for the sliding window. For each such window we perform two types of analysis in order to obtain features optimal in terms of information content. First, we use the information that comes from the cooccurrence matrices [4]. These matrices represent the spatial distribution and the dependence of the gray levels within a local area. Each (i,j) th entry of the matrices, represents the probability of going from one pixel with gray level (i) to another with a gray level (j) under a predefined distance and angle. More matrices are formed for specific spatial distances and predefined angles. From these matrices, sets of statistical measures are computed (called feature vectors) for building different texture models. We have considered four angles, namely 0, 45, 90, 135 as well as a predefined distance of one pixel in the formation of the cooccurrence matrices. Therefore, we have formed four cooccurrence matrices. Due to computational complexity issues regarding cooccurrence matrices analysis we have quantized the image obtained from the selected wavelet channel into 16 gray levels instead of the usual 256 levels, without diverse effects in defective area recognition accuracy. This procedure, also, renders the on-line implementation of the proposed system highly feasible. Among the 14 statistical measures, originally proposed by Haralick [4], that are derived from each cooccurrence matrix we have considered only four of them. Namely, angular second moment, correlation, inverse difference moment and entropy.

9 energy- Angular Second Moment

fl = Z

Z p(i, j ) 2

i

j

1~ Ne

9 Correlation

Z Z (i *j)p(i, :)-

f2 = i=l j=l

GO"

1

9 InverseDifferenceMoment

f3=~i~l+(i_j)p(i,j

9 Entropy

f4 =- Z Z p(i, j)log(pO, j)) l

)

./

These measures, we have experimentally found, that provide high discrimination accuracy that can be only marginally increased by adding more measures in the feature vector. Thus, using the above mentioned four cooccurrence matrices we have obtained 16 features describing spatial distribution in each 8 x 8 sliding window in the wavelet domain. In addition, we have formed another set of 8 features for each such window by extracting the singular values of the matrix corresponding to this window. SVD analysis has recently been successfully related to invariant paaern recognition [8]. Therefore, it is reasonable to expect that it provides a meaningful means for characterizing each sliding window, thus preserving first order information regarding this window, while, on the other hand, the cooccurence matrices analysis extracts second order information. Therefore, we have formed, for each sliding window, a feature vector containing 24 features that uniquely characterizes it. These feature vectors feed the neural classifier of the subsequent stage of the suggested methodology, next described.

III. Stage B: Neural network based segmentation of defective areas. After obtaining information about the textural structure and other characteristics of each image, utilizing the above depicted methodology, we employ a supervised neural network architecture of the multilayer feedforward type (MLPs), trained with the online backpropagation error algorithm, having as goal to decide whether a texture region belongs to a defective part or not. The inputs to the network are the 24 features of the feature vector extracted from each sliding window. The best network architecture that has been tested in our experiments is the 24-35-35-1. The desired outputs during training are determined by the corresponding sliding window location. More specifically, if a sliding window belongs to a defective area the desired output of the network is one, otherwise it is zero. We have defined, during MLP training phase, that a sliding window belongs to a defective area if any of the pixels in the 4 x 4 central window inside the original 8 X 8 corresponding sliding window belongs to the defect. The reasoning underlying this definition is that the decision about whether a window belongs to a defective area or not should come from a large neighborhood information, thus preserving the 2-D structure of the problem and not from information associated with only one pixel (e.g the central pixel). In addition and probably more significantly, by defining the two classes ill such a way, we can obtain many more training patterns for the class corresponding to the defective area, since defects, normally, cover only a small area of the original image. It is important for the effective neural network classifier

193 learning to have enough training patterns for each one of the two classes but, on the other hand, to preserve as much as possible the a priori probability distribution of the problem. We have experimentally found that a proportion of 1:3 for the training patterns belonging to defective and non-defective areas respectively, is very good for achieving both goals.

IV.

Results and Discussion.

The efficiency of our approach in recognizing defects in automated inspection images, based on utilizing texture information, is illustrated in the textile image shown in fig. 1 which contains a very thin and long defect in its upper side as well as some smaller defects elsewhere. This image is 256 x 256, while the four wavelet channels obtained by applying the 2-D Haar wavelet transform are 128 x 128. These wavelet channels are shown in fig. 2. In fig. 3 the selected wavelet channel 3 of maximum histogram variance is shown. There exist 14641 sliding windows of 8 x 8 size in this wavelet channel. The neural network has been trained with a training set containing 1009 patterns extracted from these sliding windows as described above. 280 out of the 1009 patterns belong to the long and thin defective area of the upper side only, while the rest belong to the class of non defective areas. The learning rate coefficient was 0.3 while the momentum one was 0.4. The neural network has been tested on all the 14641 patterns coming from the sliding windows of the third wavelet channel. The results are shown in fig. 4. Note that the network based on the suggested methodology was able to generalize and find also some other minor defects, while another network of the same type trained with the 64 pixel values of the sliding windows, under exactly the same conditions, was able to find only the long and thin defect. This fact demonstrates the efficiency of our feature extraction methodology based on textural and SVD features. Finally, in terms of classification accuracy we have achieved an overall 98.48 %. The evolution of the training error and of the generalization ability for the class corresponding to defects is shown in fig. 5, 6 respectively.

Figure 1. Original textile image containing a defect

Figure 3. QMF Channel No.3

Figure 2. Wavelet transformation of the original image

Figure 4. Resulted Image - White regions represent the defects

194

Figure 5. Learning Error Evolution

Figure 6. Generalization Performance Evolution

V. Conclusions

We have proposed a novel methodology for detecting defects in automated inspection images based on wavelet and neural network segmentation methods by exploiting information coming from textural analysis and SVD in the wavelet channels of the 2-D Haar wavelet transformed original images. The efficiency of this approach is illustrated in textile images and the classification accuracy obtained is 98.48 %. Clearly, our methodology deserves further evaluation in quality control vision based systems.

References

[1] Ryan, T. W., Sanders, D., Fisher, H. D. and Iverson, A. E. "Image Compression by Texture Modeling in the Wavelet Domain", IEEE trans. Image Processing, Vol. 5, No. 1, pp. 26-36, 1996. [2] Antonini, M., Barlaud, M., Mathieu, P. and Daubechies, I. "Image Coding Using Wavelet Transform", IEEE trans. Image Processing, Vol. 1, pp. 205-220, 1992. [3] Unser, M. "Texture Classification and Segmentation Using Wavelet Frames", IEEE trans. Image Processing, Vol. 4, No. 11, pp.1549-1560, 1995 [4] Haralick, R. M., Shanmugam, K. and Dinstein, I. "Textural Features for Image Classification", IEEE Trans. Systems, Man and Cybernetics, Vol. SMC-3, No. 6, pp. 610-621, 1973. [5] Meyer, Y. "Wavelets: Algorithms and Applications", Philadelphia: SIAM, 1993 [6] Kolaczyk, E. "WVD Solution of Inverse Problems", Doctoral Dissertation, Stanford University, Dept. of Statistics, 1994 [7] Donoho, D. L. and Johnstone, I. M. "Ideal Time-Frequency Denoising." Technical Report, Dept. of Statistics, Stanford University. [8] A1-Shaykh, O.K. and Doherty, J.E. "Invariant Image Analysis based on Radon Transform and SVD.", IEEE Trans. Circuits and Systems, Feb. 1996, Vol. 43, 2, pp. 123-133. [9] Porter, R. and Canagarajah, N. "A Robust Automatic Clustering Scheme foe Image Segmentation Using Wavelets", IEEE Trans. on Image Processing, April 1996, Vol. 5, No. 4, pp.662 - 665. [10] Lee, C. S., et. al, "Feature Extraction Algorithm based on Adaptive Wavelet Packet for Surface Defect Classification", to be presented in ICIP 96, 16-19 Sept. 1996, Lausanne, Switzerland.

Proceedings IWISP '96; 4-7 November 1996," Manchester, U.K. B.G. Mertzios and 'P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

195

A Region Oriented CFAR Approach to the Detection of Extensive Targets in Textured Images C a r l o s A l b e r o l a - L d p e z , Jos6 R a m d n C a s a r - C o r r e d e r a * a n d J u a n R u i z - A l z o l a * * D e p t o . T e o r i a de la Serial y C o m u n i c a c i o n e s e I n g e n i e r l a T e l e m s E T S I T e l e c o m u n i c a c i d n . U n i v e r s i d a d de V a l l a d o l i d , S p a i n C / R e a l de B u r g o s s / n . 47011 V a l l a d o l i d . e - m a i l : c a r l o s @ t e l . u v a . e s * D e p t o . Sefiales, S i s t e m a s y R a d i o c o m u n i c a c i o n e s . E T S I T e l e c o m u n i c a c i d n - U P M C i u d a d U n i v e r s i t a r i a s / n . 28040 M a d r i d , S p a i n ** D e p t o . de Serial y C o m u n i c a c i o n e s . E U I T T e l e c o m u n i c a c i 6 n C a m p u s de T a f i r a s / n . 35017 Las P a l r n a s de G r a n C a n a r i a , S p a i n Abstract In this contribution we address the problem of locating the presence of arbitrarily-shaped extensive objects in textured images. To that end, we propose to introduce spatial constraints within the detection framework by means of a recursive search of connected components of the target to be extracted. With this procedure, every target within the image is ideally detected with a single threshold, and thus the problem of locating the reference of estimation of the parameters of the detector with respect to the pixel under test is bypassed. Our experiments show that extensive targets are properly detected, regardless of their shape and extension. In addition, false alarms are easily cancelled since they will show up as isolated point-like random detections.

1

Introduction

Well known CFAR approaches [5] to target detection in images strive to maximize the probability of detection while keeping the false alarm rate low and constant throughout a non-stationary background, by means of estimating its local statistics to calculate the appropriate threshold in every pixel. However they are ,either directed at detecting very small targets [4] or they make use of some a priori knowledge about the target to be extracted, by using, for instance, a searching template from which the target features can be estimated [6]. On the other hand, if a general purpose extensive-target detection scheme is sought, a template matching procedure is not the solution, since it should use a large number of candidate templates, which would unnecessarily increase the computational complexity of the detector. Additionally, due to the fact that targets typically encountered in real world applications are extensive at practical resolutions, pixel-level detectors might not be the most efficient solution, for decisions are made independently of each other and thus the raw output of the detector will often have no spatial coherence; this makes a postprocessing stage compulsory, in which detections from the target boundaries are to be connected, and false alarms cancelled. These pixeloriented detectors are quite easy to implement in a real time scheme, but the postprocessing might overload the processor. Additionally, when using a CFAR detector for extensive target extraction, care must be taken to properly place the reference of estimation of the parameters of the detector; if this point is not taken into account, some parts of the target can easily lead the detector to miss different portions of itself since the parameters will be biased because of the pressence of target pixels within the reference of estimation. In this contribution we propose a CFAR detection scheme that incorporates region constraints within the detection framework. Our procedure potential stems from the fact that, in the target area, the image statistics will be quite different from those from the background, and will also have a sort of homogeneity, even though the target is fluctuating, that allows us to extract the target as a whole by means of a single local threshold. This way, we benefit from using pixel-level and region-level information simultaneously in the detection stage, and since ideally a single threshold is needed for a given target, we also minimize the above-mentioned effect of target shadowing by its own pixels.

196 2 2.1

C F A R D e t e c t i o n of E x t e n s i v e T a r g e t s A Pixel-Oriented

Approach

As mentioned in the introduction, few proposals of CFAR detectors in images address the problem of locating the presence of arbitrarily shaped and extensive objects; the solutions more often encountered incorporate some knowledge of the object to be extracted. We have developed [1] a pixel-oriented CFAR detector that extracts the outer edges of an extensive target, regarless of its extension and shape, in a g a m m a distributed textured background. The key of the proposal lies in the use of the phase of the estimated gradient in the pixel under test: the reference of estimation of the parameters of the detector is placed orthogonally to the gradient vector, and thus we reduce the possibility of pixels from the target falling into the cells of the reference of estimation. However this philosophy and, generally speaking, all the techniques that make decisions in a pixel by pixel basis without taking into account decisions in their surroundings, bring about spotty results, in which a number of unconnected edge elements are extracted together with a number of false alarms. Thus, a second stage is needed in which edge elements are connected and false alarms cancelled. To that end, optimization techniques have proved useful, although computationally involved [2][3]. 2.2

A Region-Oriented

Approach

If an extensive target is sought, and the image statistics remain approximately constant through the body of the target, a single threshold might be sufficient to properly detect and extract the target as a whole. That is, regardless of the shape of the object, it could be detected by a guided recursive search of its components, using as starting point in the recursion a detection obtained by means of a pixel-level detector. We have applied this main idea to build a detection algorithm in which decisions are dependent of each other and thus, the detector could be regarded as a region-level detector. To that end we proceed as follows: the detection process is started at a pixel level, but, if a detection is encountered, a region-level detection procedure is triggered, which initiates a recursive search in the 8-neighborhood of this pixel; every neighbor is now compared to the threshold that triggered the first detection. Then all of the neighbors that result in detections are recursively examined, using the only threshold calculated so far, and expanding the tree of neighbors one more level. The process keeps going until the search reaches the opposite boundary of the target (opposite with respect to the direction of the search), for all of the decisions that do not exceed the threshold will be labelled as 'background' and no further search is invoked in undetected pixels. This process can be expressed in pseudocode as follows: 1. Label pixels as Unvisited 2. For every unvisited pixel (a) Decide pixel as target/background by any CFAR detector (see, for instance,

[1])

(b) If pixel detected i. Then for every undetected neighbor A. Decide pixel as target/background with threshold in (a) B. If pixel detected 9 Then label pixel as detected and go to i 9 Otherwise label pixel as visited ii. Otherwise label pixel as visited This algorithm benefits from the fact that the recursive procedure captures the whole body of the target accurately: both the outer boundary and the inner details are captured, since the detection threshold has been previously calculated from data outside the target area, and no further threshold calculations are needed. Additionally recursive algorithms are fast and efficient, and the code that implements this algorithm is surprisingly short ant thus easy to store. The main drawback of this procedure is the condition for halting the search: at the present stage we conclude the expansion of the tree of neighbors when no more detections are encountered. Therefore, in those cases that the targets may lie in a rapidly changing background the threshold in the opposite side of the target might not be able to stop the search and noisy results would be obtained.

197

3

Results

In this section we show two exalnples of our detector capabilities. First, an artificial non-stationary background is represented in figure la), which has been synthesized by a 2-dimensional autoregresive filter driven by a white gaussian noise, and its output has been warped to obtain a gamma probability density function. We have let the parameters of the distribution vary during the synthesis process to obtain a non-uniform illumination pattern, as can be seen from this figure. Three targets have been superimposed in the texture, whose brightness content is quite overlapped with that of the background (specially in the two lower circles), but their textural pattern is different; therefore the detection process has been carried out at the output of an adaptive whitening filter (with an assumed quarter plane support). We show this output process in figure lb). Note the evident presence of the three targets in the background (three noisy spikes along the diagonal of the figure). The pixel oriented detector output is shown in figure lc) for a Pfa=10 -3. Note that target boundaries are visible, but detections are mainly isolated; the reason for this result is that the output of the whitening filter is very fluctuating in the surroundings of the targets, and therefore, the estimation of the gradient (for the placement of the ret~rence of estimation) is noisy as well. This leads to an inaccurate placement of the reference of estimation and, as a consequence, to a low detection performance. However, as figure ld) shows, the proposed detection philosophy, due to its inherent functionality, is able to extract much of the body of the target, which inakes any further processing directed at target recognition much easier. This figure also highlights that, due to the filtering process, part of the target power is smeared out of its boundaries, and therefore the detections extend farther from the original target in the filtering direction.

Figure 1: Detection in a whitened domain. Pfa = 10 -3 a) Original image b) Squared output of an adaptive whitening filter with QP support c) Boundary detection in b) d) Region-oriented CFAR detection in b). The second example is an image of a jacket in which four pins have been superimposed (figure 2a). The Pfa is set to 10 -~ in each band (the original is a three-band image. Only one band is shown here), and decisions are fused according to the OR logical function. Figure 2b) shows the result of the iterative search: the four pins are correctly detected, and most of the details in them are also visible. False alarms can be

198 easily removed by a very simple postprocessing, since its extension is much smaller than that from the real targets.

Figure 2: Detection in a natural background. P f a = 10 -3 in each band, fused by logical OR. a) Original image b) Region-oriented CFAR detection in a).

4

Conclusions

In this contribution we have proposed an algorithm for incorporating region constraints in the operative of a CFAR detector for object extraction in a textured background. Our procedure scans the image under analysis in a pixel by pixel basis until a detection is encountered; the detection triggers a recursive search of target components within the neighbors of the detection. This search is continued until the object is compactly extracted. Our results show that the algorithm performs satisfactorily in slowly changing backgrounds, for both targets are properly detected and false alarms are controlled according to the level of the detector. However, we have highlighted the fact that this procedure is sensitive to sudden changes in the image statistics. Our future efforts will be directed at disminishing this sensitivity, by means of conceiving more robust stopping criteria.

References [1] C. Alberola, J. R. Casar, J. Ruiz, A Comparison of CFAR Strategies for Blob Detection in Textured Images, Proc. of the VIII European Signal Processing Con]., EUSIPCO-96, September 1996 (to be held). [2] A. Martelli, An Application of Heuristic Search Methods to Edge and Contour Detection, Communications o] the ACM, Vol. 19, No. 2, pp. 73-83, February 1976. [3] U. Montanari, On the Optimal Detection of Curves in Noisy Pictures, Communications o] the A CM, Vol. 14, No. 5, pp. 335-345, May 1971. [4] T. Soni, J. Z. Zeidler, W. H. Ku, Performance Evaluation of 2-D Adaptive Prediction Filters for Detection of Small Objects in Textured Backgrounds, IEEE Trans. on Image Processing, Vol. 2, No. 3, pp. 327-339, July 1993. [5] C. W. Therrien, T. F. Quatieri, D. E. Dudgeon, Statistical Model-Based Algorithms for Image Analysis, Proceedings o] the IEEE, Vol. 74, No. 4, pp. 532-551, April 1986. [6] X. Yu, I. S. Reed, A. D. Stocker, Comparative Performance Analysis of Adaptive Multispectral Detectors, IEEE Trans. on Signal Processing, Vol. 41, No. 8, pp. 2639-2656, August 1993.

9- ~

,..

Proceedings IWISP '96; 4- 7 November 1996; M~nehester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

199

Generating Stabile Structure of a Color Texture Image using Scale-space Analysis with Non-uniform Gaussian Kernels. Satoru MORITA and Minoru TANAKA Faculty

1

of Engineering,Yamaguchi

University, Ube,755

Japan

Abstract

Coarseness and directionality provides important sources of information for color texture image recognition. Especially, it is important to distinguish between the textures and understand characters of similar color texture. So we proposed new scale-space analysis generated by non-uniform Gaussian kernels in order to find stabile image for coarseness and directionality. We analyze zero-crossing surfaces to generate non-uniform Gaussian scale-space from observations of a limited number. Singular points, where the topology of zero-crossing surfaces changes are plotted in new scale-space. A filter parameter for the biggest size of chunk enclosed by topology change surface is selected as an optimal parameter of a pixel. Optimal filter and the image description are calculated by this approach for natural color image. We show that this method is suited for color texture image recognition.

2

Introduction

Recently, many researchers have carried on the study of color images in the field of computer vision. The segmentation of color images using competitive learning was studied[l]. On the other hand, the segmentation of a color image using multiresolution analysis was proposed[2]. But consideration was not given to the texture in a color image. Coarseness and directionality provide important sources of information for texture image recognition. Especially, it is important to distinguish between the textures and understand characteristics of similar texture. The importance of interpreting an image in various scales was denoted by Marr[7]. Scale-space analysis is proposed using the zero-crossing points of a signal which are observed in various scales[6]. Uniqueness of scale-space based on uniform Gaussian kernels is analyzed[10]. Scale-space analysis using non-uniform kernels is useful for texture analysis and edge detections[8][9]. Image segmentation using a Gabor filter[4] with various directions have been studied for texture analysis[5]. Witkin proposed the method that selects the optimal scale which corresponds to the maximum width of interval in order to generate a stabile one-dimensional signal[6]. So we extend the interval tree of a one dimensional signal to the same approach of a two dimensional color image using non-uniform Gaussian kernel in order to select filter parameters with consideration to coarseness and directionality. In section 2, we define scale-space filtering with non-uniform Gaussian kernels. Especially, we classify the zerocrossing surfaces for a color image and clarify the properties. In section 3. using non-uniform Gaussian scale-space analysis, we denote the algorithm generating a stabile color image without the affect of noise by coarseness and directionality. We extract the stabile color images from some real images and show the effectiveness by matching experiments using the structure of stabile images.

Scale-space Analysis with Non-uniform Color Texture Image 3.1

Gaussian

Kernels

for a

Scale-space Filtering with Non-uniform Gaussian Kernels.

In order to generate the stabile image with respect to the coarseness and directionality of texture, we propose scale-space analysis with non-uniform Gaussian kernels and the algorithm generating the structure of a stabile image. In this section, traditional scale-space analysis with uniform Gaussian kernels is extended to scale-space analysis with non-uniform Gaussian kernels.

O,L = ~1V2 L = ~1 ( ~ 0 ~

+ ~O~)L

L satisfies with the previous diffusion equation.

L(x; t) = Z

ER n

g(a; t)f(x - a)da

200 Non-uniform Gaussian kernel used in scale-space analysis is defined in the following. 1

1. x 2

g(x, y; ax, ay) = ~ exp{-~L~ + 21raxcru This equation is rewritten in

y2 ~1}

1 ~p(_ ~ + y~), g(x, y; ~, F, O) - 2~IMI 2

where,

Yx r=v~r

3.2

=

0

~

sinO x ) -sinO c o s O ) ( y

0)

Zero-crossing Surfaces

With the directional vectors which maximize and minimize the curvature at the point p as (u,v)=(~1,7h), (~2, r/2), the maximum curvature ~z, the minimum curvature ~2, the mean curvature H, and the Gaussian curvature K are defined as the following. a) the maximum curvature at the point p ~1 = ,k(~l, 7h) , b) the minimum curvature at the point p ~2 = )~(~2,~]2), C) the mean curvature at the point p H = ~1+~22, d) the Gaussian curvature at the point p K = Sla2, e) H0 contours H = 0 f)K0 contours K = 0 An image is divided into elements using positive and negative of Gaussian curvature K and mean curvature H . Relationships between elements are described. In this paper, K - 0 and H=0 are called zero-crossing contours and the surfaces composed of zero-crossing contours in (x, y, t) space are called zero-crossing surfaces, x and y are the coordinates of an image and t is the scale. An image divided into elements by positive and negative of Gaussian curvature K and mean curvature H is called a KH-image.

3.3

Scale-space w i t h N o n - u n i f o r m Gaussian Kernels for a Color T e x t u r e Image

A color image is described by three color planes which are red plane(R), green plane(G) and blue plane(B). A pixel in a color image has 24 bit data. A pixel in a plane has 8bit data and 256 densitys. Thus color image I(x, y) was described three planes which are In(x,y), Ia(x,y) and IB(x,y). Next, we define non-uniform Gaussian scale-space for a color texture image. The coordinates of zero-crossing contours on IR(x,y)*G(x, y; tY,0,F), IG(x,y)*G(x, y; 9 ,0,F) and IB(x,y)*G(x, y; ~,0,F) are plotted on a five dimensional space (x,y,~,0,r). The properties of filter G(~,0,F) are decided by distortion ~, direction O and size r. Zero-crossing surfaces on non-uniform Gaussian scale-space are three kinds of manifold, which are S(x,y,~,O,F)in, S(x,y,~,O,r)ia and S(x,y,~,0,F)xB, on a five dimensional space (x,y, ffl,0,r).

3.4

T h r e e kinds of N o n - u n i f o r m Gaussian Scale-space

Three kinds of zero-crossing surfaces are extracted from three kind of these manifolds. Suppose (F, ff~) are constant, the coordinates of the zero-crossing points on I(x,y)*G(~,0,F) are plotted in three dimensional space (x,y,0). The scale-space has cylindrical coordinates in which x and y are in a plane and 0, extends circularly. Zero-crossing surfaces S(F,ffl; 0,x,y) are plotted in this scale-space. Suppose (F, 0) are constant, the coordinates of the zero-crossing points on I(x,y)*G(~,0,r) exist, are plotted on three dimensional space (x, y, if'). The scale-space has rectangular coordinate with three kinds of axes, x, y and ff~. Zero-crossing surfaces S(F,0; r are plotted in the scale-space. Suppose (~, 0) are constant, the coordinates of the zero-crossing points on I(x,y)*G(ff~,0,F) are plotted on a three dimensional space (x, y, F). The scale-space has a rectangular coordinate with three kinds of axes, x, y and F. Zero-crossing surfaces S(~,0; F, x, y) are plotted in this scale-space. The singular points where three kinds of zero-crossing contour topologies change as 0, ff~ and F increase are plotted in three kinds of scale-spaces which are red plane, green plane and blue plane.

201

Figure 1: A sample color image (left) and color planes(left R, middle G, right B) (right).

Figure 2: Filter(Top 9

r = 0.015)

3.5

0.015, O = 2n~ (n = 1, ..., 8) , F = 0.015) (Bottom 9 = 0.125, O = ~2,~ (n = 1, ..., 8),

Topology Change Surfaces

We analyzed the scale-space with non-uniform Gaussian kernels of constant values (xl, yl) to decide optimal filter for a point (xl, yl) in an image. Suppose (x, y) are constant, the singular points where the topology of zero-crossing surfaces changes on three kinds of the scale-space are plotted on three dimensional space (I', ~, ~). The scale-space has the cone coordinate in which ,F and 0 are in a plane and ~, extends perpendicularly upwards and tapers down a cone that the intersection, which ~ is constant, is a circle. Topology change surfaces, which are W(x, y; F,~, {~)IR, W(x, y; r , ~ , {~)i~ and W(x, y; r , ~ , ~)Is, are composed of a set of topology change points which were obtained from three color planes R,G and B. We try to find the maximum size of a chunk enclosed by a topology change surface. Topology of an image does not change in a region. We use log21~l instead of 9 on calculation. Three kinds of optimal filter parameters in a point (xl, yl) in a image, which correspond to color plane R, G and B, are decided. These processes are executed for all pixels of an image. This approach is the extension of the interval tree for a one dimensional signal.

4

The Algorithm

generating

a Stabile Color Texture

Image

We show the algorithm generating a stabile color texture image. 9 Color image I(x, y) is described using three kinds of planes in which are IR(x, y), IG(X, y) and and have 8bit data. (2.3)

IB(X, y)

9 Convolve three kinds of color planes which are IR(x,y), Ia(x, y) and IB(x, y), to the filters of the parameter Y = ~ 9 23""(n = 1, " ' " 5), 9 = ~ 923""(n = 1 , " ' 5) ' t? = T2n~ (n = 1, " " ' 8).(2.1) 9 Classify a filtered planes into regions by K and H parameters. Execute same processes for planes Ia(x, y) and IB(x, y). (2.2)

IR(x, y),

9 Generate three kinds of scale-space in which parameters r and tI, are constant, 0 and r are constant and and 0 are constant using three color planes In(x, y), IG(X, y) and IB(X, y). (2.4) 9 Interpolate between zero-crossing points of the limited number in a scale-space based on x, y and 0. Execute the same processes for the scale-space based on x, y and ~ and based on x, y and r. Find the singular points where the topology of the zero-crossing contour changes. Plot the singular points in a scale-space based on 0, q, and r. The set of singular points for a plane is called by a topology change surface. (2.5)

2n~ / Figure 3: Filtering image(filter~ -= 0.015, O = --K-~n = 1, ..., 8), r = 0:015) (left top R, left middle G, left bottom 2nr/ B) ,KH-image(filter~ = 0.015, O = --~--tn = 1, ..., 8), r = 0.015) (fight top R, right middle G, right bottom B)

202

Figure 4: A segment image. 9 Select the chunk of the maximum size which is enclosed by topology change surfaces generated from each planes as an optimal filter parameter.(2.5) 9 Plot the optimal filter parameters (9, F, O) of the limited number on scale-spaces based on ~ , F and 0 parameters for three kinds of color planes. An optimal filter surface is composed of the set of optimal filter parameters. Extract the discontinuities from the optimal filter surfaces using the technique of a cluster analysis[3]. 9 Describe the neighbor relations between image elements using a graph representation. The discontinuities correspond to arcs and image elements correspond to node on the graph representation. 9 Convolve a plane to the Gaussian filter of the optimal parameter obtained from a pixel. The pixel value of the plane is the pixel value of the filtered image. Execute these processes for all planes and all pixels. Thus, all pixel values of a stabile image are decided. This algorithm is applied for some real color images. Figure 1 shows a sample color image and three color -~n = planes. Figure 2 shows non-uniform gaussian kernels that filter parameters are r 0.015, 0.125, O = -2n~, 1, ..., 8), r=o.o15. Figure 3 shows filtering images and KH-images for three color planes that filter parameters are r = 0.015, O = -2n~, - ~ [ n = 1, ..., 8), r = 0.015. Figure 4 shows segment images generated using the algorithm generating a stabile color texture image. The boundaries between different gray values mean the discontinuities from the optimal filter surfaces. It is confirmed that a stabile color image without the affect of noise by coarseness and directionality is generated.

5

Conclusions

We extend the interval tree of a one dimensional signal to the same approach of a two dimensional color image using scale-space analysis with the non-uniform Gaussian kernel in order to select filter parameters with consideration to coarseness and directionality. Both the selection of optimal filters and the segmentation of an image are executed at the same time by analyzing optimal filter parameter surfaces. The proposed algorithm is applied for some real color images, and it is confirmed that this approach is useful for the the color image with noise by the matching experiments using the structure of a stabile image.

References [1] T. Uchiyama and M. A. Arbib, "Color Image Segmentation Using Competitive Learning," IEEE Trans. Pattern Anal. & Machine Intell. , vol. 16, No. 12, vol. 12, pp. 1197-1206, 1993. [2] J. Liu and Y. Yang, "Multiresolution Color Image Segmentation," IEEE Trans. Pattern Anal. & Machine Intell. , vol. 16, No. 7, pp. 689-699, 1994. [3] D. E. Rummelhart and D. Zipser, "Feature discovery by competitive learning," Cognitive Sci., vol. 9., pp. 75-112, 1985. [4l D. Gabor ,"Theroy of communication," J. Inst. Elect. Engr. , 93,vol. 93, no. III, pp. 429-459", 1946. [5] A. K. Jain and F. Farrokhnia, "Unsupervised texture segmentation usin Gabor filters," Pattern Recognition, vol. 23, pp: 1167-1186, 1991. [6] A. Witkin "Scale-space filtering," Proc. Int. Joint Conf. Argifitial intelligence ", Karlshiruhe, West Germany, pp. 1019-1022, 1983. [7] D. Mart "Vision," W. H. Freeman, San Fransisco, 1982. [8] P. Perona and J. Malik," Steerable-scalable kernels for edge detection and junction analysis," in Proc. 2nd European Conf. on Computer Vision, pp. 3-18, 1992. [9] M. Michaelis and G. Sommer, "Junction classification by multiple orientation detection," in Proc. 3rd European Conf. on Computer Vision, pp. 101-108, 1994. [10l J. Babaud, A. P. Witkin, M. Baudin,and R. O. Duda, "Uniqueness of the Gausian kernel for scale space filtering," IEEE Trans. Pattern Anal. & Machine InteU., Vol. 8, No. 1, pp. 26-33, 1986.

Session G: IMAGE CODING II: TRANSFORM, SUBBAND AND WAVELET CODING

This Page Intentionally Left Blank

Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

205

A P P R O X I M A T I O N OF B I D I M E N S I O N A L K A R H U N E N L O E V E E X P A N S I O N S B Y M E A N S OF M O N O D I M E N S I O N A L K A R H U N E N L O E V E E X P A N S I O N S , A P P L I E D TO I M A G E C O M P R E S S I O N Nello Balossino and Davide Cavagnino Dipartimento di Informatica- Universit/L di Torino C.so Svizzera 185 - 10149 TORINO - Italy E-mail: {nello, davide}@di.unito.it Abstract The paper treats image compression based on Karhunen Loeve expansions approximated by monodimensional expansions. The results prove that the described method leads to a huge reduction of computational complexity and required time. A comparison with the Discrete Cosine Transform is also reported.

Introduction In many applications a capability to compress images is required; so compression algorithms are frequently embedded in software. In order to evaluate an algorithm used to compress images, the compression ratio C is defined as C=n~/no where no is the number of bits that encode the compressed image and n~ is the number of bits in the original image. As is well known, compression algorithms are classed as reversible, or irreversible, depending on whether the decompressed image is, or is not, identical to the original one. A class of reversible compression algorithms is based on bidimensional transformations that perform a spectral analysis of parts of the image (subimages) by means of an orthonormal basis:

F(u, v)= E A(u, v,x, y)f(x, y) x,.v

where f(x,y) represents the original bidimensional image, F(u,v) are the transformed coefficients and A is the kernel of the transformation (A is often called the set of basis images). In order to reproduce the original image it is sufficient to use the following transformation:

f(x, y) = E B(u, v,x, y)F(u, v) u,v

where B is the inverse of the kernel. A bidimensional.transformation is said to be separable if and only if we can write

A(u,v,x,y)= At(u,x)A2(v,y ) If we quantize the coefficients F(u,v) or discard some of them before applying the inverse transformation we expect an information loss in the reconstructed image (in our work we only discard coefficients and round the remaining ones to two byte integers); in this way, the compression algorithms become irreversible. In this paper we concentrate on the Karhunen Loeve (KL) expansion (used also with hybrid encodings in recent works [2]) and the Discrete Cosine Transform (DCT), the latter being used as the core of the JPEG standard (see [3, 6]). Given an image of size NxN, we partition it into non-overlapping subimages of size nxn, which we interpret as a random field [7] with mean m; the autocorrelation matrix K (of size n2xn2) is computed from the centered subimages (given a subimage x, the centered subimage is x-m). The kernel of the KL transform is made up of the eigenvectors of the matrix K. The eigenvalue associated with each eigenvector is the variance of the spectral coefficients belonging to the eigenvector; we can then sort the eigenvectors in descending order respect to their eigenvalues. If we arrange the eigenvectors by rows in a matrix A, then we can write the KL transform in the following way: y=A(x-m), where x, y and m are n• subimages in column form (see [ 1, 4]) and, given that the eigenvectors constitute an orthonormal basis, we have the fiwerse transformation x=A'y+m (the symbol ' meaning matrix transposition). To effect a compression we can discard the coefficients with smaller variances, keeping only the first l eigenvectors from which we obtain (At are the first l rows of A) y, = A , ( x - m )

(1)

206 and = A~y, + m

(2)

where ~ is an approximation of x. The KL transform has the property of being optimum, respect to all others, in the least square error sense, when considering the same number of coefficients. KL thus is adapted to the image from which the eigenvectors are computed and this is the informal proof of its coding efficiency. This method has the drawback that with subimages of nxn pixels, we need to calculate eigenvectors and eigenvalues of a symmetric matrix of dimension n2xn 2, SO the complexity of the problem grows very rapidly (see, for example, [5]) with increasing size of the subimages. However this increase should allow discarding a relatively greater number of coefficients to obtain larger compression ratios; this advantage has to be balanced with the increased length of the eigenvectors to be transferred to the decompression phase.

Method Our goal was a set of basis images (of extension nxn) having the desirable characteristics of the KL ones, but lighter in computational complexity. Thus we considered row and column vectors of dimensionality n by subdividing the image in row and column vectors; we calculated separately a KL orthonormal basis (of size n) for the rows {rl ..... r~} with mean of all the rows rM and for the columns {c~..... c,} with mean of all the columns CM. Computing the eigenvectors involved the inversion of two nxn matrices, one for rows and one for columns. Afterwards, to obtain an orthonormal basis of size n 2 of basis images nxn, we multiplied every column and every row (tensor product): ci I). What is obtained is an orthonormal basis of n 2 subimages; in fact by hypothesis

c,'cs = 6,s rkr; = 6 kl If ~

is the operator that produces a row starting from a matrix, we can write (cirj ) ( ( f ir j ))t =

[c/,r.l...Cilrjn

c,2rjl" ..Cinr.n

][C/151...Cilrjn

(r162

= (c;c,)(r:,~= 8,,8;,

C,251o ..cinrjn ], = (C;C i )(rjrj:) -- 8,i6 jj

while

where ~j is the Kronecker delta. To obtain an ordering for the significance of the obtained eigenvectors, we multiply the corresponding eigenvalues obtaining a fictitious eigenvalue for each basis image. The mean to use when applying equations (1) and (2) can be either the mean of the subimages n• or the mean of the mean vectors rM and cM calculated in this way:

mu =

r., +c~,

(3)

2

where rmi is the i-th pixel of rm and c ~ is thej-th pixel of cM. We obtain a new separable transformation, derived from KL, that requires less overhead information transfer (only 2n vectors of dimensionality n plus their eigenvalues), has a slower complexity growth when increasing the subimage dimensions with respect to bidimensional KL but has the drawback of lesser accuracy when using the same number of coefficients. We compared this method with DCT, and we noted (in our preliminary tests) that when we used only 8% of the coefficients for subimages of size 8x8, the proposed method performed better than DCT respect to the mean square error (4) and relative m.s.e. (5)

( f ' ( x , y) - f ( x , y))2 mean square error = o#_p,xels

(4)

# all_ pixels

l

relative mean square error =

,#_,,x,ts

f '(x, y ) - f (x, y) f (x, y) # all_ pixels

12 (5)

207 where f'(x,y) is the reconstructed and quantized (pixel values converted to integers) image. To compare both methods, one should determine the distortion functions (m.s.e. and relative m.s.e.) for equal bit rates. This comparison is not possible in a precise sense since the Huffman source coding of the same number of coefficients can vary in run length, and therefore in bit rate. We thus base our comparison on an equal number of coefficients, all of which should however be sufficiently well represented in the two byte integer format we used. Moreover, when the image was oversampled (i.e. a pixel was set equal to three of its neighbours), the proposed method performed better than DCT whatever number of coefficients was used when n=8, and in almost all cases when n-16. This can be explained noting that DCT uses general characteristics of the images (our eyes are not very sensitive to high frequency distortions) while the previous method is optimized for high performance with the image under examination: what is needed and obtained is a lower complexity respect to bidimensional KL, in calculating eigenvalues and eigenvectors. In addition to the previous considerations, if images with high spectral components are examined, the proposed method will perform better than DCT, because of its adaptivity to the image it examines. Another important aspect to note is that to obtain higher compression ratios it is necessary to use larger subimages (i.e. increasing n), and the proposed method is faster than bidimensional KL, especially with large n (n = 16, 24 .... ). The testing of the method was performed using MATLAB | [8], a software package that allows a fast prototyping of mathematical models.

Results We present some results obtained applying the proposed method and the DCT to images of size 512x512 with 256 grey levels. The subimages are of size 8x8 and 16x 16. In Figure l(a) and (b) the behaviour of the m.s.e, versus the number of retained coefficients is reported when the transformations are based on subimages of size 8x8 (i.e. n=8). In Figure 2 are shown the same variables for subimages of dimension 16x 16 (i.e. n = 16). Note that in these figures, errors were computed without rounding the coefficients in order to analyze the capability of the methods to compact the energy into few coefficients. If the coefficients were rounded, the error would be slightly increased and the corresponding compression ratios would be those reported in Table 1 and Table 2. Obviously the compression ratio is the same, both for the KL based method and the DCT method (not taking into account, for KL, the little overhead due to the eigenvectors, eigenvalues and mean subimage). If we fix the error then the KL method (in Figures l(b) and 2(b), for example) will use lesser coefficients and so will have a higher compression ratio. Table 1: Compression ratio with n=8

Table 2: Compression ratio with n=16

No. of coefficients 2 4

No. of coefficients 2 4 8 10 16

8

10 16

Compression ratio 16 8 4 3.2 2

Compression ratio 64 32 16 12.8 8

The first image considered is the classical Boat. The second image is a Nuclear Magnetic Resonance image of size 256x256 enlarged to 512x512 by means of pixel replication. We note in the graphs that the behaviour of the errors of the two transformations is similar (in Figure 1(a) and 2(a)), and better for the KL based transform (in Figure 1(b) and 2(b)). Compatible qualitative results are obtained by personally observing the reconstructed images upon reducing the number of retained coefficients. We performed a time test of the classical KL transform versus the monodimensional KL transform using the tic & toc functions of MATLAB | The test was performed on a 120 MHz Pentium running Windows 95. For 8x8 subimages the classical method computed the basis images in 10.93 seconds (average value) while the new method computed the basis images in 5.1 seconds (average value). For 16x 16 subimages the classical method computed the basis images in 55.91 seconds (average value) while the new method computed the basis images in 7.9 seconds (average value).

208

Figure 1: The m.s.e, values in reconstructing the Boat (a) and NMR (b) images using subimages of dimension 8x8.

Figure 2: The m.s.e, values in reconstructing the Boat (a) and NMR (b) images using subimages of dimension 16x 16.

Acknowledgements This work has been supported by the national project of MURST "Sviluppo di una workstation multimediale ad architettura parallela". The authors thank prof. A. Werbrouck for critical comments and textual suggestions.

References [ 1] R. C. Gonzalez and P. Wintz. Digital Image Processing. Addison-Wesley, 1987. [2] F. G. Horowitz, D. Bone and P. Veldkamp. Karhunen-Loeve based Iterated Function System encodings. In International Picture Coding Symposium, Melbourne, March, 1996. [3] K. R. Rao and P. Yip. Discrete cosine transform algorithms, advantages, applications. Academic Press, Inc., San Diego, 1990. [4] A. Rosenfeld and A. C. Kak. Digital Picture Processing, volume 1 II ed. Academic Press, New York, 1982. [5] C. A. L. Szuberla. Discrete Karhunen-Lo6ve Transform. http://foo.gi.alaska.edu/-cas, DRAFT. [6] G. K. Wallace. The JPEG still picture compression standard. Communications of the ACM, 34(4),1991. [7] A. M. Yaglom. An introduction to the theory of stationary random functions. Prentice Hall, 1962. [8] The MathWorks. MATLAB Reference Guide. The MathWorks, Inc., Natick, MA, 1992.

Proceedings IWISP '96," 4-7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

209

B L O C K N E S S D I S T O R T I O N E V A L U A T I O N IN B L O C K - C O D E D P I C T U R E S M. Cireddu, F.G.B. De Natale, D.D. Giusto, and P. Pes

Department of Electrical and Electronic Engineering University of Cagliari Piazza d'Armi, Cagliari 09123 Italy [email protected] Abstract In this paper, some of the most significant image quality indexes are reviewed and compared with a new method for block distortion evaluation. At first, a survey is given related to classical measures based on numerical differences between original and reconstructed image data (e.g., MSE and SNR), as well as advanced methods aiming at considering the perceptive aspects of image degradation (e.g., Hosaka Plots, HVS-based methods). Then, four innovative methods for blockness distortion evaluation are described, based on DCT analysis or on the use of gradient operators. 1. Objective Distortion Measures The more classical distortion measure is the Mean Square Error (MSE) between the original image and the decoded one. It measures punctual variations of the image instensity by averaging the squared differences between couples of corresponding pixels 1 MSE : - - ~ ~ [f(i, j ) - fr(i, j)] m n i=lj=l

The Signal-to-Noise Ratio (SNR) and the Peak-Signal-to-Noise Ratio (PSNR) can be directly derived by the MSE by using the following equations, which assume the distortion introduced by the codingdecoding operation as a kind of noise: m

SNR = O2x 9 PSNR - (2b)2 ff2x=~ 1 m n ( f ( i , j ) - ./7); ~, ~., mn i=1 j=l MSE' MSE '

f =

n

~., f ( i , j ) i=lj=l mn

where f ( i , j ) is the original grey level of the pixel (ij)-th, fr(i,j) is the reconstructed grey level, and m, n are the image dimensions. These measures provide a global estimation of the image distortion after co-decoding process. 2.Advanced Methods In this section, three of the most interesting image distortion measures are breafly reviewed, which differ from the above in the sense that human perception parameters are taken into account.

2.1 Hosaka Plots The evaluation process consists of first segmenting (splitting) the NxN blocks of the original image into k classes. The initial block size N is usually chosen as 16, thus leading to 5 classes" all blocks of size k= 1,2,4,8,16 form the k-th class. From each class, two feature vectors are calculated, respectively based on the average standard deviation and on the weigthed mean

where the elements marked with '*' are referred to reconstructed images. The error diagram, or Hplot, is constructed by plotting the corresponding features dS k and dM k in polar coordinates. The area of the H-plot is proportional to the image degradation; in particular, the presence of noise and blurring effects are put in evidence by looking at the left and right side of the plot. 2.2 Information Content (IC) This method is based on the evaluation of the perceptual distortion and therefore takes into account the characteristics of the human visual system (HVS) model. It consists of five stages: (i) the original image is re-mapped by a non-linear transformation; (ii) a linear transformation in the DCT domain is

210

applied to 8x8 image blocks; (iii) a matrix of coefficients is calculated at fixed resolution; (iv) the DCT coefficients are multiplied by the weigths; (v) IC is determined by summing the coefficient magnitudes. 2.3 Perceptual distortion measure The perceptual distortion measure is based on an empirical model of the human perception of the spatial patterns. The model consists of four stages: (i) front-end linear filtering, (ii) squaring, (iii) normalization, and (iv) detection. A steerable pyramid transform decomposes the image locally into several spatial frequency levels; each level is further subdivided into a set of orientation bands ~ (0,45,90,135) degrees. The front-end linear transform yields a set of coefficients A0 for every image region. The squared normalized output is computed, and a simple squared-error norm is adopted as detection mechanism R0= k A20/Z~A 20 +o2.

~,0~ (0,45,90,135)

where k is a scaling constant, a a saturation value, and Rref Rref, images vectors.

Rdist

the original and distorted

3. Blockness distortion measures Block distortion, or tiling effect is typical of any kind of block-based coding systems. It consists in an annoying visual mosaic effect produced by the imperfect matching of neighboring approximated blocks. Some image coding approaches reduce this drawback by using appropriate overlapping or interleaving techniques, but most part of the common methods (included the current standards) prefer to ignore the problem for the sake of simplicity. The methods presented hereafter evaluate the amount of such a particular but very usual image degradation.

3.1 Methods based on DCT analysis Two block distortion measures based on DCT analysis are considered here. Both are targeted to a particular kind of distortion appearing as a step of the luminance function in the horizontal or vertical directions, and consequently analyse the DCT features looking for this phenomenon. In our tests we considered blocks of size 8x8 at 8 bpp and their DCT coefficients matrix. A block characterised by a horizontal or vertical luminance step presents on the correspondent coefficient matrix a predominance in the first column or row. A block that has a double step, horizontal and vertical, on the correspondent DCT matrix has null elements (magnitude <106, thus negligible) on the odd rows and columns (excluded the first one). A particular case is a block that presents a double step (horizontal and vertical) given by the sum of single steps: because of the transformation linearity, the correspondent DCT matrix is the sum of single step DCT matrices. To apply these measures, we divide the image into blocks of size 8x8, called reference blocks. Then, the blocks are shifted of half-block dimension from the reference position in the two spatial directions separately. First, we consider the horizontally shifted blocks and their DCT matrix: since the goal is to detect vertical discontinuities, we calculate the sum of the first row's squared entries but the DC coefficient, weigthed by a factor proportional to the coefficient position. Then, we consider the vertically shifted blocks searching for horizontal steps with the same approach. The previous results are integrated in the following expression Z1=

~

(cl,j DCT'~2 , (i4 + j4)+

(i,j)~ Eh

~cD.CT~2(i4 + j4) E ~, l,J I (i,j)~ =~v

where Eh is the first column of the horizontally shifted blocks, and =v is the first row of the vertically shifted blocks (both excluded the DC element). Reference's blocks are then considered, and the sum first row's and column's squared elements is made, weigthed by a factor proportional to the coefficient position, obtaining a second term =

~ IcDCT,2 (i4 + j4) ~ t,j J (i,j)~ =~r

where =r includes the 1-th row and column of the reference's block, excluded the DC element. Finally, the first DCT-based quality measure can be calculated as: D1 = ~:I/Z1 + Z2

211 The second quality measure considers the 8x8 blocks shifted in horizontal and vertical. For each DCT transformed block, the difference between the mean of the squared coefficients in odd position .=o (first row and column excluded) and the mean of the squared coefficients remaining Ee is computed, both weigthed by a factor proportional to the coefficient position, namely: i, j E3 = (i, j)~ F,o

(i5 + j5) 9

~

i, j

+ jS)

(i, j)~ =~e

54

The second quality measure is obtained by dividing Es by the second term of the previous difference. 3.2 Methods based on Sobel operator Blockness distortion in block coded pictures is visual perceivable as a tiling effect. It can also be viewed as the superposition of false contours in the reconstructed image. To identify and measure such an effect, the Sobel operator can be successfully adopted. It is a quadratic filter based on the convolution of the image whit two directional gradient masks: D x - [---! 0 !1 0

Dy= [ - ! - 2 - ! 1 2

~

The first experimented method consists of three steps: (i) subdivide the image into NxN blocks (N typically ranges from 4 to 32), (ii) apply the Sobel operator to the whole picture, and (iii) apply the Sobel operator only to the block boundary pixels. Fig.la. shows a matrix representation of an image, to which the method is to be applyed.

Fig.1. (a) Image considered, where M is the columns number, while N is the rows number of the matrix rappresentation of image. (b) Rappresentation of block border pels of dimension 4• contained on image

212

If we denote whit

Dsx(i,j) and Dsy(i,j) the convolution

of the picture with the two directional

gradients, the Sobel's operator global magnitude can be computed as:

~,p~D2x(i,j)+D2y(i,j) (ij)~ where P is the set of all the image pels. Then, we considered a generic block of dimension 4x4 (see Fig.lb) and we apply the Sobel operator only to the pixels belongin to the block boundaries. The result of step (iii) is expressed as

Z ~D~x+D2, (i,j~ PB where PB contains all the block border pixels. By dividing the latter result by the former, we obtain the first block distortion index So:

7. ~D2x(i,j)+D2s,(i,j) SD = (i,j~pBJD~+D~,/ (i,j)~P The value of SD is in the range [0-1], being 1 when all the blocks are uniform. A variation of the previous method lies in calculating the difference between the pixels of the vertical block border and those of the horizontal block border. The new blockness distortion index is:

= ~ DSx+ ~., DS,/ ~ ~D2x(i,j)+D2y(i,j) SOD (i,j~ Pv (i,j~ PO (i,j)~P where Pv contains all the pixels of vertical block borders, while horizontal block borders.

Po

contains all the pixels of

4. Results and Discussion Several tests were performed on various test images. The following Table I refer to Lenna image (512x512 pels, 8 bpp). We considered 12 decoded images at different quality factors (the relevant bitrate is given in the table), a low-pass filtered version (5x5 moving average), a median filtered version (3x3 kernel); the noisy version was obtained by adding Gaussian noise (It = 0, ~= 5). The remaining three versions were obtained by using a mosaic filter. The PDM and IC index are successful in evaluating the subjective distortion but cannot distinguish among different kinds of distortion (blurring, noise, tiling, etc.). The Sobel-based and DCT-based indexes are far better in block distortion evaluation; it is sufficiently low in picture that do not present blockness distortion (low-pass filtered, median filtered, noise added image) but have PSNR values lower than those of JPEG-coded pictures. This confirm how this index is efficient for the evaluation of the presence of tiling without being sensible to other types of degradation. Process

RMSE

JPEG 1.75bpp JPEG 1.1 lbpp JPEG 0.85bpp JPEG 0.70bpp JPEG 0.60bpp JPEG 0.52bpp JPEG 0.43bpp JPEG 0.38bpp JPEG 0.32bpp JPEG 0.27bpp

0 0.760 1.020 1.250 2.520 3.830 3.930 4.370 4.920 5.260 6.030

PSNR(dB) oo

50.51 47.97 46.19 40.10 36.47 36.24 35.32 34.30 33.70 32.52

I

IC

)DM

SD

SDD

D1

D2

23.73 23.81 24.64 24.95 25.20 25.23 25.80 26.28 26.16 25.87 26.36

0 0.17 0.28 0.28 0.54 1.53 2.06 2.16 2.34 2.49 2.69

0.75 0.75 0.75 0.75 0.75 0.75 0.76 0.76 0.76 0.77 0.78

0.63 0.63 0.63 0.63 0.63 0.64 0.64 0.65 0.66 0.66 0.68

0.50 0.50 0.51 0.53 0.55 0.57 0.58 0.62 0.65 0.67 0.72

0.54 0.57 0.58 0.60 0.62 0.65 0.71 0.75 0.77 0.83 0.86

Quality level ] reference very ~ood very good very good very 8oo(1 good

8ood fairly good acceptable acceptable por

|

213

JPEG 0.20bpp JPEG 0.16bpp LP filter Add-noise Median Mosaic(2 x 2) Mosaic(4 x 4 ) Mosaic(8•

7.330 10.53 17'99 5.000 17.45 7.150 10.50 14.62

References

,

30.83 27.68 23.03 34.14 23.29 31.04 27.71 24.83 i

I

27.17 27.43 34.03 15.26 31.29 30.20 35.56

2.91 3.29 2.41 1.80

2.23 1.81 2.88

0.79 0.81 0.70 0.75 0.70 0.75 0.99

0'70 0.75 0.52 0.63 0.58 0.63 i

1.08

0.80 0.91 0.04 0.50 0.16 0.51 0.51 ,,

0'92 0.98 0.53 0.51 0.52 0.75 0.99

very po0r fairly $ood ve~ 8ood

~ood

acceptable Ix)or very poor

Table I

[I] K.Hosaka, "A new picturequalityevaluationmethod", PCS'86, Tokyo, Japan, April 1986 [2] $.A.Karunasekera,N.Kingsbury, "A distortionmeasure for blocking artifactsin images based on human visual sensitivity",IEEE Trans. Image Processing,vo. 4, no 6, pp. 713-724, June 1995 [3] X.Ran, N.Farvardin, "A perceptuallymotivated three-component image model-part I. descriptionof the model", IEEE Trans. Image Processing,vo.4, no 6, pp. 401-415, April 1995 [4] J.A.Saghri,P.S.Cheatham, A.Habibi, "Image qualitymeasure based on a human visualsystem model", Optical Engineering, vol.28, pp. 813-818, 1989 [5] P.Teo, D.Heeger, "Perceptual image distortion", Proc. 1st !EEE Conference on Image Processing, vol. 2, pp. 982-986, November 1994

This Page Intentionally Left Blank

Proceedings IWISP '96," 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

A NEW

DISTORTION IMAGES

MEASURE ADAPTED

215

FOR THE ASSESSMENT TO HUMAN PERCEPTION

OF DECODED

F. B ock, H. Walter, M. Wilde D a r m s t a d t U n i v e r s i t y of T e c h n o l o g y I n s t i t u t e of N e t w o r k - a n d Signal T h e o r y M e r c k s t r a B e 25, D-64283 D a r m s t a d t , G e r m a n y E-mail: [email protected]

Abstract The development of a new distortion measure for the assessment of decoded gray scale images adapted to human perception is described. The errors are categorized into classes and locally assessed according to a human visual system model. Additionally the summed 9 up errors of each class are globally weighted according to the significance of the distortion for a human observer. The combination of these weighted error sums leads to the image distortion measure adapted to human perception (DMHP).

Psychological phenomena of human perception like knowledge-based recognition and image understanding are not incorporated in this quality metric. original image 1

........... t---;

EDGES .....

INTRODUCTION

The growth of data-intensive digital image applications as in multi-media or video-telephony makes image and video compression a central problem in digital communication and signal-storage technology. This has led to a wealth of lossy signal compression algorithms based on all sorts of data processing. Primary objective is high compression while maintaining sufficient signal quality. In order to compare different compression algorithms a quality metric for decoded images is required. The most common quantitative measure is the mean square error (MSE). But it treats all spatial frequencies and brightness levels in the image uniformly which is not necessarily meaningful, especially when there is a human receiver. A qualitative measure is the rating by trained photo interpreters, obviously a very costly possibility. Desirable would be a tool for automatic quality assessment of imagery which comes to the same results as a human observer. THE HUMAN MODEL

VISUAL

SYSTEM

For an image distortion measure adapted to human perception we first developed a model of the human visual system [1]. Therefore, various physiological phenomena of the human visual system have been considered like the sensitivity to a background illumination level and to spatial frequencies [2], [3].

I

|

assessment

1

error image

I

l

error 1 assessment/ TEXTURES . . . 1 FLATREGIONSJ assessment

EeDg:::g-~~ Tei2htiTun:~]~~p FLATeiR~ghting~ fLadapted distortion measure tohaman perceptionJ] Figure 1: Block diagram of the DMHP-System. The physiological phenomena of the human visual system cause different impressions of errors in an image depending on the contents of the image [4]. This is considered in our model by separating the image into three characteristic classes. Figure 1 shows a block diagram of the system. The decoded image is subtracted from the original image yielding an error image. The original image is separated into three characteristic classes, namely edges, textures and flat regions to design masks for subdividing the error image. Then the errors within these classes can be assessed depending on physiological aspects of the human visual system, individually. In addition, errors of each class can be globally weighted according to human perception. The mean square of all assessed and weighted errors results in the new distortion measure D MHP (distortion measure adapted to human perception). The process can be split up into three problems: separation of an image into characteristic classes, assessment of errors within these classes and globally

216

weighting of the error sums of each class. The following paragraphs will briefly introduce our solutions. 2.1

Separation of an Image acteristic Classes

into Char-

The level of objection for a human observer caused by errors in a decoded image depends on the region in the image where the error occurs [4]. It is easily understood, that an error placed within a wide region of the same gray scale values will give a different impression than the same error occuring at an edge. Therefore in our distortion measure system three different masks are made out of the original image for assigning the errors to the respective classes. For the edge-mask the image is first filtered with a 3 x 3 median filter to emphasize significant edges. Afterwards, difference recursive filtering I [5] and thresholding is applied for edge detection. These edges are widened with morphologic operations to make sure that all relevant pixels are included. Detection of texture is gained by Laplacian operators and thresholding [6]. 9x9 median filtering is applied to determine connected regions and to suppress noise. In the texture-mask all pixels which already belong to the edge-mask are subtracted in order to consider every pixel just once. All remaining pixels are assigned to the mask for flat regions. For a better understanding, the masks for the image "Lena" are shown in figure 2.

Figure 2: Masks for image "Lena". Top left: original image. Top right: mask for edges. Lower left: mask for textures. Lower right: remaining flat regions.

1Algorithm has been written by the team of Professor Serge Castan. Copyright 9 1993, 1994, 1995, Khoral Research, Inc.

2.2

Assessment of E r r o r s to t h e C l a s s e s

According

After separating an image into the characteristic classes, the errors occuring within the classes must be assessed according to h u m a n perception. All assessment values (multiplication factors to the error values) were calculated on the original image and lie within 0... 2 (0 = doesn't disturb at all, 2 - very disturbing). Edges are the most important features of an image because they represent the visual separation of objects. Obviously the kind of an error at edges plays an important role, e. g. smoothed edges m a y be disturbing but won't change the impression as hard as arising new edges or even new structures. In order to achieve a differentiatedassessment adapted to the human visual system the edge-detection is done also in the decoded image. Concatenating the two edge-masks, errors at edges can be additionally categorized in lost edges, i.e. edges which were in the original image and are missing in the decoded image, changed edges and new edges. Consequently, we can sensitize our measure to new and highly disturbing edges, e.g. blocking structures which arise in J P E G coded pictures at high compression ratio. Finally, a limit of tolerance is set for distinguishing between perceivable and non-perceivable errors in order to neglect the non-perceivable ones [8]. In textures, the correlation between neighbouring pixels is more important than the absolute pixel value. For example, errors which occur in the feather of the "Lena"-image will not cause any objection as long as the overall impression of the feather remains the same. Obviously, this effect also depends on the kind of texture, since if the same error values which occured in the feather and were invisible there will occur in parts of the also textured hat they will be recognized by everyone. A simple but sensible parameter for the assessment of an error value is the local variance 2. To deal with a wide range of possible textures, the local variance has to be thresholded and inverse scaled to get assessment values between 0.5 (high variance) and 1 (low variance), i.e. errors at parts of an image with a high local variance count less than those in parts with a low local variance. In flat regions, the human perception is sensitive to changes in gray scales, but a threshold for visibility can be found. Again, the local variance is used for assessment, i. e. errors which are smaller than the standard derivation 3 will be neglected. Furthermore, the local variance increases for a region containing edges which leads to a lower assessment or even neglect, i. e. the assessment with the local variance evokes the spatial masking of human perception 4 [8]. It is addi2The local variance is calculated over a region of 9 • 9 pixels of the original image. 3Standard derivation: a -- x/variance. 4This means reduced visibility of, e.g. noise on both sides

217

tionally taken into account, that the human sensitivity for noise depends on the background illumination level [9]. In figure 3, the final error assessment values for flat regions of the image "peppers" can be seen.

Figure 3: The error assessment values for flat regions of the image "peppers" (bright: high assessment, black: ignoring).

2.3

Globally Weighting sed error sums

of the

asses-

In figure 4 the percental unvalued error distribution 5 over the three characteristic classes of an JPEG series from the image "Golden Hill" is depictured. For an increasing compression ratio the percental amount of detected errors belonging to edges increases exponentially. This corresponds to arising blocking structures which are indeed clearly visible and disturbing. Even by a simple linear combination of these three unvalued error sums in a way that the edges are more important (weighting factor 2) than the other two classes (weighting factor 1) the result is a lucid better distortion measure than the traditional MSE 6. The globally weighting factors are normalized in a way that the effective number of pixel values remains unchanged. This normalization assures that the results for different weighting factors are still somehow connected to the MSE, so that the DMHP is comparable with the MSE.

3

Figure 4: Percental unvalued error distribution of the image "Golden Hill" belonging to the different classes depending on the J P E G compression ratio.

RESULTSAND CONCLUSIONS

The image distortion measure was tested on various images including "Lena", "Golden Hill", "Mandrill" and "Peppers" (figure 5 ) w h i c h were compressed with different coding techniques (e. g. JPEG, Wavelet-Coder). In addition, artificially generated test-images were used to control the assessment adapted to human perception. The separation of images into three characteristic classes worked successfully even for extremely different images. In figure 6 the amount of pixels allocated of a large change in the background luminance. 5Error distribution without assessment within the classes. 6The MSE is equal to the sum of the three unvalued and equally weighted error sums!

Figure 5: The images "Golden Hill", "Mandrill" and "Peppers". to the three characteristic classes is given for some of the original test-images.

Figure 6: Pixel-allocation for the three characteristic classes: edges, texture (pre_texture: texture before subtracting the edge-pixels) and flat regions. To compare the results of the new DMHP with the MSE an JPEG-compression series of the image "Golden Hill" is shown in figure 7. For a small compression ratio when the quality of the image is not visibly decreased the DMHP results in even smaller values than the MSE while it yields clearly higher values when the distortion becomes visible. In conclusion one can say that a clear adaption to the subjective impression of a human observer has already been achieved even without an optimization of the assessment and weighting parameters. In figure 8

218 edges and edges within textures should result in a still higher performance.

References [1] M. B. Barlow, "Understanding natural vision" in Physical and Biological Processing of Images. O. J. Braddick and A. C. Sleigh, eds., Springer Verlag, 1983.

Figure 7: DMHP and MSE for the JPEG compressed image "Golden Hill". two images with the same MSE are shown: at the top the image due to compression with JPEG and at the bottom due to added white Gaussian noise. Obviously, the results of the DMHP are more adapted to a human observer. In case of the JPEG compressed image the highly disturbing block-structures lead to a high value of the DMHP (even higher than the normal MSE). In contrast, for errors hardly perceivable to the alert eye of a human observer the value of the DMHP is low and remains clearly below the MSE.

[2] N. Jayant, J. Johnston and R. Safranek, "Signal Compression Based on Models of Human Perception", IEEE Proceedings Vol. 81, No. 10, pp 1383-1422, 1993. [3] J. A. Saghri, "Image quality measure based on a human visual system model", Optical Engineering, Vol. 28, No. 7, July 1989. [4] W. Xu and G. Hauske, "Perceptually relevant error classification in the context of picture coding", IEEE Image Processing And Its Applications, Conference Publication No. 410, pp 589593, 1995. [5] Retro-Manual von Khoros 2.0.2., Khoral Research Inc. 1995. [6] H. Ernst: Die digitale Bildverarbeitung. Franzis-Verlag: Miinchen 1991. [7] J. S. Goodman and D. E. Pearson "Multidimensional scaling of multiply-impaired television pictures" IEEE Trans. Sys., Man, Cybern. 9, 1979. [8] A. N. Netravali and B. G. Haskell: Digital Pictures. New York: Plenum Press, 1995. [9] H.-M. Hang and J. W. Woods: Handbook of visual communications. Academic Press: San Diego, 1995. [10] J. L. Mannos and D. J. Sakrison, "The Effects of a Visual Fidelity Criterion on the Encoding of Images", IEEE Transactions on Information Theory, Vol. IT-20, No. 4, April 1974.

Figure 8: Two images with the same MSE of 59. Top: JPEG compressed, here the DMHP yields 72. Bottom: image with added white Gaussian noise and a D MHP of 45. For future work an optimization of the assessment and weighting parameters has to be done by comparing the results with those of different human observers. In addition, a more detailed analysis of texture and a better distinction between significant

[11] N. B. Nill, "A Visual Model Weighted Cosine Transform for Image Compression and Qualtiy Assessment", IEEE Transaction on Communications, Vol. COM-33, No. 6, June 1985. [12] X. Ran and N. Farvardin, "A Perceptually Motivated Three-Component Image Model: Part I & Ir', IEEE Trans. Image Proc. Vol. 4, No. 4, pp 401-415 and 430-447, April 1995.

Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

219

Image Compression with Interpolation and Coding of Interpolation Errors

Abstract

Jian Yi and Ferdinand Arp D e p a r t m e n t o f Electrical Engineering University o f Wuppertal D - 4 2 0 9 7 Wuppertal, G e r m a n y

This paper presents a new coding scheme for still image compression which works in the spatial domain with consideration of the human visual system (HVS). The method transmits a subsampled image, restores it by interpolation and corrects residual visible interpolation errors by transmission of additional information. Further redundancy and irrelevancy reduction is achieved by combining the above method with the BIGCHAIR technique. The results achieve reduction rates which are comparable to those of transform coding, but the implementation is much simpler.

1 Introduction Image coding with classical DPCM has the disadvantage that the HVS is instdticiently considered. Thus its data reduction rate remains far behind to that of transform coding techniques although these do not consider redundancy reduction. On the other hand, DPCM has the inherent advantage of very simple implementation. The coding scheme presented in this paper is designed on the basis of DPCM techniques including an extensive consideration of the HVS. This scheme works on the following principles: Image areas in which the intensity is constant or only varies slowly, can be sufficiently represented by locally sparsely distributed pixels. These pixels can be uniformly subsampled from the image matrix. The omitted pixels can be subsequently interpolated by the receiver. Generated interpolation errors remain invisible. Around edges and other areas where the image intensity changes rapidly, interpolation introduces large and partly visible errors. These should be corrected to achieve satisfactory reconstructions. Fortunately the HVS is much less sensitive to errors in high activity areas than in smooth areas. This makes possible to correct visible errors in high activity areas with only limited accuracy and thus by few bits to make them invisible. Thus we can represent an image with less bits in slowly varying areas, where interpolation can do a good job, and correct the residual visible interpolation errors in high activity areas with a few additional bits, where the human eye is less sensitive. An improved irrelevancy reduction can be introduced to DPCM by using the BIGCHAIR technique extension [1][2][3]. This technique has been successfully used for data reduction of still as well as moving images [4][5]. It consists of a blurring filter before and an inversely deblurring one alter a standard closed-loop DPCM codec. The blurring filter is a non-ideal low-pass which attenuates small details of the image but does not remove them completely. The deblurrmg filter works inversely to its blurring counterpart. The function of these filters is to raise the high spatial frequencies of the DPCM quantization errors in the decoded image. In this way the quantization noise is better matched to the HVS and becomes irrelevant. Moreover this procedure allows access to the blurred image which enables extended visually relevant processing. We adopt this pair of filters in our system so that all reconstruction errors generated in the coding process can be spectraUy shaped.

2 System Description These principles lead to the image coding system depicted in Fig. 1. The original image is first filtered by the multidimensionally acting blurring filter so that the details of the image content are attenuated. Subsequently it is uniformly subsampled with ratios D = 2 in both the horizontal and vertical direction. A DPCM encoder is applied to the subsampied and blurred image in order to reduce redundancy as well as irrelevancy. The DPCM encoded differential signal is transmitted to the receiver which DPCM decodes the received signal and interpolates the omitted pixels by means of a linear filter. The procedure of uniform subsampling and interpolation of the whole blurred image does not

220 -i I

I

t ',

| I I I I I I I

d Subsampler I 7 D:I I

"J DPCM-Encoder 7

~O

I I I

,

,'

Interpolator I:D

~

]l DPCM'Dec~

Blurring Filter .] Encoderfor Blurred En'or Image II

ENCODER

O

~'I DPCM'Dec~

II

"1"1Interpolatorl:D

I I

§ Deblurring Filter

DECODER Fig. 1 Blockdiagram o f the coding system distinguish between its smooth and high activity areas. The described scheme is easily implemented, but there remain some visible interpolation errors in high activity areas of the inversely filtered reconstruction. This vestige of visible interpolation errors must be corrected in order to make them visually irrelevant in the final reconstruction. In order to do so we first generate a blurred error image at the transmitter side. It is the difference between the blurred original image and its blurred interpolated replica. The correction information is obtained from this error image and will also be tranmfitted to the receiver. The receiver adds this correction information to the interpolated image. Finally it becomes inversely filtered to the deblurred and visually relevant reconstruction. Several methods can be used to deduce the correction information from the error image. In this paper we studied the following different coding methods of the error image for the elimination of the residual relevant reconstruction errors: 9 DPCM coding. We studied the correlation characteristics of the blurred error image with respect to its matched DPCM coding, designed the predictor of the codec and experimentally adjusted the quantization characteristic of the encoder. 9 Subsampling and interpolation. The blurred error image also becomes uniformly subsampled and non-linearly quantized before it is transmitted. The receiver interpolates the blurred and subsampled error image by means of a linear filter. This method halves the total amount of sampling points. 9 Simple quantization. The blurred error image is directly quantized with an experimentally chosen quantization characteristic without any further processing.

2.1 Blurring and deblurring filters The blurring and deblurring filters are identical to those in the BIGCHAIR-DPCM system described in [ 1]. The blurring filter is a recursive one with the transfer function, written in one dimension for simplicity, U(Z) - K ( 1 - a ( z ) ) - '

where a ( z ) is a transversal filter with identical coefficients A k = A, for all k ( 0 < k _< M ) so that its transfer function becomes a ( z ) - E k e 1 A z - k " This so-called equiweighting aperture is the result of a derivation to achieve the minimum reconstruction error power [ 1]. The factor K is chosen in such a way that any constant input signal will not be attenuated by the blurring filter. This requires the condition A - (1 - K ) / M . On the other hand, the high

221

frequency amplitudes of the input signal are approximately attenuated by the factor K ( 0 < K < 1 ). The deblurring filter exactly realises the reciprocal transfer function of the blurring filter so that the original signal can be completely recovered when passing the error-free blurred signal through the deblurring filter. 2.2 Subsampling and interpolation o f the blurred image The used subsamplmg scheme of the blurred image is depicted in Fig. 2. The linear interpolator for recovering the omitted samples is determined by using the correlation functions of both the blurred image and the superimposed quantization noise of the DPCM encoder as a priori information. The optimisation of the interpolator is carried out using the LMS error criterion. It turns out to be a classical Wiener filter extended to signal interpolation obtained from noisy samples. The following normal equation system can be derived for the determination of the interpolator coefficients {leo+. } oo

I~+,(l~t-k)n + Q(t-k)n ) -- RID+,, 0 < n < D, for all l , k=-oo

where RID+, is the covariance function o f ht e

9

9

9

9

blurred signal while P~z_k)D and Q(~_k)D are the

covariance functions of the subsampled signal and the superimposed quantization noise, respectively. The z-transform of the above equation results in

i(z) =

9

r(z)

9

9

9

9

9

m

*

'

where i(z) is the transfer function of the interpolator, r ( z ) is the spectral power density of the blurred image, rn (z n ) and qn ( zD ) are the spectral power densities of the subsampled signal and the additive noise, respectively.

9 Subsampling positions of the blurred orighaal image Fig. 2 S u b s a m p l i n g o f t h e b l u r r e d

image

2.3 Cyclostationarity o f the interpolation error image On the assumption that the input signal is a stationary process, the interpolation error will be a cyclostationary process. It means that its covariance function is dependent on the shift n of the starting point of the interpolated subsequence:

E{ASiD+.AS(~_j)D+._,.} :x E{ASiD+.AS(~_j)D+,~_,.}, for n ~e n, e D, where {ASin+. } is the interpolation error sequence. On the other hand, this covariance is a periodic function of the starting point n with period D :

E{ASiD+,,AS(~_j)D+._.,} - E{AS(i+k)D+.AS(i+k_j)D+._,,,} , for any k. Each subsequence {ASiD+, } for any fixed starting point n (0 < n < D ) is stationary:

E {AS~D+.AS(,_j)D+.} - E {AS(,+k)D+.ASo+k_j)D+. } . The covariance function of the interpolation error sequence can be determined from the covariance function Rjn+m of the blurred signal and the interpolator coefficients {lw+ . } by application of the normal equations: oo

E{hS~+,AS(~_j)z~+,_m} - RjD+~ - ~ IW+,/~k_j)D+,_,, k=-oo

0 --< m < n.

222 2.4 DPCM coding of the blurred interpolation error image In the following sections we discuss operations applied to the differencial image between the blurred original one and the DPCM decoded and subsequently interpolated blur-image. First we describe DPCM coding of the interpolation error image, which is used to reduce its redundancy as well as irrelevancy. Due to the cyclostationary nature of the interpolation error, different predictors for DPCM coding of the interpolation error image must be designed for each of the interpolation error subsequences. There are four of such subsequences in our case having the subsampling ratios D = 2 in both the horizontal and vertical direction. Three of them are essential concerning the correlation of the interpolated pixel values. The predictors for the these three subsequences are designed according to the LMS error criterion using the knowledge of the covariance functions of the interpolation error as well as the DPCM quantization noise.

2.5 Subsampling and interpolation of the blurred interpolation error image In this section we discuss subsampling, quantization and interpolation of the interpolation error image. The intervening subsampling scheme of the error image is depicted in Fig. 3. Different interpolators must be designed for different subsequences to match the cyclostationary interpolation error sequence. The error subsequence at the existing subsampling positions of the blurred image is not needed. The remaining two interpolators are designed according to the LMS error criterion with the covariance functions of the interpolation error and its quantization noise. This interpolation only incompletely restores the whole error image. However, experimental results show that this erroneously interpolated error image matices for the visually relevant correction of the interpolation error in the interpolated blurred image. The reason is that the HVS is not sensitive to errors in high activity areas.

9

9

9

9

9

9

9

9

9

9

A

9

9 Subsampling positions of the blurred original image 9 Subsampling positions of the blurred error image

Fig. 3 Subsampling of the error image

3 Results and Conclusions Computer simulations of the described coding procedures have been carried out. The smallest transmission rate is achieved when applying the method of subsampling and interpolation to the error image (section 2.5). The test image PLAYBOY can be encoded with a transmission rate of 0.93 bits/pel without visible errors. Applying DPCM-coding to the error image (section 2.4) results in a transmission rate 1.1 bits/pel. When using simple quantization of the error image, we achieve the result of encoding the whole image with 1.2 bits/pel without visible errors. Generally the achieved coding efficiency is comparable to that of transform coding techniques. Our implementation however is much simpler.

References: [ 1] F. Arp, "BIGCHAIR-DPCM, a new method for visually irrelevant coding of pictorial information", in Proceedings of the 1988 IEEE International Symposium on Circuits and Systems, Espoo (Finland), June 7-9, 1988, pp. 231-234. [2] F. Arp, "System properties of BIGCHAIR-DPCM compared with other coding schemes for data reduction of visual information", in Proceedings of the IEEE Workshop on Visual Signal Processing and Communications, Raleigh (NC USA), Sept. 2-3, 1995, pp. 183-187. [3] F. Arp, "DPCM extension considering the human visual system", in Proceedings of the IEEE Workshop on Visual Signal Processing and Communications, Rutgers University, Piscataway (NJ USA), Sept. 19-20, 1994, pp. 120125. [4] F. Arp and J. Wassermann, "Irrelevant data reduction by BIGCHAIR-DPCM", in Proceedings of the 1991 Picture Coding Symposium, Tokyo, Sept. 2-4, 1991, pp. 147-150. [5] J. Wassermann, DPCM-Kodierung von Bildsequenzen mit Irrelevanzreduktion durch Unscharffilterung. VDI Forschrittberichte, Reihe 10, Informafik/Kommunikationstechnik Nr.219, VDI Verlag, Diisseldorf, 1992.

Proceedings IWISP '96," 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

223

Matrix to vector transformation for image compression Djamel Ait-Boudaoud Department of Electronics, Bournemouth University Talbot Campus, Fern Barrow, Poole Dorset BH12 5BB United Kingdom Email" dj [email protected] Abstract This paper presents a new algorithm for approximating a rectangular matrix (Z) with a new matrix 0P) derived from the product of a column vector (X) and a row vector (Y). The result of this transformation will enable the compression of images from N 2 to 2N, where N is the size of the sub-image block. This problem is also solved using neural networks, and a comparative analysis of both solutions is provided.

1

Introduction

The large amount of data used for the representation of digital images is still a significant concern in many applications as large channel bandwidths and high capacity storage devices are required for their transmission or storage in their original form (raw data). Techniques for reducing inherent data redundancy existing in many digital pictures are being continuously researched and developed. The proposed method is based on a well-known mathematical concept of constructing a rectangular matrix from the product of equal length column and row vectors. The approach introduced in this paper considers the reverse process, i.e. starting from a rectangular matrix (Z), construct the approximate column vector (X) and row vector (Y) to closely restore the original matrix (X.Y~Z). This method, in many instances, will lead to an over-determined problem as the number of known data far exceeds the number of unknown variables. Hence approximation techniques are used to solve the problem. Section 2 presents a brief overview of image compression techniques. Section 3 details the mathematical approximation solution. Results and experiments using the proposed algorithm are provided in section 4. Section 5 contains neural nets solution together with results and comparative analysis. Conclusions are provided in section 6.

2

Image Compression Techniques

A number of image compression schemes have been developed, examples include, transform coding [1], Differential Pulse Code Modulation [2], hierarchical image decomposition and vector quantisation [3], etc. The primary objective of compression techniques is to reduce the average bits per pixel for transmission or for storage whilst minimising the distortion of information and preserving the quality of the reconstructed images. In general, the techniques used in compression are performed in two phases. The first phase consists of mathematically transforming images into new representations suitable for compression. The second phase is concerned with encoding the new representations. Typical examples of mathematical transformation include the conversion from the spatial domain to the frequency domain (DCT is a typical example), whilst encoding method include Huffman and nm-length encoding.

3

Proposed mathematical algorithm

The proposed algorithm falls into the first phase of the compression process. Essentially, the algorithm is based on the assumption that each square sub-block in an image represents a square matrix (Z) whose values can be computed from the product of a column vector (X) and a row vector (Y) such that,

ix!

Izll gin! ~.%1

%.)

224

Assuming that the best approximate fit is measured by minimising the sum of the linear separations for all elements of XY-Z. For this, we compute the sum of the columns and rows, ci = Y', . z ji , and ri = j=l

f

j=l

(2)

z O.

then ci and r i are made to be in the same ratio to each other as are corresponding Yi and x i such that, (3)

cl = Yl ,Cl = Yl , ... cl = yl

c2

Y2 c~

r_/

)G

Y3

cm

Y,,

x/

rl

= ..a_,, ~ = ._a_-,....z_~= r2 x2 r3 x~ r. x,

(4)

The next stage consists of selecting any non zeros of ci and r i (say c 1 and rl) and express all members of X and Y as multiples of x 1 and Yl as follows, . . x l rn ) rl

(5)

= ( Y l , Y l - ~ I , " ' Y l cm )

(6)

X t =(Xl,X2,...Xn)=(xi,xl-~l

Y= (Yl,Y2,'"Ym)

r2

,.

c2

Cl

Then the error on each matrix element can be expressed by,

%. =

r~c/ w.--

---

rlCl

-

z o

(7)

where w is the value of xl.y lthat makes eij=0. For n=m=N, the total error for each w is evaluated by,

a~xE (w) = /N 2 ZN ZN(%.)2

(8)

It can be seen that there will be N2 values of w and MSE(w). The value of w that gives the minimum MSE is the best solution to the problem. Once w is chosen, x 1 and Yl are derived such that W=Xl.Yl, and the remaining terms of the vectors x and y are obtained using equations (5) and (6).

4

Experiments and results

This method has been applied to the compression of grey scale images. Several experiments were carried out to analyse performances such as best block size, compression ratio, Peak Signal to Noise Ratio (PSNR), and subjective quality. A typical test image is presented in Figure (1). The reconstructed image at a ratio of 8:2 is shown in Figure (2). It should be noted that the ratio given is only concerned with phase 1 of the compression process. Subjective analysis of this reconstructed image indicate a good reconstruction using the proposed method.

F i g u r e 1: Original image (256x256:8bpp)

F i g u r e 2: Reconstructed image (cr=8:2)

225 The compression ratio is highly dependent on the block size. Increasing the block size, however, results in too many equations for few variables (over-determined problem). Consequently, the reconstructed image begins to get blocky effect near the edges (see Figure 3). The Peak Signal to Noise Ratios (PSNR) of the reconstructed images for block sizes varying from 2 to 32 were analysed, and the best block size with respect to the PSNR was found to be block size 4 (see Figure 4). This is also confirmed with the subjective analysis of the reconstructed images. This block size achieves a compression ratio of 8:2 with a high PSNR of 41.84 dB. Note that the vector elements are coded with half the number of bits used for the matrix elements.

Figure 3: Reconstructed image (cr=8:1)

5

Figure 4: Selection of the block size

Neural nets solution

Neural networks have been used extensively in image processing. It is also known that given a sufficient number of hidden layers a multilayer feedforward neural net can be used as a universal approximator [4]. Consequently, an alternative approach to that developed above based on neural networks is proposed to solve our problem. A simple multilayer perceptron architecture is adopted with a single hidden layer. Input and output layers of the network consist of a fixed number of neurons to reflect the chosen block size, while the number of neurons in hidden layer is flexible and determined by an iterative process. The network was trained using known vectors, with backpropagation assigned as the training algorithm. These vectors were arranged to represent the output training image and the matrix formed by the product of the vectors is used as the training input image. Figure 5 illustrates the architecture of the neural net and the training images.

Figure 5: Multilayer Perceptron Network with the training images Using a single training set, the network converged to a steady state after 200 epochs. This network was then simulated using real images. Figure 6 highlights the encoding process. Upon completion of encoding, the resulting vectors are rearranged in the correct order. The decoding process simply performs vector multiplication operation. The results of decoding process is depicted in Figure 7. A comparison of both analytical and neural solutions was undertaken to evaluate the efficiency of the encoding process and the quality of the reconstructed images. A compilation of this comparison is summarised by the following points: 9 9 9 9

Analytical solution does not require training Quality of the reconstructed images is perceivably better using analytical process. Reconstructed images using neural network solution showed some artefacts on the edges. Encoding using neural networks is much faster than the analytical method.

226

Figure 6: Encoding process of 'Lenna'

Figure 7: Reconstructed 'lenna' image

6

Conclusions

In this paper, a new method for image compression based on the decomposition of matrices into column and row vectors has been presented. The proposed solution is based on approximation techniques because the problem lends itself to solving over-determined simultaneous equations. The assumptions made based on preserving the ratio of the columns and row has proved to be efficient in speeding up the computation. The second solution using a simple neural networks architecture, also proved adequate. The promise shown by this method suggest further investigations of the neural approach with respect to optimising the network architecture, and improving the performances of the method. Factors such as number of hidden layers and their associated number of nodes are critical to avoid loss of memory as well as overtraining, and these will be considered together with other neural network architectures. The major advantage of both proposed solutions is the simplicity of the decoding stage as the decoding process consists simply of vectors multiplication. This is an important factor particularly for consumer electronics where the size and the cost of the decoding stage must be kept to a minimum, and it is also critical to real-time encoding and decoding schemes. Further improvement of the compression ratio has been achieved using the pyramid encoding/decoding scheme. However, subjective analysis must be performed to identify the appropriate depth of the pyramid, as the accumulation of the errors could lead to unacceptable quality of the reconstructed images. Needless to say that any reduction in the approximation errors may have an effect on the pyramid scheme.

References [1] Clarke R.J. (1985): ' Transform coding of images', Academic Press [2] Jayant N.S., NoU P.(1984): 'Digital coding of waveforms, Principles and applications to speech and video' Prentice Hall [3] Gersho A. and Gray R.M.(1992): 'Vector quantizafion and signal compression' Kluwer Academic Publishers, Boston. [4] Hornik, K., Stinchcombe,M., White, H.(1989): 'Multi-layer feedforward networks are universal approximators', Neural Networks, 2, pp 359-368

Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

227

A SPEECH CODING ALGORITHM BASED ON WAVELET TRANSFORM Xiaodong Wu, Yongming Li, Hongyi Chen Institute of Microelectronics,Tsinghua University,Beijing 100084,China Abstract: Though wavelet transform has begun to be used in audio compression and speech parameter abstraction, it has not been used in speech compression. In this paper, we give a speech coding algorithm based on wavelet transforn. It has the features of high compression gain, good quality, wide applicability and no artifacts. It is effective not only to speech signals but also to music signals, so it will help to implement low bit rate audio compression. 1.Introduction In recent years, wavelet transform has been widely used as a new analytical tool in many research areas [1][2] . Having a good time-frequency resolution , wavelet transform is especially suitable for processing time-varying signals. Researches in image compression and audio compression show that wavelet transform does help to do these tasks [3][4]. In this paper we give a speech coding algorithn based on wavelet transform. The paper is organized as follows . First we introduce wavelet transform. Then a speech signal is decomposd into several bands with wavelet transform, centering on how to choose a wavelet function. To compress the signal , a dynamic bit allocation algorithm is given to assign bits to each band. At last, some discussions on the results are given. 2.Wavelet transform Wavelets are a new family of basis functions for the space of square integrable signals [ 1][2]. In this paper we consider only orthonormal wavelets. Given a wavelet function W(x), its dyadic dilates and integer translates

generate a Hilbert basis for L2(R). Decompose a given signal f(x)eL2(R) on the basis we can get the wavelet coefficients +oo (x)q~(x )dx,j,keZ Wj,k=~_~f which provide a-multisolution original signal

analysis of the signal, and from these coefficients we can reconstuct the

f (x)=Y.ZWj,kq'j,k (x) . jk V(x) is not arbitrarily chosen, and it must satisfy some conditions. For details, see [1][2]. 3.Speech coding algorithm based on wavelet transform A speech signal is often sampled at no more than 10kHz, with a bandwidth of about 4kHz. Using wavelet transform one can decompose the speech signal into L+I bands, where L is the levels used in the wavelet decomposition. We use the generalized wavelet transform .... wavelet packets transform to decompose the signal in up to 2 L bands. Since there are about 16 critical bands in the 4kHz bandwidth, L needs not to be large , and we take L to be 5. The width of each band is 125kHz, which approches the narrowest width of critical bands. One problem in the process is that which kind of wavelet function should be chosen, because wavelet theory gives us too many bases. At first we use the adapted wavelets with a finite support length as in [4], but in experiments we found that wavelets of a given support and a maximal number of vanishing moments give about the same results. This is mainly because the coding method we use in this algorithm. To get a high frequence resolution, we should increase the support length of the wavelets, but this will result in a large amount of computation and a low time resolution. So it is a good trade-off to use the Daubechies wavelet with a support length of more than 16, and we take 20 in our algorithm. tn order to compress the signal, there are two ways. One is to use the coefficients in some bands while throwing away the others, as A.H. Tewfik, etc. did in [4]. Obviously most informations will be lost this

228 way. The other way is to use a bit allocation aigorithm, and quantization is done according to the bits allocated. This technique is more often used in audio compression[5][6]. We use the average energy in each band as a criterion, and calculate the number of bits assigned to each band dynamically. The block diagram of the encoder is shown in the figure 1. pcm sample .._,

v bit allocation I Fig. 1.

frame

encoded bit-stream

.~ formatting

The diagram of the encoder.

4.Conclusions We use the algorithm to compress different kinds of speech and music signals, and the compression gains are 4, 8 , 1 6 , 3 2 respectively. Some of the results are given in figure 2.

Fig.2. (a)(b) A segment of speech signal and its spectrum. (c)(d) The reconstruted signal of (a) and its spectrum with this algorithm. (e)(f) The reconstruted signal of (a) and its spectrum with LPC_I 0e. The compression gain is 32. Subjective tests show that it is very promising to compress speech signals with wavelet transform. Of course, the quality of the reconstructed signal degrades with the increaseing of the compression gain. But even at a low bit rate less than 3kbps (including the side information) , the reconstructed signals still have intelligiblity and naturalness, and its quality is better than that of the reconstructed signal with LPC-10e. The algorithm is robust to speeches of the young and old, men and women, and is effective to music signals. There is no artifact in the reconstructed signals. The shortcoming of the algorithm is that, because of the high compression gain, too much of the high frequence part of the signal is lost. Therefore, the reconstructed speeches sound low and deep.

229 From the frequence spectrums of the signals, we can get the same conclusions.The spectrum of the reconstructed speech matches that of the orignal speech better than that with the LPC-10e at places where the amplitude of the spectrum is high (often this is low frequence part). At places where the amplitude of the spectrum is low (often this is high frequence part), too much of the spectrum is thrown away, which results in the fact that the reconstruted speeches sound low. Because this algoritm belongs to waveform coding methods of transform domain in fact, the use of perceptual effect is necessary, and this will be sure to improve the quality of the reconstructed signals greatly. More research work is being done. REFERENCES: [ 1]I.Daubechies, " Orthogonal bases of compactly supported wavelets", Commun. Pure Appl. Math., vol.41, Nov. 1988, pp.909-996. [2]S.Mallat,"M~ltisolution approximations and wavelet orthonormal bases of L2(R)",Trans.Amer.Math.Soe., vol.315, Nov,1989, pp.69-87. [3]D.Sinha, A.H.Tewfik, "Low bit rate transparent audio compression using adapted wavelets", IEEE Tran. Signal Processing, vol.41, No. 12, Dec. 1993, pp.3463-3479. [4]A.H.Tewfik, D.Sinha, P.Jorgensen, " On the optimal choice of a wavelet for signal representation", IEEE Trans.Information Theory, vol.38, No.2, Mar. 1992, pp.747-765. [5]ISO CD 11172-3 Coding of moving-pictures and associated audio for digital storage media at up to about 1.5Mbit/s, part 3: Audio [6]Digital Audio Compression (AC-3) Standard

This Page Intentionally Left Blank

Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

231

Automatic Determination of Region Importance and JPEG Codec Reflecting Human Sense Rina Hayasaka, Jiying Zhao, Yoshihisa Shimazu, Koji Ohta and Yutaka Matsushita Matsushita Lab., Dept. of Instrumentation Engg., Faculty of Science and Technology, Keio University, Japan 1. Introduction Looking at a picture, human beings only pay attention to important objects, while they may ignore less important ones in the same picture. There are many image processing techniques proposeduntil today, but very few are considering the local importances of images. So, it may be saidthat such imageprocessingtechniques did not consider the human sense. Presented here is a technique which determine the importance of regions automatically. If we can get the importances of regions, many image processing techniques may be improvedandcomputers may get more closer to hu man beings. We also developed a new compression method, an importanceadaptive baselineJPEG, compressing important parts using higher quality while lower the unimportant parts and lower bitrate as a whole, andit surely reflects the sense of human beings.

2. System Outline The outline of this system is shown in Fig. 1. This method take these three steps. 9 9 9

segment out regions from an image determine the importance of each region compress the image using the importances

Figure 1 : System Outline The following sections will explain each of these step.

3. Segmentation of an Image First, the original image is transformedfrom RGB into CIE L'a'b'representation, andsegmented into dusters by clustering in CIE L*a*b* space[1 ]. Then, write back theclusters in CIE L*a*b* spaceinto original image. Considering the contiguous relationship and the color difference between theconnectedregions, fuzzy reasoning is used to merge regions and get the final regions.

232

4. Determination of Region Importance After segmentation, theregion's feature is calculated, andimportance is automatically determinedthrough fuzzy reasoning. 4.2

Importance

Determinative

Region

Features

Objects may be considered visually important when they are outstanding to the eyes of human beings, or they are meaningful or attractive to human beings. Whether a region is outstanding(pop-out [2]) to human beings, depends on the feature of theregion. We found through experiment that the following features contribute to the importance: 9 Area R a t i o The percentage that the region takes over the whole image area. ArearatiO m

pixel m :

Width • Height

where Z p i x e l m i S the pixel total belonging to r e g i o n m . 9

Position Describes how far is from the center of gravity of the r e g i o n m to the center of the image. P o s i t i o n m = ~ ( m x - C x x ) 2 + (my - Cy

)2

where C x , C yis the coordinate of the image center, and m x , m of gravity of the r e g i o n m. 9

Y is

the co ordin at e of the center

Compactness Compact m =

4to x a r e a m perimeter m

where a r e a m and p e r i m e t er m are respectively the area and perimeter of r e g i o n m . T h e compactness

show how compact the r e g i o n m is, it equals to 1 when the regi on is round, andwill be smaller when 9

the boundary of the region is complicated. Border C o n n e c t i o n ~ Border m =

connect m

2(Width + Height)

where Y ' , c o n n e c t m is the number of pixel which arean theboundary of r e g i o n m andconnect with 9

image border. Region Color The mean value of each color component to CIE L * a * b * of region is used separately. Ou ts tan di ngn es s Describes how outstanding the region is comparing to its neighbor regions. O u t s tan d m = ~

l=l

IIcolorm-

c o l o r k 112 x ( 1 - d i s tan Cemk )X A r e a r a t i o k

l~m

where ~(mx-kxx)2+(my-ky)

2

d i s tan Ce mk ~Width 2 + Height 2 and Ilco lo r m - co lo r k II 2 is defined as Euclidean distance in CIE L * a * b space.

233

4.2 Automatic Tuning of Fuzzy Reasoning Rules Fuzzy reasoning is used to determine the importance of regions, u s i n g t h e a b o v e 8 features as input. Although fuzzy logic can encode expert knowledge directly and easily using rules with linguistic labels, it usually takes a lot of time to design andtune themembership functions which quantitatively definetheselinguisticlabels. So, we choose the automatic tuning proposed by Nomura[3]. In this method neural network learning techniques can automate tuning rules and substantially reduce development time and cost while performance[4]. To reflect human knowledge, the reasoning rules are tuned through learning from a large number of data of subjective assessment experiment. To the experimentees, both an original image (Fig.l(a)) and a segmented image(Fig.l(b)) are shown, and they give theirimportanceevaluation to each region by 3 levels. Theexperiment is carried out on 15 experimentees and 20 frames of images, andeach data is averagedto be the input/output data. Using the dataset, we can get therule reflecting thehuman knowledge, which give the importance of each region. One evaluated image is shown in Fig. l(c). This presents the important level of region as a whiteness, the whiter, the more important.

5. The I m p o r t a n c e A d a p t i v e Codec S c h e m e Now, we've got an information of importance of each region. Using this, the image can be compressed keeping the important part high quality, while the compressed file size doesn't get much because the nonimportant part is made to be low quality. First, we have to add the information of important level of each region to the data of encoded image, because it is necessary for decoder to decode the compressed stream. Then wemployedadata structure calledMDU-MaP", which is described by two-dimensional array. Each element of it stands for one MDU, defines the quantization table scale for all the blocks of the MDU, and indirectly describes the importance level of the MDU. MDU is Minimum Data Unit, the smallest group of interleveddata units. 0 in MDU-Map is of background, not important. In region or object basedcompression the source model is no longer the fixedsize squareblock, but it has become that of arbitrarily shaped region. It becomes necessary to code not only region content but also region contour/shape descriptions. In ourscheme, regions arerepresentedas aboundary described by using Freeman chain codes for its minimal storage requirements, good curvature description, and simplicity. Our encoding scheme is based on the JPEG coding system whichhas been widely used in diverse applications in still image compression. In JPEG data structure, thereis a part called application data segment can be ignored in standard JPEG cording. To be in accordance with JPEG standard[5], we use this segment to storethe region description. In baseline JPEG syntax, first, each component of the image is groupedinto 8 • 8 pixel blocks. Each block is then independently transformed by an 8 • 8 Forward-Discrete-Cosine-Transform(FDCT), andeach the64 DCT coefficients is uniformly quantized in conjunction with a 64-element Quantization Table, which must be specified by the application (or user) as an input to the encoder. Each element can be any integer valuefrom 1 to 255, which specifies the step size of the quantizer for its corresponding DCT coefficient. The purpose of quantization is to discard information which is not visually significant. This Quantization Tables enables us to compress important parts using higherquality whileunimportant using lower quality. Encoder goes through each block in original image to compress, check whether the block is important or not, if important, what is the scale for quantization table, by referring to MDU-Map. Different level of block is compressed using different quantization scale. It is encoder's responsibility to translate MDU-Map into a boundary description using chaincodes, and put it into application data segment of data stream. All important regions of an imagearedescribed as non-zeroelement in the MDU-Map. Decoder takes the compressed data stream as input and gets the region boundary description from application data segment, through region filling reproduces the region, does the same to all the regions and through pasting forms a MDU-Map that would be exactly thesame with original one. By referring to themap, decoder can decode each block accordingly.

234

(a) JPEG

(b) our method Figu re. 2: re su It im ages

6 . E x p e r i m e n t Result and C o n c l u s i o n An experiment result image is shown in Fig.2. (b) is the image compressed by our method, and(a)is the one compressedequally by JPEG. They have almost the same data size. Compared with the image equally compressedby JPEG, we can easily see the backgroundpart has obviously lower quality, while other important parts keep good quality. This method allocate only the important regions higher quality, reduce file size by further compressing the unimportant parts, and ensure the best visual quality at given compression ratio. Besides compression, many image processing techniques are expectedto be improvedusingregion importance co ncept.

References [1 ] A. Khotanzad and A. Bouarfa. Image segmentation by a parallel, non-parametric histogram basedclustering algorithm. Pattern Recogn., Vol. 23, No. 9, pp.961-973, 1990. [2] J. Davidoff. Cognition through Color. MIT Press, 1991. [3] H. Nomura, I. Hayashi, and N. Wakami. A self-tuning method of fuzzy reasoning by delta rule and its application to a moving obstacle avoidance. J. of Japan Society for Fuzzy Theory and Systems, Vol. 4, No.2, pp.379-388, April 1992. [4] K. Asakawa and H. Takagi. Neural networks in japan. Comm. ACM, Vol. 37, No. 3, pp. 106-112, March 1994. [5] IS O/IEC 10918-1. Information technology - Digital compression andcoding of continuoustone still images: Requirements and guidelines. 1994.

Proceedings IWISP '96; 4-7 November 1996," Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

235

Directional Image Coding on Wavelet Transform Domain Dong Wook Kang Kookmin University.

E-mail:r

Abstract A novel method of directional image coding on the wavelet transform domain is devised for efficient compression of image data: Directionally decomposed (=filtered and decimated) versions of an image are obtained by manipulating the wavelet transform coefficients, and then the coefficients of each version are segmented into vectors of reasonable dimension, and finally the segmented vectors are quantized based on the gain-shape vector quantization. The proposed method yields excellent quality of reconstructed images at very low bit rates, much better than JPEG or other transform domain vector quantization algorithms.

Background Recently, vector quantization techniques have been widely studied by a lot of researchers to efficiently encode images. One of them is transform-domain vector quantization, which-utilizes the transformation of image to compact most of the energy of the signal into a few of almost-but-not-perfectly decorrelated coefficients and then vector quantization (VQ) to exploit the remaining correlations to the utmost. The typical procedure of transform-domain vector quantization is as follows: It first transforms an image or a sub-block of it with a two-dimensional kernel. After that, it segments the corresponding coefficients into a few number of vectors of reasonable dimension. And finally it quantizes the vectors by replacing them with building blocks or codewords pre-designed with a training sequence. For example, the classified vector quantization on the discrete cosine transform domain uses a block of size 8 • 8 with the 2-dimensional DCT kernels, adaptively segments the DCT coefficients based on the classification of that block, and finally quantizes each and every segment with the best-matched codeword in its corresponding VQ codebook. Another example is the wavelet domain vector quantization: It first decomposes an image frame into a set of subband images with the wavelet kernels. And then it segments the coefficients based on the affinity of the coefficients in the corresponding spatial domain, and finally quantizes the segments with VQ codebooks. As expected from the information theory, transform domain VQ techniques reveal better performance than transform domain scalar quantization techniques in the sense of the optimum achievable performance. The performance of a transform domain VQ scheme mainly depends on the vector configuration and the methodology for constructing codebooks. The directional image coding has been known to be very good at very low bit rate image coding. It exploits the characteristics of the human vision system that contains direction-sensitive neurons in the visual cortex [1]. Therefore, the reconstructed images of the directional image coding are of better

236 subjective quality than those of other first-generation image coding algorithms. In this paper, in order to inherit the advantages of directional image coding which are outstanding when an image is encoded at very low bit rate, we introduce the directional decomposition of images using the wavelet coefficients and gain-shape VQ technique to encode the decomPOsed vectors based on the threshold coding principle.

Fig. 1. Directional decomposition of an image and the vectorization of decomposed versions.

Encoding Algorithm Fig. 1 shows the procedure to construct the directionally decomposed versions from an original image. First, we construct 64 subband images by filtering and decimating the input image three times. At each time, all the subband images high-band as well as low-band are filtered and decimated to 2:1 horizontally and vertically. Fig. l(b) shows the decomposed subband images. Aider that, 64 coefficients each of which locates at the same position in one of the 64 subband images are gathered to construct an 8x8 block which we call as basicblock. Fig. l(c) shows the segmented basicblocks. Next is directional decomposition of a basicblock. It is necessary that the resulting subvectors not only are of reasonable dimension but also convey key information about one of the directional images. A basicblock is decomposed into 17 subvectors of which 15 ones are

representatives of directional images and the

others are for the low-pass and the high-pass images. Fig. l(d) shows the windows for directional

237 decomposition of a basicblock. Since each subvector conveys one of the directional images, it can be independently encoded on the threshold coding method: Each subvector is tested whether it deserves transmission or not, and then it is quantized if and only if it is significant enough. In this case, both the directions and the VQ indices of the significant subvectors are transmitted. The test of significance is accomplished by quantizing the gain of a vector with a deadzone: If the quantized gain is not zero, then the vector is automatically considered

significant, and then the shape of it is quantized. To improve the efficiency of the encoder, the directions of the subvectors are variable length encoded. In addition, to equalize the quantization distortion from each vector, the different sizes of the shape

codebooks are alloted according to the statistical characteristics of vectors.

Simulation Results We applied the proposed directional vector quantization technique to encode the test image, Lena, which is outside the training set of the shape codebooks. We compared the PSNR performance of the proposed scheme with those of the DCT domain classified vector quantization (DCT-CVQ)[2], the DCT domain directional vector quantization (DCT-DVQ)[3], and Shapiro's embedded zerotree wavelet algorithm (WT-EZT)[4]. The simulation results show that the proposed scheme yields the best performance among them at almost every bit rate of below 0.5 bpp. For example, at about 0.25 bpp, it produces 33.4 dB, while both JPEG and the DCT-CVQ produce 31 dB, and the WT-EZT 33.2 dB. Fig. 2 shows the results. The subjective quality of the reconstructed images is also significantly improved by the proposed scheme. Fig. 3 shows the magnified versions of reconstructed images. The reason of high subjective quality is that the proposed scheme reproduces the edges or contours of the image even at a very low bit rate encoding.

Conclusions We proposed a new directional image coding technique on the wavelet transform domain. The scheme consists of directional decomposition of an image with the wavelet coefficients and threshold coding of the decomposed vectors based on the gain-shape vector quantization. Simulation results show that the proposed scheme yields excellent encoding performance in the objective as well as the subjective sense. In addition, the new scheme reveals several advantages. First, it is very practical because it can easily encode an image at various bit rates according to the budget of the encoder. Second, since it uses the fixed decomposition windows and a single set of VQ codebooks regardless of differing bit rates, the complexity of the encoder is much lessened, in comparison with the conventional VQ schemes.

References M. Kunt, A. Ikonomopoulos, and M Kocher, "Second generation image coding techniques," Proc.

oflEEE, vol. 73, pp. 549-574, Apr. 1985.

238

2.

J.W. Kim and S. U. Lee, "A transform domain classified vector quantizer for image coding," IEEE Trans. Circuit ans Syst. - Video Technology, vol. CASVT-2, pp. 3-14, March 1992.

3.

D.W. Kang, J. S. Song, H. B. Park, and C. W. Lee, "Sequential vector quantization of directionally decomposed DCT coefficients," Proc. IEEE ICIP-94, pp. 114-118, Austin, TX, Nov. 1994.

4.

M. Shapiro, "Embedded image coding using zerotrees of wavelet coefficients," IEEE Trans. Signal Processing., vol. 41, pp. 3445-3462, Dec. 1993.

Fig. 2. The P S N R performances.

Fig. 3. The magnified versions of reconstructed image:

(a) 0.126 bpp, 30.46 dB: (b) 0.369 bpp, 35.04 dB

Session H: VIDEO CODING I: MPEG

This Page Intentionally Left Blank

Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

241

An Universal MPEG Decoder with Scalable Picture Size Ram P r a b h a k a r

Wei Li

Cirrus Logic, Inc.

Logitech, Inc. 6505 Kaiser Dr. Fremont, CA 94555

3100 W. Warren Ave.

Fremont, CA 94538

knicks~corp, cirrus, corn

Wei_Li~logitech.

com

1. Introduction In recent years, MPEG has become a video compression standard widely accepted by both hardware and software multimedia compression/decompression professionals. The current hardware MPEG decoders (e.g. GD 5518) decode a compressed bitstream into a single format such as 4:2:0, 4:2:2 or 4:4:4 format specified in the encoder side. However, in practice, the display could be either a VGA terminal (mostly 4:2:2 format) or a D-1 recorder (4:4:4 format). In order to display the decoded video with appropriate resolution, the problem of down-scaling and upscaling of chrominance components (Cb and Cr) must be addressed. This paper proposes a unified hardware decoding solution to the image scaling, which will cover the whole MPEG market from broadcasting down to CD-ROM video games. The decoder is programmable to output video images with 4:2:0, 4:2:2 or 4:4:4 format. The widely practiced method for image upscaling and down scaling are performed in the spatial domain. When scaling is implemented in spatial domain, it requires large memory size and large latency. A new scaling approach in DCT domain [2][3] has been recently reported. It performs the scaling by manipulating the DCT coefficients based on the format one would like to have. The following figure is a simplified MPEG decoder.

iCoded Data i

1J InvrsScan I! !

Variable Length Decoding

Inverse

Quantisation

~__~

"

inverse DCT

H

Frame Store Memory

Motion

Compensation

0......................... !Decoded Data

9.......................

Figl: Simplified MPEG Decoder The functionality of the highlighted blocks in Fig 1 will be affected when scaling in DCT domain is performed. In the following paragraphs, functionality's of each of the highlighted blocks will be discussed. 2. Inverse DCT There are two different types of IDCT's. The MPEG committee suggests type-2 IDCT. The one dimensional type-2 IDCT is given as

N-1 x(n) = 2/N g C(m) X(m)cos(2n+ 1)ml-l/2N m=0 1/~/2

m = 0 or N

1

m=1,2,3 .....N-1

C(m) =

(1)

242 Type II IDCT, for 4:2:0 to 4:2:0 is performed o n an 8x8 block, first horizontally and then vertically, which adds upto 64 coefficients. Whereas the type II IDCT, for 4:2:0 to 4:4:4 will be performed on an 16x16 block (upscaling) or for 4:4:4 to 4:2:0 will be performed on an 4x4 block (downscaling), which adds upto 256 coefficients. The following figure is the modified IDCT block diagram.

I Desired Picture I Format

i I

i

i

,

,

I J .

I

.

.

.

.

.

.

.

.

.

.

.

..,

Anti-aliasing/ Anti-imaging filter Co-efficients IDCT

i

I Inverse i i Quantised DCT Co- i . efficient I I i

i To Motion i C~176

i i

i

i

i .

Upsampling/ Down sampling

H

.

.

.

.

.

I Pr~

Coefficients

Fig:2 Modified IDCT Block Diagram For Upscaling/Down Scaling 2.1 Changes for Upscaling: The upscaling/down scaling of images will only affect the chrominance components(Cb, Cr). For example to upscale a 4:2:0 format to 4:2:2 format, the chrominance components will be doubled in the vertical direction only. This is accomplished by first upsampling an 8x8 inverse quantised 4:2:0 image in the vertical direction and then multiplying the upsampled block with antiimaging low-pass filter coefficients for interpolation. Now IDCT is applied to the 16x8 interpolated block. Manipulation of Pixel Values in DCT Domain The DCT coefficients are manipulated using the following equation. The subscript ii stands for type II IDCT. xU (m) = (Xii (m)- Xii (N-m))/~/2 m = 0,1 .... N-1

(2)

Anti-Imaging Low-Pass Filter The anti-imaging low-pass filter is an even-length symmetric filter with the cut-off frequency of 1/21-1. It is designed using the Then the filter right-half is defined by Remez exchange algorithm. Let the low-pass filter be h(n), n = -I.~.. O,.... I.r h(n) n = 0,1,2 ....L/2-1 hr(n) = (3) 0 n = L/2 ....... N-1 Where L is the number of filter coefficients for h(n) and N is the block size of IDCT. In order to apply this filter in the DCT domain, the filter transform coefficients in the DCT domain are computed using the following equation. N-1 m = 0,1 ..... N-1 (4) Hr (m) = 2 Z hr (n) cos (Hm(n+l/2)/N) n=0 Inverse DCT The manipulated coefficients are multiplied with the filter coefficients and the resulting coefficients are IDCT transformed to obtain the interpolated block. 2.2 Changes For Down Scaling To downscale images, a similar procedure is used. For example to downscale a 4:4:4 format to 4:2:0 format, the chrominance components will be halved in both horizontal and vertical directions. The 8x8 inverse quantised block is first multiplied with the anti-aliasing low-pass filter coefficients and then down sampled in the horizontal and vertical directions. The IDCT is applied to the 4x4 decimated block. Anti-Aiiasing Low-Pass Filter The inverse quantised DCT coefficients are low-pass filtered in the DCT domain using the following equation: Y(m) = I-It(m) X(m), m = 0,1 .....N-1. 0, m=N. Manipulation of the coefficients in DCT domain The filtered coefficients are manipulated as follows: Yd(m) = (Yii (m)- Yii (N-m))/~/2 m = 0,1 .... N-1 Inverse DCT

(5)

(6)

'

243

The inverse DCT of the above equation will result in the dowsampled spatial block.

3. Motion Compensation Motion estimation is performed on the luminance and chrominance components in the MPEG encoder. But the motion vectors are computed only for luminance components. The image scaling affects only the motion compensation for the chrominance components. When the motion compensation is performed in the MPEG decoder, the motion vectors for chrominance components are derived from those of the luminance components, based on the scaling factor. For example when an image is upscaled from 4:2:0 to 4:4:4, the motion vectors for the Cb and Cr will be the same as that for Y. 4. Frame Store Memory Because of the scaling of the images, the frame store memory will have to be large enough to store the biggest format i.e. 4:4:4. For a frame size of 352x240 and with 4:4:4 resolution (24-bit/pixel), one would need a buffer size of 253k-bytes. 5. Simulation Results: Upon examining the 512x512 interpolated image Lena (original image is of size 256x256) in spatial domain and the interpolated image in the DCT domain, the DCT domain interpolation out performs the spatial domain interpolation (see Fig. 3). For symmetric convolution, the maximum number of filter coefficients can be twice the DCT block size. The longer the filter, the sharper will be its frequency response, thus smoother resized image. Since the number of operations involved by using the longest filter does not add-up, for a DCT block size of 16x16, we can use a 32-tap filter and for symmetric convolution with the filter right half defined as in Eq. (3). Note that by using DCT domain interpolation, the longest possible tap filter can be used, without any extra hardware or latency involved, and because of which the interpolated image in the DCT domain will out perform the 7-tap spatial domain interpolation. On the contrary, increasing the spatial domain interpolation filter to 32 taps will significantly add to the hardware cost. Although we have discussed interpolation by two in each direction, decimation and other combinations for interpolation are possible. The following is an approximate comparison for the number of operations required to interpolate a 176x120 SIF picture using DCT domain interpolation (32-tap filter) and spatial domain interpolation (7-tap filter).

1. Interpolation in spatial domain: 4:2:0 to 4:2:2 : Interpolation is performed on a 4:2:0 SIF picture, whose chrominance size is 176x120, by filtering the pixels using a 7-tap filter whose coefficients are [-12 0 140 256 140 0 -12]/256 (ISO recommended SIF to CCIR 601 interpolation filter) [6]. To interpolate one pixel, it takes 17 operations, assuming 3 multiplies and 2 adds, and each multiply as 3 shifts and 2 adds. To interpolate 176x120= 21120 pixels, it takes 359040 basic operations per chrominance component. For both the chrominance components, interpolation in spatial domain for 4:2:0 to 4:2:2 format takes approximately 720000 operations. 2. Interpolation in DCT Domain: 4:2:0 to 4:2:2 : After manipulating the inverse quantized DCT coefficients in the DCT domain, we perform type-2 IDCT on a 16x8 block. The basic operations per 16x8 block is 160 multiplies and 864 adds, which translates to 1724 basic operations per block, assuming one multiply as 4 shifts and 3 adds. For a chrominance size of 176x120, after manipulating the DCT coefficients, the number of blocks are 330. Interpolating in the DCT domain per chrominance component for 330 blocks, the number of basic operations is 570000. For both the chrominance components it takes 1040000 operations.

Figure 3. (a) Interpolation by spatial filtering; (b) Interpolation in DCT domain. Note that (b) is sharper and more visually pleasant than (a).

244 Conclusions In the rapidly evolving multimedia technology, there is a need for higher resolution picture quality. We have clearly shown that interpolating in the DCT domain is programmable, by just changifig the IDCT coefficients and the block size. Even though the number of operations in the DCT domain increases significantly over the spatial domain, the extra hardware involved to perform interpolation in the spatial domain far exceeds that required by the DCT domain interpolation. The resized image in the DCT domain has a better resoultion when compared to the resized image in the spatial domain. Unlike the spatial domain interpolation, the DCT domain image resizing can upscale and down scale different MPEG picture formats, without actually changing the hardware required to do it. This unique architecture can process MPEG bitstreams encoded in any format, and decode the bitstreams into any desired format. References: I. Coding of Moving Pictures And Associated Audio. ISO/IEC JTC1/SC29/WG11 N0702 March'94. 2.Stephen A. Martucci Image Resizing In the Discrete Cosine Transform Domain, in Proc. 1995 International Conference on Image Processing, pp 244-247, Washington Oct-1995. 3.Balas K. Natarajan and Vasudev Bhaskaran A Fast Approximate Algorithm For Scaling Down Digital Images In The DCT Domain, in Proc. 1995 International Conference on Image Processing, pp xxx-xxx, Washington Oct-1995. 4. Vasudev Bhaskaran and Konstantinos Konstantinides, Image and Video Compression Standards - Algorithms and Architectures. Kluwer Academic Publishers, 1995, Chapter 10, "Architectures for the DCT". 5. S.K. Rao and P. Yip. Discrete Cosine Transforms - Algorithms, Advantages, Applications. Academic Press, 1990. 6. ISO-IEC/JTC1/SC29/WG11, Coded Representation of Picture and Audio Information- Test Model 5, pp 19, April 1993.

Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

245

The Influence of Impairments from Digital Compression of Video Signal on Perceived Picture Quality Sonja Bauer, B r a n k a Zovko-Cihlar, Mislav Grgik University o f Zagreb Faculty o f Electrical E n g i n e e r i n g and C o m p u t i n g U n s k a 3, 10000 Zagreb, C R O A T I A , E-mail: [email protected] Abstract- Picture impairments in digital video systems are different from those that occur in analogue systems and

depend on the methods of coding and redundancy reduction employed With the increasing applications of MPEG coding, the assessment of MPEG coding impairments becomes very important. Subjective methods for convenient television picture quality and impairment assessment given in ITU-R Rec. BT.500-4 were modified and applied to MPEG codec. The aim of this paper is to present results of picture quality assessment in relation to impairments from MPEG-1 coding for workstation and personal computer applications where pictures are displayed within a window on a monitor. Statistical analysis of the test results was performed Based on this analysis the observations and conclusions are given. 1. Introduction

With the increasing applications of digital compressed video signals for transmission, storage and processing, the assessment of coding impairments are of growing importance[ 1]. Generally, the performance of a video compression system can be evaluated objectively or subjectively [2]. The objective methods are based on computable distortion measure such as mean squared error or signal to noise ratio (SNR). Objective measurements [8] have limited effectiveness in predicting the quality of compressed images as seen by the observers. On the other hand, subjective assessment is geared directly toward properties of the human visual system. Subjective assessments are controlled psycho-physical experiments designed to f'md out how the observers would judge picture quality. It is the reason why subjective assessment is the most effective method for determining the influence of video compression method on picture quality [9]. Video compression algorithms [4] are designed to reduce the bit rate of the original source as much as possible and still maintain the subjective picture quality required by the application. Bit rate reduction can be obtained by removing spatial (intraframe) and temporal (interframe) redundancies of the pictures. ISO/IEC Moving Picture Experts Group (MPEG) standardized video compression algorithm for digital storage media [3] (so called MPEG-1) which uses both interframe and intraframe coding to reduce temporal and spatial redundancy and achieve high quality full motion video at low bit rates. MPEG-1 was originally designed for storage application but it covers a variety of other applications especially in multimedia services. MPEG-1 video compression is presently more increasingly accepted as a tool that brings digital video and computers together because MPEG-1 compression cards (hardware) and software are available for integrating digital video into workstations or personal computers. With the increasing applications of MPEG coding, the assessment of MPEG coding impairments becomes very important. Subjective methods for convenient television picture quality and impairment assessment given in ITU-R Rec.BT.500 [5] can be applied to MPEG codec but the test sequences should be chosen very carefully with different picture contents and scene types. 2. Problem of Choosing Test Material

The fundamental difficulty in designing subjective evaluations is knowing which pictures or sequences to use for the evaluations. The scene content being viewed influences the perception of quality irrespective of technical parameters of the system. Normally, a series of pictures which are average in terms of how difficult they are for system being evaluated has been selected. The guideline is that the pictures or sequences chosen should be "critical but not unduly so", [5]. This general philosophy worked well for many years with analogue TV systems and with low compression digital television systems. It reaches the limit of its usefulness with high compression digital systems. The reason is that the quality available from high compression systems depends very much on the content of the picture. To obtain a balance of critical and moderately critical material four kinds of test materials were identified: S 1. One person is facing the camera directly and talking (a video with little motion), Fig. 1(a), $2. The same as S 1 but the title in the bottom of the pictures is appeared, Fig. 1(b), $3. Two persons talking (a video in which the motion of the speakers is relatively large), Fig. l(c), $4. The group of people (a video in which the motion of people is large with many details), Fig. 1(d). This sequences may be considered as a good approximation of the typical test materials which were identified in [6] where the random sampling procedure was developed to collect representative materials of TV program.

246 3. Video Capture and Compression System The Sun Video system [ 10] was used to provide the test sequences with multiple compression ratios (bit rates) and various levels of video quality. Sun Video is real-time video capture and compression system that consists of a Sun Video card for Sun SPARC station which provides on board hardware video compression and supports several video compression techniques including MPEG-1. The Sun Video card and its supporting software captures, digitizes and compresses unmodulated NTSC, PAL or component Y/C video (S-Video) signals from video sources such as video cameras, VCRs and videodisks. The Sun Video system is designed to work closely with the XIL 1.1 Imaging Library [ 11 ]. By itself XIL provides functions for image processing and software based image compression and decompression. When used in conjunction with the Sun Video system, XIL provides functions to access and control the video capture and compression facilities of the Sun Video card. In our program the COMPRESSOR_BITS_PER_SECOND attribute was used to tell the encoder how many bits it can use to encode one second's worth of pictures. This attribute controls the output data rate of the MPEG bit stream. Similar, the COMPRESSOR_PATTERN attribute was used to specify the pattern of picture types which are employed by the compression. In our example intraframe and predictive coded pictures were used (IP). For the compressor bit rate of 1152000 decompression attribute DECOMPRESSOR_QUALITY was changed. The value of this attribute provides the trade-off between the quality of reconstructed pictures and the speed of decoding. The MPEGdecompression increases speed by decreasing the number of quantized coefficients that it uses in reconstruction. The valid values for this attributes are integers in the range of 1 to 100. A value of 100 is a request that the decoder produced the highest quality pictures possible and a value of 1 is a request that the decompressor decode picture as fast as possible. The "DECOMPRESSOR_QUALITY" was set to three values: 1, 50, 100 (Q 1, Q50, Q 100). The Sun Video card digitizes video signal as specified in the ITU-R BT. Rec. 601 [7]. It gives the full picture resolution of 768x576 pixels (PAL). The frame size is reduced to the standard interchange format (SIF) of 384x288 pixels which is the input picture format for MPEG-1 compression [3]. The size of test sequences was 300 frames.

Figure 1. Test sequences (a) SI., (b) $2., (c) $3., (d) $4.

247 4. Test Method

The testing methodology was the double-stimulus impairment scale method with five-grade impairment scale ("EBU Method") described in ITU-R BT.Rec. 500-4 [5]. In [9] was shown that it is very suitable method when impairments are small. In our trials MPEG-1 codec performance was examined in terms of basic decoded picture quality and impairment associated with coding process. The test sequences are made using Sun Video system. The level of impairments in test sequence depends on allowed bit rate. The impairment associated with decoding process depends on the number of quantized coefficients that are used in picture reconstruction. All the test sequences are recorded directly on videotape using S-VHS videocamera and played using S-VHS videotape recorder. Component Y/C output signal fi'om videotape recorder was delivered to the Sun Video card. The compressed video sequences are stored on hard disk, decompressed using software based decompression and displayed within a window on a workstation. The size of window was 15x20 cm. The viewing distance was chosen to be 60 cm (4H, H=window height). A total of 32 observers participated in the tests. They were non-experts with normal visual acuity. A test session contained alternating 12 seconds (300 framesx40ms) presentations of the reference and test sequences divided by mid-gray interval (3s). The original source sequence without compression was used as the reference. Assessors were asked to grade the test sequence during the mid-gray interval (10s) that comes atter test sequence. A test session comprised 44 presentations. Three type of test sessions with different order of presentations were arranged to balance out effects of tiredness or adaptation from session to session. 11 presentations were shown twice within the same session to check coherence. It means that 33 different presentations were used in every test session.

5. Test Results and Conclusions

Statistical analysis of the test results was performed. The mean opinion score (MOS) and variance were computed for every combination of the test condition and the test sequence. The analyses of the data confirmed that the observers performed within accepted limits of consistency. The coherence of the results was checked by examining the grades given by the same observer to the same picture in the same test session. The grand mean score (average value of all grades) is 3.193. It indicates that the test material was carefully chosen so that all grades were used by the majority of observers. Test S e q u e n c e $3.

Test Sequence S l .

4 o

4-

3-

3-

1-

1-

I

lOk

I

I

I

I

64k 5eOk 8eek I M Sl - Bit Rate

I

I

1,4M 2M

I

I

I

I

5M

IOM

10k

64k

ibpsl

Figure 2. Mean Opinion Score for Test Sequence S1.

I

I

500k 800k

I

I

I

I

I

IM

1,4M

2M

5M

10M

$3 - Bit R a t e

[bpsl

Figure 3. Mean Opinion Score for Test Sequence $3.

The MOS of test sequences with low level of activity and test sequences with high level of activity coded with different bit rates show that MOS increase with increasing the bit rates and trend to saturate for higher bit rates, Fig.2. and Fig.3. Grades measured on the y-axis are expressed as values between 1 and 5 which correspond to the MOS for each test sequence. Statistical results show that sequence S 1. with low level of activity gives higher grades then the sequence $3. with high level of activity for the same bit rates. It is more obviously in the low bit rate region. The content of $3. sequence is more difficult for coder to handle because $3. sequence contains less redundant information then S 1. When MPEG-1 coder extracted redundant information, if it is still not enough to reduce the bit rate to the required level, it makes a series of approximations on the picture until the bit rate comes down to the value needed. The result is typically noise and blocking in the picture when a given "bit rate threshold" (determined by redundancy) is exceeded. The bit rate threshold is the value of bit rate which is achieved when all redundant data are extracted. The sequence with more redundant data (S 1.) has lower value of the bit rate threshold than sequence with less redundant data ($3.). Beyond this threshold things get worse for the quality, because more approximations need to be made. It means that codec works better for pictures without details and subject movement.

248 The grades probability distributions for the different bit rates and different test sequences show that the distributions for the sequences which contain less redundant information ($3. and $4.) are shifted to the low grades region for the bit rates beyond the bit rate threshold. The Fig.4. shows the grades probability distributions of the bit rates of 64 kb/s and 2 Mb/s for the test sequences S1. and $3. The distribution for $3. sequence and bit rate of 64 kb/s is shifted to the low grades region 9It is consequence of many faults which coder has done in $3. sequence to meet required low bit rate. The grades distributions for bit rate of 2 Mb/s have similar shape for both sequences because their bit rates thresholds are not exceeded and coder need not to make approximations. The same conclusions can be done from Table 1. which shows the grades of test sequences decompressed with different quality of decompression (Q 1, Q50, Q 100). For the lowest decompression quality $4. sequence (many details and subjects movement) has the lowest MOS. This sequence can be reconstructed only with many approximations that result in very low picture quality. For the highest decompression quality all sequences have MOS larger then 4. For all quality levels the sequence $2. with the most redundant data has the largest MOS. It confirms that codec works better for sequence with more redundant data. The evaluation of quality for digital systems is a complex affair, and it is mistake to believe that a few simple quality grades characterize a system. However, it may be helpful to evaluate how valuable picture quality is to the users of multimedia windows applications.

0,8

'

0,6

'

0,4

0,2

0,8

'

0,6

'

'

0,4

'

'

0,2'

~S1 ---<>~S3

1

2

3

4

1

5

2

3

4

5

Grades

Grades (a)

(b)

Figure 4. Grades Probability Distribution for the Bit Rates (a) 64kb/s, (b) 2 Mb/s for S1. and $3. sequences Table 1. Mean Opinion Scores for Various Decompression Quality Factor Mean Opinion Scores

Test sequences

Q1

Q50

QlOO

$2

2.553

3.553

4.547

$3

2.056

3.324

4.361

$4

2.013

3.272

4.278

REFERENCES [ 1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

M. Drury, Picture Quality Issues in Digital Video Compression, IBC'95, Conf. Pub. No. 413, Amsterdam, 1995, pp. 13-18. M. Ardito, M. Visca, Correlation between Objective and Subjective Measurementsfor Video Compressed Systems, IBC'95, Conference Publication No. 413, Amsterdam, 1995, pp. 7-12. ISO/IECIS 11172-2, Information Technology-Coding of Moving Picture and Associated Audio for Digital Storage Media at up to amount 1.5 Mbit/s; Video, Aug. 1993. N. Jayant, P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video, Prentice Hall, 1984. ITU-RRec.BT.500-4, Methodsfor the subjective assessment of the quality ofTVpictures, ITU, Geneva, 1993. Y. Zou, K. Ellsworth, J. A. Kutzner, P. J. Hearty, Subjective Testing of Broadcast-Quality Compressed Video, SMPTE Journal, pp. 789-800, Dec. 1994. ITU-RRec.BT.601, Encoding Parameters of Digital Televisionfor Studios, ITU, Geneva, 1993. ITU-RRec.BT.813, Methods for Objective Picture Quality Assessment in Relation to Impairments from Digital Coding of Television Signals, ITU, Geneva, 1993. N. Narita, Subjective-EvaluationMethods for Quality of Coded Images, IEEE Trans.on Broadcasting,, vol.40, No.I, pp.7-13, March 1994. ___, "Sun Video 1.0 User's Guide", Sun Microsystems Publication, Oct. 1993. ____,"Solaris XIL 1.1 Imaging Library Programmer's Guide", Sun MicrosystemsPublication, Nov.1993.

Proceedings IWISP '96, 94-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (FAitors) 9 1996 Elsevier Science B.V. All rights reserved.

249

On scalable coding of image sequences Erwan LAUNAY, Thomson Multimedia R&D France

Abstract : It is now quite admitted that the only interesting way to perform scalable coding of TV or HDTV image sequences relies on the concepts and techniques introduced in the MPEG2 scalable extensions [1]. However the performances achievable by such systems, and the conditions for having an efficient use of scalability still remain unclear. The focus of this work is to try to shed some light on the mechanisms and domains of validity of scalable coding.

Introduction Scalable coding of image sequences transmits data in two distinct bitstreams, corresponding to two transmission layers : the base layer and the enhancement layer. These two layers can be decoded together, thus yielding maximum quality, or the base layer can be decoded separately yielding a lower quality or lower resolution image sequence. The idea behind this partitioning is double : 9 When transmitting data through a noisy channel this enables us to distribute the information between two separate bitstreams according to its importance in the final decoding process. We then obtain graceful degradation of quality with channel noise power by providing higher protection to the base layer through channel coding. 9 It eases interworking of video services and compatibility with existing standards. For example depending on the connection used, the type of receiver or the fees paid, each consumer will be able to decode both layers or just the base layer. He will then have access to the baseline service (Digital Television), or to a high quality service

(HDTV).

However, as underlined in [2], scalability has a cost. A scalable decoder, decoding both layers will be more complex than a simple decoder, intended for non-scalable transmission and yielding the same quality. Furthermore, the transmission bit rate necessary for achieving a given full resolution image quality will be higher when splitting the bitstream into two scalable streams than when transmitting one single non-scalable bitstream. As we will see in the next section, the question of which scalabilities are valuable to implement and how to optimize their implementation has been extensively addressed in the past and led to <<scalable extensions >>in the recent MPEG2 ISO norm [ 11 ]. However with this coding norm, a new problem arises on which this work will focus. Since MPEG2 norm gives manufacturers the possibility to implement some form of scalability in their decoders, it is important to know for which services and in which cases these <~are helpful. However until now literature gave a very incomplete answer to this question : most articles only gave isolated performance measures, based on PSNR vs. rate evaluations on one sequence and most of these studies were only concerned with contribution coding. Recently, one article tried to analyze the performances of scalability on several sequences with some theoretical support [2]. However it failed to take the influence of motion compensation into account thus leading to a somewhat biased evaluation of scalability. This contribution is intended to complete and shed a new light on the results of [2] by providing a new analysis of scalability focused on the concept of <
Baseline coding scheme There are several kinds of scalable coding [ 1], but here we will mainly focus on what is called <(spatial scalability )). In this technique the base layer represents the original sequence at a lower (quarter-) resolution, and the enhancement layer contains the additional information necessary to reconstruct the original full-resolution sequence. This scalability is the most interesting to study for two reasons. First, as underlined in [2], among all those retained by MPEG2, it is the only one to really require some additional complexity of the decoder. Secondly a careful study shows that results obtained on the complex problem of spatial scalability can easily be extended to the other main scalable extensions defined in MPEG2 9SNR scalability and data partitioning. The question of spatial scalable coding was first addressed in [8], [9], and after having tried to develop some customized coding schemes for scalable coding such as in-band motion compensation [6], [10], [14], 3D subband coding [3], [4], [5], it was concluded that only a simple extension of hybrid coding schemes could yield some valuable implementations of scalable coding. The problem was then to optimize the implementation of these extensions. A lot of work was done on this subject, and led to several interesting results" 9 Among the several possible ways to share information between the two layers, Bosveld [13] showed, basing his demonstration on rate-distortion theory, that ~
250 information associated with low resolution is transmitted in two steps. First we transmit it with a rough quantization in the base layer. Secondly the base-layer quantization error is quantized with a finer step and transmitted in the enhancement layer 1. 9 If one wants to implement a (( drift-flee >>base layer decoder, one has to use two imbedded motion compensation loops in the scalable coder, each of them corresponding to a resolution. 9 The transformation used [12], [9] has to enable easy high-quality reconstruction of lower resolutions using only part of the transform coefficients. A very interesting transformation was provided by [12], yielding subband PQMF filters with complexity similar to DCT but splitting the frequency space in a more adequate way than DCT for hierarchical coding. 9 The quantization [3] used in the two layers are linked by some <>. In the case of uniform quantization, the quantization step of the base layer has to be a power of two times that of the enhancement layer. Based on these considerations and in order to have as precise an analysis as possible of scalable coding, we used as a baseline for our experiments a scheme that significantly deviates from MPEG2 specifications. In this scheme, depicted in Figure 1, we use a PRMF subband transformation instead of an 8x8 DCT and the quantization step of scalar quantization (SQ) is constrained to be a power of two. The rest of the scheme however, conforms to MPEG2 specifications, uses hierarchical block matching, an IBP GOP structure, and, what is most important, takes practical implementation constraints into account (such as limited precision of number representation, real VLC encoding and costs based on the construction of a structured bitstream). We also had to restrict ourselves to the study of TV and 88 "IV spatial scalability for implementation purposes. Though it would have been more accurate to directly work on TV and HDTV, our conclusions on TV and 88TV can easily be generalized to this case.

Scalable coding scheme The scheme in Figure 1 could be called a <<simulcast >>coding scheme since it codes separately each resolution. We used this scheme as a reference for our study since simulcast coding is the only way to transmit several resolutions with non scalable schemes. Another reference is the <<standalone >>scheme using the total bit rate for coding only the full resolution. Based on the simulcast scheme spatial scalability can be introduced as an extra refinement of the high resolution temporal prediction using the decoded low resolution sequence. We first wanted to study the influence of this <~refinement>> on the scalable performances. So we designed several spatial scalable schemes, using several <[ 1]. Instead of choosing between spatial and temporal prediction in the spatial domain, this choice is made in the transform domain. This technique is not only simpler (no interpolation required), it is also more accurate since PQMF filtering leads to perfect correspondence between low and high resolution transform coefficients [12]. 3. A third possibility investigated is to directly predict the high resolution residual by the low resolution coded residual [ 15]. This approach has a simple implementation but suffers from the mismatch of the low resolution and highresolution temporal predictions. It tends to decorrelate the residuals and makes the refinement less efficient. These three schemes were tested on several sequences 2. As expected, the more we try to have a spatial prediction coherent with the high resolution predicted signal, the more efficient the scalable scheme is. So as can be concluded from the results in Table 1, the f'~t scheme performed better than the second and much better than the third. However as shown by the performances on MOBCALgiven in Table 1, the performances of spatial scalable coding are very dependent on the content of each sequence. The repartition of the scalable coding gain, shown in Table 2 illustrates the influence of the motion content of each sequence on scalability. Sequences with high motion like HORSE lead to a poor motion estimation and thus a high scalable coding gain, while sequences with less motion like MOBCAL yield good motion estimation and thus the only scalable coding gain is provided by the prediction of Intra-coded pictures. We also notice that no gain is achieved on chrominance planes, too poorly coded in the base layer.

tayer:2 m/tot t-4 I MOBZLe

FLOWER

Frequency scalability ] 14% 25% 15%' e MPEG-like ~ interpolation [ 12.5% 14% 12% ......... Residual pred iction l ....... 12% ...................... 8% .................. 10% ...... Table I : Entropy gained by spatial scalable coding vs. base layer bit rate

251

B~e layer=2MB . . . . . . . . . . .

MOBILE CALE,NDAR ...........] . . . . . HORSE (total : 100%)................. l (total" 100%) Full resolution=4Mb ...... 1I (~ P (~ B (%)....... 11 (%) P (%) B (%) .............. 5 -12 31 38 ~ Z G - i i ~ ; i n t e r p o l a t i o n [ 108 Frequency scalability ] 115 5 -20 ! 57.5 33 10 Residualprediction . . . . . . . . 125............ -6 -19 ..... 194 ....... 13 -7 .................. Table 2 9Repartition of entropy gained by spatial scalable coding between each type of image

130.5

We then investigated the influence of coding rate on scalable performances. As shown in Figure 2, the efficiency of scalability tends to saturate when the lower layer bit rate is too low (in this case the only gain comes from Intra pictures) and is also a non-increasing function of the total bit rate. The reason for this is that in these cases the spatial prediction obtained from lower layer is unable to compete with the prediction obtained from high resolution, high quality temporal prediction. We also notice that the maximum gain for FLOWERseems to saturate around 35% even when coding the lower layer more precisely than the full resolution 2. This loss is linked to the partition of frequencies between layers in scalable coding and proves the importance of high frequencies even for estimating the low frequencies of the next image.

Conclusion In this paper we studied spatial scalability and showed that its performance is mainly determined by the quality of the motion compensation in the main layer compared to the coding quality of the base layer. We also showed that, in opposition to what is conjectured in [2], the performances of spatial scalability are not a non-increasing function of the base layer bit-rate. However the analysis in [2] completes our approach, taking the rate-distortion function of hybrid coding into account, and the combination of our results with those of [2] gives a very pertinent analysis of the general problem of scalable coding.

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [ 12] [13] [14] [15]

J. Delameilleure, S. Pallavicini, <<Scalability in MPEG2>>,Proc. HAMLET Race 2110 Workshop, pp.69-75, February 27-28 th, Rennes, France, 1996. J. Delameilleure, S. Pallavicini, <>, Signal Processing : Image Communication 4, pp. 245-262 1992. J. F. Vial, ~t Multiresolution coding schemes with layered bitrate regulation >>, Proc. VI th InternI Workshop on HDTV, Ottawa, Canada, 1993.

252

Figure 1 9Baseline scheme

Figure 2" Efficiency of scalability as a function of rate (FLOWER)

i The work of Bosveld confirms the statement of [2] that SNR scalability is in general more interesting than data

~ artitioning

We evaluate the spatial scalability gain as the entropy spare by adding spatial prediction based on the decoded base layer to the simulcast coder. Thus we evaluate this gain as a rate gain, the quality being kept constant.

Proceedings IWISP '96; 4-7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

253

IMAGE TRANSMISSION PROBLEMS BETWEEN IP AND ATM NETWORKS Prof. Mkrttel/V~. Ass. Prof.Eramlan JLV. Shdmt i w ~ a n LL Republic of Armenia Abstract: There is a need everywhere for fast data communication in public and private networks. Asynchronous Transfer Mode (ATM) is decided by CCITT to be the target switching and multiplexing technique for the B-ISDN. Traditional Local Area Networks (LANs) like Ethernet, Token Ring and Token Bus are limited in speed (10Mb) and thus are limited to particular type of (mainly data) applications. For multimedia applications, the bandwidth requirement is high and the information is a combination of voice, video and data and it requires a transfer mode capable of transporting and switching these different types of information. ATM satisfies this requirement for LANs. ATM allows transmission capacities up to 622 Mbit/s which is enough for most LAN applications. ATM carries all types of information - voice, data and video - in a common cell(packet) with standard format of 53 bytes ( 48 bytes of user information and 5 bytes of control information). And also ATM offers increased bandwidth and greater flexibility and manageability. In the other hand nowadays most of us are surrounded by powerful computer systems with graphics-oriented input and output including the entire spectrum of PCs, professional workstations and supercomputers. The existing local area networks are primarily based on shared media interconnections, which are likely to become potential bottlenecks not only because of new multimedia applications but also because of rapid growth of services employing simple data transfers. ATM, a switching and multiplexing standard for broadband integrated networks, is viewed as an emerging technology capable of removing this bottleneck. But its success as a LAN technology depends on its ability to provide LAN-like services compatible with existing protocols and applications. This approach is taking the idea of LAN emulation, and allows ATM switches to be transparently interconnected with shared media legacy LANs running the IEEE 802 family of LAN protocols. This article has aim to resolve a lot of problems advising when transmitting images between IP and ATM networks. The problem is beans reserved in the network layer. INTRODUCTION Most of existing LANs are based on shared media interconnections and employ the IEEE 802 family of LAN protocols, which includes the Ethernet, the Token Ring. And in the other hand we have ATM which is connection-oriented. Having counted that with the current interest in ATM technology, it is likely that ATM switches and interfaces will become faster and cheaper than the shared medium technologies it is growing the problem to configure IP over ATM. The goal of this article is to allow compatible and interpretable implementations for transmitting IP datagrams and ATM Address Resolution Protocol (ATMARP) requests and replies over ATM Adaptation Layer 5 (AAL5) [ 1]. Any reference to virtual connections, permanent virtual connections or switched virtual connections applies only to virtual channel connections used to support IP and address resolution over ATM and thus is assumed to be using AAL5. This article describes the initial deployment of ATM within "classical" IP networks as a direct replacement for local area networks (ethane's) and for IP links which interconnect routes, either within or between administrative domains. The "classical" model here refers to the treatment of the ATM host adapter as a networking interface to the IP protocol stack operating in a LAN-based paradigm. Characteristics of the classical model are: -The same maximum transmission unit (MTU) size is used for all VCs in a LIS [2]. -Default LLC/SNAP encapsulation of IP packets. -IP addresses are resolved to ATM addresses by use of an ATMARP service within the LIS. -ATMARPs stay within the LIS. From a client's perspective, the ATMARP architecture model presented in [ 1].

254

~MAIN BODY The deployment of ATM into the Internet community is just beginning and will take many years to complete. During the early part of this period, we expect deployment to follow traditional IP subnet boundaries. Initial deployment of ATM provides a LAM segment replacement for Local Area Networks (e.g., Ethane's, Token Rings and FDDI). In such case local IP routes with one or more ATM interfaces will be able to connect islands of ATM networks. Characteristics and features of ATM networks are different than those found in LANs: 1) ATM provides a Virtual Connection (VC) switched environment. VC set-up may be done on either a Permanent Virtual Connection (PVC) or dynamic Switched Virtual Connection (SVC) basis. 2) Data to be passed by a VC is segmented into 53 octet quantities called (5 octets of ATM header and 48 octets of data). With respect to IP and other network layer protocol or as a MAC (the medium access control) protocol below the LLC (logical link control) [3]. The recent approach is the key idea behind LAN emulation, and allows ATM switches to be transparently interconnected with shared media legacy LANs running the IEEE 802 family of LAN protocols. To clarify above said, I must explain that LAN emulation simply means that the point-to-point ATM switch should give the appearance of a virtual shared medium. Also although ATM is connection-oriented, the broadcast feature can be emulated in an ATM network using dedicated servers. Each ATM host is assigned an ATM address that can be based either on a hierarical 8-byte-long ISDN telephone number or a 20-byte address proposed by the ATM Forum [2]. That is why implementing IP directly over ATM will require translating an IP address to an ATM address. Thus we need to maintain a server referred to as the IP-ATM-ARP Server. This server needs to maintain tables that can translate an IP address to ATM address. The interaction between the hosts and IP-ATM-ARP Server can be implemented using a simple query/response protocol. The host interface will be the same as that in the case of the LAN Emulation except that the address cache to send a packet to a destination IP host, it obtains the corresponding ATM address from the address cache and passes the IP packet and the ATM address to the processing entry that reforms the connection management function. 9 In the LIS scenario, each separate administrative entity configures its hosts and routers within a closed IP subnetwork. Each LIS operates and communicates independently of other LISs on the same ATM network. Hosts, connected to ATM, communicate directly to other hosts within the same LIS. Communication to hosts outside of the local LIS is provided via an IP router. This route is an ATM endpoint attached to the ATM network that is configured as a member of one or more LISs. This configuration may result in a number of disjoint LISs operating over the same ATM network. Hosts of differing IP subtends MUST communicate via an intermediate IP route even though it may be possible to open a direct VC between the two IP members over the ATM network. The requirements for IP members (hosts, routers) operating in an ATM LIS configuration arte: -all members have the same IP network/sublet number and address mask; -all members within a LIS are directly connected to the ATM network; -all members outside of the LIS are accessed via route; -all members of a LIS must have a mechanism for resolving IP addresses to ATM addresses via ATMARP and vice versa via InATMARP (based on [4]); -all members within a LIS MUST be able to communicate via ATM with all other members in the same LIS; i.e., the connection topology underlying the intercommunication among the members is fully meshed. The following list identifies a set of ATM specific parameters that must be implemented in each IP station connected to the ATM network: -ATM Hardware Address. The ATM address of the individual IP station; -ATM ARP Request Address, that is the ATM address .of an individual ATMARP server located within the LIS. The default MTU size for IP members operating over the ATM network shall be 9180 octets. IP members must register their ATM endpoint address with their ATMARP server using the ATM address structure appropriate for their ATM network connection.

255 In an SVC environment ATMARP requests are sent to this address for the resolution of target protocol addresses to target ATM addresses. That server must have authoritative responsibility for resolving ATMARP requests of all IP members within the LIS. I must also note that if the LIS is operating with PVCs only, then this parameter may be set to null and the IP station is not required to sent ATMARP request to the ATMARP server. ATM does not support broadcast addressing, therefore there are no mappings available from IP broadcast addresses to ATM broadcast services. ATM does not support multicast address services, therefore there are no mappings available from IP multicast addresses to ATM multicast services. As to ATM switching it is also known as fast packet switching. ATM switching node transports cells from incoming links to outgoing links using the routing information contained in the cell header and information stored at each switching node using connection set-up procedure. Two functions at each switching node are performend by the connection set-up procedure: 1. the unique connection identifier at the incoming link and the unique connection identifier at the outgoing link are defined for each connection; 2. routing tables at each switching node are set up to provide an association between the incoming and outgoing links for each connection. VPI and VCI are the two connection identifies used in ATM cells. Thus the basic functions of an ATM switch can be stated as follows: - routing (space switching) which indicates how the information is internally routed from the inlet to outlet, queuing which is used in solving contention problems if 2 or more logical channels content for the same output; - and final function in header translation that all cells which have a header equal to some value on incoming link are switched to outlet and their header is translated to a value, say k. There are also many functions involved in the traffic control of ATM networks[5]. For example Connection Admission Control. This can be defined as a set of actions taken by the network during the call set-up phase to establish whether a VC/VP connection can be made. A connection request for a given call only be accepted if sufficient network resources are available to establish the end-to-end connection maintaining its required quality of service and not affecting the quality of service of existing connections in the network by this nem connection[5]. CONCLUSIONS In this article we have addressed the issues that are involved in implementing IP in ATM LANs. In LAN emulation, ATM is configured as an IEEE 802 MAC protocol below LLC. This not only allows ATM switches to be transparently integrated with the IEEE 802 family of LAN protocols. The key issue in configuring this involves resolving IP addresses to ATM addresses and providing transparency of ATM can provide the full functionality's of the network layer and data link protocols, and as a result, it is possible to implement transport layer protocols such as TCP directly over ATM. And this the IP/ATM problems image processing and transmitting difficulties at the network layer. R E F E R E N C E S

[1]

[2] [3] [4] [5]

J. Heinanen, Multiprotocol Encapsulation over ATMAdaptation Layer 5,RFC 1483, Telecom Finland, July 1993. ATM Forum, ATM User-Network Interface Specification Version 3.0, Prentice Hall, 1993. M. Laubach, Classical IP and ARP over ATM, ATM, RFC 1577, Information Sciences, January 1994. D. Plummer, An Ethernet Adress Resolution Protocol or Converting Network Address for Transmission on Ethernet Hardware, STD 37, RFC 826, MIT, November 1992. V. S. Mkrttchian, A.V. Eranosian and H. L. Karamyan, Resolving the Problem oflP on ATMLocal Area Networks, The Problems of the Efficiency Improvement of the Control Systems of Technological Processes(in Russian), AACC. vol. 4, 1995.

This Page Intentionally Left Blank

Proceedings IWISP '96; 4- 7 November 1996; Manchester, U.K. B.G. Mertzios and P. Liatsis (Editors) 9 1996 Elsevier Science B.V. All rights reserved.

257

A Scalable Video Coding Scheme Based on Adaptive Infield/lnframe DCT and Adaptive Frame Interpolation MasatoshiAsada

and KatsutoshiSawada

Department of Information Network Engineering Aichi Institute of Technology Yakusacho, Toyotashi, Aichiken, 470-03 Japan

ABSTRACT This paper describes a spatio-temporal resolution scalable coding scheme. Resolution scalability means a coding property where lower partial resolution pictures can be obtained by decoding only subsets of the total coded bit stream, while the full resolution picture is reconstructed by decoding the total bit stream. This scheme employs frame subsampling associated with adaptive interpolation for temporal scalability and adaptive infield/inframe DCT for spatial scalability. The proposed scheme provides four different spatio-temporal resolutions of a video sequence -- two temporal resolutions, each consisting of two spatial resolutions. This can be applied to interlaced video sequences effectively. Computer simulation results have demonstrated that this scheme has better coding performance compared to conventional non adaptive methods. 1, INTRODUCTION In the resolution scalable coding [ 1]-[6], lower resolution pictures can be obtained by decoding only subsets of the total coded bit stream, while the full resolution picture is reconstructed by decoding the total bit stream. This has some important applications such as the compatibility [2] between different resolution video systems, progressive resolution transmission and image database browsing. It is also useful for error resilient transmission in the case of digital television terrestrial broadcasting [3] and ATM video transmission. There are two kinds of resolution scalability -- spatial scalability and temporal one. A spatio-temporal scalable coding scheme based on subband technique is presented [4]. A spatio-temporal scalable coding scheme proposed in this paper employs flame subsampling associated with adaptive interpolation for temporal scalability and adaptive infield/inframe DCT for spatial scalability. This scheme can be applied to interlaced video sequences effectively. Section 2 describes the outline of the scheme, and sections 3 and 4 discuss the details of temporal and spatial scalable coding, respectively. Section 5 presents computer simulation experimental results. 2. OUTLINE OF THE SPATIO-TEMPORAL SCALABLE CODING SCHEME Fig. 1 shows the block diagram of the proposed spatio-temporal scalable video coding scheme. An input interlaced video sequence is fLrst separated to odd-frames and even-frames. The DCT-based spatial scalable coding is performed in each frame. At the decoder, temporally low resolution pictures are constructed by decoding only odd frames and interpolation of even frames, while temporally full resolution pictures are reconstructed by decoding both odd-frames and even-frames. Spatially low resolution pictures are obtained by decoding only low frequency DCT coefficients, while spatially full resolution pictures are reconstructed by decoding both low and high frequency DCT coefficients. Thus, this scheme provides four different spatiotemporal resolutions of a video sequence - two temporal resolutions, each consisting of two spatial resolutions. Table 1 shows four kinds of spatio-temporal resolutions and corresponding coded data.

258 Spatially ~ _ Temporally

Low Low

Odd frame

m .T..__Hinterpolation~.... t

>

Spatially Full Temporally Low

o: Odd frame e: Even frame L: Spatially low component H: Spatially high component

Spatially Low Temporally Full Even frame

Coder

____> Spatially Full Temporally Full >1<

_1

Decoder

"1

Fig. 1 A spatio-temporal scalable coding scheme Table 1 Reconstructed pictures and corresponding coded data.

~t

Odd

Even

frame

Odd

frame

Odd

frame

frame

~t

Even

Odd

frame

frame

~

Resolution Spatial Temporal Low Low Full Low Low 84 Full Full Full

Corresponding coded data Lo Lo, Ho LO, Le Lo, Ho, Le, He

_

'

~_

(

, :

)

(

)

~

0 0

~ ~

o

..

)

Knots '96: Proceedings Tokyo, 1996

Circuit Cellar (November 1996)

Газета Завтра 155 (47 1996)

Logic colloquium '69: Proceedings Manchester, 1969

Computer Vision - ECCV '96: Fourth European Conference on Computer Vision, Cambridge, UK April 14-18, 1996. Proceedings

Computer Vision - ECCV '96: Fourth European Conference on Computer Vision, Cambridge, UK, April 14 -18, 1996. Proceedings

Advances in Knowledge Acquisition: 9th European Knowledge Acquisition Workshop, EKAW'96, Nottingham, UK, May 14 - 17, 1996. Proceedings: European

Object Representation in Computer Vision II: ECCV '96 International Workshop, Cambridge, UK, April 13 - 14, 1996. Proceedings: v. 2

Advances in Case-Based Reasoning: Third European Workshop, EWCBR-96, Lausanne, Switzerland, November 14 - 16, 1996, Proceedings

Formal Methods in Computer-Aided Design: First International Conference, FMCAD '96, Palo Alto, CA, USA, November 6 - 8, 1996, Proceedings

Discrete Geometry for Computer Imagery: 6th International Workshop, DGCI'96, Lyon, France, November 13 - 15, 1996, Proceedings

Advanced Information Systems Engineering: 4th International Conference CAiSE '92, Manchester, UK, May 12-15, 1992. Proceedings

Digital Mammography: 8th International Workshop, IWDM 2006, Manchester, UK, June 18-21, 2006, Proceedings

Avro Manchester

Manchester House

96)

96)

Manchester House

47

47

96)

Logic in Databases: International Workshop LID '96, San Miniato, Italy, July 1 - 2, 1996. Proceedings

ECOOP '96 - Object-Oriented Programming: 10th European Conference, Linz, Austria, July 8-12, 1996. Proceedings

Coordination Languages and Models: First International Conference, COORDINATION '96, Cesena, Italy, April 15-17, 1996. Proceedings

96)

Manchester House

Software Process Technology: 5th European Workshop, EWSPT '96, Nancy, France, October 9 - 11, 1996. Proceedings

Logics in Artificial Intelligence: European Workshop, Jelia '96, Evora, Portugal, September 30 - October 3, 1996, Proceedings

Intelligent Tutoring Systems: Third International Conference, ITS'96, Montreal, Canada, June 12-14, 1996. Proceedings

Mathematical Foundations of Computer Science 1996 21 conf., MFCS'96

Proceedings IWISP '96, 4-7 November 1996; Manchester, UK

Proceedings IWISP '96, 4-7 November 1996; Manchester, UK

Knots '96: Proceedings Tokyo, 1996

Circuit Cellar (November 1996)

Газета Завтра 155 (47 1996)

Logic colloquium '69: Proceedings Manchester, 1969

Computer Vision - ECCV '96: Fourth European Conference on Computer Vision, Cambridge, UK April 14-18, 1996. Proceedings

Computer Vision - ECCV '96: Fourth European Conference on Computer Vision, Cambridge, UK, April 14 -18, 1996. Proceedings

Advances in Knowledge Acquisition: 9th European Knowledge Acquisition Workshop, EKAW'96, Nottingham, UK, May 14 - 17, 1996. Proceedings: European

Object Representation in Computer Vision II: ECCV '96 International Workshop, Cambridge, UK, April 13 - 14, 1996. Proceedings: v. 2

Advances in Case-Based Reasoning: Third European Workshop, EWCBR-96, Lausanne, Switzerland, November 14 - 16, 1996, Proceedings

Formal Methods in Computer-Aided Design: First International Conference, FMCAD '96, Palo Alto, CA, USA, November 6 - 8, 1996, Proceedings

Discrete Geometry for Computer Imagery: 6th International Workshop, DGCI'96, Lyon, France, November 13 - 15, 1996, Proceedings

Advanced Information Systems Engineering: 4th International Conference CAiSE '92, Manchester, UK, May 12-15, 1992. Proceedings

Digital Mammography: 8th International Workshop, IWDM 2006, Manchester, UK, June 18-21, 2006, Proceedings

Avro Manchester

Manchester House

96)

96)

Manchester House

47

47

96)

Logic in Databases: International Workshop LID '96, San Miniato, Italy, July 1 - 2, 1996. Proceedings

ECOOP '96 - Object-Oriented Programming: 10th European Conference, Linz, Austria, July 8-12, 1996. Proceedings

Coordination Languages and Models: First International Conference, COORDINATION '96, Cesena, Italy, April 15-17, 1996. Proceedings

96)

Manchester House

Software Process Technology: 5th European Workshop, EWSPT '96, Nancy, France, October 9 - 11, 1996. Proceedings

Logics in Artificial Intelligence: European Workshop, Jelia '96, Evora, Portugal, September 30 - October 3, 1996, Proceedings

Intelligent Tutoring Systems: Third International Conference, ITS'96, Montreal, Canada, June 12-14, 1996. Proceedings

Mathematical Foundations of Computer Science 1996 21 conf., MFCS'96

Recommend Documents