PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON
Illllllllll Illllllllll P A T T E R N RECOGNITION E D I T O R
P I N A K P A N I
P A L
!
I
Hill II
P A T T E I! N RECOGNITION
PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON
Mil II Mill I
Mill II
P A T T E R N RECOGNITION Indian S t a t i s t i c a l I n s t i t u t e , K o l k a t a . India 2 - A J a n u a r y 2007
I T 0 I P I N A K P A N I
PAL
Indian Statistical Institute, India
^jjp World Scientific NEW JERSEY • LONDON • SINGAPORE • BEIJING • S H A N G H A I • H O N G K O N G • TAIPEI • C H E N N A I
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
ADVANCES IN PATTERN RECOGNITION Proceedings of the Sixth International Conference on Advances in Pattern Recognition (ICAPR 2007) Copyright © 2007 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN-13 978-981-270-553-2 ISBN-10 981-270-553-8
Printed in Singapore by B & JO Enterprise
PREFACE
The Electronics and Communication Sciences Unit (ECSU) of the Indian Statistical Institute is organizing the sixth International Conference on Advances in Pattern Recognition (ICAPR 2007) at the Indian Statistical Institute, Kolkata, from 2 n d to 4 t h January, 2007. Since the advent of knowledge based computing paradigm, pattern recognition has become an active area of research involving scientists and engineers from different disciplines of the physical and earth sciences. A number of conferences are being organized every year, which act as platforms to present and exchange ideas on different facets of pattern recognition. It is needless to mention that ICAPR has carved out a unique niche within this list of conferences on pattern recognition, particularly for its continued success in focusing upon application-driven research. We are confident t h a t the programme of this ICAPR will be as exciting as the previous ones. You may be aware of the overwhelming response that we received since the publication of t h e call for papers for ICAPR 2007 in February 2006. We received 123 papers from 32 different countries. Given the constraint on time like any other three-day conference, it was indeed difficult for us to select a very few out of these high-quality technical contributions. I am thankful to the learned members of the programme committee whose untiring effort helped me to ultimately select a total of 68 papers for oral presentation. The selected papers represent a number of important frontiers in pattern recognition ranging from Biometrics, Document Analysis, Image Registration & Transmission to traditional areas like Image Segmentation, Multimedia Object Retrieval, Shape Recognition and Speech & Signal Analysis. I am happy that we shall see an excellent balance of theory and application focused research in the programmes of ICAPR 2007. Another important aspect of the programme will be a series of invited talks by renowned exponents in the field of pattern recognition and related areas. We look forward to listening to plenary speakers Prof. K. V. Mardia, Prof. T. Tan and Prof. E. J. Delp. I am confident that it would also be a rewarding experience for all of us to interact with o u r invited speakers Prof. I. Bloch and Prof. V. Lakshmanan. Publication of a proceedings of this standard requires tremendous infrastructural support. I am fortunate to have a very active advisory committee who extended their support whenever we required it. The organizing committee is working hard to make the event a grand success. The editorial workload was huge b u t Bibhas Chandra Dhara and Partha Pratim Mohanta, made it easy for me through their hard work and dedication. I must acknowledge the staffs of the ECSU for their untiring support to conference secretariat. Particularly I am thankful to N. C. Deb, D. K. Gayen and S. K. Shaw for their support to technical work. The administrative responsibilites are being organized by S. K. Seal, S. Sarkar, D. Mitra, R. Chatterjee, D. Shaw, S. S. Das and supported by S. Deb. I am also thankful to WebReview Team, Katholieke Universiteit Leuven, ESAT/COSIC, to let me use their WebSubmision and WebReview Software. This made our job easier. Of course, the World Scientific editorial team lent a graceful touch to the printed format of this publication. I also acknowledge the help of Subhasis Kumar Pal for maintaining our webserver problem free. We also thank our sponsors for their kind help and support. I conclude with my heartfelt thanks to the contributors, for submitting their papers, and who now are ready to present it before the august audience of ICAPR. I am sure that this collection of papers and their presentation will motivate us to explore the further research and the advances made in pattern recognition. Thank you.
Pinakpani Pal Electronics and Communication Sciences Unit Indian Statistical Institute
I N T E R N A T I O N A L ADVISORY C O M M I T T E E
Chairman Sankar Kumar Pal, India
Members Shun-ichi Amari, Japan Gabriella Sanniti di Baja, Italy Horst Bunke, Switzerland Bulusu Lakshmana Deekshatulu, India S.C. Dutta Roy, India Vito Di Gesu, Italy J.K. Ghosh, India Anil K. Jain, USA Nikola Kasabov, New Zealand Rangachar Kasturi, USA M. Kunt, Switzerland C.T. Lin, Taiwan M.G.K. Menon, India A.P. Mitra, India Heinrich Niemann, Germany Witold Pedrycz, Canada V.S. Ramamurthy, India C.R. Rao, USA Erkki Oja, Finland Lipo Wang, Singapore Jacek M. Zurada, USA
viii
General Chair D. Dutta Majumder, ISI, Kolkata
Plenary Chair Nikhil Ranjan Pal, ISI, Kolkata
Tutorial Chair Bhabatosh Chanda, ISI, Kolkata
Organizing Committee Arun Kumar De (Chairman), ISI, Kolkata Partha Pratim Mohanta (Convener), ISI, Kolkata B. D. Acharya, DST, New Delhi Aditya Bagchi, ISI, Kolkata Bhabatosh Chanda, ISI, Kolkata Bidyut Baran Chaudhuri, ISI, Kolkata Narayan Chandra Deb, ISI, Kolkata Malay Kumar Kundu, ISI, Kolkata Jharna Majumdar, ADE, Bangalore Dipti Prasad Mukherjee, ISI, Kolkata C. A. Murthy, ISI, Kolkata Nikhil Ranjan Pal, ISI, Kolkata Srimanta Pal, ISI, Kolkata S. Rakshit, CARE, Bangalore Kumar Sankar Ray, ISI, Kolkata Bimal Roy, ISI, Kolkata S. K. Sarkar, NPL, New Delhi Swapan Kumar Seal, ISI, Kolkata Bhabani Prasad Sinha, ISI, Kolkata
ix
INTERNATIONAL PROGRAMME COMMITTEE
Abhik Mukherjee, BESU, Shibpur Abraham Kandel, University of South Florida Tampa Amit Das, BESU, Shibpur Amita Pal, ISI, Kolkata Amitabha Mukerjee, IIT, Kanpur Anup Basu, University of Alberta Basabi Chakraborty, Iwate Prefectural University, Japan Bhabatosh Chanda, ISI, Kolkata Bhargab Bhattacharya, ISI, Kolkata Bidyut Baran Chaudhuri, ISI, Kolkata Bimal Roy, ISI, Kolkata Brian C. Lovell, The University of Queensland, Australia C. A. Murthy, ISI, Kolkata C. V. Jawahar, HIT, Hyderabad Dipti Prasad Mukherjee, ISI, Kolkata Hisao Ishibuchi, Osaka Prefecture University, Japan Irina Perfiljeva, University of Ostrava, Czech Republic Isabelle Bloch, ENST, France Jayanta Mukhopadhyay, IIT, Kharagpur Koczy T. Laszlo, Hungary Kumar Shankar Ray, ISI, Kolkata Lipo WANG, Nanyang Technological University, Singapore Malay Kundu, ISI, Kolkata Mrinal Mondal, University of Alberta, Canada Nikhil Ranjan Pal, ISI, Kolkata Niladri Chatterjee, IIT, Delhi Okyay Kaynak, Bogazici University, Turkey Olli Simula, Helsinki University of Technology, Finland Oscar Castillo, Tijuana Institute of Technology, Mexico Punam Saha, University of Pennsylvania, USA Ryszard S. Choras, Institute of Telecommunications, Poland Sanjoy Saha, JU, Kolkata Sansanee Auephanwiriyakul, Chiang Mai University, Thailand Scott Acton, University of Virginia, USA Sid Ray, Monash University, Australia Somnath Sengupta, IIT, Kharagpur Soo-Young Lee, Korea Advanced Institute of Sc. & Technology, Korea Subhashis Banerjee, IIT, Delhi Subhasis Choudhury, IIT, Bombay Sukhendu Das, IIT, Madras Sung-Bae Cho, Yonsei University, Korea Takeshi Furuhashi, Nagoya University, Japan Visvanathan Ramesh, Siemens Corporate Research Inc., USA Yutaka Hata, University of Hyogo, Japan Pinakpani Pal (Chairman), ISI, Kolkata
ADDITIONAL REVIEWERS
Aditya Bagchi, ISI, Kolkata Arijit Bishnu, IIT, Kharagpur Arun K. De, ISI, Kolkata Ashish Ghosh, ISI, Kolkata Bibhas Chandra Dhara, Jadavpur University Debrup Chakrabarty, CINVESTAV, IPN, Mexico Durga Prasad Muni, ISI, Kolkata Mandar Mitra, ISI, Kolkata Nilanjan Ray, University of Alberta, USA Oscar Montiel, Tijuana Institute of Technology, Mexico Patricia Melin, Tijuana Institute of Technology, Mexico Roberto Sepulveda, Tijuana Institute of Technology, Mexico Somitra Kumar Sanadhya, ISI, Kolkata Srimanta Pal, ISI, Kolkata Subhamay Maitra, ISI, Kolkata Swapan Kumar Parui, ISI, Kolkata Umapada Pal, ISI, Kolkata Utpal Garain, ISI, Kolkata
SPONSORS
Adobe
RD I N D I A SCIENCE LAB
Reserve Bank of India
Xlll
CONTENTS
Preface
v
International Advisory Committee
vii
Organizing Committee
viii
International Programme Committee Additional Reviewers Sponsors
Part A
x xi
Plenary Lecture
Why Statistical Shape Analysis is Pivotal to the Modern Pattern Recognition? Kanti V. Mardia
Part B
ix
Invited Lectures
1 3
13
On the Interest of Spatial Relations and Fuzzy Representations for Ontology-Based Image Interpretation 15 Isabelle Block, Celine Hudelot and Jamal Atif A Technique for Creating Probabilistic Spatio-Temporal Forecasts V. Lakshmanan and Kiel Ortega
Part C
Biometrics
An Efficient Measure for Individuality Detection in Dynamic Biometric Applications B. Chakraborty and Y. Manabe Divide-and-Conquer Strategy Incorporated Fisher Linear Discriminant Analysis: An Efficient Approach for Face Recognition S. Noushath, G. Hemantha Kumar, V. N. Manjunath Aradhya and P. Shivakumara
26
33 35
40
Ear Biometrics: A New Approach Anupam Sana, Phalguni Gupta and Ruma Purkait
46
Face Detection using Skin Segmentation as Pre-Filter Shobana L., Anil Kr. Yekkala and Sameen Eajaz
51
Face Recognition Using Symbolic KDA in the Framework of Symbolic Data Analysis P. S. Hiremath and C. J. Prabhakar
56
Minutiae-Orientation Vector Based Fingerprint Matching Li-min Yang, Jie Yang and Yong-liang Zhang Recognition of Pose Varied Three-Dimensional Human Faces Using Structured Lighting Induced Phase Coding Debesh Choudhury Writer Recognition by Analyzing Word Level Features of Handwritten Documents Prakash Tripathi, Bhabatosh Chanda and Bidyut Baran Chaudhuri
Part D
Clustering Algorithms
62
66
73
79
A New Symmetry Based Genetic Clustering Technique for Automatic Evolution of Clusters Sriparna Saha and Sanghamitra Bandyopadhyay
81
A Non-Hierarchical Clustering Scheme for Visualization of High Dimensional Data G. Chakraborty, B. Chakraborty and N. Ogata
88
An Attribute Partitioning Approach to Correlation Connected Clusters Vijaya Kumar Kadappa and Atul Negi
93
Part E
Document Analysis
A Hybrid Scheme for Recognition of Handwritten Bangla Basic Characters Based on HMM and MLP Classifiers U. Bhattacharya, S. K. Parui and B. Shaw
99 101
An Efficient Method for Graphics Segmentation from Document Images S. Mandal, S. P. Chowdhury, A. K. Das and B. Chanda
107
Identification of Indian Languages in Romanized Form Pratibha Yadav, Girish Mishra and P. K. Saxena
112
Online Bangla Handwriting Recognition System K. Roy, N. Sharma, T. Pal and U. Pal
117
Oriya Off-Line Handwritten Character Recognition U. Pal, N. Sharma and F. Kimura
123
Recognition of Handwritten Bangla Vowel Modifiers S. K. Parui, U. Bhattacharya and S. K. Ghosh
129
Template-Free Word Spotting in Low-Quality Manuscripts Huaigu Cao and Venu Govindaraju
135
Unconstrained Handwritten Digit Recognition: Experimentation on MNIST Database V. N. Manjunath Aradhya, G. Hemantha Kumar and S. Noushath
140
Part F
Image Registration and Transmission
145
An Adaptive Background Model for Camshift Tracking with a Moving Camera R. Stolkin, I. Florescu, G. Kamberov
147
Colour and Feature Based Multiple Object Tracking Under Heavy Occlusions Pabboju Sateesh Kumar, Prithwijit Guha and Amitabha Mukerjee
152
DCT Properties as Handle for Image Compression and Cryptanalysis Anil Kr. Yekkala, C. E. Veni Madhavan and Narendranath Udupa
157
Genetic Algorithm for Improvement in Detection of Hidden Data in Digital Images Santi P. Maity, Prasanta K. Nandi and Malay K. Kundu
164
High Resolution Image Reconstruction from Multiple UAV Imagery Jharna Majumdar, B. Vanathy and Lekshmi S.
170
Image Registration and Object Tracking via Affine Combination Nilanjan Ray and Dipti Prasad Mukherjee
175
Progressive Transmission Scheme for Color Images Using BTC-PF Method Bibhas Chandra Dhara and Bhabatosh Chanda
180
Registration Algorithm for Motion Blurred Images K. V. Arya and P. Gupta
186
Part G
Image Segmentation
191
Aggregation Pheromone Density Based Change Detection in Remotely Sensed Images Megha Kothari, Susmita Ghosh and Ashish Ghosh
193
Automatic Brain Tumor Segmentation Using Symmetry Analysis and Deformable Models Hassan Khotanlou, Olivier Colliot and Isabelle Bloch
198
Edge Recognition in MMWave Images by Biorthogonal Wavelet Decomposition and Genetic Algorithm C. Bhattacharya and V. P. Dutta
203
Extended Markov Random Fields for Predictive Image Segmentation R. Stolkin, M. Hodgetts, A. Greig and J. Gilby
208
External Force Modeling of Snakes Using DWT for Texture Object Segmentation Surya Prakash and Sukhendu Das
215
I-FISH: Increasing Detection Efficiency for Fluorescent Dot Counting in Cell Nuclei Shishir Shah and Fatima Merchant
220
Intuitionistic Fuzzy C Means Clustering in Medical Image Segmentation T. Chaira, A. K. Ray and 0. Salvetti
226
Remote Sensing Image Classification: A Wavelet-Neuro-Fuzzy Approach Saroj K. Meher, B. Uma Shankar and Ashish Ghosh
231
Part H
Multimedia Object Retrieval
237
An Efficient Cluster Based Image Retrieval Scheme Using Localized Texture Pattern Saumen Mandal, Sanjoy Kumar Saha, Amit Kumar Das and Bhabatosh Chanda
239
Feature Selection Based on Human Perception of Image Similarity for Content Based Image Retrieval P. Narayana Rao, Chakravarthy Bhagvati, R. S. Bapi, Arun K. Pujari and B. L. Deekshatulu
244
Identification of Team in Possession of Ball in a Soccer Video Using Static and Dynamic Segmentation V. Pallavi, Jayanta Mukherjee, A. K. Majumdar, Shamik Sural
249
Image Retrieval Using Color, Texture and Wavelet Transform Moments R. S. Choras
256
Integrating Linear Subspace Analysis and Iterative Graphcuts For Content-Based Video Retrieval P. Deepti, R. Abhilash and Sukhendu Das
263
Organizing a Video Database Around Concept Maps K. Shubham, L. Dey, R. Goyal, S. Gupta and S. Chaudhury
268
Statistical Bigrams: How Effective Are They in Text Retrieval? Prasenjit Majumder, Mandar Mitra and Kalyankumar Datta
274
Part I
Pattern Recognition
279
Adaptive Nearest Neighbor Classifier Anil K. Ghosh
281
Class-Specific Kernel Selection for Verification Problems Ranjeeth Kumar and C. V. Jawahar
285
Confidence Estimation in Classification Decision: A Method for Detecting Unseen Patterns Pandu R Devarakota and Bruno Mirbach
290
ECG Pattern Classification Using Support Vector Machine S. S. Mehta and N. S. Lingayat
295
Model Selection for Financial Distress Classification Srinivas Mukkamala, Andrew H. Sung, Ram B. Basnet, Bemadette Ribeiro and Aarmando S. Vieira
299
XV11
Optimal Linear Combination for Two-Class Classifiers 0. Ramos Terrades, S. Tabbone and E. Valveny
304
Support Vector Machine Based Hierarchical Classifiers for Large Class Problems Tejo Krishna Chalasani, Anoop M. Namboodiri and C. V. Jawahar
309
Unsupervised Approach for Structure Preserving Dimensionality Reduction Amit Saxena and Megha Kothari
315
Part J
Shape Recognition
319
A Beta Mixture Model Based Approach to Text Extraction from Color Images Anandarup Roy, Swapan Kumar Parui and Utpal Roy
321
A Canonical Shape-Representation for a Polygon Sukhamay Kundu
327
A Framework for Fusion of 3D Appearance and 2D Shape Cues for Generic Object Recognition Manisha Kalra and Sukhendu Das
332
Constructing Analyzable Models by Region Based Technique for Object Category Recognition Yasunori Kamiya, Yoshikazu Yano and Shigeru Okuma
338
DRILL: Detection and Representation of Isothetic Loosely Connected Components without Labeling P. Bhowmick, A. Biswas and B. B. Bhattacharya
343
Pattern Based Bootstrapping Method for Named Entity Recognition A sif Ekbal and Sivaji Bandyopadhyay
349
SCOPE: Shape Complexity of Objects using Isothetic Polygonal Envelope Arindam Biswas, Partha Bhowmick and Bhargab B. Bhattacharya
356
Segmental K-Means Algorithm Based Hidden Markov Model for Shape Recognition and its Applications 361 Tapan Kumar Bhowmik, Swapan Kumar Parui, Manika Kar and Utpal Roy
Part K
Speech and 1-D Signal Analysis
367
Automatic Continuous Speech Segmentation Using Level Crossing Rate Nagesha and G. Hemantha Kumar
369
Automatic Gender Identification Through Speech Analysis Anu Khosla and Devendra Kumar Yadav
375
Error-Driven Robust Particle Swarm Optimization for Fuzzy Rule Extraction and Structure Estimation 379 Sumitra Mukhopadhyay and Ajit K. Mandal
xvm
HMM Based POS Tagger and Rule-Based Chunker for Bengali Sivaji Bandyopadhyay and Asif Ekbal Non-Contemporary Robustness in Text-Dependent Speaker-Recognition Using Multi-Session Templates in an One-Pass Dynamic-Programming Framework V. Ramasubramanian, V. Praveen Kumar and S. Thiyagarajan
384
391
Some Experiments on Music Classification Debrup Chakraborty
396
Text Independent Identification of Regional Indian Accents in Spoken Hindi Kamini Malhotra and Anu Khosla
401
Part L
Texture Analysis
An Efficient Approach for Texture Classification with Multi-Resolution Features by Combining Region and Edge Information Using a Modified CSNN Lalit Gupta and Sukhendu Das
405 407
Upper Bound in Model Order Selection of MRF with Application in Texture Synthesis Arnab Sinha and Sumana Gupta
413
Wavelet Features for Texture Classification and Their Use in Script Identification P. S. Hiremath and Shivashankar S.
419
Author Index
PART A Plenary Lecture
W h y Statistical Shape Analysis is Pivotal to the Modern Pattern Recognition?
Kanti V. Mardia Department of Statistics University of Leeds, Leeds, West Yorkshire LS2 9JT, UK E-mail:
[email protected] www. maths, leeds. ac. uk
There have been great strides in shape analysis in this decade. Pattern recognition, image analysis, and morphometries have been the major contributors to this area but now bioinformatics is driving the subject as well, and new challenges are emerging; also the methods of pattern recognition are evolving for bioinformatics. Shape analysis for labelled landmarks is now moving to the new challenges of unlabelled landmarks motivated by these new applications. ICP, EM algorithms, etc. are well used in image analysis but now Bayesian methods are coming into the arena. Dynamic Bayesian networks are other developments. We will discuss the problem of averaging, image deformation, projective shape and Bayesian alignment. The aim of this talk will be to convince the scientists that statistical shape analysis is pivotal to the modern pattern recognition. Keywords: Bayesian analysis; Bioinformatics; Protein gel; Deformation; Average image; Discrimination; Penalized likelihood.
1. Introduction We have reviewed the topic over the years starting from two volumes 1 ' 2 in 1993, 1994. The subsequent reviews until 2001 include papers 3 " 6 . Since then the subject has grown especially for shapes on manifold, eg. two recent workshops in the USA of the American Institute of Mathematics in 2005 and the Institute of Mathematical Applications in 2006. Also our Leeds Annual Statistical Research (LASR) Workshops http://www.maths.leeds.ac.uk/ Statistics/workshop have been keeping abreast of the field especially in relation to shapes and images. An excellent treatment of recent developments in shape analysis including shape manifold can be found in the edited volume of Krim and Yezzi.7 Further stride has been due to its new connections with Bioinformatics - a field bursting with challenges. The field of shape analysis as 'covered until 1998 by Diyden and Mardia 8 has dominated mainly by labelled shape analysis. New innovations are now emerging in unlabelled shape analysis. Perhaps Cross and Hancock9 is one of the early statistical papers in the image area via EM-algorithm. Glasbey and Mardia 10 gave some different perspectives through penalized likelihood methods for images. A cross-over to Bioinformatics can also be seen for example in Richmond et al. 11 A Bayesian hierarchical model for unlabelled shape is proposed in Green and Mardia 12 which has
not tried images yet (only bioinformatics). Mardia et al. 13 have given a hybrid approach for image deformation and discrimination where some landmarks are labelled. One of the tools for deformation has been through a part of thin plate spline (TPS) but many other radial functions can be used. Mardia et al. 14 have shown how TPS gives advantages over various radial functions using Brodatz type texture images. Thus, it is important to distinguish between labelled and unlabelled configurations, finite and 'infinite' number of points, outline or solid shape, linear or nonlinear transformations, parametric or nonparametric methods, and so on. We now describe some special topics. 2. Labelled Shape Analysis Consider a configuration of points in M.m. For pattern recognition, applications generally m = 2 or 3. "Shape" deals with the residual-structure of this configuration when certain transformations are filtered out. More specifically, the shape of a configuration consists of its equivalence class under a group of transformations. Important groups for machine vision are the similarity group, the affine group and the projective group. Here the group action describes the way in which an image is captured. For instance if two different images of the same scene are obtained using a pinhole camera, the corresponding transformation between the two images is the composition of two central projections, which is a projective trans-
4
Why Statistical
Shape Analysis is Pivotal to the Modern Pattern
formation. If the two central projections can be approximated by parallel projections, which is the case of remote views of the same planar scene, the projective transformation can be approximated by an affine transformation. Further, if these parallel projections are orthogonal projections on the plane of the camera, this affine transformation can be approximated by a similarity transformation. Therefore, the relationships between these shapes as follows: if two configurations have the same similarity shape then they automatically have the same affine shape; if they have the same affine shape they will have the same projective shape. For example, two squares of different sizes have the same similarity, affine and projective shape whereas a square and a rectangle have the same affine and projective shape but not the same similarity shape. On the other hand, a square and a kite have the same projective shape but not the same affine shape. The word "shape" often refers in statistics to similarity shape where only the effects of translation, scale and rotation have been filtered out (see for example, Dryden and Mardia, 8 ). In recent years, substantial progress has been made in similarity shape analysis since appropriate shape space (e.g. Kendall's space) and shape coordinates (e.g., Bookstein coordinates) have been available. A simple example of Bookstein coordinates is for the shape of a triangle where the shape coordinates are obtained after taking one of the vertices as the origin and rotating the triangle so that the base of the triangle lies on the xaxis, and then rescaling the base to the unit size. The motivation behind such coordinate systems is similar to those in directional statistics where to analyze spherical data one requires a coordinate system such as longitude and latitude (see for example, Mardia and Jupp, 1 5 ). Similar type of coordinates are available for affine shape (Goodall and Mardia, 16 ). For affine shape in 2-D, we can obtain shape coordinates by using three points that determine the direction and the origin of the axes, and the unit length between the points on each of these two axes. A convenient projective shape space as well as an appropriate coordinate system for this shape space has been put forward by Mardia and Patrangenaru 17 where in 2-D, now the coordinate frame consists of four points (0, 0), (0, 1), (1, 0) and (1, 1). This allows reconstruction of three-dimensional image given two-dimensional multiple views of a scene. A "sym-
Recognition?
metrical" approach for projective shape space has been given in Kent and Mardia. 18 3. Unlabelled S h a p e Analysis a n d Bioinformatics Various new challenging problems in shape matching have been appearing from different scientific a r e a s including Bioinformatics and Image Analysis. In a class of problems in Shape Analysis, one assumes t h a t the points in two or more configurations are labelled and these configurations are to be matched after filtering out some transformation. Usually the transformation is a rigid transformation or similarity transformation. Several new problems are appearing where the points of configuration are either not labelled o r the labelling is ambiguous, and in which some points do not appear in each of the configurations. An example of ambiguous labelling arises in understanding the secondary structure of proteins, where we are given not only the 3-dimensional molecular configuration but also the type of molecules (amino acids) a t each point. A generic problem is to match such two configurations, where the matching has to be invariant under some transformation group. There are other related examples from Image Analysis such as matching buildings when o n e has multiple 2-dimensional views of 3-dimensional objects (see, for example, Cross and Hancock 9 ). The problem here requires filtering out the projective transformations before matching. Other examples involve matching outlines or surfaces (see, for example, Chui and Rangarajan 19 ). Here there is no labelling of points involved, and we are dealing with a continuous contour or surface rather than a finite number of points. Duta et al. 20 give a specific example of unlabelled matching solid shapes. Green and Mardia 12 build a hierarchical Bayesian model for the point configurations a n d derive inferential procedure for its parameters. I n particular, modelling hidden point locations as a Poisson process leads to a considerable simplification. They discuss in particular the problem when only a linear or affine transformation has to be filtered out. They also provide an implementation of the resulting methodology by means of Markov chain Monte Carlo (MCMC) samplers. Under a broad parametric family of loss functions, an optimal Bayesian point estimate of the matching matrix have been constructed, which turns out to depend on a single parameter of
Kanti V. Mardia
the family. Also discussed there is a modification to the likelihood in their model to make use of partial label ('colour') information at the points. The principal innovations in this approach are (a) the fully model-based approach to alignment, (b) the model formulation allowing integrating out of the hidden point locations, (c) the prior specification for the rotation matrix, and (d) the MCMC algorithm. We now give some details together with an example. 3.1.
Notation
Consider again two configurations of unlabelled landmarks in d dimensions, Xj, j = 1 , . . . , J and yk, k = 1 , . . . , K, respectively, represented as matrices x ( J x d) and y (K x d), where J is not necessarily the same as K. The objective is to find suitable subsets of each configuration and a suitable transformation, such the the two subconfigurations become closely matched. One of the key parameters is the matching matrix of the configurations which is represented by M (J x K), where Mjk indicates whether the points in the two configurations Xj and yk are matched or not. That is, Mjk = 1, if Xj matches yk, and 0 otherwise. Note that M is the adjacency matrix for the bipartite graph representing the matching, and that Y^,j k Mjk — L number of matches. The transformation g, say, lies in a specified group Q of transformations. Depending on the application, suitable choices for Q include (a), translations, (b) rigid body transformations, (c) similarity transformations, (d) affine transformations and (e) projective transformations. It is sometimes notationally helpful to add an extra column (k = 0) to M to yield Mo, where rrijo = 1 — JZfe=i mjk = 1 if Xj is not matched to any yk, and 0 otherwise. The matrix M (or equivalently A/0) us called a "hard" labelling because each element is either 0 or 1. It is also helpful to consider "soft" labellings given by a J x (K + 1) matrix M*, say, where 0 < m*k < 1 and ^Zk=0 m*k = 1. There is now no constraint on ]T\ m*k. Note that M is symmetric in the roles of j and k, but Mo and M* are not. Thus the overall matching objective can be restated as finding a matrix M and transformation g such that Xj « g(yk) for j , k with mjk = 1, as measured, e.g., by a sum of squares criterion. In computational geometry, the problem is termed the largest common point set (LCP) problem. We consider two
5
related statistical models to tackle the problem. 3.2. Some statistical
approaches
Model 1: Regression models In this approach we condition on the landmark positions for one configuration y and on the matching matrix M (or equivalently Mo), and then model the distribution of the landmarks of the other configuration x. In the hard version of the model, the landmarks Xj, j = 1 , . . . , J are taken to be conditionally independent with Xj ~ Nd(g{yk),o-2Id), when rrijk = 1 for some k, (1) and Xj ~ Nd{g{y0),alld),
when mj0 = 1.
Here OQ »
Xj ~ n0Nd(g(y0),alld)
+ ^^kNd{g{yk),
cr 2 / d ),
fe=i k
=
ne
where Ylo ^ 1- ^ hard membership function Mo does not appear in this formulation, but the posterior probabilities that Xj comes from the class for yk form a soft membership matrix M*. This soft model has been used by several authors including Cross and Hancock,9 Luo and Hancock,21 Walker,22 Chui and Rangarajan, 19 and Kent et al. 23 In this case the EM algorithm can be used to compute the MLEs (at least locally), and it takes the form of a simple explicit iterative updating algorithm. In general it converges quite quickly in this context, though the solution depends heavily on the starting value for g (Kent et al., 23 ), as well as the parameters a and oo- Dryden et al. 24 have treated multiple configurations. Model 2: Bayesian Hierarchical Model The matching problem can also be given a formulation which is symmetric in the roles of j and k by introducing a set of unknown latent sites {fii, i =
6
Why Statistical Shape Analysis is Pivotal to the Modern Pattern Recognition?
1 , . . . , J} to represent a collection of "true" locations. In particular, given rrijk = 1 for some j and k, the key assumption in this approach is that there is a value of i such that XJ ~ N{in,a2I),
g(yj) ~
N{m,a2I).
This approach has been implemented in a Bayesian framework by Green and Mardia (2006) with inference carried out by MCMC. In particular, the hidden points {ni} are generated by a Poisson process with some rate A and the proportion of matches is governed by a parameter p. When g comes from the rigid body group g(y) = Ay+a, where A is a rotation matrix and a is a translation vector, the likelihood can be shown to take the form likelihood oc
p<j>{{xj - Ayk -
IT j,k:mjk
a}/
=l
(2)
where 4>{.) is the density of iV(0,1), after integrating out the {p-i}- This construction defines a hard version of Model 2; it is also possible to define a soft version. We add suitable priors on A, y and a (see, Green and Mardia 12 ). Estimation for the hard versions of Models 1 and 2 is difficult to carry out analytically. Model 2 feels more natural for this situation and an elegant and powerful MCMC algorithm, which avoids the need for reversible jump steps, has been developed by Green and Mardia. 12 The soft versions can be tackled by EM, though the choice of starting point is critical since the likelihood (or posterior) will generally be highly multimodal. 3.3. Example:
Matching
protein
gels
The objective in this example is to match two electrophoretic gels automatically given two gel images (see Fig. 1). However we will assume that the images have been preprocessed and we are given the locations of the centres of 35 proteins on each of the two gels. The correspondence between pairs of proteins, one protein from each gel, is unknown, so our aim is to match the two gels based on these sets of unlabelled points. We suppose that it is known that the transformation between the gels is affine. In this case, experts have already identified 10 points; see Horgan et al. 25 Based on these 10 matches, the linear part of the transformation is estimated a priori (Dryden & Mardia, 8 pp. 20-1, 292-6).
Here, we have only to make inference o n the translation parameter and the unknown matching between certain of the proteins. Figure 2 gives the matches from this method. Note that all o f the expert-identified matches, points 1 to 10 in e a c h set, are declared to be matches with high probability in the Bayesian analysis. 4. I m a g e Deformations Deformation is a basic tool of image analysis (see, for example, Toga, 26 Glasbey and Mardia, 10 Hajnal et al., 27 and Singh et al., 28 ) which maps a region S to a region T in Rd. Consider a landmark-based example. Let U, i = 1 , . . . , k be the configuration of landmarks in the "starting" or "source" image S, and xt, i = 1 , . . . , k, be their homologues in a "target" image T. A useful deformation that takes the i ' s onto the z's and maps every point of S onto some p o i n t of T is the thin-plate spline (Bookstein 29 ; Dryden and Mardia 8 ). In the applied literature, e.g. radiology or evolutionary biology, this is often called a "model", though it is not actually serving that role: i t is a prediction function. In Toga 26 for instance, n o n e of the deformations introduced to relate pairs o f neuroanatomical images are assigned standard errors, as they would necessarily be assigned when these a r e estimates of some underlying model. One common method of fitting a deformation is to use a thin-plate spline for each coordinate of the deformation. It is well-known that the thin-plate spline can also be given a stochastic interpretation as a predictor for a certain (intrinsic) stochastic process conditioned to match the observed values a t the source landmarks. An advantage of the stochastic approach is that confidence limits on the predicted values can be provided. Mardia et al. 30 reviewed the stochastic framework and associated prediction error in detail. There are two common strategies for fitting deformations given information at a set of landmarks. One involves a minimizing a roughness penalty, e.g. for a thin-plate spline, and another involves prediction for a stochastic process, e.g. for a self-similar intrinsic random field. The stochastic approach allows parameter estimation and confidence limits for the predicted deformation. An application is presented there in Mardia et al. 30 from a study of breast data and how the images deform as a function of the imaging procedure.
Kanti V. Mardia 7
C&
(a)
(b) Fig. 1. Gel Images
Mardia et al. 14 address the problem of the distortion effect produced by different types of non-linear deformation strategies on textured images. The images are modelled by a Gaussian random field. They give various examples to illustrate that the model generates realistic images. They consider two types of deformations: a deterministic deformation and a landmark based deformation. The latter includes
Fig. 2. The 17 most probable matches in the gel data, + symbols signify x points, o symbols the y points, linearly transformed by premultiplication by the ixed affine transformation. The solid line for each of the 17 matches joins the matched points, and represents the inferred translation plus noise.
various radial basis type deformations including the thin-plate splines based deformation. The effects of deformations are assessed through Kuilback Leibler divergence measure. The measure is estimated by statistical sampling techniques. It is found empirically that this divergence measure is approximately distributed as a lognormal distribution under various different deformations. Thus a coefficient of variation based on logdivergence provides a natural criterion to compare different types of deformations. They find that the thin-plate splines deformation is almost optimal over the wider class of the radial type deformations. A new method of a compositional approach to multiscale image deformation is given in de Souza et al. 31 Here a general framework is presented for the application of image deformation techniques, in which a smooth continuous mapping is modelled by a piecewise linear transformation, with separate components at a number of discrete scales. These components are combined by composition, which has several advantages over addition. The ideal transformation between two images is the best compromise between matching criteria and smoothness constraints, and have been estimated using both deterministic and stochastic analyses.
8
Why Statistical
Shape Analysis is Pivotal to the Modern Pattern
5. Penalized Image Averaging and Discrimination The importance of statistically based objective discrimination methods is difficult to overstate. The need for such is in many areas. We describe here semi-landmark the procedure of Mardia et al. 13 Glasbey & Mardia 10 provide a landmark free method based on a penalized likelihood to discriminate. However, where landmark information is readily available it is judicious to make full use of it in any discrimination procedure. Mardia et al. 13 expand upon Glasbey & Mardia's 10 method by incorporating landmarks. The approach combines Glasbey and Mardia's method on the full image with Procrustes methods for landmark data; it is a fusion of landmark and standardized image information. The use of the extra landmark information improves the discrimination, giving a wider separation in the studentized difference in means of within and between group measures. In addition, the use of landmarks significantly improves the computational speed compared with the landmark free approach. The penalized likelihood is comprised of similarity and distortion parts. The likelihood measures the similarity between images after warping and the penalty is a measure of distortion of a warping. The images they discriminate, the measures of similarity consist of normalized image information in the twoand three-dimensional settings respectively. We now give some details together with an example.
5.1. Image
averaging
The basic strategy to obtain an average of a sample of images is as follows. We assume the images possess easily identifiable landmarks. The strategy is (1) Obtain the mean shape of landmarks using the Procrustes mean. (2) Register the sample of images to the mean shape. (3) Warp all the images to the mean shape using say thin plate spline. (4) Interpolate the images (if necessary) to give a homologous coordinate system over all the sample images. (5) Average the images over the homologous coordinate system.
Recognition?
5.2. Image distance
and
discrimination
We now describe a general strategy for image discrimination based on penalized likelihood. 5.2.1. Image distance Define a measure of distance between two images /$ and fj, both in m dimensions, to be P(fi, fj) = T(fh ft) + \L(Xi, Xj),
(3)
where T(fi,fj) is a standardised difference i n texture/surface information, L(Xj,Xj) is a landmark based distance between images and A is a weighting parameter. Next, for a sample of n images / i ; • • •, fn, in TTI dimensions with corresponding landmarks X i , . . . , X n , define an average image by minimising n
n
F = ^T(/i,/)+A^L(Xi,M), i=l
(4)
t=l
with respect to A and fi. Here / is the mean image and T and L are as defined in quantity (3). An algorithm to minimise this quantity is described in Section 5.3. The optimum average image is referred to as the perturbed Procrustes average when it is evaluated by an algorithmic method. Discriminant analysis. Consider the case where we wish to allocate an individual image to one of two populations, say HA and l i s - Suppose we are given samples of training data from the distinct populations, these are / i , . • •, / m from n ^ and gi, • •. ,gn from HB- Firstly, we obtain the optimum image averages, / and g via Section 5.3. Then we carry out discriminant analysis on the variables P(h, f) and P(h,g), where P is given in (3) and h is an individual image. In order to use (3) we need to select an appropriate: weightingj)arameter A. We select A which attains the maximum separation between the training samples according to P(ft, / ) , P{gj,f), P{fi,g) and P(gj,g), i = 1 , . . . ,m; j = 1 , . . . ,n, that is t h e distance between each image and each average image. We obtain the optimal choice of A by selecting that which optimises the studentised difference between P within group and P between group means. Here the within group consists of P(fi, f) and P{gj,g) and the between group consists of P(fi,g) and P(gj,f), i = 1 , . . . , m; j = 1 , . . . , n. Mardia et al. 13 have given an explicit expression for optimum A.
Kanti V. Mardia
Assuming that training data has been used to obtain the population mean images and the optimum A has been derived, we can define a discriminant rule, based on image distances, to allocate a new image h in the classical way, eg. Fisher's rule. 5.2.2. Specific image distances in the two dimensional setting Full Procrustes with texture information. We define a measure of distance between two images fc (t) and fj(t), via warping one to the other, to be A ( / * , / i ) = Et(/i(*X i -X 1 (*)) - / J ( t ) ) 2 + Ad F (X,,X i ) 2 .
/.x {0)
The measure of distance Pi given in equation (5) is comprised of a similarity part and a distortion of shape part through the Procrustes distance. The parameter A determines the relative weighting between similarity and distortion. Also, it is worth commenting that we could have alternatively used d/p in (5) for balance in the measure, but it is still valid to use dp- Note, this measure does not detect size differences. However, a size measure could be added to the distance measure Pi. A suitable size term is the ratio of centroid sizes of the landmark configurations. Bending energy with texture information methods. Here, an alternative approach to discrimination is given, where the bending energy matrix is used instead of the Procrustes distance. Define the second measure of distance between two images /,(£) and fj(t), via warping one to the other, to be
5.3. Perturbed
Procrustes
9
averages
The optimum average image can be arrived at by perturbation of the Procrustes mean shape via the following optimisation procedure: Algorithm 1 1. Set A = Aj. 2. Obtain /z, the Procrustes mean shape of X i , . . . , Xn. 3. Obtain / = ± ]T" =1 * / 1 _ > X i ( / i ) . evaluate F. 4. Perturb fj, to give fj,pert. 5. Obtain / = £ £ " = i tfj^-oc^/i), evaluate Fnew for
Uperf
If F
new
< F accept /Xpert,
i.e set H=npert. 6. Repeat steps 3, 4 and 5 until F cannot be reduced further. 7. Set A = A J + i, repeat steps 2 to 6 until F cannot be reduced further. Here ^/i—x^/t) denotes the tth image warped to the mean shape, fj,. The final / is the average image. Note in Algorithm 1, /z is only changed if the objective function improves. This algorithm possesses an MCMC-like property, however unlike MCMC methods in this case only improvements due to perturbation are accepted. In this algorithm various choices of distance could be used. However, the objective function F must penalise affine transformations in the perturbation otherwise severe shearing occurs. This will not occur where the Procrustes distance is used. We propose Gaussian updates from a bivariate normal distribution on each landmark as the perturbation method. Mardia et al. 13 have used a systematic grid search for the choice of A.
PuifiJj) = E t C f i ^ - x ^ ) ) - fj(t))2+ A vec(X;) T Block(B(X.,), B(X J) ))vec(X i ), (6) where B(X.j) is the bending energy matrix for the deformation with Xj as the source. The measure of distance Pi given in equation -(6}-^ comprised of a similarity part and a distortion part. Again the parameter A determines the relative weighting between similarity and distortion. Note Pi does not account for any affine differences between images. This is due to the bending energy being affine invariant. Therefore, this is only appropriate where very little of the warp from one configuration to another can be accounted for by an affine transformation. As with Pi given in quantity (5), Pi in (6) does not detect size differences.
5.4. Example:
Fish
Here we consider photographic images obtained under controlled conditions of two species of fish , haddock and whiting. These are part of a data set of ten haddock and ten whiting images. Eight of each species are randomly selected as training data leaving two from each species as test data. In total sixteen corresponding landmarks are defined on each of the fish in this example (see Figures 3, 4). Effect on average of perturbation . The extent of the improvement in the perturbed average have been assessed under a criteria based on the the distance Pi from equation (5) and found Pi to be smaller.
10
Why Statistical Shape Analysis is Pivotal to the Modem Pattern Recognition?
Effect on discrimination of perturbation. Here we wish to assess if the perturbed averages improve the discrimination procedure. In order to do this, Mardia et al. 13 consider the studentised differences between Pi within species and Pi between species for A = 48.1 million. This value attains the optimal separation for the Procrustes'average images. Taking A = 48.1 million is not the optimal value to discriminate using the perturbed average, however parity enables comparison between the studentised differences. We find that the perturbed average gives a greater separation between species. Another example of reconstruction through a stochastic deformation template related to degratded fish is given in de Souza et al. 32 which may be applicable for other fish images - rather than photographs as here.
Fig. 3. Haddock 1.
Fig. 4. Whiting 1.
The details of the allocation for the training and test data are given in Mardia et al. 13 All the fish were correctly allocated.
6. Discussion We have given here a personal view of why shape analysis is pivotal to pattern recognition in a very broad sense. Still the field is young and we hope to see many new developments coming from interdisciplinary research. Mardia and Gills 3 3 have identified three themes for statistics in the 21st century. First, statistics should be viewed in the broadest way for scientific explanation or prediction of any phenomenon. Second, the future of statistics lies in a holistic approach to interdisciplinary research. Third, a change of attitude is required by statisticians - a paradigm shift - for the subject to go forward. References 1. Mardia, K.V. and Kanji, G. (eds.) (1993). Statistics and Images: Vol L Carfax Publishing Co. Ltd., Abingdon, Oxfordshire. 2. Mardia, K.V. (ed.) (1994). Statistics and Images: Vol. II Carfax Publishing Co. Ltd., Abingdon, Oxfordshire. 3. Dryden, I.L., Mardia, K.V. and Walder, A.N. (1997). Review of the use of context in statistical image analysis. Journal of Applied Statistics, 24, pp513-538. 4. Mardia, K.V. (1997). Bayesian image analysis. J. Theoretical Medicine, 1, pp63-77. 5. Glasbey, C.A. and Mardia, K.V. (1998). A review of image warping methods. J. Appl. Statist., 25, ppl§5171. 6. Mardia, K.V. (2001). Shapes in Images. In Pattern Recognition Prom Classical to Modern Approaches, edited by Pal, S.K. and Pal, A., World Scientific, ppl47-167. 7. Krim, H. and Yezzi, Jr. A. (eds). (2006). Statistics and Analysis of Shapes. BIrkhauser, Boston. 8. Dryden, I.L. and Mardia, K.V. (1998). Statistical Shape Analysis. J. Wiley, Chichester. 9. Cross, A. D. J. k Hancock, E. R. (1998). Graph matching with dual-step EM algorithm. IEEE Trans. Patt Anal Mach. Intell. 20, 1236-53. 10. Glasbey, C.A. and Mardia, K.V. (2001). A Penalized Likelihood Approach to Image Warping (with discussion). Journal of Royal Statistical Society, Series B, 63, pp465-514. 11. Richmond, N.J., Willett, P. and Clark, R.D. (2004). Alignment of three-dimensional molecules using an image recognition algorithm. Journal of Molecular Graphics and Modelling, 23, pp 199-209. 12. Green, P.J. and Mardia, K.V. (2006). Bayesian alignment using hierarchical models, with applications in protein bioinformatics. Biometrika, 9 3 , pp235-254. 13. Mardia, K.V., McDonnell, P. and Linney, A.D. (2006). Penalised image averaging and discriminations with facial and fishery applications. Journal of
Kanti V. Mardia Applied Statistics, 33, pp339-369. 14. Mardia, K.V., Angulo, J.M. and Goitia, A. (2006). Synthesis of image deformation strategies. Image and Vision Computing, 24, ppl-12. 15. Mardia, K.V. and Jupp, R E . (2000). Directional Statistics, 2nd edition. J. Wiley, Chichester. 16. Goodall, C.R. and Mardia, K.V. (1993), Multivariate aspects of shape theory. Ann. Statist., 2 1 , 848-866. 17. Mardia, K.V. and Patrangenaru, V. (2005). Directions and projective shapes. Annals of Statistics, 33, ppl666-1699. 18. Kent, J.T. and Mardia, K.V. (2006). A new representation for projective shape. LASR2006 Proceedings. Edited by: Barber, S., Baxter, P.D., Mardia, K.V. and Walls, R.E., Leeds University Press, pp75-78. 19. Chui, H. and Rangarajan, A. (2003). A new point matching algorithm for non-rigid registration. Computer Vision and Understanding, 89, 114-141. 20. Duta, N., Jain, A.K. and Mardia, K.V. (2002). Matching of palm prints. Pattern Recognition Letters, 2 3 , pp477-485. 21. Luo, B. and Hancock, E.R. (2001). Structural Matching using the EM algorithm and singular value decomposition. IEEE Trans. PAMI, 23, 1120-1136. 22. Walker, G. (2000). Robust, non-parametric and automatic methods for matching spatial point patterns. PhD thesis, University of Leeds. 23. Kent, J. T., Mardia, K. V. & Taylor, C. C. (2004). Matching problems for unlabelled configurations. In Bioinformatics, Images and Wavelets, Proceedings of LASR 2004, Ed. R. G. Aykroyd, S. Barber & K. V. Mardia, pp. 33-6. Leeds: Leeds University Press. 24. Dryden, I.L., Hirst, J.D. and Melville, J.L. (2006).
25.
26. 27.
28.
29.
30.
31.
32.
33.
11
Statistical analysis of unlabelled point sets: comparing molecules in chemoinformatics. Biometrics to appear. Horgan, G.W., Creasey, A. and Fenton, B. (1992). Superimposing two-dimensional gels to study genetic variation in malaria parasites. Electrophoresis, 13, 871-875. Toga, A.W. (1999). Brain Warping. Academic Press, San Diego. Hajnal, J.V., Hill, D.L.G. and Hawkes, D.J. (2001). (Eds) Medical Image Registration. CRC Press, London. Singh, A.J., Goldgof, D. and Terzopoulos, D. (1998). (Eds) Deformable Models in Medical Image Analysis. IEEE Computer Soc, Los Alamitos, California. Bookstein, F.L. (1991). Morphometric Tools for Landmark Data: Geometry and Biology. Cambridge University Press, Cambridge. Mardia, K.V., Bookstein, F.L., Kent, J.T. and Meyer, C.R. (2006b). Intrinsic random fields and image deformations. Journal of Mathematical Imaging and Vision, in press. de Souza, K.M.A., Kent, J.T., Mardia, K.V. and Glasbey, C. (2007). A compositional approach to multiscale deformation. In preparation. de Souza, K.M.A., Kent, J.T. and Mardia, K.V. (1999). Stochastic templates for Aquaculture Images and a Parallel Pattern Detector. J. Roy. Statist. Soc. C. 48, pp211-227. Mardia, K. V. and Gilks, W. (2005), "Meeting the statistical needs of 21st-century science," Significance, 2, 162-165.
PART B
Invited Lectures
15
On the Interest of Spatial Relations and Fuzzy Representations for Ontology-Based Image Interpretation
Isabelle Bloch, Celine Hudelot 1 and Jamal Atif2 Ecole Nationale Superieure des Telecommunications - GET-Telecom Paris CNRS UMR 5141 LTCI - Signal and Image Processing Department 46 rue Barrault, 75013 Paris, France E-mail: Isabelle.
[email protected] 1
Current address: Ecole Centrale Paris, Grande Voie des Vignes, 92 295 Chdtenay-Malabry E-main:
[email protected] 2
Cedex, France
Current address: Universite des Antilles et de la Guyane, Guyane, FVance E-main:
[email protected]
In this paper we highlight a few features of the semantic gap problem in image interpretation. We show that semantic image interpretation can be seen as a symbol grounding problem. In this context, ontologies provide a powerful framework to represent domain knowledge, concepts and their relations, and to reason about them. They are likely t o be more and more developed for image interpretation. A lot of image interpretation systems rely strongly on descriptions of objects through their characteristics such as shape, location, image intensities. However, spatial relations are very important too and provide a structural description of the imaged phenomenon, which is often more stable and less prone to variability than pure object descriptions. We show that spatial relations can be integrated in domain ontologies. Because of the intrinsic vagueness we have to cope with, at different levels (image objects, spatial relations, variability, questions to be answered, etc.), fuzzy representations are well adapted and provide a consistent formal framework to address this key issue, as well as the associated reasoning and decision making aspects. Our view is that ontology-based methods can be very useful for image interpretation if they are associated to operational models relating the ontology concepts to image information. In particular, we propose operational models of spatial relations, based on fuzzy representations. Keywords: Image interpretation, semantic gap, ontology, spatial relations, fuzzy sets, fuzzy relations, brain imaging.
1. Introduction The literature acknowledges several attempts towards formalization of some domains. For instance in medicine, noticeable efforts have led to the development of the Neuronames Brain Hierarchy(http: / / bralninfo.rprc.washington.edu/) and the Foundational model of anatomy (FMA) ( h t t p : / / s i g . biostr.Washington.edu/projects/ fm/AboutFM.html)at the University of Washington, or Neuranat (http://www.chups.jussieu.fr/ext/ neuranat)in Paris at CHU La Pitie-Salpetriere. Generic formalizations of spatial concepts were also developed and specified in different fields, for spatial reasoning in artificial intelligence, for Geographic Information Systems, etc. In a parallel domain, well formalized theories for image processing and recognition appeared in the image and computer vision community. Noticeably, both types of developments still remain quite disjoint and very few approaches try to use the abstract formalizations to guide image in-
terpretation. The main reason is to be found in the so called "semantic gap", expressing the difficulty to link abstract concepts with image features. This problem is also related to the symbol grounding problem. In this paper we highlight a few features of the semantic gap problem in image interpretation. We show that semantic image interpretation can be seen as a symbol grounding problem in Section 2. In this context, ontologies provide a powerful framework to represent domain knowledge, concepts and their relations, and to reason about them. Therefore, they are likely to be more and more developed for image interpretation. We briefly explain the potentials of ontologies towards this aim in Section 3. A lot of image interpretation systems rely strongly on descriptions of objects through their characteristics such as shape, location, image intensities. However, spatial relations are very important too, as explained in Section 4, and provide a structural description of the imaged phenomenon, which is often more stable and less prone to variability than pure object descriptions. We show
16
On the Interest of Spatial Relations and Fuzzy Representations
that spatial relations can be integrated in domain ontologies. Because of the intrinsic vagueness we have to cope with, at different levels (image objects, spatial relations, variability, questions to be answered, etc.), fuzzy representations are well adapted and provide a consistent formal framework to address this key issue, as well as the associated reasoning and decision making aspects. This question is addressed in Section 5. Our view is that ontology-based methods can be very useful for image interpretation if they are associated to operational models of spatial relations (and other concepts), in particular based on fuzzy representations. These operational models contribute to reduce the semantic gap. We provide some hints on this integration in Section 6. As a typical application where all these issues are raised, we illustrate our purpose with examples in brain image interpretation. 2. Semantic gap in image interpretation and symbol grounding The symbol grounding problem has been first introduced in artificial intelligence by Harnad in,1 as an answer to the famous Searle's criticisms of artificial systems.2 It is defined in1 through the fundamental question: How is symbol meaning to be grounded in something other than just more meaningless symbols. As underlined in the literature, symbol grounding is still an unsolved problem (see e.g.3). In the robotics community, this problem was addressed as the anchoring problem: 4 a special form of symbol grounding needed in robotic systems that incorporate a symbolic component and a reasoning process. The anchoring process is defined as the problem of creating and maintaining the correspondence between symbols and sensor data that refer to the same physical object. In our case, artificial systems are not robotic systems but image interpretation systems. As the former, they incorporate a symbolic component. Some similarities between Anchoring and Pattern Recognition have been underlined in,5 in order to assess the potentiality of using ideas and techniques from anchoring to solve the pattern recognition problem and vice versa. Similarly, we argue that image interpretation could greatly benefit from such a correspondence. Indeed, the image interpretation problem can be defined as the automatic extraction of the meaning of an image. The image semantics cannot be
for Ontology-Based Image
Interpretation
considered as being included explicitly in the image itself. It rather depends on prior knowledge on the domain and the context of the image. It is therefore necessary to ground the digital representation of an image (perceptual level) with the semantic interpretation that a user associates to it (linguistic level). In the image indexing and retrieval community, this problem is called the semantic gap problem, i.e. the lack of coincidence between the information that one can extract from the visual data and the interpretation of these data by a user in a given situation.6 Our view is that image interpretation can be seen as a symbol grounding problem, i.e. the dynamical process of associating image data to human interpretations by taking into account the influence of external factors such as the social environment (application domain, interpretation goal, ...) or the physical environment of the interpretation. Indeed, image interpretation is the process of finding semantics and symbolic interpretations of image content. This problem has the same nature as the physical grounding of linguistic symbols in visual information in the case of natural language processing systems. 7 ' 8 In our case, linguistic symbols are application domain concepts defined by their linguistic names and their definition. Example: In cerebral image interpretation, concepts can be: brain: part of the central nervous system located in the head, caudate nucleus: a deep gray nucleus of the telencephalon involved with control of voluntary movement, glioma: tumor of the central nervous system that arises from glial cells,... Rather than being constrained by a grammar and a syntax as in a formal or natural language, the concepts are organized in a semantic knowledge base which describes their semantics and their hierarchical and structural dependencies. Example: The human brain is a structured scene and spatial relations are highly used in the anatomical brain description (e.g. the left thalamus is to the left of the third ventricle and below the lateral ventricle). This structural component, in the form of spatial relations, plays a major role in image interpretation. This aspect is detailed in Section 4. Ontologies are useful to represent the semantic knowledge base. They entail some sort of the world view, i.e. a set of concepts, their definitions and their relational structure which can be used to describe and reason about a domain. This aspect is detailed
Isabelle Block, Celine Hudelot and Jamal Atif
in Section 3. As underlined by Cangelosi in,9 a symbol grounding mechanism, as language itself, has both an individual and a social component. The individual component called Physical Symbol Grounding refers to the ability for a system to create an intrinsic link between perceptions and symbols. The Social Symbol Grounding refers to the ability to communicate with other systems by the creation of a shared lexicon of perceptually-grounded symbols. It is strongly related to the research on human language origins and evolution where external factors such as cultural and biological evolution are primordial. N.tii
»:
u\im\ l.*t Mumliwi .-„.•*. l . \ .
.»•
i i . , f >•-,
<•* t
livMiK^V.S."' * * ^ 9X >¥*>•)?
Semantic Kmnv-Sedge 1 brain 1
•
••'•*;•
denotes
1
1
.]
'
t
1
1
1 real world
/•V">
perceptions by visual sensors
' (fvujlfiiiig;
Visual Percepts Image pixels regions descriptors..
Fig. 1. Physical and external symbol grounding for image interpretation.
In the case of image interpretation systems, these two components of the symbol grounding are also essential and take the following form: on the one hand, the physical symbol grounding consists of the internal creation of the link between visual percepts (image level) and a known semantic model of the part of the real world which concerns the application domain (domain semantic level). On the other hand, in order to enable communication and interoperability with humans or other systems, this grounded interpretation must capture a consensual information accepted by a group. As a consequence a social external symbol grounding component raises for image interpretation. Moreover, image interpretation systems operate in a dynamic environment which is prone to changes and variations. The interpretation process is highly influenced by external factors such as the environmental context, the perception system
17
or the interpretation goal and it has to adapt itself to these external factors. As a consequence, image interpretation is a distributed and adaptive process between physical symbol grounding and external symbol grounding as shown in Figure 1. 3. Ontologies for image interpretation In knowledge engineering, an ontology is defined as a formal, explicit specification of a shared conceptualization.10 An ontology encodes a partial view of the world, with respect to a given domain. It is composed of a set of concepts, their definitions and their relations which can be used to describe and reason about a domain. Ontological modeling of knowledge and information is crucial in many real world applications such as medicine for instance.11 Let us mention a few existing approaches involving jointly ontologies and images. By using ontologies, the physical symbol grounding consists in ontology grounding, 12 i.e. the process of associating abstract concepts to concrete data in images. This approach is considerably used in the image retrieval community to narrow the semantic gap. In, 13 the author proposes to ground, in the image domain, a query vocabulary language used for content-based image retrieval using supervised machine learning techniques. A supervised photograph annotation system is described in, 14 using an annotation ontology describing the structure of an annotation, irrespectively of the application domain, and a second ontology, specific to the domain, which describes image contents. Another example concerns medical image annotation, in particular for breast cancer,15 and deals mainly with reasoning issues. But image information is not direcly involved in these two systems. Other approaches propose to ground intermediate visual ontologies with low level image descriptors, 16-18 and are therefore closer to the image interpretation problem. In, 19 the enrichment of the Wordnet lexicon by mapping its concepts with visual-motor information is proposed. As the main ontology language OWL is based on description logics, a usual way to implement the grounding between domain ontologies (or visual ontologies) and image features is the use of concrete domains as shown in Figure 2. Description logics20 are a family of knowledgebased representation systems mainly characterized
18
On the Interest of Spatial Relations and Pazzy Representations
Constructor atomic concept individual Top Bottom atomic role conjunction disjunction negation existential restriction universal restriction value restriction number restriction
Table 1.
Description logics syntax and interpretation.
Syntax A a T
Example Human Lea Thing Nothing has-age Human n Male Male U Female -i Human 3has-child.Girl Vhas-child. Human 9has-child.{Lea} (> 3 has-child) (< 1 has-mother) Man C Human Father = Man n 3 has-child.Human John:Man (John,Helen):has-child
JL
r CUD CUD -rC
Subsumption Concept definition
3r.C Vr.C 9 r.{A] (>nR) (
Concept assertion Role assertion
a:C (a,b) : R
Abstract domain Rose R3: hasCoior
Concrete domain (image) Pink (RGB values)
(shape descriptors)
Fig. 2. tion.
for Ontology-Based Image
Importance of concrete domains in image interpreta-
by a a set of constructors that enable to build complex concepts and roles from atomic ones. A semantics is associated with concepts, roles and individuals using an interpretation J = (A 1 , - 1 ), where A 1 is a non empty set and -J is an interpretation function that maps a concept C to a subset CJ of A 1 or a role r to a subset R1 of A J x A z . Concepts correspond to classes. A concept C represents a set of individuals (a subset of the interpretation domain). Roles are binary relations between objects. Table 1 describes the main constructors and a syntax for description logics. Concrete domains are expressive means of description logics to describe concrete properties of real world objects such as their size, their spatial extension or their color. They are of particular interest for image interpretation, as illustrated in Figure 2. Indeed, they allow performing anchoring for a particular application, hence reducing the semantic gap. This grounding approach using description log-
Interpretation
Semantics A1 CAX a? € A 1 TI = AI
±.x = $x
Rx C A 1 x Ax x c Xx n D X
C UD AX\CX {x € A1 | 3y € A1 : (x,y) {x 6 A z | Vj/ € A z : (x, y) {x € Ax | 3y € A2" : (x, y) { x € A x | Klrlfoy)
{xeA1!
€ Rx A y € C 1 } € RT =s- y £ C1} € flx => y = a x } €«*}!>*}
\{y\(x,y)eRx}\
C1 CD* CX = DX ax €CX (a ,bx)eRx x
ics and concrete domains has been used by several authors 21,22 for the automation of semantic multimedia annotation. 4. Importance of spatial relations Spatial relations between objects of a scene or an image is of prime importance, as highlighted in different domains, such as perception, cognition, spatial reasoning, Geographic Information Systems, computer vision. In particular, the spatial arrangement of objects provides important information for recognition and interpretation tasks, in particular when the objects are embedded in a complex environment like in medical or remote sensing images. 23 ' 24 Human beings use extensively spatial relations in order to describe, detect and recognize objects: they allow to solve ambiguity between objects having a similar appearance, and they are often more stable than characteristics of the objects themselves (this is typically the case of anatomical structures). Many authors have stressed the importance of topological relations, but distances and directional relative position are also important, as well as more complex relations such as "between", "surround", "among", etc. Freeman25 distinguishes the following primitive relations: left of, right of, above, below, behind, in front of, near, far, inside, outside, surround. Kuipers 24,26 considers topological relations (set relations, but also adjacency which was not considered by Freeman) and metrical relations (distances and directional relative position).
Isabelle Block, Celine Hudelot and Jamal Atif
Spatial reasoning can be defined as the domain of spatial knowledge representation, in particular spatial relations between spatial entities, and of reasoning on these entities and relations (hence the importance of relations). This field has been largely developed in artificial intelligence, in particular using qualitative representations based on logical formalisms. In image interpretation and computer vision, it is much less developed and is mainly based on quantitative representations. In most domains, one has to be able to cope with qualitative knowledge, with imprecise and vague statements, with polysemy, etc. This calls for a common framework which is both general enough to cover large classes of problems and potential applications, and able to give raise to instantiations adapted to each particular application. Ontologies appear as an appropriate tool towards this aim. This shows the interest of associating ontologies and spatial relations for symbol grounding and image interpretation. Figure 3 illustrates a part of an ontology of spatial relations.27
Spatial Relation
X Topological Relation
Adjacent;
Metric Relation
Directional Relation
19
cal knowledge.30"32 All these ontologies concentrate on the representation of spatial concepts according to the application domains. They do not provide an explicit and operational mathematical formalism for all the types of spatial concepts and spatial relations. For instance, in medicine, these ontologies are often restricted to concepts from the mereology theory. 31 They are therefore useful for qualitative and symbolic reasoning on topological relations but there is still a gap to fill before using them for image interpretation. Example: internal brain structures are often described trough their spatial relations, such as: the left caudate nucleus is inside the left hemisphere; it is close to the lateral ventricle; it is outside (left of) the left lateral ventricle; it is above the thalamus, etc. In case of pathologies, these relations are quite stable, but more flexibility should be allowed in their semantics.33 This example raises the problem of assigning semantics to these spatial relations, according to the application domain: what do concepts such as "close to" or "left" mean when dealing with brain images? Should this meaning be adapted depending on the context (possible pathology, etc.)? These questions can be addressed by using fuzzy models.
Distance Relation
5. I m p o r t a n c e of fuzzy representations Binary Directional Relation
Ternary Directional Relation
Right to 11 Left to | |ln Front otj
Close to
Far from
hierarchy relation
Fig. 3. Excerpt of the hierarchical organization of spatial relations in the ontology of.27
As mentioned in, 28 several ontological frameworks for describing space and spatial relations have been developed recently. In spatial cognition and linguistics, the project OntoSpace(http://www. ontospace.uni-bremen.de/ twiki/bin/view/Main/WebHome) aims at developing a cognitively-based commonsense ontology for space. Some interesting works on spatial ontologies can also be found in Geographic Information Science29 or in medicine concerning the formalization of anatomi-
Usually vision and image processing make use of quantitative representations of spatial relations. In a purely quantitative framework, spatial relations are well defined for some classes of relations, unfortunately not for intrinsically vague relations (such as directional ones for instance). Moreover they need a precise knowledge of the objects and of the types of questions we want to answer. These two constraints can be relaxed in a semi-qualitative framework, using fuzzy sets. This allows to deal with imprecisely defined objects, with imprecise questions such as are these two objects near to each other?, and to provide evaluations that may be imprecise too, which is useful for several applications, where spatial reasoning under imprecision has to be considered. Note that this type of question also raises the question of polysemy, hence the need for semantics adapted to the domain. This is an important question to be solved in the symbol grounding and semantic gap problems. Fuzzy set theory finds in spatial information pro-
20
On the Interest of Spatial Relations and Fuzzy Representations
cessing a growing application domain. This may be explained not only by its ability to model the inherent imprecision of such information (such as in image processing, vision, mobile robotics...) together with expert knowledge, but also by the large and powerful toolbox it offers for dealing with spatial information under imprecision. This is in particular highlighted when spatial structures or objects are directly represented by fuzzy sets. If even less information is available, we may have to reason about space in a purely qualitative way, and the symbolic setting is then more appropriate. In artificial intelligence, mainly symbolic representations are developed and several works addressed the question of qualitative spatial reasoning (see34 for a survey). For instance in the context of mereotopology, powerful representation and reasoning tools have been developed, but are merely concerned by topological and part-whole relations, not by metric ones. Limitations of purely qualitative spatial reasoning have already been stressed in, 35 as well as the interest of adding semiquantitative extension to qualitative value (as done in the fuzzy set theory for linguistic variables36'37) for deriving useful and practical conclusions (as for recognition). Purely quantitative representations are limited in the case of imprecise statements, and of knowledge expressed in linguistic terms. As another advantage of fuzzy representations, both quantitative and qualitative knowledge can be integrated, using semi-quantitative (or semi-qualitative) interpretation of fuzzy sets. These representations can also cope with different levels of granularity of the information, from a purely symbolic level, to a very precise quantitative one. As already mentioned in, 25 this allows us to provide a computational representation and interpretation of imprecise spatial constraints, expressed in a linguistic way, possibly including quantitative knowledge. Therefore the fuzzy set framework appears as a central one in this context. Several spatial relations have led to fuzzy modeling, as reviewed in. 23 Spatial reasoning aspects often imply the combination of various types of information, in particular different spatial relations. Again, the fuzzy set framework is appropriate since it offers a large variety of fusion operators 38 ' 39 allowing for the combination of heterogeneous information (such as spatial relations with different semantics) according to different fusion rules, and without any assumption on an underlying
for Ontology-Based Image
Interpretation
metric on the information space. They also apply on various types of spatial knowledge representations (degree of satisfaction of a spatial relation, fuzzy representation of a spatial relation as a fuzzy interval, as a spatial fuzzy set, etc.). These operators can be classified according to their behavior, the possible control of this behavior according to the information to combine, their properties, and their specificities in terms of decision.40 For instance, if an object has to satisfy, at the same time, several spatial constraints expressed as relations to other objects, the degrees of satisfaction of these constraints will be combined in a conjunctive manner, using a t-norm. If the constraints provide a disjunctive information, operators such as t-conorms are then appropriate. It is the case for example for symmetrical anatomical structures that can be found in the left or right parts of the human body. Operators with variable behavior, as some symmetrical sums, are interesting if the aim is a reinforcement of the dynamics between low degrees and high degrees of satisfaction of the constraints. In particular, this facilitates the decision since different situations will be better discriminated. Let us come back to ontologies from the point of view of uncertain knowledge and imprecise information. A major weakness of usual ontological technologies is their inability to represent and to reason with uncertainty and imprecision. As a consequence, extending ontologies in order to cope with these aspects is a major challenge. This problem has been recently stressed out in the literature. Several approaches have been proposed to deal with uncertainty and imprecision in ontology engineering tasks. 41 ' 42 T h e first approach is based on probabilistic extensions of the standard OWL ontology language(http: / / www.w3.org/TR/owl-features/) by using Bayesian networks.43'44 The probabilistic approach proposes to first enhance the OWL language to allow additional probabilistic markups and then to convert t h e probabilistic OWL ontology into the directed acyclic graph of a Bayesian network with translation rules. As the main ontology language OWL is based on description logics,20 another approach to deal with uncertainty and imprecision is to use fuzzy description logics. 45-48 Fuzzy description logics can be classified according to the way fuzziness is introduced into the description logics formalism. A good review can be found in.49 In particular, a common way for description logics with concrete domains is to intro-
Isabetle Block, Celine Hudelot and Jamal Atif
duce fuzziness by using fuzzy predicates in concrete domains as described in. 50 Another approach is to introduce fuzziness directly in the concrete domains, which then become fuzzy concrete domains. This is particularly interesting for image interpretation. Example: Using fuzzy representations of spatial relations in the image domain leads to restricted search area for the caudate nucleus, based on the knowledge that it is to the right and close to the lateral ventricles. This is illustrated in Figure 4-
21
advantages: • they allow representing the imprecision which is inherent to the definition of a concept; for instance, the concept "close to" is intrinsically vague and imprecise, and its semantics depends on the context in which objects are embedded, on the scale of the object and on their environment; • they allow managing imprecision related to the expert knowledge in the concerned domain; • they constitute an adequate framework for knowledge representation and reasoning, reducing the semantic gap between symbolic concepts and numerical information. 6. Towards the integration of ontologies, spatial relations and fuzzy models
(a)
(b
(c)
(d)
Fig. 4. (a) The right ventricle is superimposed on one slide of the original image (MM here). The search space of the object "caudate nucleus" corresponds to the conjunctive fusion of the spatial relations "to t h e right of t h e right ventricle" (b) and "close t o t h e right ventricle" (c). The fusion result is shown in (d).
Example: Typically brain image interpretation may have to cope with abnormalities such as tumors. Our system allows instantiating generic knowledge expressed in the ontology to adapt to the specific patient 's case. The fuzzy representations provide an efficient way to represent inter-individual variability, which are a key point in such situations. They can be further revised or specified according to the visual features extracted from the image and matched with the symbolic representation. Using fuzzy representations, it is possible to deal with such cases, for instance by enlarging the areas where an object can be found, which amounts to relax the definition of the fuzzy relation. In summary, fuzzy representations have several
To conclude this presentation, we summarize ongoing developments carried out in our team, towards the construction of a spatial relation ontology enhanced with fuzzy representations and its use for image interpretation. This work aims at integrating all important features underlined in this paper. A global scheme of our approach is provided in Figure 5. Our recent work addresses the important problems highlighted in this paper in several ways. 27,52 We propose to reduce the semantic gap between numerical information contained in the image and higher level concepts by enriching ontologies with a fuzzy formalism layer. More specifically, we introduce an ontology of spatial relations and propose to enrich it by fuzzy representations of these relations in the spatial (image) domain. The choice of spatial relations is motivated on the one hand by the importance of structural information in image interpretation, and on the other hand by the intrinsically ambiguous nature of most spatial relations. This ontology has been linked to the part of FMA related to brain structures, as illustrated in Figure 5. As another contribution, this enriched ontology can support the reasoning process in order to recognize structures in images, in particular in medical imaging. Different types of reasoning become then possible: (i) a quite general reasoning may consist in classifying or filtering ontological concepts to answer some queries; (ii) at a more operational way, the ontology and the fuzzy representations can be used to deduce spatial reasoning operations in the images and to guide image interpretation tasks such
22
On the Interest of Spatial Relations and Fuzzy Representations
for Ontology-Based Image
Interpretation
J- M Mt.yl
generic knowledge H'*!••.'!
m.iti.r.-':
JV.'"J
iil'U-.tUTil
it'c.cipti
Knowledge of
• T ^ ^ l
llfMltr,
Pathoiuqr.il] ;.rt:.o^
»pocifi': CHSOS
F.rmn inm'.'
r i
In'-I'iavnii lunnis
Or i-i-w -ibifti
L-.J
>n
IP:-!.'. Graph based representation of the generic model G
f?*fl
Leammg jjroesdofn ji*] I j i j | j }
Stepl:
Step 2:
learning spatial relations (adjacency, distance, orientation) of the generic model using healthy cases
• learning spatial relation for specif cases * deducing stable relations for each class of patholog *".
Fuzzy modeling of spatial relations
Hi&ry *wmtHfafte
Dealing w i t h s specific
•PI Hi
Generic model adaptation using knowledge of specific case and results of the teaming procedure
^ ^ ^ V^ W #
•^•Sil] - . i ' S l / l '•,
kifo JfkA* /?*&
fe5
Graph based propagation process to update the graph and to represent the tumor impact on the surrounding structures
Enrichment of the
* :••• pfcirtiis'lliSHf » i
""
v : 3 , S p a t i a l relation ^ j ontology concepts
Spatial relations between anatomical concepts
faiit!?I' njrl*-js 3pita. patialPclatipnRight Of PJJH Uwal tnt Kit 3pllu<SFMdPe!£ nCIa t TofigsI La'rabtntn k
Brain a n a t o m v
srniasspaiaipda HADO * of n ^ht ih jam
concepts
Fig. 5. Overview of our framework. Ontological engineering is used to represent the symbolic knowledge useful to interpret cerebral images. In particular, a spatial relation ontology is used to enrich the brain ontology by the description of the spatial structure of the brain. A graph based representation of the brain including learned fuzzy representations of spatial relations is derived from the generic model and from an image database. This graph is used to guide the segmentation and the recognition of cerebral structures. This framework is also useful to deal with pathological cases by an adaptation of the knowledge and the reasoning process. The second scheme displays a part of an ontology of brain anatomy (excerpt of the FMA 5 1 ) enhanced with our fuzzy spatial relations ontology. The concepts of the spatial relation ontology are prefixed by p i .
as localization of objects, segmentation, and recognition. An illustration is provided in Figure 6 for the
recognition of internal brain structures. Another enrichment of the model consists of the representation of domain knowledge by graphs,
Isabelle Bloch, C&line Hudelot and Jamal Atif 23 DomaiR Ont&ftfgj-
'>
.-7-
Image domain
5
if ^
^-T--
\ *"*"'
w
":"
"
l
"" "
"™
"
'
)
< Fact Base C"!; 5«<jt^,si«frii, ytoiwrews (\ lifcsJKfc JU^^WSrtftto^lattHftjtK *; K?: *3rsy, »Kttfw jx*HisH»wfi< P v&, ij> ^ua^^ojthias^iAMa&ffi^ C^. 'on^^flaatf! ^tj&HHjsns", fla<* 1WJ«X^„^aR'*t*J,.-^3ftKa»i
* a . *Siai i w
*fr\i
9lh? ^!Sd
/
1
/"• I
R4
R2
/
i
m \ \ \\ \ R3
semantic gap and to answer some symbol grounding questions, which are difficult and still open problems in image interpretation. It provides tools both for knowledge acquisition and representation and for its operational use. It has an important potential in model-based recognition that deserves to be further explored, in particular for medical image interpretation. The framework described in this section focuses on spatial relations, but similar principles can be applied to other types of information t h a t could be involved in image interpretation.
Rl
Fig. 6. The right lateral ventricle corresponds to the spatial region Rl in the image. The domain ontology describes spatial relations between several grey nuclei and the lateral ventricles. These relations are exploited to identify each individual structure. which include fuzzy models of spatial relations, used to guide the recognition of individual structures in images. 5 3 The inclusion of such structural models, as intermediate representation domains between symbols and images, deals with the physical symbol grounding problem, and also contributes to reduce the semantic gap. However pathological cases may deviate substantially from generic knowledge. We propose to adapt the knowledge representation to take into account the possible influence of pathologies on the spatial organization, based on learning procedures. We also adapt the reasoning process, based on graph based propagation and updating. These features of our approach are detailed in. 5 2 A result is illustrated in Figure 7.
Fig. 7. An axial slice of a 3D MRI, with segmented tumor and some anatomical structures. The enriched ontology contributes to reduce the
A c k n o w l e d g m e n t s : This work has been partly supported by grants from Region Ile-de-France, G E T and ANR. References 1. S. Harnad, Physica 42, 335 (1990). 2. J. Searle, Behavioral and Brain Sciences 3, 417 (1980). 3. M. Taddeo and L. Floridi, Journal of Experimental and Theorical Artificial Intelligence (2006). 4. S. Coradeschi and A. Saffiotti, Robotics and Autonomous Systems 43, 85 (2003), Special issue on perceptual anchoring. Online at http://www.aass.oru.se/Agora/RAS02/. 5. I. Bloch and A. Saffiotti, Some similarities between anchoring and pattern recognition concepts, in AAAI Fall Symposium on Anchoring Symbols to Sensor Data in Single and Multiple Robots Systems, 2001. 6. A. Smeulders, M. Worring, S. Santini, A. Gupta and R. Jain, IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 1349 (2000). 7. D. Roy, Computer Speech and Language 16 (2002). 8. P. Vogt, Artificial Intelligence 167, 206 (2005). 9. A. Cangelosi, Pragmatics and Cognition, Special Issue on Distributed Cognition (2006), in press. 10. T. R. Gruber, Towards Principles for the Design of Ontologies Used for Knowledge Sharing, in Formal Ontology in Conceptual Analysis and Knowledge Representation, eds. N. Guarino and R. Poli (Kluwer Academic Publishers, Deventer, The Netherlands, 1993). 11. P. Zweigenbaum, B. Bachimont, J. Bouaud, J. Charlet and J. Boisvieux, Meth Inform Med 34, p. 2 (1995). 12. A. Jakulin and D. Mladenic, Ontology grounding, in Conference on Data Mining and Data Warehouses, (Ljubljana, Slovenia, 2005). 13. C. Town, Machine Vision and Applications (2006). 14. A. Schreiber, B. D. ans J. Wielemaker and B. Wielinga, IEEE Intelligent Systems 16, 66 (2001). 15. B. Hu, S. Dasmahapatra, P. Lewis and N. Shadbolt,
24
16.
17.
18.
19.
20.
21. 22.
23. 24. 25. 26. 27.
28.
29.
30.
31. 32.
On the Interest of Spatial Relations and Fuzzy Representations for Ontology-Based Image
Ontology-based medical image annotation with description logics, in 15th IEEE International Conference on Tools with Artificial Intelligence, 2003. C. Hudelot, N. Maillot and M. Thonnat, Symbol grounding for semantic image interpretation: from image data to semantics, in Proceedings of the Workshop on Semantic Knowledge in Computer Vision, ICCV, (Beijing, China, 2005). W. Z. Mao and D. A. Bell, Integrating visual ontologies and wavelets for image content retrieval, in DEXA Workshop, 1998. V. Mezaris, I. Kompatsiaris and M. G. Strintzis, Eurasip Journal on Applied Signal Processing 2004, 886 (2004). G. Guerra-Filho and Y. Aloimonos, Towards a sensorimotor wordnet: Closing the semantic gap, in Proceedings of the Third International Wordnet Conference, January 2006. F. Baader, D. Calvanese, D. McGuinness, D. Nardi and P. Patel-Schneider, The Description Logic Handbook: Theory, Implementation and Applications (Cambridge University Press, 2003). M. Aufaure and H. Hajji, Multimedia Information Systems , 38 (2002). K. Petridis, D. Anastasopoulos, C. Saathoff, N. Timmermann, I. Kompatsiaris and S. Staab, Engineered Applications of Semantic Web Session (SWEA) at the 10th International Conference on KnowledgeBased & Intelligent Information & Engineering Systems (KES 2006) (2006). I. Bloch, Image and Vision Computing23, 89 (2005). B. J. Kuipers and T. S. Levitt, AI Magazine 9, 25 (1988). J. Freeman, Computer Graphics and Image Processing 4, 156 (1975). B. Kuipers, Cognitive Science 2, 129 (1978). C. Hudelot, J. Atif and I. Bloch, Ontologie de relations spatiales floues pour l'interpretation d'images, in Rencontres francophones sur la Logique Floue et ses Applications, LFA 2006, (Toulouse, France, 2006). J. Bateman and S. Farrar, Towards a generic foundation for spatial ontology, in International Conference on Formal Ontology in Information Systems (FOIS2004), (Trento, Italy, 2004). R. Casati, B. Smith and A. Varzi, Ontological Tools for Geographic Representation, in Formal Ontology in Information Systems, ed. N. Guarino (IOS Press, Amsterdam, 1998) pp. 77-85. O. Dameron, Symbolic model of spatial relations in the human brain, in Mapping the Human Body: Spatial Reasoning at the Interface between Human Anatomy and Geographic Information Science, (University of Buffalo, USA, 2005). M. Donnelly, T. Bittner and C. Rosse, Artificial Intelligence in Medicine 36, l(January 2006). S. Schulz, U. Hahn and M. Romacker, Modeling anatomical spatial relations with description log-
33.
34.
35. 36. 37.
38. 39.
40. 41.
42. 43.
44.
45. 46.
47.
48.
49.
Interpretation
ics, in Annual Symposium of the American Medical Informatics Association. Converging Information, Technology, and Health Care (A MIA 2000), (Los Angeles, CA, 2000). J. Atif, H. Khotanlou, E. Angelini, H. Duffau and I. Bloch, Segmentation of Internal Brain Structures in the Presence of a Tumor, in MICCAI, (Copenhagen, 2006). L. Vieu, Spatial Representation and Reasoning in Artificial Intelligence, in Spatial and Temporal Reasoning, ed. O. Stock (Kluwer, 1997) pp. 5-41. S. Dutta, International Journal of Approximate Reasoning 5, 307 (1991). L. A. Zadeh, Information Sciences 8, 199 (1975). D. Dubois and H. Prade, Fuzzy Sets and Systems: Theory and Applications (Academic Press, NewYork, 1980). D. Dubois and H. Prade, Information Sciences 36, 85 (1985). D. Dubois, H. Prade and R. Yager, Merging Fuzzy Information, in Handbook of Fuzzy Sets Series, Approximate Reasoning and Information Systems, eds. J. Bezdek, D. Dubois and H. Prade (Kluwer, 1999) I. Bloch, IEEE Transactions on Systems, Man, and Cybernetics 26, 52 (1996). P. da Costa et al. (Eds), Proceedings of the ISWC Workshop on Uncertainty Reasoning for the Semantic Web, 2005). E. Sanchez (ed.), Fuzzy Logic and the Semantic Web (Elsevier, 2006). Z. Ding, Y. Peng and R. Pan, A Bayesian Approach to Uncertainty Modelling in OWL Ontology, in International Conference on Advances in Intelligent Systems-Theory and Applications (AISTA20O4), (Luxembourg-Kirchberg, Luxembourg, 2004). Y. Yang and J. Calmet, Ontobayes: An ontologydriven uncertainty model, in International Conference on Intelligent Agents, Web Technology and Internet Commerce (IAWTIC05), 2005. S. Holldobler, T. Khang and H. Storr, Proceedings InTech/VJFuzzy 2002, 25 (2002). Y. Li, B. Xu, J. Lu, D. Kang and P. Wang, A family of extended fuzzy description logics, in 29th Annual International Computer Software and Applications Conference (COMPSAC'05), (IEEE Computer Society, Los Alamitos, CA, USA, 2005). G. Stoilos, G. Stamou and J. Pan, Handling imprecise knowledge with fuzzy description logic, in International Workshop on Description Logics (DL 06), (Lake District, UK, 2006). U. Straccia, Description logics with fuzzy concrete domains, in 21st Conference on Uncertainty in Artificial Intelligence (UAI-05), eds. F. Bachus and T. Jaakkola (AUAI Press, Edinburgh, Scotland, 2005). M. d'Aquin, J. Lieber and A. Napoli, Etude de quelques logiques de description floues et de formalismes apparentes, in Rencontres Francophones sur la
Isabelle Bloch, Celine Hudelot and Jamal Atif Logique Flout et ses Applications, (Nantes, France, 2004). 50. U. Straccia, A fuzzy description logic for the semantic web, in Fuzzy Logic and the Semantic Web, ed. E. SanchezCapturing Intelligence (Elsevier, 2006) pp. 73-90. 51. C. Rosse and J. L. V. Mejino, Journal of Biomedical Informatics 36, 478 (2003).
25
52. J. Atif, C. Hudelot, I. Bloch and E. Angelini, From Generic Knowledge to Specific Reasoning for Medical Image Interpretation using Graph-based Representations, in International Joint Conference on Artificial Intelligence IJCAI'07, (Hyderabad, India, 2007). 53. O. Colliot, O. Camara and I. Bloch, Pattern Recognition 39, 1401 (2006).
26
A Technique for Creating Probabilistic Spatio-Temporal Forecasts
V. Lakshmanan University of Oklahoma and National Severe Storms Laboratory E-mail: lakshmanOou.edu Kiel Ortega School of Meteorology University of Oklahoma
Keywords: Probability; Atmospheric Science
Probabilistic forecasts can capture uncertainty better and provide significant economic benefits because the users of the information can calibrate risk. For forecasts in earth-centered domains to be useful, the forecasts have to be clearly demarcated in space and time. We discuss the characteristics of a good probability forecast - reliability and sharpness - and describe well-understood techniques of generating good probability forecasts in the literature. We then describe our approach to addressing the problem of creating good probabilistic forecasts when the entity to be forecast can move and morph. In this paper, we apply the technique to severe weather prediction by formulating the weather prediction problem to be one of estimating the probability of an event at a particular spatial location within a given time window. T h e technique involves clustering Doppler radar-derived fields such as low-level shear and reflectivity to form candidate regions. Assuming stationarity, the spatial probability distribution of the regions is estimated, conditioned based o n the level of organization within the regions and combined with the probability that a candidate region becomes severe. For example, the probability that a candidate region becomes tornadic is estimated using a neural network with a sigmoid output node and trained on historical cases.
1. Motivation A principled estimate of the probability that a threat will materialize can be more useful than a binary yes/no prediction because a binary prediction hides the uncertainty inherent in the data and predictive model from users who will make decisions based on that prediction. A principled probabilistic prediction can enable users of the information to calibrate their risk and can aid decision making beyond what simple binary approaches yield.1 Probabilistic forecasts can capture uncertainty better and provide significant economic benefits because the users of the information can calibrate risk. Techniques to create good probabilistic forecasts are well understood, but only in situations where the predictive model is a direct input-output relationship. If the threats in consideration move and change shape, as with short-term weather forecasts, the wellunderstood techniques can not be used directly. For forecasts in earth-centered domains to be useful, the forecasts have to be clearly demarcated in space and time. This paper presents a data mining approach to address the problem of creating principled probabilistic forecasts when the entity to be forecast can
move and change shape. The rest of the paper is organized as follows. The characteristics of good probabilistic forecasts, and standard data mining approaches to create such probabilistic forecasts are described in Section 2. The limitations of the standard data mining approaches in creating clearly demarcated forecasts in space and time are explained, and the first part of our predictive model (to create principled spatial forecasts) is explained in Section 3. The second part of model, to create principled temporal forecasts that can be tied with the spatial forecast, is explained in Section 4. The way of tying together the two probabilities is explained in Section 5. Results of applying this approach to predicting liquid content are described in Section 6. 2. Probabilistic Forecasts A good probability forecast has two characteristics (See Figure 1): (a) it is reliable. For example, 30% of all the times that the threat is forecast as 30% likely to occur, the threat should occur and (b) it is sharp, i.e. the probability distribution function of the forecast probabilities should not be clustered around
V. Lakshmanan
and Kiel Ortega
27
P(f) ,/
Fig. 1.
A good probability forecast needs to be both reliable (left) and sharp (right).
the a priori probability. Instead, there should be lots of low and high-probability events and relatively few mid-probability events. In many cases, there are three probabilities of interest: (a) the probability that an event will occur, (b) the probability that the event will occur at a particular location (c) the probability that the event will occur at a particular location within a specified time window. When stated like this, most decision makers will aver that it is the third probability that is of interest to them, 2 but that is not what they are commonly provided. Even though research studies implicitly place spatial and temporal bounds on their training set, it has long been unclear how to explicitly form the spatial and temporal variability of the probability field. Thus, if probabilities are presented to decision maker, those probabilities are usually of the first type. The probability that an event will occur is commonly estimated through the formation of a mapping between input features and the desired result on a set of historical cases . 3 This mapping function may take the form of a neural network, a support vector machine or a decision tree. In the case of a neural network, the choice of a sigmoid function as the output node is sufficient to ensure that the output number is a probability, provided that the training set captures the a priori probabilities of the true input feature space. 4 In the case of maximum marginal methods such as support vector machines or bagged, boosted decision trees, scaling the output using a sigmoid function yields the desirable property that the output is a probability . 5 3. Spatial probability forecast Thus, techniques to estimate the probability of an event are well understood. The estimation of spa-
tial probabilities when the threat under consideration is stationary can be performed using the standard formalisms, for example, 6 on soil variability. If the threat is stationary, then the value of each of the input fields at a pixel or statistical or morphological values computed in the neighborhood of a pixel may be used as the input features to a classifier. If the classifier is a neural network with a sigmoid output node or a Platt-scaled maximum marginal technique, then the output of the classifier yields a spatial probability. One approach to probabilistic forecasts when the threat areas are not stationary is to use kriging, 7 a geospatial interpolation approach. Naturally, such an approach is limited to studies where the kriging resolution can be finer than the rate of transformation. This is a condition that exists in slow-moving systems such as diseases (studied by 8 ), but not in fast moving systems such as thunder storms. For fast moving threats, no principled approach to estimating probability fields exists. This is because the input features at a point in space and time affect the threat potential at a different point N minutes later. Thus, a simple input/output mapping is insufficient because which location will be affected, and when it will be affected needs to be known. Yet, in practical situations, the time and location that will be affected is not known with certainty. Consequently, it is necessary to assume a spatial probability distribution associated with the locations that will be affected at a given time instant. Once this second probability is introduced, prior work on principled probability estimates is no longer applicable. A new formulation is needed in the development of a spatiotemporal framework to estimate the probability of occurrence of moving or spreading threats to account for the dynamics of spatial probability field
28
A Technique for Creating Probabilistic Spatio- Temporal Forecasts
of area at risk over time. One can develop the spatiotemporal formulation first by building an ontology of threat precursors with identified features. Each of the features signals a probability distribution of threats in space and time. Threat probabilities can be combined from multiple features and dynamics among these features to estimate the probability of a threat occurring at a particular location within a specific time window. One factor that needs to be considered is that even if the spatial and temporal distribution of the locations that will be affected by a particular feature is estimated, whether the feature will lead to the threat still needs to be estimated. This, of course, is a problem that has been thoroughly addressed in the literature on data mining algorithms. The formulation can be easily described by comparing to an analogue of a set of billiards balls (See Figure 2) and asking: what is the future position of any ball on the table? Each grid point in the set of continuous fields will be considered akin to a ball, and the driving force of a set of points, such as a storm cell boundary will be considered akin to the ring.
9l\
tossing the same coin 50 times. This assumption has to be made on faith in situations where one does not have 50 coins; we will never have 50 identical thunderstorms or hazardous events. By making the assumption of space-time stationarity and from the motion estimated by the hybrid technique described in the next section, one can formulate the spatial probability distribution of a ball on the billiards table. If motion estimates of a moving potential threat are available, the variability of the motion estimates themselves can be used to gauge the probability distribution of the motion estimates (Figure 3). Under stationarity assumptions, the historical probability distribution can be used to estimate the future spatial probability distribution of this current billiard ball. There is one further complication, however. The future expected position of a billiard ball depends on how tightly the balls are packed or on how much control the forcing function exerts. If the balls are loosely packed, the movements of different balls are independent. If the balls are tightly packed, all the balls move together. The actual movement of the balls will be somewhere in between the two extremes. The actual balance depends on the problem at hand and will likely be estimated from the data. If X is the event of interest (such as the probability that there will be a tornado at a particular location within the next 30 minutes), then simple Bayesian analysis yields: P(X) = P(X\packed) x P(packed)+ P(X\notpacked) x (1 - P(packed))
Fig. 2. The probabilistic formulation assumes rigid bodies operated upon by a driving force that permits individual degrees of motion.
We will make the simplified assumption of spacetime stationarity: the probability distribution of a particle in space is identical to the probability distribution of the particle in time (See Figure 3). Loosely, this assumption is similar to assuming that the probability distribution obtained by tossing 50 coins is the same as the probability distribution obtained by
^ '
The first quantity, P(X\packed): the probability of lightning given that all the grid points associated with a thunderstorm move together - can be estimated quite readily using numerical integration of the probability distribution corresponding to motion vectors that will yield a lightning strike at this location based on where the threats are currently present. The third quantity, P(X\notpacked): the probability of lightning given that the grid points associated with a thunderstorm move independently (in other words, the state transition from a cell to its neighbor is independent of other cells in a thunderstorm) can be computed through probability analy-
V. Lakshmanan
and Kiel Ortega
29
p=P(ui)P(vi;
*x / Fig. 3. By assuming that the probability distribution in space is identical to the probability distribution in time (stationarity), the estimated probability distribution in time from historical data can be used to create a spatial probability field. The figure shows how the data values of one component of the velocity vector in time (top) is used to estimate a probability distribution of that component (bottom left). This probability distribution, though estimated from the time field, is considered the spatial distribution in order to formulate the probabilistic location of a grid cell (bottom right) at time T into the future.
Sis:
P{X\notpacked)
= 1 - ( (1 - P{xl))x (1-P(x2))x (2) (l-P(xN)))
where P(xl), P(x2), etc. are the spatial probabilities that the individual grid points will end up impacting this location. The weighting factor P(packed) needs to be estimated from historical data. We anticipate that the P (packed) will depend upon the type of event under consideration and needs to be estimated from the data - a single number may not work for all problems. 4. Estimating Movement There are two broad methods of estimating movement: (a) optical flow methods including crosscorrelation and spectral methods that minimize a displacement error with sub-grids of two frames of a sequence of images 9 ' 10 (b) object tracking methods that identify objects in the frames of a sequence and correlate these objects across time. 11 The object tracking methods provide the ability to track small-scale features and extract trends associated with those features but miss large-scale transformations. In comparison, the optical flow methods yield more accurate estimates over large scales, but miss fine-scale movements and do not provide the ability to extract trends. 12 Nevertheless, trends are very important in a
data-mining predictive approach. A measured value may bear no correlation with whether a threat will materialize, but sharp increases or decreases of some measurements are often reliable indicators of coming threats. Thus, even though optical flow methods can yield more accurate estimates of movement than object-tracking methods, optical flow methods cannot be djrectly applied to problem domains in which object-specific motion and trend information is critical. One way to achieve the high accuracy of optical flow methods while retaining the trending capability of object-based tracking is to create a hybrid technique. It is possible to find objects by clustering the input features using a multi-scale hierarchical approach. 12 This hybrid technique will not correlate objects across frames as in a typical object-tracking scenario. Instead, the objects should be correlated in the current frame with the images in the previous frame using the current shape of the object itself as the domain in which displacement error will be minimized. To extract trends, the object domain can then be simply displaced by the temporally adjusted amount in previous frames and the change in statistical properties of the object computed and used for prediction. Such a hybrid technique, as Figure 4 illustrates, neatly sidesteps the problems associated with splitting and merging that are commonly associated with object-tracking methods. Motion estimates from pairs of frames are smoothed across time using a Kalman filter.13
30
A Technique for Creating Probabilistic Spatio- Temporal Forecasts
•
&
,
o O
Fig. 4. A hybrid tracking technique of correlating objects in the current frame to pixels in the previous frame can yield trend information as well as the high accuracy associated with optical flow methods.
Fig. 5. Top row: Values of Vertical Integrated Liquid (VIL) in kg/m2 and reflectivity (Z) in dBZ at T = t 0 - Middle row: Probabilistic forecast of VIL I 20, Z I 0, Z i 30 at T = 4 0 4- 30 Bottom row: Actual values of VIL and Z at T = t 0 + 30
5. C r e a t i n g a s p a t i o - t e m p o r a l probability forecast In the short term (under 60 minutes), extrapolation forecasts of storm location often provide reasonable results. Therefore, a data mining approach to probabilistic forecasts of severe weather hazards holds promise of being able to predict where alreadyoccurring severe weather is likely to be. At the same time, research studies have shown that it Is possible to predict, with a high degree of skill, which storms are likely- to initiate lightning (one measure of storm severity) and where new storms are likely to form. What has been missing Is a formal mechanism of combining these two disparate sets of studies. Our
spatlotemporal formalism provides the framework to combine the two probabilities to yield, for example, a lightning probability forecast. The suggested approach is to apply the various components described above to train the system as follows: ® Cluster the input remotely sensed data into candidate regions m Using the hybrid motion estimation technique, associate current severe weather to threat signals that occurred N minutes ago. ® Train the associated threat signals to the severe weather activity using a neural network capable of providing predictive probabilities.
V. Lakshmanan and Kiel Ortega 31 Probabilistic Forwcasf Reiiabiiity
t
future. Examples of the original, forecast and actual images are shown in Figure 5 The reliability and sharpness diagrams when evaluated over a 6 hour period is shown in Figure 6. Reflectivity forecasts were sharp, but not reliable. The reflectivity forecasts were usually underestimates of the true probability. The VIL probability forecasts were both reliable and sharp, so a probabilistic forecast of VIL would be of high utility to users of weather information. The lead time, of 30 minutes, is higher than possible using deterministic forecast techniques. Thus, the technique described in this paper works for severe weather situations (such as VIL), but not for any weather (as highlighted by the poor performance on reflectivity). Future work will involve testing against more useful diagnostics of severe weather - the initiation of lightning and the potential for a tornado.
o
0.3 0 4 0.5 0.6 Forecast probability
0.7
Reliabilitysharpness of Probabilistic Frxecas!
Acknowledgments 01
02
03
04 05 08 07 Forecast Probabtlrty
OS
0.9
1
Sharpness Fig. 6. Reliability and sharpness of Vertical Integrated Liquid (VIL in kg/m2) and radar reflectivity (Z in dBZ) forecasts for a Mar 8, 2002 case in the central United States. • Use the spatiotemporal framework and the motion estimates derived from the hybrid technique to estimate spatial probabilities. • Estimate the weighting factor P (packed) to create optimal (in terms of reliability and sharpness) probability factors. The trained model and pattern recognition techniques can be applied in real-time on routinely arriving satellite, radar and model data to predict up to N minutes into the future. The resulting probabilistic forecasts can be evaluated using three metrics: (a) reliability (b) sharpness (c) Radar operating characteristics (ROC) curves, created by setting risk thresholds that vary from 0 to 1 and determining the probability of detection and false alarm rate on a grid-point by grid-point basis. 6. Results and Conclusion The technique of the paper was applied to predicting the spatial distribution of vertical integrated liquid(VIL; 14 ) and radar reflectivity 30 minutes into the
Funding for this research was provided under NOAA-OU Cooperative Agreement NA17RJ1227. The statements, findings, conclusions, and recommendations are those of the authors and do not necessarily reflect the views of the National Severe Storms Laboratory (NSSL) or the U.S. Department of Commerce. References 1. A. H. Murphy, The value of climatological categories and probabilistic forecast in the cost-loss ratio situations, Monthly Weather Review , 803 (1977). 2. I. Adrianto, T. M. Smith, K.A.Scharfenberg and T. Trafalis, Evaluation of various algorithms and display concepts for weather forecasting, in 21st Int'l Conf. on Inter. Inf. Proc. Sys. (IIPS) for Meteor., Ocean., and Hydr., (Amer. Meteor. Soc, San Diego, CA, Jan. 2005). 3. M. Richard and R. P. Lippman, Neural network classifiers estimate bayesian a posteriori probabilities, Neural Computation 3, 461 (1991). 4. C. Bishop, Neural Networks for Pattern Recognition (Oxford, 1995). 5. J. Piatt, Advances in Large Margin Classifiers (MIT Press, 1999), ch. Probabilistic outputs for suppot vector machines and comparisons to regularized likelihood methods. 6. J. Prevost and R. Popescu, Constitutive relations for soil materials, Electronic Journal of Geotechnical Engineering (1996).
32
A Technique for Creating Probabilistic Spatio- Temporal Forecasts
7. M. Oliver and R. Webster, Kriging: a method of interpolation for geographical information systems, Int. J. Geographical Information Systems 4, 313 (1990). 8. P. Goovaerts, Geostatistical analysis of disease data: visualization and propagation of spatial uncertainty in cancer mortality risk using poisson kriging and pfield simulation, Int. J. Health Geography 5 (2006). 9. J. Barron, D. Fleet and S. Beauchemin, Performance of optical flow techniques, Int'l J. Comp.Vis. 12, 43 (1994). 10. J. Tuttle and R. Gall, A single-radar technique for estimating the winds in tropical cyclones, Bull. Amer. Met. Soc. 80(Apr. 1999).
11. M. Dixon, Automated storm identification, tracking and forecasting - a radar-based method, P h D thesis, University of Colorado and National Center for Atmospheric Researchl994. 12. V. Lakshmanan, R. Rabin and V. DeBrunner, Multiscale storm identification and forecast, J. Atm. Res. , 367(July 2003). 13. R. Kalman, A new approach to linear filtering and prediction problems, Trans. ASME - J. Basic Engr. , 35(March 1960). 14. D. R. Greene and R. A. Clark, Vertically integrated liquid water - A new analysis tool, Mon. Wea. Rev. 100, 548 (1972).
PART C
Biometrics
A n Efficient Measure for Individuality Detection in Dynamic Biometric Applications
B. Chakraborty Faculty of Software and Information Science Iwate Prefectural University Japan 020-0193 E-mail:
[email protected] Y. Manabe Graduate School of Software and Information Iwate Prefectural University Japan 020-0193
Science
Individuality detection from dynamic information is needed to counter the threat of forgery in biometric authentication system. In this work a new similarity measure has been proposed for discrimination of genuine online handwriting from forgeries from dynamic time series signal. The simulation experiment shows that the measure is more effective compared to DP matching for measuring intraclass and interclass variation among several writers for writer verification by online handwriting. Keywords:
Dynamic Biometrics, Individuality Detectio
1. Introduction Biometric technologies are rapidly gaining importance with the growing need of information security. Current biometric technologies for personal identity verification are mostly based on static information such as face, fingerprint, iris, blood vessel etc. Due to increasing threat of forgery it is becoming necessary to extend biometric technologies to include dynamic information like hand gesture, body movement or online handwriting. The main point of an efficient authentication system is to minimize false acceptance rate as well as false rejection rate. One of the key problems in biometrics is the intraclass variation. Nearly no two structures are completely identical although they are of same origin. To achieve a successful identity detection, intra class variation should be as minimum as possible while inter class variation being maximum. In the case of dynamic biometric technologies such as writer verification by on-line handwriting, one generally has to deal with multivariable time series to include dynamical information. In the process of verification, we need to find out similarity or dissimilarity between two time series (signal) in order to discriminate between genuine handwriting with forgeries. The similarity measures used for static feature values (expressed as multidimensional vectors) in classification/identification problems can not be used directly for calculating similarities between two
Translation Error, Similarity Measure
multivariate time series signal. In this work a similarity measure for comparing two trajectories has been proposed which can be applied for individuality detection by distinguishing online handwriting of different persons based on dynamic information. The proposed measure is based on translation error defined by Wayland in 1 for detecting determinism in a time series. Simulation experiments has been done for the problem of writer identification by online handwriting and the usefulness of the measure has been shown in comparison to popular DP matching method in measuring intraclass and interclass variation among several writers. The next section represents a brief introduction to writer verification by online handwriting followed by proposed algorithm of measuring similarity between trajectories generated by pen-tip movement on writing tablet. Section 3 contains simulation experiments and results. Section 4 represents conclusion and future direction.
2. Dynamic Biometrics: Online Handwriting Online signature verification as a means of person identification is under intense investigation for a long time and a lot of research paper has been published in this area 2 3 . 4 Automatic writer authentication methods using online handwriting are broadly classified
36
An Efficient Measure for Individuality
Detection in Dynamic Biometric
into two categories : function based and parameter based . 5 In parametric approaches, only the parameters abstraced from the signal are used. Though they are simpler with high computation speed but task of selecting right parameters is difficult and during the processing the dynamical information is lost. In functional approach the two complete signals are compared which generally yields better results. 6 For the past two decades the use of DTW (Dynamic Time Warping) based on DP matching algorithm to find the best matching path, in terms of the least global cost, between an input signal and a template, has become a major technique in signature verification,7 8 Though DTW is quite sucessful it has two main drawbacks: heavy computational load and warping of forgeries. Other popular recent methods are based on HMM. 9 In this paper we consider online handwriting as a multivariate time series signal generated by the pen movement on writing pad, the variables are x,y coordinates of pen-tip movement, writing pressure, pen inclination with respect to x and y co-ordnates etc. The path of the hand movement during pen-up position is also interpolated in the time series along with the movement of the pen-down position. A similarity measure for measuring similarity between generated time series signals corresponding to two samples of same writing has been proposed and is explained in the following section. 2.1. Proposed Algorithm Similarity
of
Applications
by delay coordinate embedding method from a online handwriting signal. (2) Constructed trajectory reflects the individuality of the time series. (3) Several multi-dimensional trajectories constructed from the several handwriting time series of same characters generated by t h e specific writer have almost same dynamics within a certain error. (4) Thus, similarity between time series signals can be evaluated by translation error of constructed trajectories. A deterministic time series signal [^(i)]^!, can be embedded as a sequence of time delay co-ordinate vector vs(t), known as experimental attractor, with an appropriate choice of embedding dimension m and delay time r for reconstruction of the original dynamical system as follows:
vs(t + l) = f(vs(t)) vs(t) = \s(t), s(t + r),...,s(t + (m- 1 ) T ) ) ] where / denotes reconstructed system having one-to-one correspondence with original system. Though the present work is not concerned with detecting determinism in time series, delay vector embedding is used here for extracting the local wave pattern of the time series as shown in Fig. (1) which act as a local feature.
Measuring
The time series signal generated by online handwriting is considered to be originated from the dynamics of the hand movement. The trajectories of a writing piece of sample should be similar for the same writer but different for different writers as handwriting is considered to be dependent on individual. Here we propose a novel measure for evaluating similarity of time series signals based on Translation Error, originally proposed by Wayland. Wayland test 1 is a widely used method for detecting determinism from a time series. We have modified Wayland's algorithm in order to evaluate similarity between handwriting time series. The proposed measure is based on the following concepts. (1) Multi-dimensional trajectory can be constructed
•
t
v. (1)1 vf(2) 4
4 4
v.O) • Fig. 1.
Delay Coordinate Embedding and Local Wave Form
B. Chakraborty and Y. Manabe
37
(5) etrans is calculated for L times for different selection of random vector vSa (k) and the median of e\rans (i = 1,2,..., L) is calculated as
•
M(etrans) = Median(e\rans,...,
e t L rans ).
(4)
The final translation error Etrans is calculated by taking the average, repeating the procedure M times to supress the statistical error generated by random sampling in the previous step. 1
Fig. 2.
Translation in Multi-Dimensional Phase space
We propose a measure for calculating distance between trajectories for a th sample and b th sample of the same piece of writing from the translation of delay vectors in the reconstructed space based on translation error defined in Wayland test as follows: (1) Si(t) is defined as the time series signal generated from online handwriting and vSi (t) is the corresponding embedded vector for i th sample, vSa (t) and vSb(t) denote the delay vectors respectively for a th and b th samples. (2) A random vector vSa(k) is chosen from vSa(t). Let the nearest vector of vSa(k) from vSb(t) be vSb(k'), where \k — k'\ < Tth, where Tth denotes a threshold value ensuring a small region for k nearest neighbours (shown in Fig. (2) ) (3) For the vectors (vSa(k) and vSb(k')), the transition in the each orbit after one step are calculated as follows;
trans
2.2. Measure for Individuality
(1) Intra sample variation of similarity measure from a single writer. (2) The distance between intra sample distribution and inter sample distribution of several writers. Now for calculation of (1) online handwriting samples for a particular piece of writing for a particular writer has to be taken iV times. The translation error between (N x N — N) samples has to be computed to get the distribution of individual variation for a particular writer. The co-efficient of variation for N samples is defined as CV = °Sim
VSb(k') = vSb(k' + l)-vSb(k').
(2)
fJ-Sim — N(N-l)
is calculated
from
~ n\
|y-|
"•"
ly-i
/> V"/
where V denotes average vector between VSa (k) andVS(1(fc')-
Detection
The translation error Et defined above is used here to define a measure of variation between same and different writer's handwriting samples as follows:
(1)
i(\v.A*)-v\ , \v.>W)-v\
e
Et will be different if we interchange a th and 6 th sample in step 2 of the above algorithm i,e. random vector is chosen from delay vectors of b th sample. Wayland demonstrated that the translation error Et tends to 0 if the time series is deterministic and tends to 1 if it is random.
V3a(k) = vSa(k+l)-vSa(k),
(4) Translation error etrans, VSa(k) and VSb(k') as
M
asim = y
N(N-I)
2^i=l
]LiIi
aim,
(Sinn -
^Sim)2
where er| im and [isim denote the variance and mean of the distribution of translation error for N samples. For calculation of (2), online handwriting sample for a particular piece of writing for several (K) writers have to be taken N times. The difference between the distribution of translation error for same writer
38
An Efficient Measure for Individuality
Detection in Dynamic Biometric
(number of samples N(N — 1), group P) and the distribution of translation error for this writer and the other (K-l) writers ( number of samples NK, group Q) are calculated by Fratio as follows:
lvl MS-
. — Winter —
(Lf?- , M&tntra
bb
2inUr —1
— , ssi"i'" — N(N-l) + NK-2
SSinter V-PQ?
= N(N - l)(fiP - fiPQ)2
SSintra
= N(N - l)a2P + NKa2Q
+ NK^Q
-
UP, HQ, fipQ represent average of group P, group Q and group P and Q pulled together as one group while op and OQ represent variance of group P and Q respectively. For correct identification of a writer, MSinter should be as high as possible while MSintra should be as low as possible, making Fr high for good authetication results. 3. Simulation Experiment and Results 3.1. Data
Preparation
In order to ebvaluate the efficiency of the proposed similarity measure a small scale simulation experiment has been done with 5 (A, B, C, D, E) writers, all of them wrote the word 'Software' in katakana, a Japanese alphabet system known to be difficult for capturing individuality of handwriting. A piece of sample in shown in Fig. (3) The writers used WACOM tablet " intuos e 3" for writing and each writer wrote 10 times in two sittings producing 100 samples per writer. In this study we used only x-y coordinates of the pen tip movement. WACOM tablet "intuosjj 3" produces 200 points /sec which is too many for computation.
Fig. 3.
Handwriting Character samples
3.2.
Applications
Pre-processing
A preprocessing has been done to sample non equidistant points such that points in the curve region is taken into account more than the points in the relatively straight portion. If the angle between the line joining ( xt-\,yt-\) and (xt,yt) and the line joining (xt-i,yt-i) and (xt+i,yt+i) is less than a certain threshold (Qth) the point (xt,yt) is dropped. Secondly the time series values of the co-ordinates x(t) and y(t) are normalized to lie between 0 and 1. 3.3. Simulation
Experiment
Translation error (TE) and DP matching distance values are calculated for 90 trajectory pairs corresponding to 10 samples for each particular writer and 400 trajectory pairs for a particular writer and other writers. That is here, group P has 90 samples and group Q had 400 samples. For calculation of translation error (TE) the delay vectors of the trajectories are constructed with simple assumption of embedding dimemsion m = 3 and delay T = 1. The other parameters are set as follows: Qth = 1.0, L = 50 and M = 10 (in step 5 of proposed algorithm). 3.4. Simulation
Results
Table. 1 represents the intra writer variability values using translation error and DP matching as the measures. The ratio of standard deviation and average value of the distribution of both the measurements is taken for calculating the index. It is shown that for both the time series, translation error is a better measure for identity detection. Table. 2 represents the avarage Fr of each writer in comparison with rest of the writers. Here also translation error is found to be better than DP matching. Specially for writers C and D the low values of Frati0 using DP measure for x co-ordinate time series indicate that intra writer variation compared to inter writer variation is high enough to discover any individuality and thus can not be used for personal identification. Fr for y(t) values seems better but FT based on T E measure for both the time series are far better than those values based on DP measure. Thus translation error can be used to detect the individuality of writer C and D as Fr corresponding to TE of writer C and
B. Chakraborty and Y. Manabe 39 D are quite high. In fact in our experiment writer C and D are novice in using writing tablet and their sample writings show greater variation from sample to sample compared to writer A, B and E. It seems t h a t in spite of variation in sample writing, T E is better measure t h a n D P to identify the particular writer.
4.
Conclusion
In this work translation error ( T E ) , a measure for detecting determinism in a time series, is used to define a distance measure and an algorithm for calculating the distance between the trajectories of two online handwriting samples of same piece of writing has been proposed. T h e effectiveness of the measure for possible use as a dynamic biometric technolgy for identity detection has been examined in comparison to the one of the popular technique, D P matching algorithm, in writer verification problem with online handwriting. Table 1. Intra writer variability for different writers
Writer A Writer B Writer C Writer D Writer E Average
Time Series X TE DP 0.096 0.213 0.152 0.232 0.160 0.334 0.118 0.223 0.126 0.237 0.130 0.248
Time Series Y TE DP 0.104 0.252 0.139 0.301 0.212 0.126 0.102 0.193 0.127 0.253 0.120 0.242
In the simple experiment conducted here it is found t h a t translation error based similarity or distance calculation of trajectories are better t h a n D P matching based measure. This measure can also be used for feature evaluation for multivariate time se-
ries. In our experiment the values in the Tables indicate t h a t x trajectories are better t h a n y trajectories for identity detection. This measure also can be extended to consider multivariate time series for authentication problems using all the time series information combined together. At present we are conducting experiments with a larger number of writers and would like to apply this measure for writer verification problem from online handwriting. Table 2. FTatio of one writer with rest of the writers
Writer A Writer B Writer C Writer D Writer E Average
Time Series X TE DP 178.24 346.88 185.85 210.11 104.76 4.68 167.06 0.14 85.04 244.39 212.84 90.79
Time Series Y TE DP 104.11 235.61 144.34 141.84 91.88 107.30 142.75 39.49 92.91 210.85 120.72 141.30
References 1. R. Wayland et al, Physical Review Letters 70(5), 580 (1993). 2. V. S. Nalwa, Proc. of IEEE85,215 (1997). 3. A. Khalmatov and B. Yanikoglu, Pattern Recognition Letter 26(15),2400 (2005). 4. h t t p : / / i r i s . u s e . e d u / V i s i o n - N o t e s / Bibliography/char1012.html 5. F. Lecrec and R. Plamondon,International Journal of Pattern Recognition and Artificial Intelligence, 8(3), 643 (1994). 6. R. Plamondon and G. Lorette,Pattern Recognition 22, 107 (1989). 7. P. Zhao et al., IEICE Trans. Inf. & Syst E 7 9 D(5),535 (1996). 8. H. Feng and C. C. Wah, Pattern Recognition Letter 24(16) ,2943 (2003). 9. M. M. Shafie and H. R. Robiee.Proc. of ICDAR 2003 (2003).
40
Divide-and-Conquer Strategy Incorporated Fisher Linear Discriminant Analysis: An Efficient Approach for Face Recognition
S. Noushath*, G. Hemantha Kumar and V. N. Manjunath Aradhya Department
of Studies in Computer Science University of Mysore Mysore-570006, INDIA E-mail: nawali-naushad@yahoo. co. in* P. Shivakumara Department of Computer Science School of Computing National University of Singapore Singapore E-mail:
[email protected]
Fisher linear discriminant analysis (FLD) is one of the most popular feature extraction methods in patter recognition and can obtain a set of so-called projection directions such that the ratio of the between-class and the withinclass scatter matrices reaches its maximum. However in reality, the dimension of the patterns will be so high t h a t the conventional way of obtaining Fisher projections makes the computational task a tedious one. To alleviate t h i s problem, in this paper, divide-and-conquer strategy incorporated FLD (dcFLD) is presented with two objectives: one is to sufficiently utilize the contribution made by local parts of the whole image and the other is to still follow the same simple mathematical formulation as FLD. In contrast to the traditional FLD method, which operates directly on t h e whole pattern represented as a vector, dcFLD first divides the whole pattern into a set of subpatterns and acquires a set of projection vectors for each partition to extract corresponding local sub-features. These local sub-features a r e then conquered to obtain global features. Experimental results on several image databases comprising of face a n d object reveal the feasibility and effectiveness of the proposed method. Keywords: Divide-and-conquer Strategy; Fisher Linear Discriminant Analysis; Principal Component Analysis; Face Recognition; Object Recognition
1. Introduction 1
Principal Component Analysis (PCA) and Fisher Linear Discriminant analysis (FLD), 2 respectively known as eigenface and fisherface method, are the two state-of-the-art subspace methods in face recognition. Using these techniques, a face image is efficiently represented as a feature vector of low dimensionality. The features in such subspace provide more salient and richer information for recognition than the raw image. It is this success, which has made face recognition (FR) based on PCA and FLD very active, although they have been investigated for decades. To further exploit the potentiality of PCA and FLD methods, new techniques (Refs. 3,4 are two similar methods found in the literature referred respectively as 2DFLD and 2DLDA. In all our discussions, whenever we refer to, 4 it also implies3 and viceversa.) called 2DPCA 5 and 2DLDA3,4 were respectively proposed. Although these methods proved to be efficient in terms of both computational time and
accuracy, a vital unresolved problem is that these methods require huge feature matrix for the representation. To alleviate this problem, the (2£>)2LDA6 method was proposed which gave the same or even higher accuracy than the 2DLDA method. Further, it has been shown in 6 that, 2DLDA essentially works in the row-direction of images. In this way, alternative2DLDA (A2DLDA)6 was also proposed which works in the column direction of images. By simultaneously combining both row and column directions of images, 2-Directional 2DLDA, i.e. (2D) 2 LDA was proposed. Unlike 2DLDA and A2DLDA methods, the DiaFLD 7 method seeks optimal projection vector along the diagonal direction of face images by enlacing both row and column information at the same instant. Furthermore, 2DFLD and DiaFLD were combined together in 7 to achieve efficiency in terms of both accuracy and storage requirements. In spite of the success achieved due to the above mentioned variations 3 ' 4 ' 6 ' 7 of the original FLD method, there are still some serious flaws that needs
S. Noushath,
G. Hemantha
to be addressed. The main disadvantage of the FLD method is that, when the dimensionality of given pattern is very large, extracting features directly from these large dimensional patterns cause some processing difficulty such as the computational complexity for large scale scatter matrices constructed by the training set. Furthermore, due to utilizing only the global information of images, this is not effective under extreme facial expression, illumination condition and pose, etc. Hence, in this paper, we have made a successful attempt to overcome the aforesaid problems by first partitioning a face image into several smaller sub-patterns (sub-images), and then a single FLD is applied to each of them. It was reported that the changes due to lighting condition and facial expressions emphasize more on specific parts of the face than others. 8 Consequently, variations in illumination or expression in the image will only affect some sub-images rather than the whole image, and therefore the local information of a face image may be better represented. Furthermore, the sub-pattern dividing process can also help in increasing the diversity, making the common and class-specific local features be identified more easily.9 Meaning that, the different contributions made by different parts of images are more emphasized which in turn helps to enhance the robustness to both illumination and expression variation. These are the reasons which motivated us to go for subpattern dividing process to overcome the aforementioned drawbacks of the original FLD method. In the first step of this method, an original whole pattern denoted by a vector is partitioned into a set of equally sized sub-patterns in a non-overlapping way and all those sub-patterns sharing the same original feature components are respectively collected from the training set to compose a corresponding sub-patterns training set. In the second step, FLD is performed on each of such sub-patterns training set to extract its features. At last, a single global feature is synthesized by concatenating each sub-patterns FLD projected features together. Finally, a nearest neighbor classifier is used for subsequent recognition. The experiment on different image databases will provide the comparison of the classification performance of dcFLD with other linear discrimination methods. The rest of the paper is organized as follows: The algorithm is detailed in section 2. Experiments
Kumar,
V. N. Manjunath
Aradhya and P. Shivakumara
41
are carried out in section 3 to evaluate dcFLD and other subspace analysis methods using wide range of image databases. Finally, conclusions are drawn in section 4. 2. Proposed dcFLD There are three main steps in the proposed dcFLD algorithm: (1) Partition face images denoted by a vector into sub-patterns, (2) FLD is performed subpattern-by-subpattern to extract features from a set of large dimensional patterns, and (3) classify an unknown image. 2.1. Image
Partition
Suppose that there are iV training samples Ak(k = 1,2,..., iV), denoted by m-by-n matrices, which contain C classes, and the ith class d has n* samples. Now, an original whole pattern denoted by a vector is partitioned into K d-dimensional subpatterns in a non-overlapping way and reshaped into d-by-K matrix Aj = (Aji, Aj2, • • •, AJK) with Aij being the j t h subpattern of Ai and i — 1,... ,N,j = I,... ,K. Now to form the j t h training subpattern set TSj, we respectively collect j t h subpattern of Ai and i=l,..,N, j=l,..,K. In this way, K separate subpatterns are formed. 2.2. Apply FLD on K
subpatterns
Now according to the second step, conventional FLD is applied to the j t h subpattern set [TSj]to seek corresponding projection sub vectors Uj = (uji,Uj2,--,Ujt) by selecting t eigenvectors corresponding to t largest eigenvalues based on maximizing the ratio of the determinants of the between-class and the within-class scatter matrices of the projected samples. Analogous to the fisherface method, 2 define the j t h between-class and within-class sub scatter matrices,Gbj and GWJ , respectively as follows: c Gbj = Y,n*(Av
- Ai)(AH
~ Ai)T
(!)
t=i
c J =H
Gw
i=l
E
(Akj - AijKAv - Aijf
(2)
Akj&Ci
Here, A, = i YLi=\ Aij 3 = 1,2,3,..., K are subpattern means, Aij = ^- Y^Ji=i Aij is ith class j t h
42
Divide-and-Conquer
Strategy Incorporated Fisher Linear Discriminant
subpattern mean and Akj is the j subpattern of kth sample belonging to the ith class. After obtaining all individual projection sub vectors from the partitioned sub patterns, extract corresponding sub features Yj from any subpattern of given whole pattern Z = (Zi, Z 2 , . . . , ZK) using the following equation: Yj = UjZj
(3)
Now synthesize them into a global feature as follows: Y = (¥?,...,¥?)?
= (Z?UU...,ZTUK)T
Classification
In this process, in order to classify an unknown face image I, the image is first vectorized and then partitioned into K sub-patterns (Ii,h, • • • ,IK) in the same way as explained in section 2.1. Using the projection sub vectors, sub features of the test sample I is extracted as follows: Fj=Ujlj
Vj = l,2,...,K
(5)
Since one classification result for the unknown sample is generated independently in each subpattern, there will be total K results from K subpatterns. To combine K classification result from all subpatterns of this face image I, a distance matrix is constructed and denoted by D(I) = (dij)NxK, where di} denotes the distance between the corresponding j t h patterns of the I and the ith person, and d^ is set to 1 if the computed identity of the unknown sample and the ith persons identity are identical, 0 otherwise. Consequently, a total confidence value that the test image I finally belong to the ith person is defined as: K
Td{I) = Y,di}
(6)
And the final identity of the test image I is determined as follows: Identity(I) 2.4. Image
= argtmax(TCi(I))
l^i^N
Recognition
the image of a face. Similarly, in this sub-pattern based approach, a face image can be reconstructed in the following way: A ^ (Uj . FVJ) + Aj
Vj = l,...,K
(8) t h
Where Uj indicates the projection vectors of j subpattern obtained through PCA, FVJ is the feature vector of the image which we are seeking to reconstruct, and Aj is the j t h sub-pattern mean.
(4)
It is interesting to note that when K = 1 and d = m x n, dcFLD reduces to the standard Fisherface method. Thus we can say that the Fisherface method is a special case of the proposed dcFLD method. 2.3.
Analysis: An Efficient Approach for Face
(7)
Reconstruction
In the whole-pattern based approach, the feature vector and the eigenvectors can be used to reconstruct
3. Experiments In this section, a series of experiments are presented to evaluate the performance of the proposed dcFLD by comparing with existing methods. 1 ' 2 ' 4 ^ 7 1 0 , 1 1 All our experiments are carried out on a PC machine with P4 3GHz CPU and 512 MB RAM memory under Matlab7 platform. For all the experiments, the nearest neighbor classifier is employed for classification. If extra explanation is not given, it is understood that the experiment is repeated for 25 times by varying the number of projection vectors t (where i=l,2,..,20,25,30,35,40,45). Since t has a considerable impact on classification performance as well on the dimension of subpattern (for the proposed dcFLD method), we choose that t which corresponds t o best classification result on the image set.
3.1. Image
databases
The aforementioned algorithms are tested on several image databases comprising of face and object. We carried out the experiments on two face databases: ORL 12 and the Yale2 and also on an object database namely COIL-20. 13 In ORL database, there are 400 images of 40 adults, 10 images per person while Yale database contains 165 Images of 15 adults, 11 image per person. Images in Yale database features frontal view faces with different facial expression and illumination condition. Besides these variations, images in ORL database also vary in facial details (with or without glasses) and head pose. The COIL-20 is a database of 1440 gray-scale images of 20 objects. The objects were placed on a motorized turntable against a black background. The turntable was rotated through 360 degrees t o vary object pose with respect to fixed camera. Images of the objects were taken at pose interval of 5 degrees, which corresponds to 72 images per object.
S. Noushath, G. Hemantha
Kumar,
V. N. Manjunath
Aradhya and P. Shivakumara
43
Table 1. Best recognition accuracy for varying training sample number and dimension of subpattern
3.2. Results
Dimension of subpatter(d-.fC)
2
4
5
6
8
92-112
88.25(11)
94.50(11)
97.00(08)
99.00(06)
99.75(04)
112-92
87.00(12)
94.75(18)
96.00(08)
98.50(12)
100.0(06)
161-64
87.00(06)
94.75(13)
97.25(15)
99.00(09)
99.75(05)
322-32
87.00(10)
94.50(14)
96.50(09)
98.25(09)
100.0(06)
644-16
85.50(07)
94.25(18)
96.50(12)
98.00(09)
99.75(08)
2576-4
82.00(20)
92.75(19)
94.00(18)
95.75(08)
99.25(09)
on the face
databases
We first conduct an experiment with ORL database. As noted above, 40 people, 10 images of each person with the size of 112 x 92 are used. Our preliminary experiments show that the classification performance of the proposed method is impacted by the dimension of subpattern (d). In order to determine the effect of dimension of subpattern on the available data, we check the classification performance by varying both number of training samples and dimension of subpattern. For this, we randomly chose p images from each class to construct the training database, the remaining images being used as the test images. To ensure sufficient training a value of at least 2 is used for p. It can be ascertained from Table 1 that the recognition accuracy of the proposed method is greatly influenced by the size of the subpattern dimension (d). It is apparent from the table that, smaller the d (or larger the number of subpatterns K), better is the recognition accuracy. Value in the parenthesis denotes the number of projection vectors used to attain the best recognition accuracy. It is also observed that the recognition accuracy is comparatively better when d and K take on the values 161 and 64 respectively. Hence in all our later experimentations on ORL, we use these values for d and K. The next set of experiments on ORL database is conducted by each time randomly selecting 5 images per person for training, the rest 5 per person for testing. This experiment is independently carried out 40 times, and the averages of these experiments results are tabulated in Table 2. Experiments on the Yale database are carried out by adopting leave-one-out strategy, that is, leaving out one image per person each time for testing and all of the remaining images are used for training. Each image is manually cropped and resized to
235 x 195 pixels in our experiment. This experiment is repeated 11 times by leaving out a different image per person every time. Results depicted in Table 2 are the average of 11 times results. Here the values of d and K are empirically fixed to 235 and 195 respectively, to yield optimal recognition accuracy. In table 2, /x is defined as follows:
number fi = number :
of of
selected all— the
eigenvectors f: x 100 eigenvectors
Here all the eigenvectors are sorted in the descending order of their corresponding eigenvalues, and selected eigenvectors are associated with the largest eigenvalues. It can be determined from Table 2 that on ORL, dcFLD achieves better performance improvement over FLD and other methods. Moreover, the experiments on Yale database significantly exhibits the efficiency of proposed method under varied facial expressions and lighting configurations. It can be seen from the table that, on Yale database, dcFLD achieves up to 4-10% performance improvement over PCA, up to 4-9% improvement over the Fisherface method, up to 1-4% improvement over A2DLDA method. In contrast to 2DLDA and (2D)2LDA methods, significant improvement in accuracy is achieved (i.e. up to 13%). Thus we can say that not only does dcFLD prevails stable under aesthetical conditions (for ORL), but also exhibits efficiency and high robustness under the condition when there is wide variations in both lighting configurations and facial expressions (for Yale). Finally, to further exhibit the essence of this subpattern based approach over the conventional wholepattern based approach, we conduct a simple reconstruction test. Taking one image from the ORL as
44
Divide-and-Conquer
Strategy Incorporated Fisher Linear Discriminant
Table 2. Database
ORL
Yale
II
IMB
Accuracy comparison of various approaches. 2D2LDA
H
PCA
FLD
2DLDA
A2DLDA
10.0
90.43
93.73
93.80
93.67
93.33
2DPCA
DiaFLD
DiaFLD+2DFLD
dcFLD
93.95
93.80
93.73
94.63
12.5
92.46
94.00
94.73
94.88
92.71
93.78
94.66
92.48
94.70
20.0
93.61
93.93
92.86
93.88
93.88
93.00
94.00
93.00
94.15
6.66
83.47
88.42
88.42
93.38
90.08
86.77
89.25
92.56
93.38
10.00
86.77
88.42
85.12
92.56
91.73
87.60
89.25
92.56
92.56
13.33
85.95
86.77
85.12
92.56
88.42
87.60
88.42
92.56
95.04
16.66
87.60
88.42
85.12
92.56
85.12
87.60
87.60
90.90
93.38
20.00
87.60
86.77
84.29
92.56
80.16
87.60
87.60
89.25
93.38
9H
mi
Hi
I
Fig. 1. five reconstructed images by whole-pattern based approacfa(Top-row) and sub-pattern based approach (Bottomrow)
an example, we can determine its five reconstructed images for varying projection vectors t (where 1=10, 20, 30, 40,50). These images are shown in Fig. 1. It is quite evident that the sub-pattern dividing approach yields higher quality image than the wholepattern based approach, when using similar amount of principal components. Note that our objective is to demonstrate the effectiveness of the subpattern based approach over the whole-pattern based approach. Hence in both the approaches, we considered the PCA features for reconstruction. It is also well known that, PCA has better image reconstruction accuracy than the FLD method. 3.3. Results
Analysis: An Efficient Approach for Face Recognition
on the object
database
Inspired by the conviction that the successful method developed for FR 1 ' 5 should be extendable for object recognition as in 10,11 respectively, in this section, we verify the applicability of the dcFLD method
for objects by considering COIL-20 database. This database contains gray level images of size 128 x 128 pixels. In this experiment, we empirically fixed the values of both d and K to 128. For comparative analysis of various approaches, we consider first 36 views of each object for training and the remaining views for testing. So, size of both the training and the testing database is 720. Table 3 gives the comparison of nine methods on top recognition accuracy, corresponding dimension of feature vector/matrices and running time costs. It reveals that the top recognition accuracy of the proposed dcFLD method is higher than other methods. These results certify the pertinence of the dcFLD method for object database apart from the face images. But, the only demerit of the proposed- method is that it consumes more time compared to other contemporary methods such as 2DPCA, 2DLDA, etc. This is due to the fact that the proposed method involves the drudgery of subpattern set formation and then obtaining corresponding projection vectors and features. But, as the present-day computers have ample processor speed, it has no practical influence on the classification performance of the proposed method. 4. Conclusions a n d future work In this paper, a novel subspace analysis method called dcFLD is proposed for efficient and robust face/object recognition. The proposed method utilizes the separately extracted local information from each sub-pattern set and thereby possesses robustness and better recognition performance. Indeed, the practicality of the dcFLD is well evidenced in the experimental results for the Yale face database. In that,
S. Noushath, G. Hemantha Kumar, V. N. Manjunath Aradhya and P. Shivakumara
45
Table 3. Comparison of Different Methods on COIL-20 database. Methods PCA
10
Top Recognition Accuracy
Running Time(s)
Dimension
93.89
157.91
35
FLD
91.25
177.29
40
2DPCA 1 1
94.30
30.45
128 x 5
2DLDA
93.05
33.69
128 x 7
A2DLDA
88.88
29.94
9 x 128
2D2LDA
94.72
59.69
11x11
93.75
29.91
128x6
DiaFLD DiaFLD+2DFLD
94.10
61.64
9x9
dcFLD
95.77
129.52
19x128
dcFLD improves recognition accuracy by a minimum of 3-4% t h a n the other discrimination methods. Furthermore, the results on COIL-20 object database also show t h a t the proposed method is feasible and effective. Also we believe t h a t this method is equally effective in scenarios where a person's face is occluded by sunglass, scarf etc. which is an interesting issue for future work. Nevertheless, there are still some aspects of d c F L D method t h a t deserve further study. Is the proposed approach the best way of subdividing the full p a t t e r n . In addition, d c F L D needs more coefficients for image representation t h a n F L D . Is there any way t o reduce t h e number of coefficients and in t h e meantime t o keep up the same accuracy? Finally, it is still unclear as how to choose the optimal number of subp a t t e r n s t o obtain t h e better recognition accuracy. These are some crucial issues which give scope for future work. References 1. M. Turk and A. Pentland, Journal of Cognitive Neuroscience 3, 71 (1991).
2. P. Belhumeur, J. Hespanha and D. Kriegman, IEEE Transactions on Pattern Analalysis and Machine Intelligence 19, 711 (1997). 3. H. Xiong, M. Swamy and M. Ahmed, Pattern Recognition 38, 1121 (2005). 4. M. Li and B. Yuan, Pattern Recognition Letters 26, 527 (2005). 5. J. Yang, D. Zhang, A. Frangi and J. Yang, IEEE Transactions on Pattern Analalysis and Machine Intelligence 26, 131 (2004). 6 S. Noushath, G. H. Kumar and P. Shivakumara, Pattern Recognition 39, 1396 (2006). 7. S. Noushath, G. Kumar and P. Shivakumara, Neurocomputing 69, 1711 (2006). A. M. Martnez, IEEE Transactions on Pattern Analalysis and Machine Intelligence 24, 748 (2002). 9 X. Tan, S. Chen, Z. Zhou and F. Zhang, IEEE Transactions on Neural Networks 16, 875 (2005). 10. H. Murase and S. Nayar, International Journal of Computer Vision 14, 5 (1995). 11 P. Nagabhushan, D. Guru and B. Shekar, Pattern Recognition 39, 721 (2006). 12 www.uk.research.att.com/facedatabase.html. 13. wwwl.cs.Columbia.edu/CAVE/research/softlib/ coil-20.html.
Ear Biometrics: A N e w Approach
Anupam Sana and Phalguni Gupta Indian Institute of Technology Kanpur Kanpur(U.P.), India-208016 E-mail: {sanupam,pg}@iitk.ac.in Ruma Purkait Department of Anthropology Saugor University Saugor-470003, India E-mail:r.
[email protected]
Abstract. The paper presents an efficient ear biometrics system for human recognition based on discrete Haar wavelet transform. In the proposed approach the ear is detected from a raw image using template matching technique. H a a r wavelet transform is used to decompose the detected image and compute coefficient matrices of the wavelet which a r e clustered in its feature template. Decision is made by matching one test image with 'n' trained images using Hamming distance approach. It has been implemented and tested on two image databases pertaining to 600 individuals from IITK and 350 individuals from Saugor University, India. Accurcy of the system is more than 96% Keywords: Haar wavelet; Wavelet Coefficient Matrices; Ear detection; Template matching; Hamming distance.
1. Introduction Biometrics is the automated method of identifying or verifying the identity of an individual on the basis of physiological and behavioral characters. Ear which is easily detectable in profile images can be advocated as a recognition tool. Ear has few advantages over facial recognition technology. It is more consistent compared to face as far as variability due to expression, orientation of the face and effect of aging, especially in the cartilaginous part is concerned. Its location on the side of the head makes detection easier. Data collection is convenient in comparison to the more invasive technologies like iris, retina, fingerprint etc. The possibility of ear as a tool for human recognition has been first recognized by the French Criminologist, Alphonse Bertillon 1 . Nearly a century later, Alfred Iannarelli 2 has devised a non-automatic ear recognition system where more than ten thousand ears have been studied and found no two ears are exactly alike. Burger and Burger 3 ' 4 have proposed an ear biometrics approach which is based on building neighborhood graph from Voronoi diagrams of the detected edges of ear. But its main disadvantage is the detection of erroneous curve segments i.e. this system may not be able to differentiate the real ear edges and non edges curves. Choras 5 also has used an approach for feature extraction based on contour
detection, but the method too suffers from the same disadvantage of erroneous curve detection. Hurley et a/.6 have proposed an approach on force field transformation to determine the energy lines, wells and channels of ear. Although it has been experimented on a small data, the results are found to be quite promising. Victor et al.7 have presented an approach based on the Principal Component Analysis (PCA) on face and ear recognition. Face based system has given a better performance than ear. In a similar experiment by Chang et al.8 no significant difference between the performance of ear and face has been found. Moreno et al.,9 have analysed ear by neural classifiers and macro-features extracted by compression networks. It has achieved better recognition result (without considering rejection thresholds) using only compression network. Chen and Bhanu 1 0 have worked on 3D ear recognition using local surface descriptor for representing ear in 3D. The system performance is evaluated on a real range image database of 52 subjects. This paper has proposed a novel approach for feature extraction from the ear image. Wavelet is an emerging technique for image analysis. Haar wavelet is simple and reliable in this field. So in this paper discrete Haar Wavelet transformation is applied on ear images and wavelet coefficients are extracted. Section 2 discusses the proposed approach. The experimen-
Anupam Sana, Phalguni Gupta and Ruma Purkait 47
tal results are presented in Section 3. The proposed conclusions are given in the last section.
tured, two has been used for training and one for testing purpose.
2. T h e P r o p o s e d A p p r o a c h This section discusses a new approach for feature extraction in ear biometrics. The ear images generally have large background, so there is a need to detect ear which is done by template matching and detected image is scaled into constant size. R*om these ear images features are extracted as Haar wavelet coefficient from wavelet decomposed image and feature template is stored for each training image. So testing feature template which is extracted using above mention way is matched with large set of training template and gives best set of matches based on individual matching score. The present approach is implemented in four major steps. Step 1 is image acquisition where ear image is captured in camera in the laboratory environment. Step 2 is image preprocessing where EGB image is converted in gray-scale image, ear is detected and then scale is normalized. Step 3 is feature extraction where unique features of ear are extracted and are stored as trained template. Then in Step 4 matching is performed to get the best matches. 2 . 1 . Image
Acquisition
Ear images are collected at two centers, Indian Institute of Technology Kanpur (IITK) and Saugor University. The images are captured in the laboratory environment where moisture and humidity are normal and illumination is constraint. One bright light source is used to illuminate the whole ear to reduce the shading of ear edges that generally appear in the image. A stand is used to place chin of a person over there that reduces the rotational effect on image. At IITK, a CCD camera (JAI CV-M90 3-CCD RGB color camera) from a distance of 28 to 28 cm has been used to take the images of the ear. Using ITI camera configurator three images of each ear have been acquired for six hundred subjects. Another set of images have been taken of three hundred and fifty subjects at the Anthropology department laboratory in Saugor University. These have been captured using a digital camera (Kodak Easy Share CX7330, 3.2Megapixel) from a distance of 24 to 28cm from the subject. For each of these subjects, three images of the ear are taken. Out of the three images so cap-
Uj
(h)
Fig. 1. Image database (a) IITK database, (b) Saugor University database.
2.2. Image
Preprocessing
The raw image may not be suitable for feature extraction due to its large background so some preprocessing are required to make it suitable. The important steps involved are: Grayscale conversion. Ear Detection, Scale Normalization. 2.2.1. Grayscale Conversion RGB images are converted into grayscale by using the following formula: J p = 0.2989*/(B)+0.5870*I(G)+0.1140*7(B) (1) where I(R) = red channel value of RGB image, 1(G) = green channel value of the RGB image, 1(B) = blue channel value of RGB image, Ig = an intensity image with integer values ranging from a minimum of zero to a maximum of 255. 2.2.2. Ear Detection Ear detection is implemented using simple template matching technique. Here first a set of images are manually cropped to get the different sizes of ear images. These images are decomposed into level 2 using Haar wavelet. These decomposed images are trained as templates. And then the input raw image is also decomposed into level 2 using same technique. Thereafter, each template is retrieved from the
48
Ear Biometrics:
A New Approach
scaling function <j>(x) has values one for closed interval of time(x) from 0 to 1 and zero for other interval of time. >(x) is formulated as follows:
y *
1 $(x)
0
Fig. 3.
:
(x) i
s
Haar scaling function $(x).
trained set and matched with the same sized overlap-, ping block of the decomposed input image. Thus for each trained template the best matched block in input image is traced. Among those blocks the best matched block is chosen and the corresponding region from the original image is extracted (Fig. 2),
{i
if 0 < x < 1 otherwise
(4)
Haar wavelet function &(x) is a step function has value 1 for time (x) greater than equals to 1 and less than | . 9(x) has also value -1 for time (x) greater than equal to | and less than 1. \P(x) has 0 values for other interval. W(x) is discontinuous at time(x) equal to 0, | , 1. 9(x) is defined by 1 Vx € [0,1/2) tf(x) = { - l V x € [1/2,1) 0 otherwise
(5)
TOO 1/2 1
Fig. 4. Fig. 2.
x
The Haar wavelet #(a?).
(a)Raw ear image, (b) Ear traced, (c) Detected ear.
2.2.3. Scale Normalization The cropped ear image may be of varying size so the feature set of images may also vary. Hence the images are normalized to a constant size by resizing technique. If two images are in different sizes, e.g. one is of size (x'xy*) and the other is of size (x" xy"), then two images are mapped to constant size.
I(xfJy,) = I(x1y)
(2)
I(x\yn
(3)
2.3. Wavelet Tmnsform
= I(x,y) and Featum
Extraction This section discusses wavelet transform and introduces the method for the feature extraction from wavelet coefficients.11 Every wavelet is defined by one scaling function and one wavelet function. The Haar
A standard decomposition of an image (two dimensional signals) is easily done by first performing a one dimensional transformation on each row followed by one dimensional transformation of each column. Thus the input ear image is decomposed (Figure 3) into approximation (CA), vertical (CV), horizontal (CH) and diagonal (CD) coefficients using the wavelet transformation and the approximation coefficient (CA) is further decomposed into four coefficients. The sequences of steps are repeated to get the coefficients from the four level wavelet transform. After analyzing the coefficients at all the four levels it is found that the coefficients of level 1 to level 4 are almost the same. In order to reduce redundancy from among the coefficients of different levels only one of them is chosen. Thus all the diagonal, vertical and horizontal coefficients for the fourth level are chosen to reduce space complexity and discard the redundant information. The feature vector comprises of following coefficients [CD4 CV4 CH4], This coefficient matrix which represents the unique ear
Anupam Sana, Phalguni Gupta and Ruma Purkait
l"LevelHaar WTavefet transform
Oil
CD1
CHI
OT 1
2" Level Haar wsvelet transform CA2
CH2
CD2
Fig. 5.
Wavelet decomposition level.
pattern is binarised comparing the sign (negative or positive) of its element. The binary feature template of approximataion coefficient matrix from level two wavelet transform is shown in Fig. 6.
^
^
^
Haar Wavelet
l^y^l
49
consists of 350 (350x3) individuals. In Fig. 7 false acceptance rate (FAR) and false rejection rate (FRR) is plotted at different threshold values (i.e. in between 0 and 1). From the two curves in this figure it is found that the system is giving equal error rate (EER) 3.4% and 2.4% at threshold 0.303 and 0.323 for IITK and Saugor University image database respectively. In Fig. 8 the accuracy is plotted at different threshold values and it is found that at EER threshold accuracy value is 96.6% and 97.6% respectively and system is giving maximum accuracy of 97.8% and 98.2% at 0.29 and 0.28 threshold value for IITK and Saugor University image database respectively. The Receiver Operating characteristic (ROC) is shown in Fig. 9. From two curves in this figure it is observed that the system is giving more than 95% genuine acceptance at FAR lower than 1%. Plot of FAR /FRR curve
Fig. 6.
2.4.
Wavelet transform of Ear into level two.
Matching
Testing binary template (S) matched with the trained template (T) of the database using Hamming distance. Hamming Distance(HD) between two templates (nxm) calculated using the equation HD =
n x m
5^Tij®$j
(6)
Here templates T and S are XOR-ed element wise and HD is computed, which is the matching score between training and testing templates. Therefore, for each trained template there is a matching score and best matches can be chosen. The matching score of all the templates are statistically analyzed in the next section. 3. Experimental Result The experiment is conducted on two different databases, IITK and Saugor University. The IITK image database has images belonging to 600 individuals, with three images acquired per person (600x3). While the database collected at Saugor University
Fig. 7. FAR and FRR curve for IITK database and Saugor University database.
4. Conclusion The proposed ear biometrics has certain advantages over those reported earlier. This paper has proposed a new approach of ear biometrics. As two training images are sufficient for database preparation the time taken to test with the testing image is significantly reduced. The ear pattern template size being small make it easy to handle. For this reasons it can be applicable on any broad scale or small scale society. Our approach achieved more than 96% accuracy for the two databases indicating a fairly reliable system.
50 Ear Biometrics: A New Approach periments and Mr. P r a d e e p Nayak for assistance in conducting the experiments.
Plot of accuracy cuive
References
Fig. 8. Accuracy Curve for IITK database and Saugor University database. Plot of ROC Curve 1
! \
99.5
i
Ji-
— ROC-IITK - -ROC- Sagour
ll
99 £98.5
:
™ 98 tx
.jr
g 97.5 S
\
':
97
* •
;-'.-./_
.1 96.5
S
/
f
0)
'..-S" •
^
-
:
-
-
-
-
:
~
-
-
/
s
-
* 95.5 95
; 10
; 10
10
10
10
False Acceptance Rate (%) -- >
Fig. 9. Receiver Operating Characteristic Curve for IITK database and Saugor University database. 5.
Acknowledgement
T h e study is supported by the Ministry of Communication and Information Technology and University G r a n t s Commission, Govt, of India. T h e authors acknowledge Ms. Hunny Mehrotra for preparing the manuscript and for assistance in conducting the ex-
1. A. Bertillon, La photographie judiciaire,avec un appendice sur la classification et identification anthropometriques, in Gauthier-Villars, (Paris, 1890). 2. A. Iannarelli, Ear identification, in Proceedings of International Workshop Frontiers in Hand/writing Recognition, (Paramont Publishing Company, Fremont, California, 1989). 3. M. Burge and W. Burger, Personal identification based on pea, in Proceedings of the 21st Workshop of the Austrian Association for Pattern Recognition, 1997. 4. M. Burge and W. Burger, Ear biometrics in computer vision, in Proceedings of the 15th International Conference on Pattern Recognition, 2000. 5. M. Choras, Ear biometrics based on geometrical feature extraction, in Electronic Letters on Computer Vision and Image Analysis, 2005. 6. D. J. Hurley, M. S. Nixon and J. N. Carter, A new force field transform for ear and face recognition, in Proceedings of the IEEE International Conference of Image Processing, 2000. 7. B. Victor, K. Bowyer and S. Sarkar, An evaluation of face and ear biometrics, in Proc. Intl Conf. Pattern Recognition, 2002. 8. K. Chang, S. S. K. W. Bowyer and B. Victor, Comparison and combination of ear and face images in appearance-based biometrics, in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003. 9. B. Moreno, A. Sanchez and J. Velez, On the use of outer ear images for personal identification in security applications security technology, in Proceedings. IEEE 33rd Annual 1999 International Carnahan Conference, 1999. 10. H. Chen and B. Bhanu, Contour matching for 3d ear recognition, in Seventh IEEE Workshops on Application of Computer Vision(WACV/MOTION-05), 2005. 11. S. Pittner and S. Kamarthi, Feature extraction from wavelet coefficients for pattern recognition tasks, in IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999.
51
Face Detection using Skin Segmentation as Pre-Filter
Shobana L., Anil Kr. Yekkala and Sameen Eajaz Philips Electronics India Ltd. E-mail: {shobana.lakshminarasimhan, anil.yekkala, sameen.eajaz
Face Detection has been a topic of research for several years due to its vast range of applications varying from Security Surveillance to Photo Organization. Face detection algorithms are also used by Face Recognition algorithms for locating faces in an image, before performing the recognition. In the available literature on Face Detection, there exists a Face Detection algorithm provided by Viola and Jones, very robust in terms of detection rate, but its speed is not very satisfactory. In this paper, we propose a Pre-flltering step using skin segmentation and some minor modification to the algorithm provided by Viola Jones. The proposed Pre-filtering step and modifications improve the speed of the algorithm approximately by 2.8 times. It also improves the precision of the algorithm by reducing the number of false detections. Keywords: Face Detection; Viola Jones; Skin Segmentation; Pre-Filter
1. Introduction Several methods have been proposed in the literature for performing Face Detection. The available methods for Face Detection can be broadly categorized under two approaches. The first approach 1 ' 4 ' 9 ' 12 is based on conducting an exhaustive search of the entire image using windows of varying sizes and finding the face regions using features like Haar Cascades, Eigen Features and others. The second approach 2,3 is based on the skin segmentation using color properties and simple texture measures, then finding connected regions and finally, filtering the non-face regions using face profiles and facial features. Even though the first approach based on the exhaustive search provides a very robust algorithm in terms of detection, its speed is not very satisfactory. For example, using the Viola Jones approach takes almost 2-3 seconds to detect a face in an image of size 1024x768 on a Pentium processor. Though the second approach based on skin segmentation provides very satisfactory results in terms of speed, it fails to provide a robust algorithm, since the algorithm fails for images with Complex backgrounds. Also, the false detection rate is quite high, since most of the time skin-colored objects are also detected as faces. In this paper, we present an improvement to the Viola Jones based Face Detection algorithm to improve its speed and precision as well by reducing the number of false detections, without affecting the recall rate. These improvements are achieved by using skin segmentation as a Pre-Filter and with some modification to the approach followed by Viola Jones. The following sub-sections provide a brief
overview of the skin segmentation techniques in the available literature on Face Detection and also an overview of the Viola Jones Face Detection algorithm. 1.1. Skin Segmentation Detection
based Face
Based on several studies 6 ' 13 ' 14 conducted in the past, it has been found that in color images the skin regions can be detected based on their color properties. It has been observed that in case of RGB color models the proportion of red, green and blue are within some prescribed limits for skin regions. Similarly, in case of YC(,Cr model the chrominance components C;, and C r are within the prescribed limits for skin regions. Simple texture measures are used to improve the skin segmentation performance. Skin segmentation based Face Detection is done in three steps. 1. Skin segmentation based on the RGB color properties of the skin. 2. Finding the connected regions 3. Filtering out the non-face regions. The advantages of skin segmentation based Face Detection is its simplicity and its being not affected by the geometric variations of the face. Its main disadvantage is its sensitivity to the illumination conditions.
1.2. Viola Jones 1
Method
The Viola Jones approach is based on conducting an exhaustive search of the image using varying size windows, starting from a base window of size 24x24, gradually increasing the window size by a fixed factor of 1.25 (or any value ranging from 1.2 to 1.3).
52
Face Detection using Skin Segmentation
A
as Pre-Filter
B 1
C
2 D
1
Fig. 2.
4
Integral Sum.
Each window region is classified as a face or nonface region based on simple features. The features used are reminiscent of Haar basic functions proposed in. 11 Three kinds of features are used, namely, two-rectangular features, three-rectangular features and four-rectangular features. In two-rectangular features the difference between the sums of pixel intensities of the two rectangular regions is considered. In case of three-rectangular features the sums of the pixel intensities within the outside rectangles is subtracted from that of the center rectangle. Finally, in case of four-rectangular feature the difference between the sum of the pixel intensities of the diagonal pairs of rectangles is considered. Figure 1 shows the set of rectangular features used.
Ill
SB Fig. 1.
Rectangular Features.
In order to make the computations of rectangular features faster, an intermediate representation of the image called integral image is used. The integral image at any point (x,y) is the sum of the pixels above and to the left of (x,y). y
ii(x,y) = 2_J2-^I{J,k) j=0 fc=0
Using integral image, the sum of the pixel intensities within any rectangular region in the image can
be computed with just three sum operations. For example, in Figure 2 the sum of the pixel intensities of the rectangle D can be computed with four array references. The value of the integral image at location 1 is the sum of the pixel intensities in rectangle A. The integral image value at location 2 is A + B, at location 3 is A + C, and at location 4 is A + B + C + D. The intensity sum within D can be computed as 4 + 1 — (2 + 3). Viola Jones Face Detector uses a boosting cascade of rectangular features. The cascade contains several stages. The stages are trained in such a way that they progressively eliminate the non-face candidates. The initial stages in the cascade are simple so that they can be evaluated fast and can reject maximum number of non-face candidates. The later stages are comparatively complex. Each stage consists of a set of features. Each input window is evaluated on the first stage of the cascade, by computing the set of features belonging to the first stage. If the total feature sum of the stage is below a pre-defined threshold, then the window is discarded, otherwise it passes to the next stage. The same procedure is repeated at each stage. Since most of the windows in an image do not contain facial features, most of the windows get discarded in the initial stages, hence reducing the number of computations. As the Viola Jones face detector is insensitive to small variations in scale and position of the windows, a large number of detections occur around a face region. To combine these regions into a single region, a grouping algorithm is applied. The grouping algorithm groups all the overlapping rectangles that lie within an offset of 25% of their size into a single rectangle. A region is detected as a face only when a minimum of three face rectangles are obtained around that face region. This confidence check is used to reduce the number of false detections. For a non-face region, the probability of getting three or more overlapped rectangles is relatively low. 2. Proposed Approach for Face Detection We present an improvement to the Viola Jones algorithm by using skin segmentation as a pre-filter and modifying the search strategies and merge and group methods. With the proposed modification, the algorithm is performed in three steps. Figure 3 gives the overview of the proposed algorithm.
Shobana L., Anil Kr. Yekkala and Sameen Eajaz
11,
^2Y2Iss(j,k)
s(x,V) =
j=0 fe=0
I
2.2. Searching For Face
•miMg. Fin ting Skin Regions Feature based Filtering
Fig. 3.
2.1. Skin
53
Overview of the proposed algorithm.
Segmentation
In the proposed method, skin segmentation is used as the pre-filtering step to reduce the number of windows to be searched. Skin segmentation is performed by detecting the skin pixels using color properties in the RBG color space. For each pixel, the algorithm checks whether the proportions of red and blue are within the prescribed limits. The method can be described by the following equation.
Regions
Unlike the Viola Jones algorithm, the proposed approach does not perform an exhaustive search to locate the face regions. Instead, it finds the probable face regions based on the number of skin pixels present in the region. If the number of skin pixels present in a region is above a pre-defined threshold, the region is considered to be a candidate face region. Based on experimental results, this threshold was found to be 60%. The Haar features are computed for each candidate face region using the same approach as Viola Jones detection to check whether the region is a face region or not. Unlike the Viola Jones approach, the Face Detection starts with the largest possible window size depending on the image size and gradually scales down to a window size of 24x24. The advantage of starting from the largest window size is that if a bigger window is detected as a face region, the smaller windows within that face region that are lying at an offset of more than 25% of the bigger window size can be discarded. Note that the regions that are lying within an offset of 25% are not discarded because these windows are required for the Merge and Group algorithm.
if ((r < 0.34; and (r < 0.58; and (g > 0.22; and 2.3. Merge and
(s < o.3s;; hs = 1 else hs
=0
end where,
R
G
9 = R+G+B A pixel with the Skin Segmentation output Iss=l indicates that it is a skin pixel and that with 0 indicates a non-skin pixel. The skin segmentation is designed in such a way that rejecting a skin region is considered costlier then accepting a non-skin region. Once the skin segmentation is performed, the output binary image Iss is represented as an integral image, similar to the intensity sum. This representation results in faster computation of the number of skin pixels by using just 3 sum operations. Since skin segmentation is based on color properties, this pre-filtering step can be used only for color images. r' -~
fi+G+B'
Group
The merge and group technique is similar to that followed in the Viola Jones Face Detection. Once the merging is performed according to Viola Jones method, the algorithm checks for overlapping face regions with percentage of overlap greater then 50%. If such overlapping regions exist, then the region with lower average value of feature sum is eliminated. This results in improved precision. 3. R e s u l t s A comparative study on the performance in terms of speed of the proposed method with respect to standard Viola Jones method is shown in Table 1. The study was conducted on a set of 452 images obtained from Caltech Image database. The comparison is done based on the percentage of windows processed in each stage using the proposed method and
54
Face Detection using Skin Segmentation Table 1.
Stage
as Pre-Filter
Percentage of regions processed in each stage. % of Windows
% of Windows
No of
searched
searched
features
in Viola Jones
Proposed
method
method
1
9
100
36.738
2
16
68.15636
25.92015
3
27
37.44727
14.18722
4
32
23.80884
8.981527
5
52
12.71177
4.884061
6
53
5.70148
2.211218
the performance analysis. Table 2 shows that proposed method has not affected the recall rate much, but has improved the precision rate. Table 2.
Detection Rate. Viola Jones method
Proposed Method
Recall
0.97
0.96
Precision
0.86
0.95
7
62
2.563128
1.001229
8
72
0.885991
0.341391
The Precision and Recall are computed as follows. Rprnll
9
83
0.49522
0.19646
10
91
0.300651
0.122056
11
99
0.200682
0.083506
12
115
0.117771
0.049768
13
127
0.083492
0.037018
14
135
0.05391
0.024601
15
136
0.036424
0.017609
16
137
0.026771
0.013751
17
155
0.021732
0.011666
18
159
0.015506
0.00891
19
169
0.012858
0.007694
20
181
0.010599
0.00662
21
196
0.008351
0.005504
22
197
0.007104
0.004849
23
199
0.006146
0.004308
24
200
0.005452
0.003919
25
211
0.004961
0.003656
that using the Viola Jones method. The comparison clearly shows that in each stage the number of features computed using the proposed method is approximately around 2.8 times lesser then the number of features computed using the standard Viola Jones method. Hence there is an improvement by a factor of 2.8 in speed. It can be further verified that the average number of features computed using the standard Viola Jones method is 2.8 times the average number of features computed using the proposed method. Hence it can be concluded that the proposed method results in an improvement of speed by a factor of 2.8. This has been further verified by
Precision
Noof facesdetected*100 Noof facesdetected+Noof facesmissed =
Noofcorrectdetections*100 Noofcorrectdetections+Noofwrongdetections
The results of detection can be seen in Figure 4. 4. Conclusions In this paper we have proposed modifications to the Viola Jones approach of Face Detection. The modifications include addition of a Pre-Filtering step using Skin Segmentation, modification in the search algorithm by starting the search from the largest window and excluding the smaller windows from the search if they are within a detected face region. In addition, a modification to the Merging algorithm has been suggested. With the proposed modification the speed of the algorithm has improved by a factor of 2.8 and the precision rate has also improved by 10% without affecting the recall rate much. Hence it can be concluded that using Skin segmentation as a prefilter for detecting faces in color images and starting the search from a larger window size are useful extensions to the Viola Jones method. The proposed approach of using skin segmentation as a pre-filter can be easily extended to any face detection algorithms that use sliding window techniques, such as face detection technique based on eigen features.
References 1. Micheal J Jones, "Robust Real Time Face Detection", International Journal of Computer Vision, 57(2), 137-154 (2004).
Shobana L.t Anil Kr. Yekkala and Sameen Eajaz 55
Fig. 4. Face Detection Results 2. Pedro Fonseca, Jan Nesvadba, "Face Detection in the Compressed domain", International Conference on Image Processing , (2004). 3. R. L. Hsu, M. Abdel Mottaleb and A. K. Jain, "Face Detection in Color Images", IEEE Transactions on Pattern Analysis and Machine Intelligence, v, o (1). 24, no. 5, pp. 696-706, May 2002. 4. Diedrik Marius, Sumita Pennathur and Klint Rose, "Face Detection using color thresholding and Eigenimage template matching". 5. Rapheal Feraud, Olivier J. Beraier, Jean Emmanuel Viallet and Michel Collobert, "A Fast and Accurate Face Detector Based on Neural Networks", IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 23, No. 1 (Jan 2001). 6. Jean-Christophe Terrillon, Mahdad N. Shirazi, Hideo Fukamachi and Shigeru Akamatsu, "Comparative Performance of Different Skin Chrominance Models and Chrominance Spaces for the Automatic Detection of Human Faces in Color Images". 7. Sung K and Poggio T, "Example-based learning for viewbased face detection", IEEE Pattern Analysis Machine Intelligence. , (1998). 8. Rowley H, Baluja S and Kanade T., "Neural network-based face detection", IEEE Pattern Anal-
ysis Machine Intelligence. , (1998). 9. Shneiderman H and Kanade T, "A statistical method to 3D object detection applied to faces and cars", International Conference on Computer Vision. , (2000). 10. Roth D, Yang M and Ahuja N, "A snowbased face detector", Advances in Neural Information Processing Systems. , (2000). 11. Papageorgiou C, Oren M and Poggio T, "A general framework for object detection", International Conference on Computer Vision. , (1998). 12. M. Turk, A. Pentland. "Eigenfaces for Recognition", Journal of Cognitive Neuroscience. Vol. 3 N o . 1, 71-86 (1991). 13. Son Lam Phung, Abdesselam Boezerdoum and Douglas Chai, "Skin Segmentation using Color Pixel Classification: Analysis and Comparision", IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 27 N o . 1, (Jan 2005). 14. Filipe Tomaz and Tiago Candeias and Hamid Shahbazkia, "Improved Automatic Skin Detection in Color Images", Proc. Vllth Digital Image Computing: Techniques and Applications. , 10-12 (Dec 2003).
56
Face Recognition using Symbolic K D A in the Framework of Symbolic Data Analysis
P. S. Hiremath Department of Computer Science Gulbarga University, Gulbarga-585106, Karnataka, India E-mail: [email protected] C. J. Prabhakar Department of Computer Science Kuvempu University Shankarghatta-577451,Karnataka, India E-mail: [email protected]
In this paper we present one of the symbolic factor analysis method called as symbolic kernel discriminant analysis (symbolic KDA) for face recognition in the framework of symbolic data analysis. Classical factor analysis methods (specifically classical KDA) extract features, which are single valued in nature to represent face images. These single valued variables may not be able to capture variation of each feature in all the images of same subject; this leads t o loss of information. The symbolic KDA Algorithm extracts most discriminating non-linear interval type features; they optimally discriminate among the classes represented in the training set. The proposed method has been successfully tested for face recognition using two databases, ORL and Yale Face database. The effectiveness of the proposed method is shown in terms of comparative performance against popular classical factor analysis methods such as eigenface method and fisherface method. Experimental results show that symbolic KDA outperforms the classical factor analysis methods Keywords: Symbolic Data Analysis; Face Recognition; Interval Type Features; Symbolic Factor Analysis Methods.
1. Introduction Of the appearance based face recognition methods, 4 ' 5 ' 1 2 , 1 6 , 2 6 , 2 9 those utilizing LDA techniques 10,11,28 35 ' have shown promising results. However, statistical learning methods including the LDA based ones often suffer from the so called small sample size (SSS) problem encountered in high dimensional pattern recognition tasks where the number of training samples available for each subject is smaller than the dimensionality of the sample space. Therefore numerous modified versions of the LDA were proposed. These modified versions have shown promising results as it is demonstrated in 3,34,6,30,20,23,33 There are two ways to address the problem. One option is to apply linear algebra techniques to solve the numerical problem of inverting the singular within class scatter matrix. For example, Tian et al, utilize the pseudo inverse to complete this task. Also, some researchers 15 ' 34 recommended the addition of a small perturbation to the within class scatter matrix so that it becomes non-singular. However, the above methods typically computationally expensive since the scatter matrices are very large. The second option is a subspace approach, such as the one fol-
lowed in the development of the Fisherfaces method 3 where PCA is firstly used as a preprocessing step to remove the null space of within class scatter matrix and then LDA is performed in the lower dimensional PCA subspace. However, it has been shown that the discarded null spaces may contain significant discriminatory information 19 to prevent this from happening solutions without a separate PCA step, called direct LDA methods have been proposed recently in. 6 ' 3 0 ' 2 3 Although successful in many cases, linear methods fail to deliver good performance when face patterns are subject to large variations in view points, which result in a highly non convex and complex distribution. The limited success of these methods should be attributed to their linear nature. As a result, it is reasonable to assume that better solution to this non linear problem could be achieved using non linear methods, such as the so called kernel machine techniques. 17 ' 25 Among them, kernel principal component analysis (KPCA) 27 and kernel Fisher discriminant analysis (KFD) 24 have aroused considerable interest in the fields of pattern recognition and machine learning. KPCA was originally developed by Scholkopf et al., in 1998, while KFD was first
P. S. Hiremath and C. J. Prabhakar
proposed by Mika et a/., in 1999.24 Subsequent research saw the development of series of KFD algorithms. 2 ' 24 - 31 ' 32 ' 35 The denning characteristic of KFD based algorithms is that they directly use the pixel intensity values in a face image as the features on which to base the recognition decision. The pixel intensities that are used as features are represented using single valued variables. However, in many situations same face is captured in different orientation, lighting, expression and background, which lead to image variations. The pixel intensities do change because of image variations. The use of single valued variables may not be able to capture the variation of feature values of the images of the same subject. In such a case, we need to consider the symbolic data analysis (SDA), 1 ' 7 ' 8 , 9 ' 1 8 in which the interval-valued data are analyzed. In this paper, new appearance based method is proposed in the framework of Symbolic Data Analysis (SDA), 1 ' 9 namely, symbolic KDA for face recognition, which are generalization of the classical KDA to symbolic objects. In the first step, we represent the face images as symbolic objects (symbolic faces) of interval type variables. The representation of face images as symbolic faces accounts for image variations of human faces under different lighting conditions, orientation and facial expression. It also drastically reduces the dimension of the image space without losing a significant amount of information. Each symbolic face summarizing the variation of feature values through the different images of the same subject. In the second step, we applied symbolic KDA algorithm to extract interval type non-linear discriminating features. According to this algorithm, In the first phase, we applied kernel function to symbolic faces, as a result a pattern in the original input space is mapped into a potentially much higher dimensional feature vector in the feature space, and then perform in the feature space to choose subspace dimension carefully. In the second phase, Symbolic KDA is applied to obtain interval type non-linear discriminating features, which are robust to variations due to illumination, orientation and facial expression. Finally, minimum distance classifier with symbolic dissimilarity measure 1 is employed for classification. Proposed method has been successfully tested using two standard databases ORL and Yale face database. The remainder of this paper is organized as fol-
57
lows: In section 2, the idea of constructing the symbolic faces is given. Symbolic KDA is developed in section 3. In section 4, the experiments are performed on the ORL and Yale face database whereby the proposed algorithm is evaluated and compared to other methods. Finally, a conclusion and discussion are offered in section 5. 2. Construction of Symbolic Faces Consider the face images T i , r 2 , . . . ,T„, each of size N x M from a face image database. Let ft = {Ti, T2, • •., r „ } be the collection of n face images of the database, which are first order objects. Each object T; e ft,/ = 1 , . . . , n, is described by a feature vector (Yi,...,Yp), of length P = NM, where each component Yj,j = l , . . . , p , is a single valued variable representing the intensity values of the face image Ti. An image set is a collection of face images of m different subjects; each subject has same number of images but with different orientations, expressions and illuminations. There are m number of second order objects(face classes) denoted by E = { c i , . . . , c m } , each consisting of different individual images T; e ft, of a subject. We have assumed that images belonging to a face class are arranged from right side view to left side view. The view range of each face class is partitioned into q sub face classes and each sub face class contains r number of images. The feature vector of kth sub face class ck of ith face class Cj, where k = 1,2,..., q, is described by a vector of p interval variables Yi,...,Yp, and is of length p = NM. The interval variable Yj of kth sub face class ck of ith face class is described as
Yj(ck) = [xkj,xkj}.
(1)
where x% and xk< are minimum and maximum intensity values, respectively, among j t h pixels of all the images of sub face class ck. This interval incorporates information of the variability of j t h feature inside the kth sub face class c\. We denote Xk = (Y1(ck),...,Yp(ck)),i
=
l,...m,k=l,...,q. (2) The vector Xk of interval variables is recorded for each kth sub face class ck of ith face class. This vector is called as symbolic face and is represented as: X(ck) = (ak1,...,akp).
(3)
58
Face Recognition using Symbolic KDA in the Framework of Symbolic Data
where a^ = Y}(c?) = [x^x^] j = 1 , . . . ,p, k = 1 , . . . ,q and i = 1 , . . . , m. We represent the qm symbolic faces by a matrix X of size p x qm , consisting of column vectors Xf, i = 1,... ,m and k = 1 , . . . ,q. 3. Acquiring N o n Linear Subspace using symbolic K D A Method Let us consider the matrix X containing qm symbolic faces pertaining to the given set fi of images belonging to m face classes. The centers x^ € 3? of the intervals ahj = Yj(c^) = [x^,x*j], are given by kc _ fiij + Sij] (4) V ~ 2 where j = 1 , . . . ,p, k = 1 , . . . , q and i = 1 , . . . , m. The p x qm data matrix X c containing the centers r. . e 3? of the intervals for qm symbolic faces. The a;?, p- dimensional vectors X]° = (x^0,...,
xkip ), X% =
(s.ii ,...,Xip) and Xt = (xkt,..., xkp) represents the centers, lower bounds and upper bounds of the qm symbolic faces, respectively. Let $ : W —> F be a nonlinear mapping between the input space and the feature space, the nonlinear mapping, $, usually defines a kernel function. Let K e ^gmxim define a kernel matrix by means of dot product in the feature space: Kij = (HXi) • *{Xj))
VTS*V J(V) = T V S*V
(6)
where V is a discriminant vector, Sf and 5 * are the between class scatter matrix and the within class scatter matrix respectively. The between class scatter matrix and the within class scatter matrix in the feature space F are defined below: m
m
Sf = - E ^ ? - <)("»* - of
(7)
Q
qm
V=J2bL$(xt) = HA
(9)
L=l
H = ^C
Mx^)...^(x!c),...,^(xm^...^(xic)} where A = (bi,...,bgm). Substituting equation (9) into equation (6). We can obtain the following equation: ATKWKA (10) ATKKA where K is a kernel matrix, W = diag(W\,..., Wm), Wi is a ft x ft matrix whose elements are ^-. From the definition of W, it is easy to verify that W is a qm x qm block matrix. In fact, it is often necessary to find s discriminant vectors, denoted by a\,..., as, to extract features. Let V = [a\,..., as}. The matrix V should satisfy the following condition: J(A) =
V = argmax
\vTsbv\ \vTstv\
(11)
where Sb = KWK and St = KK. Since, each symbolic face Xk is located between the lower bound symbolic face X_t and upper bound symbolic face Xt, so it is possible to find most discriminating non-linear interval type features [Bj, Bi ]. The lower bound features of each symbolic face X* is given by B! = V?*(X2),l
= l,...,s.
(12)
where *(X*) = [ ^ ( X ^ ) • * ( * ? ) > . . . ,*(xf) • k C C $(X ),..-, « « ) HX?)), (*(Xm )
*(2C?))].
i=l m
face class i, mf is the mean of the mapped symbolic faces in face class i, m* is the mean across all mapped qm symbolic faces. From the above definitions, we have S* = Sf + S*. The discriminant vectors with respect to the Fisher criterion are actually the eigenvectors of the generalized equation SfV = \S®V. According to the theory of the reproducing kernel, V will be an expansion of all symbolic faces in the feature space. That is, there exist coefficients bi(L = 1 , . . . , m) such that
(5)
In general, the Fisher criterion function in the feature space F can be defined as
1
Analysis
x c
m
x
i
st =—E E(*( " ) - ?)(*( in - mt) »=i fc=i
(8) th th where X-f denotes the k symbolic face of i face class, qi is the number of training symbolic faces in
Similarly the upper bound features of each symbolic face Xjfc is given by B,fc = Vf * ( * * ) , /
l,...,s.
(13)
Let Ctest = [ r i , . . . , r ; ] be the test face class contains face images of same subject with different expression, lighting condition and orientation. The test
P. S. Hiremath and C. J. Prabhakar
symbolic face Xtest is constructed for test face class Ctest as explained in the section 2. The lower bound test symbolic face of test symbolic face Xtest is described as X_test ~ (£i e s t ! • • • ,£p est )Similarly, the upper bound test symbolic face is described as X~test = (x\est,.. •,xtpest). test The interval type features [ 5 ; B ] of test symbolic face Xtest are computed as: Rtest Btest
= V^{Xtest).
(14)
= V^(Xtest).
(15)
59
three outperforms the classical factor analysis methods.
Table 1. Comparison of classification performance using ORL Database. Training time (sec)
F earure Dimension
Fisherfaces
98
86
92 8
Eigenfaces
102
189
87.65
Symbolic PCA
38
71
94 85
Symbolic KPCA
87
109
89 15
Symbolic LDA
85
34
96 00
Symbolic KDA
19
28
98 50
Methods
Recognition Rate (%)
where I = 1 , . . . , s. 4. Experimental Results Face recognition system using symbolic KDA method identify the face by computing nearest face image for a given unknown face images using a minimum distance classifier with Minkowsky's symbolic dissimilarity measure proposed by De Carvalho and Diday [1] is employed for classification. The proposed symbolic KDA method with polynomial kernels is experimented with the face images of the ORL and Yale databases. The effectiveness of proposed methods is shown in terms of comparative performance against five popular face recognition methods. In particular, we compared our algorithms with eigenfaces29 fisherfaces3 symbolic PCA 1 3 and symbolic KPCA. 14 The experimentation is done on system with CPU:Pentium 2.5 GHz. 4.1. Experiments
using ORL
database
We assess the feasibility and performance of the proposed symbolic LDA on the face recognition task using ORL database. The ORL face database is composed of 400 images with ten different images for each of the 40 distinct subjects. All the 400 images from the ORL database are used to evaluate the face recognition performance of proposed methods. We have manually arranged the face images of same subject from right side view to left side view. Six images are randomly chosen from the ten images available for each subject for training, while the remaining images are used to construct the test symbolic face for each trial. Table 1 presents the experimental results for each method corresponding to ORL database. The experimental results show that the proposed method with polynomial kernel of degree
4.2. Experiments database
on the Yale Face
The experiments are conducted using Yale database to evaluate the excellence of the symbolic KDA for the face recognition problem. The Yale Face database consists of a total 165 images obtained from 15 different people, with 11 images from each person. In our experiments, 9 images are randomly chosen from each class for training, while the remaining two images are used to construct test symbolic face for each trial. The recognition rates, training time and optimal subspace dimension are listed in Table 2. From Table 2, we note that the symbolic KDA method with polynomial kernel of degree using smaller number of features outperforms the classical factor analysis methods with a larger number of features. Table 2. Comparison of classification performance using Yale Face database. Trarung time (sec)
Feature Dimension
Fisherfaces
Methods
59
23
8995
Eigenfaces
85
110
82.04
Symbolic PCA
35
41
91 15
Symbolic KPCA
43
32
92 00
Symbolic LDA
98
56
94 55
Symbolic KDA
12
15
98 15
Recognition Rate (%)
5. Conclusion In this paper, we introduce a novel symbolic KDA method for face recognition. Symbolic data representation of face images as symbolic faces, using interval variables, yield desirable facial features to cope up with the variations due to illumination, orientation and facial expression changes. The feasibility of
60
Face Recognition using Symbolic KDA in the Framework of Symbolic Data Analysis
the symbolic KDA has been tested successfully on frontal face images of O R L and Yale databases. Experimental results show t h a t symbolic KDA method with polynomial kernel of degree three leads to superior recognition rate as compared to classical factor analysis methods. T h e proposed symbolic KDA outperforms symbolic P C A , symbolic LDA and symbolic K P C A under variable lighting conditions, orientations and expressions. T h e proposed symbolic KDA has many advantages compared to classical factor analysis methods. T h e drawback of classical factor analysis methods is t h a t in order to recognize a face seen from a particular pose and under a particular illumination, the face must have been previously seen under the same conditions. T h e symbolic KDA overcomes this limitation by representing the faces by interval type features so t h a t even the faces seen previously in different poses, orientations and illuminations are recognized. Another important merit is t h a t we can use more t h a n one probe images with inherent variability of a face for face recognition. Therefore, symbolic KDA improve the recognition accuracy as compared to classical factor analysis methods at reduced computational cost. This is clearly evident from the experimental results. Further, the symbolic KDA yields significantly better results t h a n other symbolic factor analysis methods. References 1. Bock, H. H., Diday, E. (Eds.) 2000. Analysis of Symbolic Data. Springer Verlag. 2. Baudat, Anouar, "Genralized Discriminant Analysis using Kernel Approach", Neural Computation, 23852404, 2000. 3. P. Belhumeur, J. Hespanha, D. Kriegman,1997. Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Transaction on PAMI.19 (7): 711-720. 4. Bruneli and Poggio: Face Recognition: Features versus Templates, vol -15. IEEE Trans. Pattern Analysis and Machine Intelligence, (1993) 1042-1052. 5. Chellappa, Wilson, Sirohey, 1995. Human and machine recognition of faces: A survey, vol - 83(5). Proc. IEEE) 705-740. 6. Chen, Liao, Ko, Lin, Yu,2000. Anew LDA based face recognition system which can solve the small sample size problem. Pattern Recognition 33, 1713-1726. 7. Choukria, Diday, Cazes, 1995. Extension of the principal component analysis to interval data. Presented at NTTS'95: New Techniques and Technologies for statistics, Bonn.
8. Choukria, Diday, Cazes, 1998. Vertices Principal Component Analysis with an Improved Factorial Representation. In: A. Rizzi, M. Vichi, H. Bock (eds.): Advances in Data Science and Classification. Pp.397-402, Springer Verlag. 9. Diday, 1993. An Introduction to symbolic data analysis. Tutorial at IV Conf. IFCS. 10. K. Etemad and R. Chellappa, 1997. "Discriminant Analysis for Recognition of Human Face Images", J. Optical Soc. Am. vol-14, pp 1724-1733. 11. Fisher, 1938. The statistical utilization of multiple measurements, Ann. Eugenics, 8, 376-386. 12. M. A. Grudin, 2000. "On internal representations in face recognition systems", Pattern recognition, vol.33, no.7, pp.1161-1177. 13. Hiremath. P. S, Prabhakar. C. J, 2005. "Face Recognition Technique using Symbolic PCA Method", Proc. Int. Conf. on Pattern Recognition and Machine Intelligence (PreMI'05), Kolkata, 266-271, Springer Verlag. 14. Hiremath. P. S, Prabhakar. C. J, "Face Recognition Technique using Symbolic kernel PCA Method", Proc. Int. Conf. on Cognition and Recognition (COGREC'05), Mysore, 801-805, Allied Publishers (2005). 15. Hong, Yang,1991. Optimal discriminant plane for a small number of samples and design method of classifier on the plane. Pattern Recognition 24(4),317-324. 16. Kirby, Sirovich, 1990. Applications of the KarhunenLoeve procedure for the characterization of human faces, v-12(l). IEEE Trans. Pattern Anal. Machine Intell. 103-108. 17. Kernel Machines http://www.kernel-machines, org, 2000. 18. Lauro, Verde, Palumbo, 1997. Analysis of symbolic data, Bock and Diday (Eds), Springer Verlag. 19. Liu, Cheng, Yang, 1992a. A generalized optimal set of discriminant vectors. Pattern Recognition 25,731739. 20. Liu, Wechsler, 2002. Gabor feature based classification using the enhanced Fisher linear discriminant model for face recognition. IEEE Trans. Image Process. 11(4), 467-476. 21. Liu, Wechsler, 2000: Robust coding schemes for indexing and retrieval from large face databases, IEEE Trans. On image processing 9, 132-137. 22. Liu, Cheng, Yang, 1993. Algebraic feature extraction for image Recognition based on an optimal discriminant criterion, Pattern Recognition, 26, 903-911. 23. Lu, Plataniotis, Venetsanopoulos, 2003b. Face Recognition using LDA based algorithms. IEEE Trans Neural Networks, 1(1), 195-200. 24. Mika, Ratsch, Scholkopf, Muller, "Fisher Discriminant Analysis with kernels" Proc. IEEE Int Workshop Neural Networks for signal Processing, 4148,1999. 25. Muller.Mika.Ratsch, Tsuda, Scholkopf,2001, " An
P. S. Hiremath and C. J. Prabhakar 61
26.
27.
28.
29. 30.
Introduction to kernel based Learning Algorithms", IEEE Trans. Neural Networks, voll2, 1810201. A. Pentland, B. Moghaddam and T. Starner, 1994. "View based and modular Eigenfaces for Face Recognition" , Proc. Computer Vision and Pattern Recognition, pp.84-91. Scholkopf and A. Smola, and K. Muller, "Nonlinear Component Analysis as a kernel Eigenvalue Problem", Neural Computation, vol.10, pp.1299-1319, 1998. D. Swets, J. Weng, 1996. Using discriminant eigenfeatures for image retrieval. IEEE. Transactions on PAMI.18, 831-836. Turk, Pentland, 1991. Eigenfaces for Recognation, v-3. J Cognitive Neuro Science, 71-86. Yu, Yang, 2001. A Direct LDA algorithm for high dimensional data with application to face recognition,
Pattern Recognition, 34(7), 2067-2070. 31. M. H. Yang, "Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition using Kernel Methods", Proc. Fifth IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp.215-220,2002. 32. M. H. Yang, N. Ahuja and D. Kriegman, " Face Recognition Using Kernel Eigenfaces", Proc. IEEE Int'l Conf. Image Processing, 2000. 33. Ye, Li, 2004. LDA/QR. An efficient and effective dimension reduction algorithm and its theoretical foundation. Pattern Recognition, 27(9), 1209-1230. 34. Zhao, Chellappa, Phillips, 1999. Subspace linear discriminant analysis for face recognition. Technical Report, CS-TR4009, University of Maryland. 35. Zhao, Chellappa, Phillips, Rosenfeld, 2003. Face Recognition: A literature survey, ACM Comput. Surveys, 35(4), 399-458.
62
Minutiae-Orientation Vector Based Fingerprint Matching
Li-min Yang*, Jie Yang and Yong-liang Zhang Institute
of Image Processing and Pattern Recognition Shanghai JiaoTong University (SJTU), Shanghai, 200240, P.R. China E-mail: [email protected]. cri*
Fingerprint matching is an important problem in fingerprint identification. In this paper, we propose a novel minutiaeorientation vector(MOV) for fingerprint matching. MOV combines both orientation information and neighborhood minutiae information. A set of reference point pairs is identified based on MOV. Alignment is done coordinately and directionally based on these reference point pairs. Matching score is computed. The experimental results show t h a t the proposed matching scheme is very competitive as compared with those algorithms, which have been participated in FVC2004 DB2.A. Keywords: Fingerprint alignment; Fingerprint matching; Minutiae-orientation vector.
1. Introduction Fingerprints have been used for identifying individual from the ancient time. Fingerprint identification is the most widely used biometric authentication method at present. Many fingerprint matching methods have been reported in literatures. Among them the minutiae-based methods are most popular. Fingerprint can be represented by a set of minutiae, including ridge endings and bifurcations. Each minutia can be described by its spatial location associated with the direction and the minutia type. A novel fingerprint feature named the adjacent feature vector(AFV) was defined in [1]. An AFV consists of four adjacent relative orientations and six ridge counts of a minutia. An AFV-based fuzzy scoring strategy is introduced to evaluate similarity levels between matched minutiae pairs. [2] proposed a triangular matching method to cope with the deformation and validate the matching by dynamic time warping. [3] used orientation-based minutia descriptor for minutiae matching. [4] integrated point pattern and texture pattern by firstly aligning two fingerprints using point pattern and then matching their texture features. [5] defined a novel feature vector for each fingerprint minutia based on the global orientation field. These features are used to identify corresponding minutiae between two fingerprint impressions by computing the Euclidean distance between vectors. [6] proposed a minutiae matching technique, which used both local and global struc-
tural information. [7] developed a method to evenly align two sets of minutiae based on multiple reference minutiae (MRM). MRM is distributed in different regions of fingerprint and two sets of minutiae are globally and evenly aligned. Thus a pair of unpaired corresponding minutiae far away from the reference minutiae in SRM-based method maybe paired in MRM based method. In this paper, we propose a novel minutiaeorientation vector(MOV) for each minutia, which combines both neighborhood minutiae and surrounding orientation information. Based on MOV, one main-reference point pair and a set of associatereference point pairs are identified. Alignment is performed according to these reference point pairs. Matching score is computed thereafter. Experiments are conducted on the public domain collection of fingerprint images, DB2.A in FVC2004. Comparing with earlier works like [1], the strength of the proposed scheme lies in that the MOV structure combines both neighborhood minutiae and surrounding orientation information. MOV has limited dimensions, which means shorter computational time. Further more, the multi-reference point pairs mechanism is more robust to deformations than one-reference point pair method. The rest of this paper is organized as follows. Section 2 describes the proposed minutiaeorientation vector. Our fingerprint matching method is presented in Sec. 3. The experimental results are reported in Sec. 4. Section 5 concludes this paper.
Li-min Yang, Jie Yang and Yong-liang Zhang 63 2. Definition of t h e Novel Minutiae-Orientation Vector(MOV) In general, a minutia point M from a fingerprint minutiae set can be described by a feature vector given by: F={x,y,w).
(1)
where (x, y) is its coordinate, w is its orientation. Generally, the orientation of minutia is in the range [—n/2, it/2\. Considering that the difference of 30° may result in the difference of 150° due to the effect of rotation of fingerprint image on the directions of ridges, the orientation difference of two minutia wi and o)2 can be calculated as follows:5 d(o)i,0)2) = if
-it/2
if
-it < (wi - (j2) < -Tt/2,
Secondly, search around M to find 8 selective minutiae, M : , M 2 , - - - , M 8 . Af" has coordinate (xl,yl) and orientation o)1, where i = 1,2, • • • ,8. It is worth to notice that each selective minutia M% should satisfy the following two conditions (illustrated in Fig. 1):
< (wi - o>2) < Tt/2,
DMMi
=
min
DMMk
where k = 1,2, ••• ,N,f(i) = [o)+(i-l)x45°,o)-Mx 45°), o) M M i denotes the direction from M to minutia M%, and DMM1 denotes the distance between M and minutia Ml, i = 1,2, • • • ,8, TV is the number of minutiae on this fingerprint. The proposed novel minutiae-orientation vector(MOV) at minutia point M can be defined as:
0)i — 0)2 — 7T
if
it/2 < (o)i - o)2) < it.
MOV(M)
= {d(o),o^),d(o),o/),
d(o>, UMMi), DMMi }i=l,2,- • ,8 Let M(x, y, o>) denotes an arbitrary minutia from a fingerprint image, we define our novel minutiaeorientation vector(MOV) as follows: Firstly, draw a circle C around M with center M(x, y) and radius 6 * T (T is the local average ridge distance, empirically determined). Let 6\ = o), #2 = o)+7r/4, 0 3 = W+7I-/2 and 04 = o)+37r/4. We plot four lines l\,h,h, and Z4 along the angles 61,82,03, and 64 with respect to X axis through the minutia point M. Label the eight intersection points of circle C and lines h,h,h, and Z4 as C?,C?,--,C 8 M . C f has coordinate (XCM,yc\i) and orientation O) C M, where i = 1,2, • • • , 8 (illustrated in Fig. 1). .'-,
?,
where ^MCM
(3)
€ (—7r/2,7r/2) is the direction of M to
This vector MOV(M) of the minutia M is invariant to rotation and translation of the fingerprint images. As we know, if two minutiae from two images of the same finger are corresponding minutiae, they will have similar MOVs. 3. Fingerprint Matching Based on M O V A minutia can be described by its position, direction and type (ridge ending or ridge bifurcation). Only minutia position and minutia direction are used in this paper. In order to align two point sets before calculating the matching score, we need to identify a set of reference point pairs. 3.1. Reference Point Identification
Pairs
Suppose Tfe and Qi (where k = 1,2, •••if, I = 1.2, • • • L, K,L are the numbers of minutiae detected on T and Q, respectively.) are two minutiae from Fig. 1. Illustration of minutiae-orientation vector(MOV)
64
Minutiae-Orientation
Vector Based Fingerprint
Matching
template fingerprint and query fingerprint, respectively. The similarity level between Tk and Qi is calculated as: Thr
~TZk'Ql)
S{Tk,Qi)
if <£(Tk,Qi)
0
otherwise.
(4)
where <£(Tk,Qt) = \MOV{Tk) - MOV{QL)\ is the Euclidean distance between Tk and Qi's MOVs, and Thr is the threshold. The similarity level S(Tk,Qi), 0 < S(Tk,Qi) < 1, describes a matching certainty level of two MOVs instead of simply matched or not. S(Tk,Qi) = 1 implies a perfect match while S(Tk,Qi) = 0 implies a total mismatch. Note that not all dimensions of the MOV exist. When calculating the similarity level between Tk and Qi, only the dimensions existed in both minutiae are employed to decide the similarity level. The best matched minutiae pair (A, B) is obtained by maximizing the similarity level as: S(A,B)
=
max(S(Tk,Ql)).
(5)
Minutiae pair (A, B) is identified as the mainreference point pair, while A and B's surrounding intersection point pairs (Cf,Cf), i = 1,2, •••8 are identified as associate-reference point pairs. 3.2. Fingerprint
Alignment
Aligning two minutiae sets T and Q is to rotate and shift one of them in order to make minutiae in T overlap approximately the corresponding counterparts in Q. After obtaining main-reference point pair (A(xa,ya,wa), B(xb,yb,Ub)), and associate-reference point pairs (C/*(a;?,j/?,u;?),Cf(zj.yj.wj)), i = 1,2, • • • 8, the rotation parameter Aw between template minutiae set T and query minutiae set Q is computed by:
AW = a ( W 6 - W a ) + / 3 , ^ K b - < ) .
(6)
i
where a and $ are weights, and a + ]>3/?i = l , i = l,2,---8. Rotate Cf to Cf by angle Aw with respect to reference point B, according to the following formula: cos Aw sin Aw \ ( x\ — Xb sin AUJ - cos Aw J \yf - Vb
where (Xf, Yf) is the coordinate of Cf . Then the translation vector (Ax, Ay)T between T and Q can be computed by :
(7)
(8) where 7 and 77* are weights, and -y + ^2rji — 1, i = 1,2, ••-8. After obtaining the translation and rotation vector (Ax, Ay) and Aw, we translate and rotate all the minutiae on query set Q, according to the following formula:
(
cos Aw sin Aw 0 sin Au> — cos Aw 0 0 0 1
(9)
where (xi,yi,Wi)T, represents minutiae on the query minutiae set and (xf ,yf,wf)T represents the corresponding aligned minutiae. Let Q' denote the new minutiae set of the query fingerprint after transformation with the estimated translation and rotation parameters. 3.3. Matching
Score
Computation
For the transformed minutiae set Q', we re-compute the MOVs of each minutia. By calculating and comparing the similarity level of minutiae's MOVs, we find the corresponding minutiae pairs between the transformed minutiae set {Q'[}i=i,2,••• ,L and t h e originally extracted minutiae set {Tfc}fe=ii2)... tK- Let {r c „,<5c n }n=i,2,-,AP, N' < mm(K,L), denote the corresponding minutiae pairs, then the final matching score Macore between the query and template fingerprints can be calculated according to the following equation: JV'
Ms,
"£s(TCn,Q'cJ.
(10)
n=l
where the similarity level S(TCn,Q'Cn) according to Eq. (4).
is computed
Li-min Yang, Jie Yang and Yong-liang Zhang 65 4. Experimental Results The experiments reported in this paper have been conducted on the public domain collection of fingerprint images, DB2.A of FVC2004. 8 The fingerprints of DB2_A are acquired through optical sensor "U.are.U 4000" by Digital Persona, and each fingerprint has a size of 328x364 pixels by 500 dpi. This database contains 800 fingerprints captured from 100 different fingers, eight impressions for each finger. According to FVC rules, 8 each fingerprint is matched against the remaining fingerprints of the same finger in genuine match. Hence, the total number of genuine tests is (8x7xl00)/2 = 2800. In imposter match, the first fingerprint of each finger is matched against the first fingerprint of the remaining fingers. Hence, the total number of false acceptance tests is (100x99)/2 = 4950.
(ROC) curve obtained by the proposed algorithm is illustrated in Fig. 2. In Table 1, we compare the performance of the proposed scheme with the results of the two best algorithms called "P039" and "P071" on DB2.A of FVC2004. According to the ranking rule in terms of EER in FVC2004, the proposed scheme is in the second place on DB2_A. 5. Conclusions In this paper, we propose a MOV-based matching scheme. For each minutia, a minutiae-orientation vector(MOV) is constructed. It is rotation and translation invariant. Based on MOV, a set of reference point pairs between template and query fingerprint minutiae sets is identified. Alignment is done according to these reference point pairs. Fingerprint matching based on the proposed MOVs is processed. Experimental results on the public domain collection of fingerprint images, DB2-A of FVC2004 show that fingerprint matching based on the proposed MOV can achieve good performance. References
hta. DtakhK*!'..
Fig. 2. ROC curve on FVC2004 DB2.A obtained by the proposed algorithm
Table 1. Comparison of the proposed algorithm with P039 and P071 on DB2-A Algorithm
P039 Our Algorithm P071
EER
(%)
FMR 100(%)
FMR 1000(%)
1.58 2.04 2.59
2.18 2.23 3.14
5.79 3.30 4.96
Average Match Time(s) 0.83 0.45 0.57
We tested the relationship between FMR(False Match Rate) and FNMR(False Non-Match Rate) on DB2.A. The Receiver Operating Characteristic
1. Xifeng Tong, Jianhua Huang, Xianglong Tang, Darning Shi, Fingerprint minutiae matching using the adjacent feature vector, Pattern Recognition Letters 26 (2005) 1337-1345 2. Z. M. Kovcs-Vajna, A fingerprint verification system based on triangular matching and dynamic time warping, IEEE Trans. Pattern Anal. Mach. Intell. 22 (11) (2000) 1266-1276 3. M. Tico, P. Kuosmanen, Fingerprint matching using an orientation-based minutia descriptor, IEEE Trans. Pattern Anal. Mach. Intell. 25 (8) (2003) 1009-1014 4. A. Ross, A. K. Jain, J. Reisman, A hybrid fingerprint matcher, Pattern Recognition 36 (7) (2003) 1661-1673 5. Jin Qi, Suzhen Yang, Yangsheng Wang, Fingerprint matching combining the global orientation field with minutia, Pattern Recognition Letters 26 (2005) 24242430 6. Jiang, X., Yau, W. Y., Fingerprint minutiae matching based on the local and global structures In: Proc. Internat. Conf. Pattern Recognition, ICPR2000, Barcelona, Spain, vol. 2, pp. 1042-1045 7. En Zhu, Jianping Yin, Guomin Zhang, Fingerprint matching based on global alignment reference minutiae, Pattern Recognition 38 (2005) 1685-1694 8. Biometric Systems Lab., Pattern Recognition and Image Processing Lab., Biometric Test Center., [Online] Available:http://bias.csr.unibo.it/fvc2004/
66
Recognition of Pose Varied Three-Dimensional Human Faces Using Structured Lighting Induced Phase Coding
Debesh Choudhury Instruments
Research and Development Establishment Photonics Division, Raipur Road, Dehradun 248008, Uttaranchal, India E-mail: [email protected]
Face recognition received a lot of attention in the recent past. Most of the reported techniques are based on use and analysis of the two-dimensional (2D) information available in the 2D facial images. Since human faces are originally three-dimensional (3D) objects, association of 3D sensed information can make face recognition more robust and accurate. This paper reports a method for recognition of faces by utilizing phase codes obtained from projected structured light patterns onto the faces. The phase differences associated with the distorted projected patterns are detected and the computed phase maps are utilized to synthesize complex signature functions, spatial frequency distributions of which are directly proportional to the computed phase maps and hence to the original 3D face shape. The synthesized signature functions of the test faces are compared with that of the target face by digital cross-correlation. Analyses of the cross-correlation intensities (squared modulus) complete the recognition process. Preliminary experimental results are presented for faces with wide variations of pose (out-of-plane head rotations). Keywords: Biometrics; face recognition; 3D object recognition.
1. Motivation Out of all the biometric authentication areas, the topic of human face recognition research received perhaps the most attention. 1 ' 2 A lot of techniques exist in the literature for face recognition utilizing feature analysis, eigenface approach, neural processing, global two-dimensional (2D) matching of 2D images etc. 3 ' 4 All these techniques involve utilization or analysis of information obtained from 2D facial images. Since a human face is originally a 3D object, direct association of 3D shape information would make face recognition more accurate and robust. 2. Introduction Three-dimensional face recognition is a relatively recent trend, although face recognition based on 3D modeling of the 2D facial image have been welladdressed. The face recognition research, in general, is experiencing a paradigm shift towards 3D face recognition that utilize directly (or indirectly) sensed 3D shape information.2 A recent literature review5 points to a variety of methods those utilized range images (and or stereo images) and applied several techniques, such as segmentation, feature analysis, principal component analysis, Hausdorff distance method, iterative closest point approach etc. for the purpose of 3D face recognition. The face recognition in the 3D regime has sev-
eral advantages. 3D face recognition overcomes the limitations due to pose (viewpoint or orientation) and lighting variations. It may also solve the problem of discriminating live face from fake faces6 because faking a 3D face would be more difficult than to fake a 2D image. However, different 3D recognition schemes may require different forms of 3D signatures. Moreover, standard 3D face databases are yet to be available freely. Nevertheless, superior performance of 3D face recognition techniques over the image-based ones calls for more studies in this area. In what follows we consider a technique for recognition of 3D faces using directly sensed 3D shape induced phase coding.7 A structured light pattern (SLP) is projected onto the faces. The 3D depth variations in the faces induce proportional distortions in the projected patterns. The SLP projected face images are captured using a CCD camera. The phase differences associated with the distortions of the SLP projections are extracted by using a Fourier-fringe analysis method. 8 The extracted phase values characterize the 3D shape of the faces. The detected phase maps are utilized as shape codes to synthesize 2D spatial harmonic functions. The encoded spatial functions can be used as unique signatures for the face objects. The synthesized signature functions of the test faces are compared with that of the target face by digital cross-correlation (CrossCorr). The CrossCorr inten-
Debeah Choudhury SLP projected face image
Camera
Signature data base Aexp{i2np^i(x,y)fi.x,y)} Aexp{i2Tt$$2(x,y)j{x,y)}_
Phase map ty0(x,y)
v
Aexp {i2n$Mx,y)flx,y)} Asxp{i2nPfn(x,y)fi.x,y)}
o
Spatial coding Aexp{ i2n$$0(x,y)fix,y)}
H
Cross-correlation of the signature functions Aexp {i2nP$t(x,y)fix,y)}
® Aexp
{i2itffy0(x,y)f{x,y)}
1} I
Noise
Correlation peak Peak or Noise and no peak
Pig. 1. Schematic diagram of the proposed face recognition system along with the algorithm of the phase coded crosscorreiatkm.
sities (squared modulus) are analyzed. The signal-tonoise-ratio (SNR) of CrossCorr results gives a quantitative measure for recognition or rejection of a face class. 3. T h e o r y Figure 1 represents the schematics of the proposed face recognition system. The configuration consists in a white-light projector connected with a computer. A SLP from the computer, say, a sinusoidal grating, is projected on the human face object positioned in front of a uniform background screen. A CCD camera is connected with the same computer that captures the projected patterns with and without the face object. The captured images are processed to compute the associated phase difference using Fourier-fringe analysis. 8 In short, the first order Fourier spectra of the distorted and the undistorted SLP images are
67
selected by spatial filtering and their inverse Fourier transforms are computed. Now, the associated phase difference is extracted by computing the arctangent of the product of the inverse Fourier transform of the first order spectrum of the distorted SLP and the conjugate of that of the undistorted SLP. 8 This computed phase map characterizes the 3D shape of the face and the phase values are, in general, wrapped, i.e., the phase values may have 2n ambiguity due to large depth variations in the face objects. The wrapped phase maps are used for synthesizing the signature functions by spatial coding. The phase coding algorithm and the associated computational process are described in the lower dashed block of Fig.l. If 4>t(x, y) and o(x, y) represent the computed phase map corresponding to the SLP projected target face image and a general test object face image respectively, we define the signature functions st(x, y) and sa(x, y) of the target and the test object faces respectively as st(x,y)
= Aexp{i2n@t(x,y)f(x,y)}
,
(1)
sa(x,y)
= Aexp{i2ir0(t>o(x,y)f(x,y)}
,
(2)
where (x,y) represent the rectangular spatial coordinates, f(x, y) is a function of the coordinates, i = \/^-l, f$ is a positive factor (fraction or whole number) and A is a constant. The function f(x,y) is used for spatial coding, which can be any suitable function of the coordinates, linear or powered. The signature functions st(x, y) and s0(x, y) represent the 3D shape information of the face objects spatially encoded in terms of spatial modulations. The spatial frequency distributions of the signatures are proportional to the computed phase map corresponding to the face objects. The signature functions of t h e faces contain high spatial frequency information, hence we prefer to compare them using the frequency domain correlation pattern recognition method 9 instead of the weE-known image domain techniques such as principle component analysis. The cross-correlation operation between the signature functions of the target face and the test face may be expressed as c(x,y)=
/
l
St(u,v)S*(u,v)N(u,v)
x exp[—i2w(ux + vy)]dudv
(3)
where (u, v) is the spatial frequency coordinates corresponding to the spatial coordinates (x,y), St(u,v)
68
Recognition
of Pose Varied Three-Dimensional
Human Faces Using Structured Lighting Induced Phase Coding
and S0(u, v) represent the Fourier transforms of the signature functions St(x,y) and s0{x,y) respectively, * signifies complex conjugate, and N(u, v) is a notch type spatial filter that blocks the zero order components of St(u,v) and passes the higher orders satisfying the relation N(u,v)
= 1 — circ
v^
Mt
(4) \min
The minimum value of \4>t\ is used, because the spatial filter distribution pertaining to the computed phase map corresponding to the target face falls within that range. The j3 factor may be used to control the sensitivity of the spatial coding, and may be suitably selected after some test experiments. The Fourier transform St (u, v) may explicitly be written as
10
+oo +oo
St(u,v)
•= \
\ st{x,y)exp{-i2ir(ux+
vy)}dxdy
—oo —oo +oo +oc
= x//-p| i i M».(,.,)/fc»)-«x-n.}K*. — oo —oo
(5) Referring to the well-known method of stationary phase, 11 it can be noticed that the main contribution to the integral of equation (5) comes from the points where the phase ip(x,y)=
(3(j)t (x,y)f(x,y)-ux-
vy
3.1. Personalized
spatial
coding
It is interesting to point out the possibility of assigning different spatial coding functions for different persons. As for example, the coding functions fi(x,y), h{x,y), h{x,y) may be used to synthesize the signature functions of the persons 1, 2, 3 respectively. The signature function of the same face (say person 1) will not be the same for a different coding function [say with f2(x, y)}, because according to equations (8) and (9), the spatial frequency distribution of a face signature function is dependent of the coding function f(x,y). Person-specific coding function will render the spatial coding more stringent and will help to reinforce secure face recognition in an authentication scenario.
(6)
is stationary, i.e.,
s-'S-
The face signature points (x, y) that give a dominant contribution to the spectrum distribution at (u, v) are governed by equations (8) and (9). Similarly, the spectrum distribution [S0(u,v)\ of the signature function [s0(x,y)] due to the general test face can also be expressed. If 4>0 ^ <j>t, the spatial frequency spectrum distributions of the general test face signature and the target face signature are different, and therefore they are uncorrelated. On the otherhand, if 4>o ~ 4>t, the spectrum distributions of the test face signature and the target face signature overlap in the spatial frequency domain and they are correlated.
4. Results and discussions
0
-
(7)
Now, partially derivating equation (6) with respect to x and y and applying the method of stationary phase, 11 we can have ,df(x,y) dMx,y) u = (3 4>t{x,y)—zz— , (8) +f(x^y) dx dx df(x,y) dt{x,y) v = (3 + f{x,y) {x,y) • (9) dy dy
The feasibility is experimented with faces of 20 persons. We put f(x,y) = x - y in equations (1) and (2). Movies of SLP projected face sequences are captured with variations in poses (i.e., out-of-plane head rotations) in both left-right and up-down directions (about 90 degrees) using a commercial digital camera (Sony Cybershot).
Debesh Choudhury
Fig. 2. Experimental results: (a) SLP projected 1st face; (b) same with rotated head; (c) SLP projected 2nd face; (d) phase map of (a); (e) phase map of (b); (f) phase map of (c); (g) signature of (a); (h) signature of (b); (i) signature of (c); (j) CrossCorr of (g) and (h); (k) CrossCorr of (h) and (i); (1) CrossCorr of (i) and (g);
The extracted frames are processed using the proposed spatial phase coding algorithm (Fig.l). The SLP projected face images are of 128x128 pixel size. The value of /3 is kept 0.05 throughout. Experimental results with two face samples are shown in Fig.2. The SLP projected face images are shown in Fig.2(a), Fig.2(b) (faces of the same person with different poses, i.e., faces of the same class) and Fig.2(c) shows that of a different person. The associated phase map corresponding to the depth-induced distortions in Fig.2(a), Fig.2(b) and Fig.2(c) are computed and are shown in Fig.2(d), Fig.2(e) and Fig.2(f) respectively. These phase maps are utilized to synthesize the signature functions. The real parts of the signature functions of the face images of Fig.2(a), Fig.2(b) and Fig.2(c) are shown in Fig.2(g), Fig.2(h) and
69
Fig.2(i) respectively. The normalized intensity of the CrossCorr function of Fig.2(g) and Fig.2(h), that of Fig.2(h) and Fig.2(i), and that of Fig.2(i) and 2(g) are shown respectively in Fig.2(j), Fig.2(k) and Fig.2(l). The high peak in Fig.2(j) exemplifies correct recognition between face signatures of pose varied faces of the same class, whereas Fig.2(k) and Fig.2(l) contain no well-defined peak but only noise, which clearly demonstrate discrimination of faces of the false class. We present some more results to show the effects of pose variation on the cross-correlations. Figure 3 shows some sample results of two face objects. The SLP projected pose varied faces of a person are shown in Fig.3(a) - Fig.3(f). The SLP projected frontal face image of the same person is shown in Fig.3(g), whereas Fig.3(h) shows the same of another person. The CrossCorr between the signature functions of the faces of Fig.3(a) - Fig.3(f) with that of the face of Fig.3(g) are respectively shown in Fig.3(i) - Fig.3(n) which contain sharp correlation peaks. These sharp peaks are the evidences of correct recognition with pose varied faces of the same class. Figures 3(o) - 3(t) show the CrossCorr between the signature functions of the faces of Fig.3(a) - Fig.3(f) with that of the face of Fig.3(h). The CrossCorr in Fig.3(o) - Fig.3(t) contain no well-defined peak but only noise, which signify mismatch for faces of different class. Therefore, the proposed spatially coded signature functions of the 3D faces can be utilized to recognize the true class faces and to reject the false class faces. We also show an example of personalized spatial coding by using two different spatial coding functions [f(x, y)\, results of which are shown in Fig.4. The SLP projected face images are shown in Fig.4(a), Fig.4(b) and Fig.4(c), all of which belong to the same class, i.e. they are faces of the same person. The signature functions corresponding to the faces of Fig.4(a) and Fig.4(b) are shown in Fig.4(d) and Fig.4(e) respectively with a coding function f(x, y) = x — y. Figure 4(f) shows the signature function corresponding to the face image of Fig.4(c) but with a coding function f{x,y) = 0.5a; - y. The normalized intensity of the CrossCorr function of Fig.4(d) and Fig.4(e), that of Fig.4(e) and Fig.4(f), and that of Fig.4(f) and Fig.4(d) are shown respectively in Fig.4(g), Fig.4(h) and Fig.4(i). In these CrossCorr results, only Fig.4(g) shows a sharp peak, but the other
70
Recognition of Pose Varied Three-Dimensional
Z>' y
Y
'
/
r
' *>,-; (o)
(P)
/f"
(k)
(j)
' " > > • - .
Human Faces Using Structured Lighting Induced Phase Coding
(1)
if
(m)
(n)
(s)
(t)
**W (q)
(r)
Fig. 3. More experimental results with pose variations: (a) - (f) SLP projected pose varied face of a person; (g) SLP projected frontal face of the same person; (h) SLP projected frontal face of another person; (i) - (n) CrossCorr of signatures of (a) - (f) with that of (g); (o) - (t) CrossCorr of signatures of (a) - (f) with that of (h);
CrossCorr show only noise, i.e., the face of the same person with different coding functions don't correlate and match. Therefore, the signature functions can be made different by using different spatial coding functions. This gives a choice for personalized coding that can render more secure recognition. Since, a correct recognition in our technique is evidenced by a high correlation peak around the centre, we can define the SNR as the ratio of the maximum correlation peak intensity around the centre to the mean noise in a rectangular area (128x128 pixel) around the centre (excluding a 21x21 pixel area at the centre where the correct peak is situated). The SNR of CrossCorr is computed for the pose variant face signatures. The frontal face signature is crosscorrelated with the pose varied face signatures of the same person (true class) for head rotations in both left-right and up-down directions. The frontal face signature of a second person (false class) is also cross-
correlated with the pose varied face signatures of the first person. Plots of the computed SNR values versus the pose varied (left-right) face signature numbers for two human objects are shown in Fig.5. The SNR of CrossCorr between the faces of the same person show high values (more than 100), whereas the SNR of CrossCorr for different persons' face is low (less than 20). Therefore, it is possible to recognize pose varied faces of the same class (person) and reject the pose varied faces of different class (person) by analyzing the SNR of CrossCorr. Similar plots can be obtained with pose variation in the up-down direction. The rate of false recognition is nil in our feasibility experiments using face objects of 20 persons with wide variations of out-of-plane head rotations (about 90 degrees) in both left-right direction (25 poses approx) and up-down direction (25 poses approx). We have utilized the correlation pattern recog-
Debesh Choudhwry
\ A f e - * -,JSk* ..•#&•;*' (e)
(f)
•**- *x
'>3-^.yi**
(h)
(i)
(d)
V
(g)
Fig. 4. Experimental results with different coding functions: (a) SLP projected 1st face; (b) same with rotated head; (c) almost same as (a); (d) signature of (a); (e) signature of (b); (f) signature of (c) with different coding function; (g) CrossCorr of (d) and (e); (h) CrossCorr of (e) and (f); (i) CrossCorr of (f) and (d);
True Class -3KFalse Class - - O
71
method creates rather higher spatial frequencies in the coded signatures, which calls for a spatial frequency sensitive matching technique and frequency domain correlation pattern recognition method satisfies that. In-plane rotation and scale are not considered in the present study, which can be surmounted using matured techniques based on wavelet and circular transforms. 13,14 Although, the results presented here are based on the test experiments carried out on a very limited database of our own, because standard 3D face databases are yet to be available freely, the feasibility of the proposed 3D face recognition algorithm is successfully proved. Preliminary tests with wide pose variations show the promise of the proposed technique. It is worth mentioning that explicit reconstruction of the 3D shape is not required which is a tedious job. However, the system has to be calibrated as any other structured lighting based system. 15 More rigorous testing are required with changed conditions, such as for expression variations, with and without glasses, beards, intentional disguise etc. Since, the 3D shape of a face may change with age (years), because the flesh and the tissues may change drastically, so the 3D signature database must be updated regularly for authentication purposes. 5. Conclusion
o "5 200
*-*"*"•
>-<)-oTo-o--{>-o--'^o-cv-p-A'>-<>, <— Left side head rotation
Frontal
Right side head rotation —5*
Pose varied signature number
Fig. 5.
SNR versus pose varied face signature plot.
nition 9 method for matching, although other popular techniques, such as principal component analysis (PCA) or FisherFaces method, might have been tried. While PCA and FisherFaces work in the image domain, correlation pattern recognition works in the frequency domain and offer advantages such as shiftinvariance, the ability to accommodate in-class image variability. 12 Also, the spatial coding in our proposed
A technique for recognition of pose varied threedimensional faces is presented. It utilizes the threedimensional shape cues of the faces obtained by structured light pattern projection induced spatial phase coding. Experimental feasibility is tested using pattern projected face images with wide variations in pose captured by a commercial digital camera. The test experiments bring out the novelty with excellent results of true class face recognition and false class face rejection. Acknowledgments All the people who agreed to expose their faces for the experiments deserve sincere thanks and gratitudes. Some helps from Mr. S. K. Chaukiyal are acknowledged. The author is indebted to Dr. A. K. Gupta and Director, IRDE for permission to present this work.
72
Recognition of Pose Varied Three-Dimensional Human Faces Using Structured Lighting Induced Phase Coding
References 1. S. Z. Li and A. K. Jain, eds., Handbook of Face Recognition (Springer, 2005). 2. Reports on Face Recognition Grand Challenge initiatives, http://www.frvt.org/FRGC. 3. W. Zhao, R. Chellappa, A. Rosenfield and J. Phillips, ACM Comput. Surv. 12, 399 (2003). 4. S. K. Zhou and R. Chellappa, J. Opt. Soc. Am. A 22, 217 (2005). 5. K. W. Bowyer, K. Chang and P. Flynn, Comput. Vis. Imag. Underst. 101, 1 (2006). 6. J. Li, Y. Wang, T. Tan and A. K. Jain, in Biometric Technology for Human Identification, A. K. Jain, N. K. Ratha, Eds., Proc.SPIE 5404, 296 (2004). 7. D. Choudhury and M. Takeda, Opt. Lett. 27, 1466 (2002). 8. M. Takeda and K. Mutoh, Appl. Opt. 22, 3977 (1983).
9. B. V. K. Vijaya Kumar, A. Mahalanobis and R. D. Juday, Correlation Pattern Recognition (Cambridge, New York, 2006). 10. D. Choudhury and M. Takeda, in Optical Information Systems, B. Javidi and D. Psaltis., eds., Proc.SPIE 5202, 168 (2003). 11. M. Born and E. Wolf, Principles of Optics (Pergamon, 1989). 12. K. Venkataramani, S. Qidwai and B. V. K. Vijayakumar, IEEE Trans Systems Man Cybernetics: Part C 35, 411 (2005). 13. Y. Sheng and D. Roberge, Appl. Opt. 38, 5541 (1999). 14. S. Roy, H. H. Arsenault and D. Lefebvre, Opt. Eng. 42, 813 (2003). 15. R. Legarda-Saenz, T. Bothe and W. P. Juptner, Opt. Eng. 43, 464 (2004).
Writer Recognition by Analyzing Word Level Features of Handwritten Documents
Prakash Tripathi and Bhabatosh Chanda Ecetronics and Communication Sciences Indian Statistical Institute Kolkata 700108, India E-mail: [email protected]
Unit
Bidyut Baran Chaudhuri Computer vision and Pattern Recognition Indian Statistical Institute Kolkata 700108, India
Unit
Writer recognition based on handwriting is important from security as well as judiciary point of view. In this paper, based on static off-line data, we have proposed a novel writer recognition methodology using word level micro-features and simple pattern recognition technique. The result is reasonably good and encouraging, and shows usefulness of computational feaures proposed here. Keywords: writer recognition, handwritten document, word bounding box, word level feature, K-nearest neighbour.
1. Introduction Analysis of handwritten documents is taken up by the research community to deal with two kinds of problems: (i) What is written and (ii) who has written the concerned document. In this work we deal with the second problem. This has immense application potential in security system and judicial framework. For example, sample of handwriting may be considered as biometric of the writer. 2 So this may be used as a means of authentication. Analysis of handwriting from the viewpoint of determining the writer has great bearing on the criminal justice system. Writer individuality rests on the hypothesis that each individual has components of handwriting that is distinct from those of another individual. Handwriting has long been considered individual property, as evidenced by the importance of signatures in documents. However, this hypothesis has not been subjected to rigorous scrutiny with the accompanying experiment, testing, and peer review. Each writer can be characterized by his own handwriting, by the reproduction of details and unconscious practices. This is why in certain cases the handwriting samples may be treated as a biometric, like fingerprints. The problem of writer identification arises frequently in the court of justice where the judge has to come to a conclusion about the authenticity of a document (e.g. a will, bill or receipt). Need also arises in banks for signature verification, or in
some institutes that analyze manuscripts of ancient authors and are interested in the genesis of the texts. In order to come to a conclusion about the identity of an unknown writer, two tasks may be considered: (1) The writer identification task concerns the retrieval of handwritten samples from a database using the sample under study as a graphical query. It provides a subset of relevant candidate documents, which the expert (s) will concentrate on. (2) The writer verification task must come to a conclusion about two samples of handwriting and determines whether they are written by the same writer or not. When dealing with large databases, the writer identification task can also be viewed as a filtering step prior to the verification task. Analysis of handwriting by computer has a long history 3 and one of its first applications was in signature verification.4 However, that was a very constrained search and templates of handwritten word(s), which is usually distinctive from person to person, are used for verification. Identifying and/or verifying writer based on handwriting style (not limited to a constrained set of words) is relatively new area with pioneering work done by Srihari. 5 The basic tasks are (i) extraction of style or forensic dependent features6 from documents, and (ii) then the features are analyzed with the help of some pattern clas-
74
Writer Recognition
by Analyzing
Word Level Features of Handwritten
sification / recognition algorithms. 7 ' 8 For example, clustering and HMM based methodologies are proposed by Bensefia et a/.9'10 These are mainly off-line methods based on static data. On-line methods are primarily based on dynamic features like pen pressure, direction of strokes, etc. l,lx Some approaches adopt both static and dynamic features. 12 The objective of this work is to develop a novel writer recognition system for handwritten documents using the static features relevant to writing style. Writing style consists of character shape (e.g., loop formation, curvature, corners, etc. ), word formation (e.g, inter-character gap, aspect ratio, interword gap, etc.) and line formats (e.g., skew, straightness, inter-line spacing, etc. ). Particularly in this paper we have explored the capabilities of word level features and employed a simple pattern recognition methodology in writer recognition. The experiment is carried on the documents written in an Indian language named Bangla. The remaining of this paper is organized as follows. Section 2 describes the data acquisition strategy. Proposed methodology is described in section 3. Experimental results and discussion are given in section 4 followed by concluding remarks in section 5. 2. Data Acquisition For such experiment the data acquisition is very critical to draw any meaningful conclusion. Thus, our objective was to obtain a set of samples that would capture variations in handwriting among and within writers. It means that we need handwriting samples from multiple writers, as well as multiple samples from each writer. The handwriting samples of the sample population should have the following properties. (1) They should be sufficient in number to exhibit normal writing habits and to portray the consistency with which particular habits are executed. (2) For comparison purposes, they should have similarity in texts, in writing circumstances, and in
Documents
writing purposes. (3) The text content should be meaningful and it should have continuity. Several factors may influence handwriting style, e.g., gender, age, ethnicity, handedness, the learing system of handwriting, the subject matter (content), writing protocol (written from memory, dictated, or copied out), writing instrument (pen and paper), changes in the handwriting of an individual over time, etc.5 We tried to ensure that document content captures as many features as possible. Only some of these factors were considered in the experimental design. The other factors will have to be part of a different study. However, the same experimental methodology can be used to determine the influence factors not considered. 2.1. Source
Documents
One of the two almost similar source documents in Bangla script, which were to be copied by each writer is shown in Fig.l. They are concise, containing 158 words and 120 words respectively and are composed of most frequently occurring words in Bangla language. A table of some such words and their frequencies obtained from a moderately large Bangla Corpus is shown in Fig.2. In addition, the source documents also contain punctuation, distinctive letter and a general document structure that allowed extracting macro-document attributes such as word and line spacing, line skew, etc. Each participant (writer) was required to copy one of the source documents four times in his/her most natural handwriting, using plain, unlined sheets, and a ballpoint pen. The repetition was to determine, for each writer, the variation of handwriting from one occasion to the next. We collected documents from 30 writers resulting in a total of 120 documents. Each handwritten document was scanned and converted into a two-tone image using a resolution of 300 dpi. An example of such image is shown in Fig.3.
Prakash Tripathi, Bhabatosh Chanda and Bidyut Baran Chaudhuri
?RJ> Ft tfPft ^ ^Tl 33 *5RJ £J>R *R$ f SRSI f%f F^SR t%|. SOT ^ 3 Cot *RR 3 • ^r*ff w iroi «fprai?r «rq*u ^ ^ is$8 *
^T "Q ?wsr i$*ro?f «npr *>nft # R ^ ^ p jt^fo ^ i faft ^rftft $M *KrRi w H R J ^rtw $n ¥R ww, ^M ^ TO ^ifirai «* ICTI CT *nw «iftiwR 1*R 13 *fo «r
sclera WRJ ^ w*r ftg ifi^f ^5^5 ^ Fig. 1. A portion of the original document used to collect handwriting. --\sT7
2S>. 1\>- !"'
-,; , *
1'.
x
; svju -
'V
• ]f„)H
->^
>
10(^1",
1 tSOl
"w
~ ~. 1
• r .
5
<-?2>
s
' i d ' . •<
j
.
»- ;
•z-s^
*.?7'^
s
,-,-
i > V
^W\
^1 ^ /
^)' (^
M^2
*r ',•
l',^'/-
-
75
3. Proposed methodology A digital binary image J is denned over a discrete domain Z2 obtained by sampling the continuous domain along two orthogonal directions such that the value of I{i,j) is either 0 or 1. In this section we describe preprocessing, feature extraction and pattern recognition techniques adopted for the development of the intended system for writer recognition. 3.1.
Preprocessing
Preprocessing comprises of following basic steps. (1) Noise removal: Stray black pixels (positive noise) of / are removed by binary morphological opening with 3 x 3 structuring element (SE). Let the result be I. (2) Estimation of line count: Connected components of I are labeled. Statistics of run-length of 0-pixels within and between the words are obtained. A rectangular SE is designed based on these statistics and the image is closed with this SE to form word-blobs (see Fig.4). Let the result be Ic- Then Ic is vertically scanned to estimate the line number and line spacingi. 13
Oi ~* S1 , 1
irf
^ 7 1
82v
Fig. 2. Some most frequently used Bangla words and their frequencies obtained from a corpus.
-*^&> SYTJM'S' "3^?srr, «?v&
Fig. 4.
Illustartes word blobs obtained by closing.
(3) Extraction of word bounding box: Based on the line count and line spacing as well as the vertical scan of each word-blob, we determine the word bounding box. An example of word bounding boxes obtaind by the system is shown
2*-(X3r ^-xz-f^s,,
Fig. 3.
-35f?sf-^cJ rasf5?-ajTR
An example of handwritten document.
76
Writer Recognition
by Analyzing
Word Level Features of Handwritten
in Fig.5. The result of this step is not 100% accurate; however, it is not very critical for the final results. 13
Documents
are computed over non-overlapping division of word bounding box. 3.3. Box level
features
Let the fc-th bounding box of a word be represented by Wk with two diagonal points (ri,ci) and (r2,C2) in the image. Then the box level features are computed from the preprocessed images are as follows.
J^
QfS^T:^\
(1) Thickness of stroke Ts\ This roughly refers to the variable thickness of stroke caused due to speed of writing and is computed as
Fig. 5. Illustartes word bounding boxes obtained by the system. Note the errors occured though noise is cleaned in step 1.
'E(r,c)EWkI(r'C) (4) Thinning: The / is thinned for the extraction of structural features. Let the thinned image be It-14
(5) Medial axis transform and skeletonization: The / also undergoes medial axis transform from which local maxima are extracted. Let the results be Im and Is, respectively. 3.2. Feature
extraction
We distinguish between two types of features: conventional features and computational features. Conventional features are the handwriting attributes that are commonly used by the forensic document examination community. These features are obtained from the handwriting by naked eye under microscope. Computational features are those having known software/ hardware techniques of extraction. Computational features can be divided into macroand micro-features, depending on whether they pertain globally to the entire handwritten sample, e.g., uniformity in line spacing and straightness of lines, or are extracted locally, e.g., inter-character gaps and loop formation. Macro-features can be extracted at the document level (entire handwritten manuscript) or at least a paragraph; while micro-features may be extracted at line, word, and character levels. As mentioned earlier, in this work we confine ourselves to word level features, which we consider of two types: (1) Box level features (e.g., spread of stroke, depth of stroke, aspect ratio and density) that are computed over the entire word bounding box, and (2) Cell level features (e.g., isolated points, corner, crossing, branch, end point, pixel density) that
(2) Width of stroke Ws: This this may be another way of computing the thickness of stroke due to speed of writing and is computed as Ds = l -
z2(r,c)ewk -f(r>c) s(r,c)eWk
Im(r,c)
(3) Spread of stroke Ss: This roughly refers again to the thickness of stroke due to speed of writing and is computed as S. = l -
2^(r,c)€Wk
Is\T,c)
i(r,c)eWk
Im(r,C)
(4) Aspect ratio A: This gives an idea of elongation of word due to speed and style of writing and is computed as A = (r2 -n)/{c2
Fig. 6.
-ci)
mt
Division of bounding box into cells.
3.4. Cell level
features
Each word bounding box is divided into m x n cells. An example of such division for 3 x 4 cells is shown in Fig.6. On each of the cells six features are computed. (1) Number of isolated points: Isolated points are those pixels which have no neighbours. This is computed from the thinned image It-
Prakash Tripathi, Bhabatosh Chanda and Bidyut Baran Chaudhuri 77 (2) N u m b e r o f e n d p o i n t s : E n d points are those pixels which have only one neighbour. This is computed from It(3) N u m b e r of c r o s s p o i n t s : Cross points are those pixels where two strokes cross or have V junction. This is computed based on the connectivity number 1 4 using It. (4) N u m b e r o f b r a n c h p o i n t s : Branch points are those pixels which have ' T ' or 'Y' junction. This is computed based on the connectivity number using It. (5) P i x e l d e n s i t y : This is the ratio of number of black pixels and the cell size. This is computed using the thinned image I. (6) N u m b e r o f c o r n e r s : This is related to the smoothness of the strokes. This is computed using the thinned image / based on the facet model. 1 5 Thus, a total of N = 6mn + 4 features per word is computed, which will be used for writer recognition. It may be noted t h a t since the features reflect different kinds of geometrical and structural attributes, we do not employ any transformation (e.g., P C A ) for feature reduction. 3.5.
Recognition
Training set is a set of document images for which writers' identity are known. It contains all the features computed for all the words of all the writers. Writer information is attached with the feature vector of every word in training set. Each word of training documents is represented by a point in the Ndimensional feature space. For testing a table consisting of all the writers along with individual counter is created. T h e counters are initialized to zero. For a test document, every word is used to recognize the writer using K-nearest neighbour algorithm. T h e counter for corresponding K writers is incremented. After examining all the words of the test document, the writer having maximum count is taken as the writer of the document. T h e K-nearest neigbour approach is simple and gives consistently good results. 4. E x p e r i m e n t a l R e s u l t s a n d D i s c u s s i o n T h e d a t a set consists of 4 copies of 30 different writers. For each writer, out of four documents, three were considered as training d a t a set and one as test
data. Words were extracted automatically from every training document image. For each word 76 word level features (4 box level and 72 cell level features considering m = 3 and n = 4) were extracted. So, N = 76. Euclidean distance is employed t o measure the similarity and K is taken to be 3. Using leave one out strategy, all of t h e 120 documents are used as test set, and out of these 120 documents writers are correctly recognized in 114 cases making t h e recognition rate of 95%. It may be noted t h a t if the value of K is chosen 1 or 2, the recognition r a t e goes down to 67.2% and 70%, respectively. This implies t h a t t o make the system invariant to the intra-writer variation high value of the K should be taken. Here K = 3 is a reasonable choice, as we have three h a n d w r i t t e n documents of each writer in the training set. Recognition score for the words for a randomly selected writers is shown in Table 1.
5. C o n c l u s i o n Here we have presented a novel algorithm for writer recognition based on h a n d w r i t t e n documents. In this work only micro-features at word level are used. Simple K-nearest neighbour algorithm with Euclidean distance measure has been used and the result obtained is very encouraging. Table 1. Recognition score for the words for some writers (ts: Test, tr: Training). ts
1
2
3
4
5
6
7
8
539 100 65 48 62 29 39 71
24 502 12 113 115 86 65 127
97 46 505 53 40 42 51 92
14 77 20 687 104 85 128 104
27 107 19 126 958 54 143 54
12 60 3 85 44 345 62 66
18 70 26 134 181 76 591 83
30 105 43 73 56 94 82 575
tr
1 2 3 4 5 6 7 8
References 1. K. Yu, Y. Wang and T. Tan, Writer identification using dynamic features, in Proc. Intl. Con}, on Bioinformatics and its Applications (ICBA '04), (Florida, USA, 2004). 2. A. K. Jain, B. Ruud and P. S. (eds), Biometrics: Personal identification in networked society (Kluwer Academic, Boston, 1999). 3. E. C. Greanias, P. F. Meagher, R. J. Norman and P. Essinger, IBM J. Res. Develop. 7, 14 (1963).
78
Writer Recognition by Analyzing Word Level Features of Handwritten Documents
4. K. K. Ho, H. Schroder and G. Leedham, Codebooks for signature verification and handwriting recognition, in Proc. of the Australian and New Zealand Intelligent Information Systems Conference, (Australia, 2006). 5. S. N. Srihari, S.-H. Cha, H. Arora and S. Lee, J. Forensic Science 47 (2002). 6. M. Tapiador and J. A. Sigenza, Writer identification method based on forensic knowledge, in Proc. Intl. Conf. on Bioinformatics and its Applications (ICBA'04), (Florida, USA, 2004). 7. K. Steinke, Pattern Recognition 14, 357 (1981). 8. E. N. Zois and V. Anastassapoulous, Pattern Recognition 33, 101 (2000). 9. A. Bensefia, T. Paquet and L. Heutte, Pattern Recognition Letters 26, 2080 (2005).
10. A. Bensefia, T. Paquet and L. Heutte, Electronic Letters on Computer Vision and Image Analysis 5, 72 (2005). 11. K. P. Zimmerman and M. J. Varady, Pattern Recognition 18, 63 (1985). 12. W. Jin, Y. Wang and T. Tan, Text-independent writer identification based on fusion of dynamic and static features, in Proc. Intl. Workshop on Biometric Recognition Systems, (Beijing, China, 2005). 13. A. K. Das, A. Gupta and B. Chanda, Image Processing and Communication 3, 85 (1997). 14. B. Chanda and D. D. Majumdar, Digital Image Processing and Analysis (Prentice-Hall of India, New Delhi, 2000). 15. R. M. Haralick and L. G. Shapiro, Computer and robot Vision (Addison-Wesley, Reading, 1992).
PARTD Clustering Algorithms
81
A N e w Symmetry Based Genetic Clustering Technique for Automatic Evolution of Clusters
Sriparna Saha and Sanghamitra Bandyopadhyay Machine Intelligence Unit Indian Statistical Institute Kolkata, India Email: {sripama-r, sanghami}@isical.ac.in
In this article, a new symmetry based genetic clustering algorithm is proposed which automatically evolves the number of clusters as well as the proper clustering of any data set. Strings comprise both real numbers and the don't care symbol in order to encode a variable number of clusters. A newly proposed index, Sj/m-index, is used as a measure of the validity of the clusters that is optimized by the genetic algorithm. Sym-'mdex uses a new point symmetry based distance. The algorithm is therefore able to detect both convex and non-convex clusters that possess the symmetry property. Kd-tree based nearest neighbor search is used to reduce the complexity of finding the closest symmetric point. Effectiveness of the proposed method is demonstrated for several artificial data sets and one real-life data set. Keywords: Clustering, Genetic Algorithms, Point Symmetry Based Distance, Real encoding
1. Introduction 1
Clustering is a fundamental problem in data mining with innumerable applications spanning many fields. In order to mathematically identify clusters in a data set, it is usually necessary to first define a measure of similarity or proximity which will establish a rule for assigning patterns to the domain of a particular cluster centroid. One of the basic feature of shapes and objects is symmetry. Su and Chou have proposed a point symmetry (PS) distance based similarity measure. 2 This work is extended in 3 in order to overcome some of the limitations existing in. 2 It has further been shown in4 that the PS distance proposed in 3 also has some serious drawbacks and a new PS distance (dps) is defined in4 in order to remove these drawbacks. For reducing complexity of point symmetry distance computation, Kd-tree based data structure is used. if-means is a widely used clustering algorithm that was used in 2 in conjunction with the earlier PS based distance. However, if-means has three major limitations: it requires the a priori specification of the number of clusters (K), it often gets stuck at suboptimal solutions based on the initial configuration and it can detect only hyper-spherical shaped clusters. The real challenge in this situation is to be able to automatically evolve a proper value of K as well as providing the appropriate clustering of a data set. There exists a Genetic Algorithm (GA) based clustering technique, GCUK-clustering,5 which is able to automatically evolve the appropriate clustering for hyperspherical data sets. However for clusters with
other than hyperspherical shapes, this algorithm is likely to fail, as it uses, like the K-means, the Euclidean distances of the points from the respective cluster centroids for computing the fitness value. In this article a variable string length GA based clustering technique is proposed. Here assigment of points to different clusters is done based on point symmetry based distance rather than Euclidean distance. This enables the proposed algorithm to automatically evolve the appropriate clustering of all types of clusters, both convex and non convex which have some symmetrical structures. The chromosome encodes the centres of a number of clusters, whose value may vary. A new cluster validity index named Sym-'mdex is utilized for computing the fitness of the chromosomes. The effectiveness of the proposed genetic clustering technique for evolving the appropriate partitioning of a dataset is demonstrated on four artificial and one real-life data sets having different characteristics. 2. Proposed Algorithm ( V G A P S ) In this section the new clustering algorithm is described in detail. It includes determination of number of clusters as well as appropriate clustering of the data set. This genetic clustering technique is subsequently referred to as variable length genetic clustering technique with point symmetry based distance (VGAPS). In GAs, the parameters of the search space are encoded in the form of strings (called chromosomes). A collection of such strings is called population. Ini-
82
A New Symmetry
Based Genetic Clustering
Technique for Automatic
tially a random population is created, which represents different points in the search space. An objective/fitness function is associated with each string that represents the degree of goodness of the solution encoded in the string. Based on the principle of survival of the fittest, a few of the strings are selected and each is assigned a number of copies that go into the mating pool. Biologically inspired operators like crossover and mutation are applied on these strings to yield a new population. The process of selection, crossover, and mutation continues for a fixed number of generations or till a termination condition is satisfied. 2.1. String
representation
The chromosomes in VGAPS are made up of real numbers (representing the coordinates of the centres) as well as don't care symbol ' # ' . Note that real encoding of the chromosome is adopted since it is more natural to the problem under consideration. The value of K is assumed to lie in the range [Kmim Kmax), where Kmin is chosen equal to 2 unless specified otherwise. The length of a string is taken to be Kmax where each individual gene position represents either an actual center or a don't care symbol. 2.2. Population
initialization
For each string i in the population (i = 1 , . . . P where P is the size of the population), a random number Ki in the range [Kmin,Kmax\ is generated. This string is assumed to encode the centres of Ki clusters. For initializing these centres, Ki points are chosen randomly from the data set. These points are distributed randomly in the chromosome. Let us consider the following example. Example: Let Kmin = 2 and Kmax = 10. Let the random number Ki be equal to 4 for chromosome i. Then this chromosome will encode the centers of 4 clusters. Let the 4 cluster centers (4 randomly chosen points from the data set) be (10.0, 5.0) (20.4, 13.2) (15.8, 2.9) (22.7, 17.7). On random distribution of these centers in the chromosome, it may look like # (20.4, 13.2) # # (15.8, 2.9) # (10.0, 5.0) (22.7, 17.7) # # .
Evolution
2.3. Fitness
of Clusters
computation
This is composed of two steps. Firstly points are assigned to different clusters by the newly developed point symmetry based distance dps. Next, the cluster validity index, Sym-mdex, is computed and used as a measure of the fitness of the chromosome. 2.3.1. Newly developed point symmetry based distance, dps The proposed point symmetry based distance dps(x,c) associated with point x with respect to a center c is defined as follows: Let a point be x. The symmetrical (reflected) point of x with respect to a particular centre c is 2 x c — x. Let us denote this by x*. Let the first and second unique nearest neighbors of x* be at Euclidean distances of d\ and a\ respectively. Then dps(x,c) = -^~Y^-
x
de(x,c)
(1)
where de (x, c) is the Euclidean distance between the point x and c. Some important observations about the proposed point symmetry distance dps(x,c) are as follows: (1) Instead of computing Euclidean distance between the original reflected point x* = 2 xc — x and its first nearest neighbor as in 2 and, 3 here the average distance between x* and its first and the second unique nearest neighbors have been taken. Consequently the term, (di + d2)/2 will never be equal to 0, and the effect of de (x, c), the Euclidean distance, will always be considered. (2) Considering both d\ and di in the computation of dpS makes the PS-distance more robust and noise resistant. From an intuitive point of view, if both di and d^ of x with respect to c is less, then the likelihood that x is symmetrical with respect to c increases. This is not the case when only the first nearest neighbor is considered which could mislead the method in noisy situations. It is evident that the symmetrical distance computation is very time consuming. Computation of dps(x~i,c) is of complexity 0(n). In order to compute the nearest neighbor distance of the reflected point of a particular data point with respect to a cluster center efficiently, we have used Kd-tree based nearest neighbor search.
Sriparna Saha and Sanghamitra
Kd-tree Based Nearest Neighbor Computation: A Kdimensional tree, or Kd-tree is a space-partitioning data structure for organizing points in a Kdimensional space. A Kd-tree uses only those splitting planes that are perpendicular to one of coordinate axes. ANN(Approximate Nearest Neighbor) is a library written in C + + , 6 which supports data structures and algorithms for both exact and approximate nearest neighbor searching in arbitrarily high dimensions. In this article, ANN is used to find d± and di in Equation 1 efficiently. The ANN library implements a number of different data structures, based on Kd-trees and box-decomposition trees, and employs a couple of different search strategies. The Kd-tree data structure has been used in this article. Here ANN is used to find d\ and d
2.3.3. Fitness calculation The fitness of a chromosome is computed using the newly developed Sym-'mdex. Let K cluster centres be denoted by Q where 1 < i < K and number of points in each cluster is equal to n, where i = 1,...K. Then Sym index is defined as follows: Sym(K)
= ( i x i - x DK),
djvjv5 = maxi=\,...NdNN(xi),
(2)
where d;vAr(Si) is the nearest neighbor distance of Xj. Assuming that x* lies within the data space, it may be noted that d
~f-
(3)
Ideally, a point x is exactly symmetrical with respect to some c if di = 0. However considering the un-
(4)
where K is the number of clusters. Here, K
EK = Y,E*
Here for each point Xi, 1 < i < n is assigned to cluster k iff dps(x~i,Ck) < dps(xi,c~j), j = 1,...,K, j ^ k and (dpa(xi,ck)/de(xi,ck)) < 8. For (dpa(xi,Ck)/de(xi,Ck)) > 8, point x"i is assigned to some cluster m iff d e (xi,c m ) < de(x~i,Cj), j = 1,2. ..K, j 7^ m. In other words, point x is assigned to that cluster with respect to whose centers its PSdistance is the minimum, provided this value is less than some threshold 8. Otherwise assignment is done based on the minimum Euclidean distance criterion as normally used in 5 or the iif-means algorithm. The value of 8 is kept equal to the maximum nearest neighbor distance among all the points in the data set. It is to be noted that if a point is indeed symmetric with respect to some cluster centre then the symmetrical distance computed in the above way will be small, and can be bounded as follows. Let d™^ be the maximum nearest neighbor distance in the data set. That is
83
certainty of the location of a point as the sphere of radius d™^ around x, we have kept the threshold 8 equals to d'fi'ff. Thus the computation of 8 is automatic and does not require user intervention. After the assignments are done, the cluster centres encoded in the chromosome are replaced by the mean points of the respective clusters.
2.3.2. Assignment of points
dx <
Bandyopadhyay
(5)
such that
Ei^^d^ix^Ci)
(6)
and DK = maxfi=l \\ci - Cj\\
(7)
DK is the maximum Euclidean distance between two cluster centers among all centres. d*g(xj,c7) is computed by Equation 1 with some constraint. Now first two nearest neighbors of xj — 2 x ct — Xj will be searched among the points which are already in cluster i i.e., now the first and second nearest neighbors of the reflected point Xj of the point x~j with respect to Ci and xj should belong to the ith cluster. The objective is to maximize this index in order to obtain the actual number of clusters and to achieve proper clustering. The fitness function for chromosome j is defined as 1/Symj, where Symj is the Sym index computed for the chromosome. Note that minimization of the fitness value ensures maximization of the Sym index. Explanation: As formulated in Equation 4, Sym is a composition of three factors, these are 1/K, 1/EK and DK- The first factor increases as K decreases; as Sym needs to be maximized for optimal clustering, so it will prefer to decrease the value of
84
A New Symmetry
Based Genetic Clustering Technique for Automatic
K. The second factor is the within cluster total symmetrical distance. For clusters which have good symmetrical structure, EK value is less. This, in turn, indicates that formation of more number of clusters, which are symmetrical in shape, would be encouraged. Finally the third factor, DK, measuring the maximum separation between a pair of clusters, increases with the value of K. Note that value of DK is bounded up by the maximum separation between a pair of points in the data set. As these three factors are complementary in nature, so they are expected to compete and balance each other critically for determining the proper partitioning. 2.4. Genetic
Operations
The following genetic operations are performed on the population of strings for a number of generations. Selection: Conventional proportional selection is applied on the population of strings. Here, a string receives a number of copies that is proportional to its fitness in the population. Crossover. During crossover each cluster centre is considered to be an indivisible gene. Single point crossover, applied stochastically with probability yuc, is explained below with an example. Example: Suppose crossover occurs between the following two strings: # (20.4, 13.2) # # (15.8, 2.9)| # (10.0, 5.0) (22.7, 17.7) # # and (13.2, 15.6) # # # (5.3, 13.7)| # (10.5, 16.2) (7.9, 15.3) # (18.3, 14.5) Let the crossover position be 5 as shown above. Then the offspring are # (20.4, 13.2) # # (15.8, 2.9)| # (10.5, 16.2) (7.9, 15.3) # (18.3, 14.5) and (13.2, 15.6) # # # (5.3, 13.7)| # (10.0, 5.0) (22.7, 17.7) # # Mutation: Mutation is applied on each chromosome with probability fim. Mutation is of three types. (1) Each valid position (i.e., which is not ' # ' ) in a chromosome is mutated with probability fxm in the following way. A number 6 in the range [0,1] is generated with uniform distribution. If the value at that position is v, then after mutation it becomes v x (1 ± 26), if v ^ 0, otherwise for v = 0 it will be equal to ±26. The ' + ' or '-' sign occurs with equal probability. (2) One randomly generated valid position is removed and replaced by ' # ' . (3) One
Evolution of Clusters
randomly chosen invalid position is replaced by randomly chosen point from the data set. Any one of the above mentioned types of mutation is applied randomly on a particular chromosome if it is selected for mutation. 2.5. Termination
Criterion
In this article the processes of fitness computation, selection, crossover, and mutation are executed for a maximum number of generations. The best string having the lowest fitness (i.e., largest Sym index value) seen up to the last generation provides the solution to the clustering problem. We have implemented elitism at each generation by preserving the best string seen up to that generation in a location outside the population. Thus on termination, this location contains the centres of the final cluster. 3. Implementation Results The experimental results showing the efectiveness of VGAPS algorithm are provided for five artificial and one real life data sets. The description of the data sets are given in Table 1. Data_6_2 and Data_4_3 are used in 7 while the other data sets can be obtained on request to the authors. The cancer data set is obtained from [www.ics.uci.edu/~mlearn/MLRepository.html]. Each pattern of Cancer dataset has nine features. There are two categories in the data: malignant and benign. The two classes are known to be linearly separable. There are a total of 683 data points in the data set. VGAPS is implemented with the following parameters (determined after some experimentations): \ic = 0.8, /xm = 0.02. The population size P is taken to be equal to 100. Kmin and Kmax are set equal to 2 and y/n respectively where n is the total number of data points in the particular data set. VGAPS is executed for a total of 30 generations. Note that it is shown in Ref.8 that if exhaustive enumeration is used to solve a clustering problem with n points and K clusters, then one requires to evaluate l/K ^2i-i(—l)K~:'jn partitions. For a data set of size 50 with 2 clusters this value is 2 4 9 — 1 (i.e., of the order of 10 15 ). If the number of clusters is not specified a priori, then the search space will become even larger and utility of GAs is all the more evident. For
Sripama Saha and Sanghamitra
all the data sets, as is evident from Table 1, VGAPS is able to find out appropriate number of clusters and the proper partitioning. Figures 1, 2, 3, 4 and 5 show the final clustered results obtained after application of VGAPS on Sym_5_2, Sym.3_2, Ring_3_2, Data.6.2 and Data_4_3. For cancer data set it is not possible to show the clustered result visually. The obtained cluster centres are (3.013453, 1.266816, 1.378924, 1.304933, 2.056054, 1.293722, 2.080717, 1.215247, 1.105381), (7.130802, 6.696203, 6.670886, 5.700422, 5.451477, 7.780591, 6.012658, 5.983122, 2.540084) respectively. And the actual cluster centres are (2.9640, 1.3063,1.4144,1.3468,2.1081,1.3468, 2.0833,1.2613, 1.0653) and (7.1883, 6.5774, 6.5607, 5.5858, 5.3264, 7.6276, 5.9749, 5.8577, 2.6025) respectively. Table 1 also shows the performance of GCUKclustering5 optimizing Davies-Bouldin index (DBindex) for all the data sets. As is evident, GCUKclustering is able to detect the proper number of clusters as well as proper clustering for DataJL2 and Data_4_3 but it fails for Sym_5_2, Sym_3.2 and Ring_3_2. Figure 6, 7 and 8 show the clustering result obtained by GCUK-clustering on Sym_5_2, Sym.3_2 and Ring.3.2 respectively. GCUK-clustering obtained, incorrectly, 6, 2 and 5 clusters for these three data sets respectively. Clustering results on DataJL2 and Data_4_3 obtained by GCUK-clustering are same as that of VGAPS and are therefore omitted. To compare the performance of VGAPS with that of GCUK-clustering,5 for the real-life data set, Minkowski Score (MS) 9 is calculated after application of both the algorithms. MS is a measure of the quality of a solution given the true clustering. For MS, the optimum score is 0, with lower scores being "better". For Cancer dataset, MS score is 0.3233 for VGAPS and 0.37 for GCUK-clustering. From the above results it is evident that VGAPS is not only able to find the proper cluster number, but it also provides significantly better clustering (both visually as in Figure 1-5, and also with respect to the MS scores). Moreover, we have also conducted statistical test ANOVA, and found that the difference in the mean MS values over ten runs obtained by VGAPS and GCUK-clustering are statistically significant. For Cancer, mean difference in mean MS obtained by two algorithms over ten runs is -4.67 E02 which is statistically significant (significance value
Bandyopadhyay
85
is 0.00). 4. Conclusion In this paper a genetic algorithm based clustering technique, VGAPS clustering, is proposed which assigns the data points to different clusters based on the point symmetry based distance and can automatically evolve the appropriate clustering of a data set. A newly developed symmetry based cluster validity index named Sym index is utilized for computing the fitness of the chromosomes. The proposed algorithm has the capability of detecting both convex and nonconvex clusters that possess the symmetry property. The effectiveness of the clustering technique is demonstrated for several artificial and real life data sets of varying complexities. The experimental results show that VGAPS is able to detect proper number of clusters as well as proper clustering from a data set having any type of clusters, irrespective of their geometrical shape and overlapping nature, as long as they possess the characteristic of symmetry. Table 1. Results obtained with the different data sets using VGAPS and GCUK. Here, # pts, # dim, # AC, # O C denotes respectively number of points in the d a t a set, number of dimension, actual number of clusters, obtained number of clusters
Name
Sym.5.2 Sym-3J2 Ring.3.2 Data.6-2 Data.4.3 Cancer
#pts
850 600 350 300 400 683
#dim
2 2 2 2 3 9
#oc
#AC
5 3 3 6 4 2
VGAPS
GCUK
5 3 3 6 4 2
6 2 5 6 4 2
Fig. 1. Clustered Sym_5_2 using VGAPS where 5 clusters are detected
86
A New Symmetry
Based Genetic Clustering
Technique for Automatic
Fig. 2. Clustered Sym_3_2 using VGAPS where 3 clusters are detected
Evolution of Clusters
Fig. 6. Clustered Sym_5_2 using GCUK-clustering where 6 clusters are detected
m&
.-••••;:"7
irtii? 2 °$*$
°°
Fig. 3. Clustered Ring-3.2 using VGAPS where 3 clusters are detected
Fig. 7. Clustered Sym.3-2 using GCUK-clustering where 2 clusters are detected
41 &
^4 #
^S
*&
Fig. 4. Clustered Data-6-2 using VGAPS where 6 clusters are detected
Fig. 8. Clustered Ring-3_2 using GCUK-clustering where 5 clusters are detected
References
Fig. 5. Clustered Data-4_3 using VGAPS where 4 clusters are detected
1. B. S. Everitt, S. L a n d a u a n d M. Leese, Cluster Analysis (London: Arnold, 2001). 2. M.-C. Su a n d C.-H. C h o u , IEEE Transactions Pattern Analysis and Machine Intelligence 2 3 , 674 (2001). 3. C. H. C h o u , M. C. Su a n d E. Lai, S y m m e t r y as a new m e a s u r e for cluster validity, in 2nd WSEAS Int. Conf. on Scientific Computation and Soft Computing' 2002, p p . 209-213. 4. S. B a n d y o p a d h y a y a n d S. S a h a , Pattern Recog. (Revised.). 5. S. B a n d y o p a d h y a y a n d U. Maulik, Pattern Recognition 3 5 , 1197 (2002).
Sriparna Saha and Sanghamitra Bandyopadhyay 6. D. M. Mount and S. Arya, Ann: A library for approximate nearest neighbor searching (2005), http://www.cs.umd.edu/~mount/ANN. 7. U. Maulik and S. Bandyopadhyay, Pattern Recognition 33, 1455 (2000). 8. M. de Berg, M. V. Kreveld, M. Overmars and
87
O. Schwarzkopf, Cluster Analysis for Application (Academic Press, 1973). 9. A. Ben-Hur and I. Guyon, Detecting stable clusters using principal component analysis in methods in molecular biology (Humana press, 2003).
A Non-Hierarchical Clustering Scheme for Visualization of High Dimensional Data
G. Chakraborty and B. Chakraborty Faculty of Software and Information Science Iwate Prefectural University Japan 020-0193 E-mail: {goutam, basabi}&soft.iwate-pu.ac.jp N. Ogata Graduate School of Software and Information Iwate Prefectural University Japan 020-0193
Science
Clustering algorithms with data visualization capability is needed for discovering structure in multidimensional data. Self Organizing Maps(SOM) are widely used for visualization of multidimensional data. Though SOMs are simple to implement, they need heavy computation as the dimensionality increases. In this work a simple non hierarchical clustering scheme has been proposed for clustering and data visualization of high dimensional data in two dimension. Simple simulation experiments show that the algorithm is effective in clustering and visualization compared to SOM while takes much lesser time than SOM. Keywords: Non hierarchical clustering, Data visualization, Self Organizing Map, Multi Dimemsional Scaling
1. Introduction 123
Clustering algorthims are needed for data mining tasks in the the process of understanding and discovering the natural structure and grouping in a data set. It has also a wide range of applications in other areas like data compression, information retrieval or pattern recognition/classification. Data visualization techniques provide further assistance in this process by visual representation of the data. In case of high dimensional data, understanding of structure is difficult as human cannot visualizes in dimensions more than three. Most clustering algorithms do not work efficiently for high dimensional data due to existence of noisy and irrelevant attributes. High dimensional data sets are often sparse in some dimensions and show clustering tendency in some subspaces. Subspace clustering 45 is an extension of traditional clustering to find clusters in proper subspaces for high dimensional data. Multidimensional Scaling (MDS) is another approach 6 to cluster high dimensional data into a lower dimensional space in data visualization for exploring similarities in data. Much of the work in cluster visualization has been done for hierarchical clustering. For non-hierarchical clustering, Self organizing Maps (SOM) are popular. SOMs invented by Kohonen 7 reduce the data dimensions by producing a map in 1 or 2 dimension and display similarities in the data through the use of self orga-
nizing neural networks. SOMs are simple to construct and easy to understand but the major problem with SOMs is that they are computationally expensive. As the data dimension increases, dimension reduction visualization techniques become more important, but unfortunately the computation time also increases for SOM. In this work a novel non hierarchical clustering scheme is proposed to lower compuational load with data visualization effect comparable to SOM. Our technique is simple to implement and can be implemented for any dimension. The proposed algorithm is represented in the next section followed by simulation experiments and results in the following section. Final section contains conclusion and direction for improvement. 2. Proposed Clustering Scheme The main idea of the proposed scheme is as follows. Let us consider a point in two dimension corresponding to each high dimensional point in the data set. Then any two points in the two dimension should be moved closer or far apart ( Fig. 1) depending on the similarity or dissimilarity in the original high dimensional space. This process will be initiated for any two random points and iterated for a large number of times. Eventually the clusters formed in two dimension will produce a image of the clusters present
G. Chakraborty, B. Chakraborty and N. Ogata
in the data in the original high dimensional space (Fig. 2). The actual algorithm for computation is as follows.
/
"
-
__
\t~~
ii::::::::::::
IE
simirality
Fig. 1. Moving two dimensional points according to similarity in original space.
i ^*^
3i_: • _ J I _ T|""
It'
j.
X
r
^
~7
J
H
~! »x i r it _\lzc _
,31V
Jj , , I -
; ^! ;| Jr ^I _ ; 7; ^ :
c=> \?~
_
_
|~
IE" : ir
I|i ~ ~
~
~
31" 1*":
j i . 3^ 0>1E L . . r E ' a J £-,»-• JSIL'
4 ^ *
\,3C|i.':i " -JT
Fig. 2. space.
'Z
*"
- •"
Start and Final configuration of the two dimensional
(1) Let us consider D to be the data set containing n m-dimensional data points (X\, X2, • ••, Xn) to be clustered, where Xt = xn, X&,..., xim. (2) Normalization of the data: At the first step, the data points are to be normalized to (Yi, Y2, • • •, Yn) as following: Vij
-mini'
= maxij ~mi7iij
where y,j is the normalized value of j t h component (dimension) of the ith data point. miriij = min(xij,X2j, • • • ,xnj) and maxij — max(xij,X2j, • • • ,xnj) represent the minimum and maximum values of the j th attribute (dimension) of the data respectively. (3) Two dimensional data generation: Random two dimensional n data points are generated corre-
89
sponding to the data points in the original m dimension(left portion in the Fig 2). (4) Similarity Calculation: Any two data points Y)t and Yi in m dimension is selected and their similarity is calculated by Manhatton distance as: Mkl = \Yk-Yl\ As the value of Manhatton distance depends on number of dimensions, scaling has been done to limit the values to lie between 0 - 0.5 by the following equation m m + Mki Rki represents the degree of relatedness of the data points Yk and Yj (5) Moving the data points in two dimension: Now according to the value of Rki, the data points in two dimension corresponding to Yfe and Y; are moved. The actual movement value is calculated as follows: i_ + b Movement Value = ax fit where the parameters a and 6 can be calcu lated from the following equations. 1 <*i +b (2) 0.01 1 a 2 = a— + b (3) U.o a\ and a2 represent the movement of points in two dimension for high similarity of the original points and low similarity of original points respectively. These two parameters are to be set by users from the value 0.01 to 0.5. The movement of two dimensional points according to the degree of relatedness of the original points are shown in Fig. 1 and Fig. 3. (6) The above procedure (step 4 and step 5) has to be repeated a large number of times until the data are clustered in two dimension as shown in (Fig 2) 3. Simulation Experiments and Results Simple simulation experiments have been done for checking the efficiency of the the proposed clustering scheme compared to SOM. Two sets of data are used in the experiment.
90
A Non-Hierarchical
Clustering Scheme for Visualization
of High Dimensional
Data
data. Fig 6 represents the result of the proposed clustering scheme. It is seen that in the two dimension there are ten discrete clusters corresponding to ten classes of vowels. Though the results are comparable to the output of SOM which also clearly shows ten clusters, the time taken for clustering by the proposed scheme is far less than that of SOM.
Fig. 3.
Movement value according to degree of Relatedness.
3.1. Vowel data
set
8
This data set contains three dimensional array of (three formant frquencies) vowel data of 10 vowels for different speakers uttered several times, a total of 300 data. The three dimensional view of the data set is shown in Fig 4. Now the proposed algorithm and SOM is simulated with this data. For SOM the learning epoch is 10,000 times. For the proposed algorithm the number of iteration has been done for 100,000 times. Though the number of iteration is larger in the proposed algorithm than in SOM, the total time taken is less for the proposed algorithm as time for one iteration is lesser than for one iteration for SOM. The total time taken for SOM is around 2700 sec while for the proposed algorithm it was nearly 100 sec.
Fig. 5.
Result of Vowel data by SOM.
1 2 3 4 5 6 7 8 9 10
*
0.524
''*&& 0.5235
' **&§&. "
* :-s • v
0.523
-
-
0.5225
-
-
0.522
^ **#%..
0.5215
0.487
Fig. 6.
0.4875
0.488
Vowel data set.
Fig. 5 represents the result of SOM by Vowel
0.489
0.4895
0.49
0.4905
0.491
0.4915
Result of Vowel data for the Proposed Algorithm
3.2. Fisher Iris
Fig. 4.
0.4885
Data
The proposed algorithm has also been simulated with Fisher Iris data. 9 This is a four dimensional data set containing three classes of Iris plant (Iris Setosa, Iris Versicolour, Iris Verginica) each with 50 sample points, a total of 150 data points. As the data set is four dimensional it is impossible to visualize in four
G. Chakraborty, B. Chakraborty and N. Ogata 91 dimension. Simulation experiments for clustering by SOM and the proposed scheme have been done for 10,000 times and 100,000 times respectively as before. Fig. 7 and Fig 8 represents the result of SOM and the result of the proposed scheme. In this case also the results show that the clustering result of the proposed algorithm is similar to the results obtained by SOM but the time taken for the proposed clustering scheme is lesser than time taken by SOM.
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Fig. 7. Result of the Iris data by SOM.
vs VG
»
-
'»%,
. 0.49952
0-499525
0.49953
0.499535
0.49954
0.499545
0.49955
0.499555
Fig. 8. Result of Iris data by Proposed Algorithm.
4. Conclusion In this work a non hierarchical clustering scheme has been proposed for data visualization of multi dimensional data in two dimension. The clusters generated in two dimension retain the properties of the data
i,e. similarity/dissimilarity between the data points in the original dimension. The algorithm is simple and less time consuming than more widely used Self Organizing Map for non hierarchical clustering. The clustering results obtained by SOM and the proposed algorithm is almost similar. In the present work we have done simulation with 3 and 4 dimensional data sets but the algorithm can be applicable to data sets with larger dimension. The computational load with increase of dimension in the proposed algorithm is not much compared to the increase of computational load with dimension in case of SOM as in this algorithm we used manhatton distance for similarity measurement. With the use of euclidean distance, the computational load increases but clusteting result does not differ much. So we preferred manhatton distance as the similarity measure. Here in this simple experiment, in both the cases, it is found that the resulting clusters clearly show the data structure like SOM but take much lesser time than SOM for execution. Further experiment with data of higher dimension has to be done to justify the efficiency of the algorithm. Moreover we have only experimented with the data sets which have clear clusters, we need also to experiment with datasets with overlapping cluster to see the effect of the proposed algorithm. In our simulation we have decided at the start to iterate the process for 100,000 times. We need to set a proper stopping criterion depending on some cluster validity measures. We are currently working on this problem and hope to report in near future. References 1. A. K. jain and R. C. Dubes, Algorithms for Clustering Data (Prentice-Hall, Englewood Cliffs, NJ, 1988). 2. L. P. wang and X. J. Fu, Data Mining with Computational Intelligence, (Springer, Berlin, 2005). 3. V. S. Tseng and C. P. Kao ' Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method', IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 355 - 365, (2005). 4. L. Parsons, E. Haque and H. Liu, ACM SIGKDD Explorations Newletter 6(1),90, (2004). 5. X. J. Fu and L. P. Wang, ' Data dimensionality reduction with application to simplifying RBF network structure and improving classification performance', IEEE Trans. System, Man, Cybern, Part B, 33, 399-409, (2003).
92
A Non-Hierarchical Clustering Scheme for Visualization of High Dimensional Data
6. J. Kruskal and M. Wish, Multidimensional Scaling (Sage Publications, London, 1978). 7. T. Kohonen, Self- Organizing Maps (New York: Springer-Verlag, 1997). 8. UCI Machine Learning Repository (http://www.ics.uci.edu/ mlearn/databases/).
9. J. C. Bezdek, 'Pattern Recognition with Fuzzy Objective Functions (Plenum Press, NY, 1981). 10. J. Lee and D. Lee ' An improved cluster labeling method for support vector clustering', IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 461 - 464, (2005).
93
A n Attribute Partitioning Approach to Correlation Connected Clusters
Vijaya Kumar Kadappat'* and Atul Negi* t Department of Computer Applications Vasavi College of Engineering, Hyderabad 500031, India. Email: [email protected]. com, kadappakumar@gmail. com * Department of Computer and Information Sciences University of Hyderabad, Hyderabad 500046, India. Email: [email protected], [email protected]
Correlation Connected Clusters are objects that are grouped based upon correlations in local subsets of data. The method 'Computing Clusters of Correlation Connected objects' (4C) 1 uses DBSCAN and Principal Component Analysis (PCA) to find such objects. In this paper, we present a novel approach that attempts to find correlation connected clusters using an attribute partitioning approach to PCA. We prove that our novel approach is computationally superior to the 4C method. Keywords: Clustering, DBSCAN, PCA, SubXPCA, correlation connected clusters, attribute partitioning
1. Introduction Cluster analysis is widely used for data exploration in data mining. A recently proposed clustering technique, 'Computing Clusters of Correlation Connected objects' (4C technique) is based on DBSCAN2 and Principal Component Analysis (PCA). The 4C method was proved to be useful in various data sets such as molecular biology, time sequences, etc. 4C finds correlations which are local to subset of the data and are not global, since dependency between features can be different for different subgroups of data set. The 4C method was proved to be superior to DBSCAN, CLIQUE, etc.1 However, 4C is not suitable for high dimensional data due to inability of DBSCAN handling such data. PCA is a crucial aspect of 4C method. In 4C method, PCA is used to find correlations in the neighbourhood of a core object and PCA plays vital role in 4C in finding correlated clusters. A clustering approach based on subsets of attributes was previously presented by Friedman and Meulman. 3 Following this line of thinking, SubXPCA 4 was proposed to improve PCA. SubXPCA finds local subsets of attributes and combines them globally like PCA does. More importantly, SubXPCA 4 was found to be computationally superior to PCA for high dimensional data. In this paper we attempt to improve 4C method based upon our insight into advantages of attribute partitioning approach to PCA (i.e. SubXPCA) for high dimensional data and we prove computational superiority of our approach over 4C.
The organization of the paper is as follows. In section 2, we review the salient aspects of 4C 1 and a review of SubXPCA is presented in section 3. We propose 'an Attribute Partitioning approach to Correlation Connected Clusters' (AP-4C) and time efficiency of AP-4C over 4C is proved in section 4. We conclude with some comments in section 5. 2. Computing Clusters of Correlation Connected objects (4C) We review briefly the 4C method here and a detailed discussion may be found in Bohm et oZ.'s work. 1 Consider X = { X i . . . X N } , the set of N patterns, each of d dimensionality. 4C algorithm: We use the definitions of core object, correlation dimension, direct correlationreachability, Ne * (Xi) etc., as defined by Bohm et al.x for our discussion and we give them here - correlation dimension: Let S C X, A < d, V = { A j . . . A^}, the eigenvalues of S in descending order and 6 £ 5i((5 ss 0). S forms a A— dimensional linear correlation set w.r.t. 5 if atleast d — X eigenvalues of S are close to zero. Let S C X be a linear correlation set w.r.t. S € 5ft. The number of eigenvalues with Aj > 6 is called correlation dimension of S. Direct correlation reachability (DirReach^(P2,Pi))Let e,S £ 5ft and /x, A € N. A point Pi e D is direct correlation-reachable from a point P2 € D w.r.t. e, n, 6 and A if P2 is correlation core object, the correlation dimension of e-neighbourhood
94
An Attribute
Partitioning
Approach to Correlation Connected
of Pi, Ne(Pi) is atmost A and Pj 6 7VeCp2(P2). Cp2 is correlation similarity matrix, Nf''(P) = {Xi e D\max{distP(P, Xj), distXi(Xit P)} < e}, where distP(P,Xi) = y/(P - Xi)CP(P - Xi)T. Input: X, pattern (object) set; e, neighbourhood; fi, number of points in e—neighbourhood; A, upper bound for correlation dimension; 6, threshold to select correlation dimension. Output: Every object is assigned a cluster-id or marked as noise. Assumption: Each pattern (object) in X is marked as unclassified initially. Method: For each unclassified pattern (object), Xi € X repeat the steps 1 — 3 l.Test whether Xi is a correlation core object as follows: l.l.Compute e—neighbourhood of Xi, iV^Xi). 1.2.If Number of elements in Ne(Xi) > fi, then 1.2.1. Compute covariance matrix of patterns in AT£(Xi), i.e. M X | . 1.2.2.If Correlation Dimension of A^Xi) < A 1.2.2.1.Compute correlation similarity matrix, Mx, and Ne X '(X;) 1.2.2.2.Test if number of patterns inAT e Mx '(Xi) >n 2.If (Xi is a correlated core object) then Expand a new cluster as follows: 2.1.Generate new cluster-id. 2.2.Insert all Xj e N 6 ' (Xi) into queue, Q. 2.3.While (Q is not empty) repeat the steps 2.3.1 - 2.3.4. 2.3.l.E = first object in Q. 2.3.2.Compute R = {Xj e X\DirReaehi>*(E,Xj)}, where DirReachf,^(E,'X.j) indicates Xj is direct correlation reachable from correlation core object E. 2.3.3.For each Xj € R repeat the steps 2.3.3.1-2.3.3.2 2.3.3.l.If Xj is unclassified or noise then Assign current cluster-id to Xj. 2.3.3.2.1f Xj is unclassified then insert Xj into queue Q. 2.3.4.Remove E from queue Q. 3.1f (Xi is not a correlated core object) then Mark Xi as noise.
Clusters
4C method was found to be very useful to find correlations in local subsets of data such as microbiology, e-commerce. 4C makes use of PCA to find correlations in the data set, which is not suitable for high dimensional data, hence 4C consumes a large chunk of time for such data sets. To counter this problem, we make use of SubXPCA, a computationally more efficient variation of PCA which is described in the following section. 3. Cross-sub-pattern Correlation based Principal Component Analysis (SubXPCA) We review briefly SubXPCA here and a detailed discussion may be found in the work of Negi and Kadappa. 4 S u b X P C A Algorithm: In the following steps, we use the indices as follows. 1 < i < N; 1 < j < k; 1 < p < dt; 1 < s < kr 1.Apply classical PCA for each sub-pattern set as follows:5 1.1. Partition datum, Xi suitably into fc(> 2) equally-sized sub-patterns, each of size dt (= d/k) and SPj is the set of j t h sub-patterns of Xi, Vi. 1.2. For every SPj, repeat the following steps 1.2.1 to 1.2.4. 1.2.1.Compute covariance matrix, (Cj)dtxdi1.2.2.Compute eigenvalues (A£) and corresponding eigenvectors (e^). 1.2.3.Select r (< d{) eigenvectors corresponding to first r largest eigenvalues obtained in step 1.2.2. Let Ej be the set of r eigenvectors (column vectors) selected in this step. 1.2.4. Extract r local features (PCs) from SPj by projecting SPj onto Ej as follows. Let Yj be the reduced data in this step, and is given by (Yj)Nxr
=
(SPj)Nxdi(Ej)dtxr
1.3. Collate Yj, Vj, as shown below. Let Z denotes such combined data. Z; = (j/i(i, l),yi(i,2),...,yi(i,r)... and so on . . . yk(i, 1 ) , . . . , yk{i, r)) where Zi is the ith r ow of Z, which corresponds to Xi. And (Vjih l),Vj(i,2),...,Vj(i,r)) is ith r o w o f Y j . 2. Apply classical PCA on Z obtained in step 1.3. 2.1.Compute Final covariance matrix, (C F )(k r )x(kr
Vijaya Kumar Kadappa and Atul Negi 95
using SubXPCA 4 as follows: 1.2.1.1.Partition each pattern in 7V£(Xi) of pattern Xi, into k sub-patterns, find sub-covariance matrices, project sub-patterns on them and find final covariance matrix (Cx,) of the same neighbourhood as given in step 1.2.1 and 2.1 of sec. 3. 1.2.1.2.Compute eigenvalues and eigenvectors of C x , obtained in step 1.2.1.1. 1.2.1.3.Count the number of eigenvalues greater than S. If the count < A then 1.2.1.3.1.Compute correlation similarity matrix Cx., and
for the data Z. 2.2. Compute eigenvalues (Xf) and corresponding eigenvectors (ef). 2.3.Select rF (< kr) eigenvectors corresponding to first rF largest eigenvalues obtained in step 2.2. Let E F be the set of rF eigenvectors selected in this step. 2.4.Extract rF global features (PCs) by projecting Z (obtained in step 1.3) onto E F . Let Z F be the data obtained after projection and is given by ( Z F ) N x r F = ( Z ) N x k r ( E F ) k r x r F
2.5.Z F is the final reduced pattern data (corresponds to original data, X), which can be used for subsequent classification, recognition, etc. SubXPCA was proved to be computationally more efficient than PCA and the successful classification results on 5 UCI repository of Machine Learning databases was also presented by Negi and Kadappa. 4 4. An Attribute Partitioning approach to Correlation Connected Clusters (AP-4C) In this section we present our clustering method, AP4C, which is based on DBSCAN and SubXPCA. 4.1. AP-4C
Algorithm:
To find correlation dimension of e-neighbourhood of core object (See step 1 of sec. 2), 4C method uses classical PCA which is computationally expensive. To ease this problem, AP-4C method uses SubXPCA to find eigenvalues and eigenvectors, to compute correlation dimension. The algorithm is as follows. Input: X, pattern (object) set; e, neighbourhood; fi, number of points in e—neighbourhood; A, upper bound for correlation dimension; 5, threshold to select correlation dimension Output: Every object is assigned a cluster-id or marked as noise Assumption: Each pattern (object) in X is marked as unclassified initially Method: For each unclassified pattern (object), X; 6 X repeat the steps 1 - 3 . l.Test whether Xi is correlation core object as follows: 1.1.Compute e-neighbourhood of Xi, 7Vf(Xi). 1.2.If Number of elements in 7Ve(Xi) > u, then 1.2.1.Find Correlation Dimension of iv £ (Xi)
JVfx'(Xi) 1.2.1.3.2.Test if number of patterns in -/VeCx'(Xi) > / i Steps 2 and 3: are same as 4C method (See sec. 2), except that M is replaced with C. hence for brevity we do not reproduce it here. To give a better conceptual conprehension of our method we summarize it in figure 1. 4.2.
Time complexities AP-4C
of 4C and
In PCA techniques, where covariance matrix is computed explicitly, a large amount of time is spent in computing covariance matrix, and an insignificant amount of time for other tasks such as finding eigenvalues. Hence, we focus on time complexity of covariance matrices of PCA and SubXPCA used in 4C and AP-4C methods respectively. Consider X = { X i , . . . , X N } , the set of TV patterns of size d. Time complexity to calculate covariance matrix (C) in PCA, Tc' It involves computation of d(d + l ) / 2 parameters, and a total of Nd(d + l ) / 2 operations and Tc is given as Tc = 0(Nd2). Time complexity to calculate k sub-covariance matrices (Cj) in SubXPCA, Tfx, is given as follows. TFl = O(kNdf)
(1)
where k is number of sub-patterns per pattern and O(Ndf) is the time complexity of calculating a subpattern covariance matrix, Cj. Time complexity to calculate final covariance matrix (C F ) in SubXPCA, I> 2 , is given by TF2 = Q(Nk2r2)
(2)
96
An Attribute Partitioning Approach to Correlation Connected Clusters
('lusters awl/or Noise objects
A No V Select next unclassified \ ; •
Mnd ('-neighbourhood of X. N (X );
Find Correlation Dimension of N(.(Xf) using SubXPCA: u) Partition each object into sub-objects b) Extract local features from each i'!l sub-objects set using PCA c) Apply PCA on locally extracted data to get eigen values d) If (number of eigen values > 8) < A. then (i) Compute correlation similarity matrix, C" ; (ii) If Count (objects in (^-neighbourhood of X. vv.r.t. C ) > u then X is- correlation core object
No
Expand cluster w.r.l. X. using concept of direct correlation-reachabililv
Expansion of cluster is completed
Fig. 1. The flow chart of proposed AP-4C algorithm where r < di, and di, is sub-pattern size. From eqs. (1) and (2), time complexity t o calculate all covariance matrices of SubXPCA, T>, is given by
7> = O(kNdf) + 0{Nk2r2)
(3)
Time complexity of 4C method, T±c, is discussed in 1 and is reproduced here. TiC = 0(NTc
+ Nd3) = 0(NNd2
+ Nd3)
(4)
Vijaya Kumar Kadappa and Atul Negi
On the same lines, time complexity of AP-4C method, TAP, is given by 0(N(kNdf+Nk2r2)+Nkdf (5) where di is sub-pattern size and k is number of subpatterns per pattern (See sec. 3). TAP = 0(NTF+Nkdf)
=
Theorem 4.1. TF < Tc, Vr < diy/(k-l)/k, where 2 < k < d/2, is number of sub-patterns per pattern, r is number of chosen projection (eigenvectors per Sj and di is sub-pattern size (See step 1 in sec 3) Proof. To prove that TF < Tc From eq. (3), TF = kNdf + N(kr)2 TF = (l/k)Nd2 + {d2/d2)N{kr)2 (since di = d/k) TF = (l/k)Nd2 + (k2/d2) [Nd2] r2 TF = (l/k)Nd2 + (r2/d2) [Nd2] (since di = d/k) TF = a+r^)[Nd2} (6) Obviously, TF < Nd2 if (± + £ ) < 1 TF < Nd2 if r2/df < (1 - 1/fc)' Hence TF
«
•
(\/k)Tc-
Proof of Lemma. Since r < d\, it is true that r > 1 (From step 1.2.3 in sec 3). To prove TF w (l/k)Tc, we have to minimize (r2/d2) (from eq. (6)). It is true that (r2/d2) reaches minimum possible value, as r tends to minimum possible value and d; tends to maximum possible value. Also it is true that (r2/d2) reaches minimum possible value, as r —> 1 and di —• (d/2) (since r > 1 and Theorem 4.1). Since di = (d/k), (di —> d/2) implies k —• 2. From previous discussion, (r2/d2) reaches minimum possible value, as r —> - 1 and fc->2. Hence the Lemma follows. • By Lemma 4.1, TF as (l/k)Tc is true for smaller values of k and r. However, in practice, r may not be chosen as 1 (i.e. smallest possible value), especially when k is small, since the classification rate may get reduced due to less number of PVs (r). Hence some tradeoff between r and k is required to achieve good classification rate and time efficiency.
Lemma 4.2. TAP < T^c-
97
Proof of Lemma. From Theorem 4.1, TF < Tc (7) Consider Second term from eq (5), i.e. Nkdf. Nkdf = (l/k2)Nd3 (since dt = d/k) Nkdf < Nd3 (since k > 2) (8) Therefore, from eqs. (4), (5), (7) and (8) the Lemma follows. • 4.3. Why is AP-4C
more efficient
than
4C ? AP-4C uses SubXPCA to compute eigenvalues which are used to find correlation dimension of eneighbourhood of a core object (See step 1.2.1 of sec. 4.1). Similarly 4C uses classical PCA for the same. In PCA methods where covariance matrix is comuted explicitly, most of the time is consumed for the computation of covariance matrix alone. In contrast to PCA (where a single large covariance matrix, C, is computed), SubXPCA computes k, (k > 2), smaller sub-pattern covariance matrices (Cj), one for each sub-pattern set, Sj, and one final covariance matrix ( C F ) . By the Theorem 4.1, it is obvious that TF < Tc, Vr < di\J(k — l)/k, where r is number of chosen projection vectors per Sj, di is sub-pattern size and k is number of sub-patterns per pattern. The upperbound for r (i.e. di^/(k — l)/k) is reasonably large and in practice, we choose first few salient features (i.e. r is small in general), therefore, the computation of final covariance matrix, C F , becomes trivial. Finally, SubXPCA is faster by nearly k times to PCA (See Lemma 4.1). The concept of partitioning is the basic reason for the lesser time complexity of SubXPCA. Since we use SubXPCA in AP-4C for finding correlation dimension of e—neighbourhood, AP-4C is thus faster than 4C, which uses PCA and the same is proved in Lemma 4.2. It was found that classification results of SubXPCA and PCA were significantly same. 4
5. Conclusions and Future work In this paper, we have proposed a new and efficient 4C method, AP-4C, suitable for high dimensional data. Theoretical proofs reveal that AP-4C is more efficient than 4C. 4C becomes a special case of AP4C if (i) the number of sub-patterns per pattern in SubXPCA is taken as 1 and (ii) step 2 of SubXPCA
98
An Attribute Partitioning Approach to Correlation Connected Clusters
is omitted. AP-4C may be extensively used in high dimensionality d a t a mining applications. In t h e near future, we a t t e m p t to improve our method by using ideas from 'clustering objects on subset of a t t r i b u t e s ' , 3 where the relevant a t t r i b u t e subsets for each individual cluster can be different and partially (or completely) overlap with those of other clusters. References 1. C. Bohm, K. Railing, P. Kroger and A. Zimek, Computing clusters of correlation connected objects, in
Proc. of SIGMOID ACM, (Paris, France, 2004). 2. A. K. Pujari, Data mining Techniques (Universities Press, India, 2002). 3. J. Friedman and J. Meulman, J. of Royal Statistical Society. 66, 815 (2004). 4. A. Negi and V. K. Kadappa, An experimental study of sub-pattern based principal component analysis and cross-subpattern correlation based principal component analysis, in Pnoc. Image and Vision Computing international confernce, (Univ. of Otago, New Zealand, 2005). 5. S. Chen and Y. Zhu, Pattern Recognition 3 7 , 1081 (2004).
PART E
Document Analysis
101
A Hybrid Scheme for Recognition of Handwritten Bangla Basic Characters Based on H M M and MLP Classifiers
U. Bhattacharya*, S. K. Parui and B. Shaw Computer
Vision and Pattern Recognition Unit Indian Statistical Institute 203, B. T. Road, Kolkata-700108, INDIA E-mail: ujjwal&isical.ac.in*
This paper presents a hybrid approach to recognition of handwritten basic characters of Bangla, the official script of Bangladesh and second most popular script of India. This is a 50 class recognition problem and the proposed recognition approach consists of two stages. In the first stage, a shape feature vector computed from two-directionalview-based strokes of an input character image is used by a hidden Markov model (HMM). This HMM defines its states in a data-driven or adaptive approach. The statistical distribution of the shapes of strokes present in the available training database is modelled as a mixture distribution and each component is a state of the present HMM. The confusion matrix of the training set provided by the HMM classifier of the first stage is analyzed and eight smaller groups of Bangla basic characters are identified within which misclassifications are significant. T h e second stage of the present recognition approach implements a combination of three multilayer perceptron (MLP) classifiers within each of the above groups of characters. Representations of a character image at multiple resolutions based on a wavelet transform are used as inputs to these MLPs. This two stage recognition scheme has been trained and tested on a recently developed large database of representative samples of handwritten Bangla basic characters and obtained 93.19% and 90.42% average recognition accuracies on its training and test sets respectively. Keywords: Handwritten character recognition; Bangla character recognition; stroke feature; wavelet based feature; HMM; MLP.
1. Introduction Although significant development has already been achieved on recognition of handwriting in scripts of developed nations, not much work has been reported on Indie scripts. However, in the recent past, significant research progresses could be achieved towards recognition of printed characters of Indian scripts including Bangla. 1 Unfortunately, the methodology for printed character recognition cannot be extended towards recognition of handwritten characters. The development of a handwritten character recognition engine for any script is always a challenging problem mainly because of the enormous variability of handwriting styles. Many diverse algorithms/schemes for handwritten character recognition 2 ' 3 exist and each of these has its own merits and demerits. Possibly the most important aspect of a handwriting recognition scheme is the selection of an appropriate feature set which is reasonably invariant with respect to shape variations caused by various writing styles. A large number of feature extraction methods are available in the literature. 4 India is a multilingual country with 18 constitutional languages and 10 different scripts. In the
literature, there exist a few studies on recognition of handwritten characters of different Indian scripts. These include Ref. 5 for Devanagari, Ref. 6 for Telugu, Ref. 7 for Tamil, Ref. 8 for Oriya among others. Bangla is the second most popular Indian script. It is also the script of two other Indian languages, viz., Assamese and Manipuri. On the other hand, Bangla is the official language and script of Bangladesh, a neighbour of India. Several off-line recognition strategies for handwritten Bangla characters can be found in Refs. 9-11. A few other works dealing with off-line recognition of handwritten Bangla numerals include Refs. 12,13. Many of the existing works on handwritten Bangla character recognition are based on small databases collected in laboratory environments. On the other hand, it is now well established that a scheme for recognition of handwriting must be trained and tested on a reasonably large number of samples. In the present work, a recently developed large and representative database 14 of handwritten Bangla basic characters has been considered. In the present article, a novel hybrid scheme for recognition of Bangla basic characters has been proposed. In the first stage, a shape vector representing all of certain directional view-based strokes of an in-
102
A Hybrid Scheme for Recognition of Handwritten
Bangla Basic Characters Based on HMM and MLP
put character is fed to our HMM classifier. The set of posterior probabilities provided by the HMM determines the smaller group of the input character among N such groups of confusing character shapes. In the second stage, three MLP classifiers for the particular group independently recognize the character using its representations at three fine-to-coarse resolutions based on a wavelet transform. Final recognition is obtained by applying the sum rule 15 for combination of the output vectors of three MLPs of the particular group. An HMM is capable of making use of both the statistical and structural information present in handwritten shapes. 16 A useful property of the present HMM is that its states are determined in a data-driven or adaptive approach. The shapes of the strokes present in the training set of our image database of handwritten Bangla characters are studied and their statistical distribution is modelled as a multivariate mixture distribution. Each component in the mixture is a state of the HMM. This model is robust in the sense that it is independent of several aspects of input character sample such as its thickness, size etc. In the proposed approach, above HMM is used to simplify the original fifty class recognition problem into several smaller class problems. Wavelets have been studied thoroughly during the last decade 17 and its applicability in various image processing problems are getting explored. In the present work, Daubechies 18 wavelet transform has been used to obtain multiresolution representations of an input character image. Distinct MLPs are employed to recognize the character at each of these resolution levels. Final recognition results are obtained by combining the above ensemble of MLPs. Such a multiresolution recognition approach was studied before for numeral recognition problem 12 and was observed that it is robust with respect to moderate noise, discontinuity or small changes in rotation. 2. Stage I of the Recognition Scheme The first stage of the proposed recognition scheme consists of a preprocessing module, extraction of directional view based strokes from character image, computation of stroke features and designing a classifier based on HMM.
2.1.
Classifiers
Preprocessing
For cleaning of possible noise in an input character image, it is first binarized by Otsu's thresolding technique 19 followed by smoothing using median filter of window size 5. No size normalization is done at the image level since it is taken care of during feature extraction. A sample image from the present database and the effect of smoothing on extracted strokes by the subsequent module is shown in Figs. 1(a) - 1(1).
"3*
(6)
V)
"""GO
ZJI *3* "c^ ^ ^ s - ^ «)
(H)
CO
0')
<#
CO
Fig. 1. (a) Image of a sample character; (6) image in (a) after binarization; (c) vertical strokes obtained from (6); (d) vertical strokes obtained after pruning (c); (e) horizontal strokes obtained from (6); (/) horizontal strokes obtained after pruning (e); (g) image of the sample character after smoothing; (h) image of the smooth character sample after binarization; (i) vertical strokes obtained from (h); (j) vertical strokes obtained after pruning (i); (k) horizontal strokes obtained from (h); (I) horizontal strokes obtained after pruning (k).
2.2. Stroke
Extraction
Horizontal and vertical strokes present in the input character shape are obtained in the form of digital curves each of which is one-pixel thick and in which all the pixels except the terminals have exactly two 8-neighbours. Certain two directional view based approach is considered for extraction of these strokes. In this approach, a binarized image E consisting of only the vertical strokes present in the input character shape is obtained by removing all the object pixels of the binarized character image other than those for which right or east 4-neighbour is a background pixel, that is, where the pen movement is upward or downward. In other words, the object pixels of the input sample image that are visible from the east form the vertical strokes as shown in Fig. 2(a). Similarly, another binarized image S consisting of only the horizontal strokes is generated. The object pixels of the binarized character image whose bottom or south 4-neighbour is a background pixel, that is, where the pen movement is side-wise form the horizontal strokes as shown in Fig. 2(b). Thus, here,
U. Bhattacharya,
connected components of images E and S are respectively vertical and horizontal strokes. Vertical strokes consisting of pixels less than 20% of the height and horizontal strokes consisting of pixels less than 20% of the width of the input image are ignored during further processing. Such vertical and horizontal strokes for a character shape in Fig. 1(a) are respectively shown in Figs. l(j) and 1(1).
(a)
(b)
Fig. 2. Strokes in a character sample are shown, (a) Dark pixels indicate E image; (b) dark pixels indicate S image.
2.3. Computation Components
of Feature
On each stroke, six points Pi,i = 0, . . . , 5 are obtained such that the curve lengths Pj_iPj, i = 1,...,5 are equal with P$ and P5 are respectively the lowest and highest pixels for a vertical curve or the leftmost and rightmost pixels for a horizontal curve. The feature components corresponding to each stroke from an input character are obtained as {oti,ct2,- • • ,cts,X,Y,L}, where on is the angle between the line P i - i P j and the positive X-axis, (X, Y) is its centroid and L is the length of the stroke. It represents the shape, size and position of the stroke with respect to the character image. The quantities a* are invariant under scaling and the remaining three quantities X, Y and L are normalized with respect to the character height. For an input character, all the strokes are arranged from left to right and obtained the observation sequence O — (0\,..., OT), where T is the number of strokes and Oi = {an, aa,..., ai5,Xi, Yt, Lt} corresponds to the i-th stroke in the above arrangement. 2.4. HMM Based
Classifier
The HMM designed for the present classification task is a non-homogeneous one and the classifier used in the first stage of the proposed recognition scheme consists of 50 such HMMs denoted by 7,-, j = 1,2,..., 50 one for each of the underlying 50 classes.
S. K. Parui and B. Shaw
103
For an input pattern X, the probability values P ( 0 | 7 j ) are computed for all j = 1,2,..., 50 using the well known forward and backward algorithms. 20 Finally X is assigned to class c e { 1 , 2 , . . . , 50} such that c = arg max { P ( 0 | 7 , ) } . Computational del<j<50
tails are avoided due to space constraints. It has been assumed above that the feature values {oti,a-2,... ,a5,X,Y,L} corresponding to the strokes obtained from training samples of each class follow a mixture of K 8-dimensional Gaussian distributions. The unknown parameters of this mixture distribution for different choices of K are obtained using the well known EM algorithm. 21 The optimum value of K for each class is determined by using the Bayesian information criterion (BIC). 22 T h e mean vectors of these K distributions are called shape primitives for the corresponding character class and these form the state space of the associated HMM. Thus, here, the states of an HMM are not determined a priori but are constructed adaptively on the basis of the training set of character samples of the respective class. The parameters of the underlying HMMs (such as the initial and transition probability matrices) are computed on the basis of the above states. 3. Final Classification An analysis of the recognition performance of the above HMM based classifier on both the training and test sets shows that a significant percentage of misclassifications occurred within several smaller groups of character classes. This pattern of misclassifications is shown in Table 1. In the second stage, fresh classification is performed for each of the above sub-groups. It significantly improves the overall recognition accuracy of the first stage. However, in this context, it is obvious that if an input sample in the first stage of its classification is misclassified in a group other than its own (as shown in Table 1), then the second stage of the present classification scheme fails to correctly classify the sample. An input sample misclassified in the first stage as another character of its own group, gets a second chance of being properly classified by the second stage of classification. Brief details of the final classification scheme is given below. A similar scheme 23 was used before for recognition of hand printed Bangla numerals.
104
A Hybrid Scheme for Recognition of Handwritten
Bangla Basic Characters Based on HMM and MLP
Classifiers
Table 1. Groups of confusing character shapes provided by the HMM classifier Group No.
No. of 1samples in the group
Groups of confusing characters
Training
Tea
reclassification percentage Between groups Total Within 'group Test Training Test Tiaining Training Test
2
3T 3It * * , * $
3
Sfcfr*
1200
707
20.50
21.22
5.17
5.94
25.67
27.16
4
^ *r TF * 3
1500
1000
10.27
11.90
5.60
7.50
15.87
19.40
5
\3 !ft" T5 v» 3 *
1800
1200
30.17
32.83
3.67
3.92
33.84
36.75
6
*r*r*r*TTrwws'irsr
3000
1974
33.00
37.13
6.77
952
39.77
46.65
7
* ?&* T^s \? % t<*
2700
1800
15.85
17.06
4.96
6.78
20.81
23.84
3000
2000
27.23
29.30
6.23
7.65
33.46
36.95
15000
9481
23.05
25.67
5.37
6.98
28.42
32.65
1
8
j>*rFJto!T«t*r?r*rT> Total
3.1. Multi-resolution Representation a Character Image
600
400
22.50
24.00
3.83
4.50
26.33
28.50
1200
400
12.00
12.25
3.92
4.25
15.92
16.50
of
The bounding box (minimum rectangle enclosing the shape) of a character image is first normalized to the size 64 x 64. Wavelet decomposition algorithm 24 is applied to this normalized image recursively for two times to obtain an 16 x 16 smooth.. .smooth approximation of the original image as the final decomposition. Here, we also obtain a 32 x 32 smooth.. .smooth approximation of the original image at an intermediate stage. Images of a sample character at multiple resolutions are shown in Figs.3. The images at these three fine-to-coarse resolutions are gray-valued ones and as before Otsu's thresholding technique is used for their binarization.
(a)
(c)
(b)
id)
Fig. 3. Image of a character sample at different resolutions: (o) The original image; (6) Size normalized (64 x 64) image; (6) Wavelet transformed image at resolution (32 x 32); (d) Wavelet transformed image at resolution (16 x 16).
3.2. Classification Resolutions
at
Multiple
For an input character, images at each of the above three fine-to-coarse resolutions are fed to the input
layers of three different MLP networks. Optimal size of the hidden layer in each case has been determined through near exhaustive simulation runs. The size of the output layer is equal to the number of classes involved in the recognition task and this number varies for different groups of Table 1. These MLPs are trained using the well known backpropagation training algorithm. For the same sample of any group, the values at the output layers of the corresponding three MLP classifiers are usually different and this is the reason that we have combined the output values of these MLPs obtaining better recognition accuracy. 3.3. Combination of MLP Classifiers Multiple Resolutions
at
It is now very popular in the OCR community to consider combination of multiple classifiers for better recognition performance than that of a single classifier. There exist various approaches for such combination of outputs of multiple classifiers. These combination approaches have recently been studied in Ref. 25. Such combination approaches include product rule, sum rule, median rule, majority voting etc. It has been observed that the product rule provides the worst results which are also worse than any of the individual classifiers while the performance of the sum and median rule are good. Also, performance of the majority voting rule is reasonably good. The recognition results shown in the next Section have been obtained by combining the three MLP classifiers using sum rule.
U. Bhattacharya, S. K. Parai and B. Shaw
4. Experimental Results
and there are 9481 samples in the test set.
Most of the existing off-line recognition studies on handwritten characters of Indian scripts are based on different databases collected either in laboratory environment or from smaller groups of the concerned population. However, recently a large and representative database of handwritten Bangla basic characters was developed and the simulation results of the proposed recognition approach have been obtained based on the training and test samples of this database.
Vc^i" **«-* -&\S^ i'S*
^tSv&
sf6^o oo
^5PJ
" * * 0 -zl-sn* %%-s-
s^
Fig. 4. Samples from the present database of handwritten Bangla characters; three samples in each category are shown.
4.1. Database of Handwritten Basic Characters
4.2. Recognition Stage
Results
of the
First
In the first stage of the present hybrid recognition scheme based on an HMM classifier, horizontal and vertical strokes have been extracted as described in Section 2.2. Parameters of the HMM corresponding to a character class are estimated using shape vectors computed from its set of strokes as described in Section 2.3. For example, 7 is the minimum value of K and was obtained for the 42-th character of Bangla alphabet while 18 is the maximum value of K and was obtained for the 24-th character of the alphabet. The first stage of the present scheme provided only 71.58% and 67.35% average recognition accuracies respectively on the training and test sets of the present database. However, from Table 1, it is seen that 23.05% and 25.67% of the misclassifications occurred within 8 smaller groups consisting of 2 to 10 different character classes.
Oof
r \s--w\y
$ £> $ ^^^
105
Bangla
Samples of the present database were collected by distributing several standard application forms among different groups of population of the state of West Bengal in India and the purpose was not disclosed. Subjects were asked to fill-up these forms in Bangla by putting one character per box. Such handwritten samples were collected over a span of more than two years. Sample characters of the present database are stored as grayscale images at 300 d.p.i. resolution. These are TIF files with 1 byte per pixel. A few samples images from the present database are shown in Fig. 4. The present database consists of 24481 images of isolated Bangla basic characters explicitly divided into training and test sets. The training set consists of 15000 samples with 300 samples for each category
4.3. Recognition Stage
Results
of the
Second
Combination of three MLP classifiers within each of the above smaller groups of Bangla characters provided 98.48% and 97.21% average recognition accuracies respectively for the training and test samples. Individual recognition accuracies at the second stage for each of the above groups are shown in Table 2. Thus, at the end of the second stage, the overall final recognition accuracies are respectively 93.19% and 90.42% on the training and test sets of the present database of handwritten Bangla basic characters. 5. Conclusions In the present work, we considered a hybrid approach to recognition of handwritten Bangla basic characters. In this approach, both HMM and MLP classifiers have been used. Shape features are used by the HMM classifier of the first stage while pixel images at multiple resolutions are used by MLP classifiers of the second stage. The approach of the second stage was studied before providing acceptable recognition accuracies in smaller problems such as recognition of
106
A Hybrid Scheme for Recognition of Handwritten Bangla Basic Characters Based on HMM and MLP Classifiers Table 2. Recognition performance of the 2nd stage of the proposed scheme Group No.
Characters belonging to different groups
Samples correctly classified Correct recognition (%) in respective groups within each group during the 1 st stage by the 2nd stage Training
Test
Training
Test
577
382
100
98.95
1153
383
99.57
98.17
1138
665
99.29
98.35
1416
925
99.08
97.95
1734
1153
99.13
98.01
2797
1786
97.57
96.75
2566
1678
98.17
97.26
j*ro jtoJT r*r?r tT5
2813
1847
97.87
95.78
Total
14194
8819
98.48
97.21
3T 31T
^ It T? ? 1 \3 \B 'ST ^ 'S "» »J JJ IT T ¥ *
_
e
^ 7T V *f
a
h a n d w r i t t e n Bangla numerals. However, when it is a larger class problem, the performance of the scheme is not equally satisfactory. This is the reason for our choice of shape features with a n HMM in the first stage and MLP-based multi-resolution recognition
approach in the latter stage. References 1. B. B. Chaudhuri, U. Pal, Pattern Recognition, 31, 531-549 (1998). 2. R. Plamondon, S. N. Srihari, IEEE Trans. Patt. Anal, and Mach. Intell., 22, 63-84 (2000). 3. N. Arica, F. Yarman-Vural, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 31, 216 - 233 (2001) . 4. O. D. Trier, A. K Jain, and T. Taxt, Pattern Recognition, 29, 641 - 662 (1996) . 5. K. R. Ramakrishnan, S. H. Srinivasan, S. Bhagavathy, Proc. of the 5th ICDAR, 414-417 (1999). 6. M. B. Sukhaswami, P. Seetharamulu, A. K. Pujari, Int. J. Neural Syst., 6, 317-357 (1995). 7. R. M. Suresh, L. Ganesan, Proc. of Sixth ICCIMA'05, 286-291 (2005). 8. S. Mohanti, IJPRAI.12, 1007-1015 (1998). 9. U. Bhattacharya, S. K. Parui, M. Sridhar, F. Kimura, CD Proc. IICAI,1357-1376 (2005). 10. T. K. Bhowmik, U. Bhattacharya, S. K. Parui, Proc. ICONIP, 814-819 (2004) . 11. F. R. Rahman, R. Rahman, M. C. Fairhurst, Pattern Recognition, 35, 997-1006 (2002). 12. U. Bhattacharya, B. B. Chaudhuri, Proc. of ICDAR, Seoul, 322-326 (2005).
13. U. Bhattacharya, T. K. Das, A. Datta, S. K. Parui, B. B. Chaudhuri, International Journal for Pattern Recognition and Artificial Intelligence, 16, 845-864 (2002). 14. www.isical.ac.in/~ujjwal/download/database.html, "OFF-LINE HANDWRITTEN CHARACTER DATABASE". 15. J. Kittler, M. Hatef, R. P. W. Duin and J. Matas, IEEE Trans, on Patt. Anal, and Mach. Intell. 20 226-239 (1998). 16. H. Park and S. Lee Pattern Recognition, 29, 231-244 (1996). 17. A. Graps, IEEE Computational Science and Engineering, 2(2), (1995). 18. I. Daubechies, IEEE Trans, on Information Theory. 36, 961-1005 (1990). 19. N. Otsu, IEEE Trans. Systems, Man, and Cybernetics, 9, 377-393 (1979) 20. L. R. Rabiner, Proc. Of the IEEE, 77(2), 257-285 (1989) 21. K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, San Diego, 2nd Ed., 1990. 22. J. Bernardo and A. Smith, Bayesian Theory. John Wiley & Sons, 1994. 23. U. Bhattacharya and B. B. Chaudhuri, Proc. of the 7th Int. Conf. on Document Analysis and Recognition, vol. I, 16-20 (2003). 24. S. G. Mallat, IEEE Trans, on Pattern Anal, and Machine Int., 11(7), 674 -693 (1989). 25. I. K. Ludmila, IEEE Trans. On Patt. Anal, and Mach. Intell., 24, 281-286 (2002).
107
A n Efficient Method for Graphics Segmentation from D o c u m e n t Images
S. Mandal, S. P. Chowdhury and A. K. Das CST Department Bengal Engineering and Science University Howrah - 711 103, India E-mail: {sekhar,shyama,amit} @cs.bees.ac.in B. Chanda Electronics and Communication Sciences Indian Statistical Institute Kolkata - 700 108, India E-mail: [email protected]
Unit
Major constituents of any document are text, graphics and half-tones. While half-tone c a n be characterised by its inherent intensity variation, text and graphics share common characteristics except difference in spatial distribution. The success of document image analysis systems depends on the proper segmentation of t e x t and graphics as text is further subdivided into other classes like, heading, table, math-zones. Segmentation of graphics is essential for better OCR performance and vectorization in computer vision applications. Graphics segmentation from text is particularly difficult in the context of graphics made of small components (dashed or dotted lines etc.) which have many features similar to texts. Here we propose a very efficient technique for segmenting all sorts of graphics from document pages. Keywords: Document Image Analysis (DIA), Graphics Segmentation
1. Introduction Due to its immense potential for commercial applications research in Document Image Analysis (DIA) supports a rapidly growing industry including OCR, vectorization of engineering drawings and Vision systems. Commercial document analysis systems are now available for storing business forms, performing OCR on typewritten/handwritten text, and compressing engineering drawings. Graphics detection is one of the first application areas in the document processing system. However, until today we do not have any efficient method for detecting for all type of graphics appeared in frequently used real life documents. Here, we are mainly focusing on the segmentation of the graphics from a document page, which is already half-tone segmented, and may not be even fully skew corrected. This paper is organised as follows. Section 2 describes past research. Proposed method is detailed in Section 3. Concluding section (Section 4) contains experimental results and remarks. 2. Past work Graphics segmentation is attempted and reported by many researchers. 1-11 Quite a few are in the domain of text graphics separation and in many cases text
strings are separated out thus indirectly segmenting graphics as left-overs. Texture based identification is proposed by 4 exploiting t h e nature of the texture of regular text to be supposedly different from that of graphics by using Gabor filters. We have come across many references where graphics are identified from engineering drawings for the purpose of vectorization. 1 0 - 1 3 However, engineering drawings are special cases of document images containing predominantly graphics portion. In a nutshell we are aware of three different approaches for separating o u t graphics from text; they are; (1) Directional morphological filtering. The technique is applied to locate all linear shapes and are considered to be text; effectively leaving other objects as graphics. This works well for simple maps 14 and may have inherent problems in dealing with more complex scenes. (2) Extraction of lines and arcs etc. Relying on transform 15 or vectorization 16 many had tried t o isolate the graphical objects from text. This approach works well for engineering drawings.
108
An Efficient Method for Graphics Segmentation
from Document
(3) Connected component Analysis. A set of rules (based usually on the spatial characteristics of the components (text and graphics) is used to analyse connected components to filter out the components. The algorithms can handle complex scenario and can be tuned to deal increasingly complex documents. One of the best examples of this class is the work done by Fletcher and Kasturi. 1 Here we elaborate, the well known and frequently referred endeavour by Fletcher and Kasturi. 1 This is a deliberate choice as our approach, in many ways, resembles their approach based on connected component analysis They have done text-string separation from mixed graphics thereby indirectly segmenting the graphics part. On the other hand, we tried to separate graphics from the document containing both text and graphics using simple spatial properties. The major steps of Flecher and Kasturi's algorithm are as follows: 1) Connected component generation: This is done by component labeling process which also computes the maximum and minimum co-ordinates of the bounding rectangles. 2) Filtering using area/ratio: This is used to separate large graphical components. The filtering threshold is determined according to the most populated area and the average area of the connected components. The working set to be used by the next step is now reduced to text strings and small graphical components. 3) Co-linear component grouping: Hough transform is used on the centroids of the rectilinear bounding boxes of each component. This operation consolidates grouping of co-linear components. 4) Logical-grouping: Co-linear components are grouped to words by using information like position of each component, inter character and inter word gap threshold. 5) Text string separation: The words along a text line is segmented from the rest of the image. The algorithm is robust to changes in font size a and style. It is by and large skew independent but sensitive to touching and fragmented characters. The a
assumed that the maximum font size in the page should be less than 5 times of the minimum font size used in the same
Images
page
inherent connectivity assumption weakens with images of degraded/worn out documents due to a large number of touching and fragmented characters. As the aspect ratio in graphics and text vary widely the dependence on area/ratio filter is unwieldy. The algorithm is also computationally expensive like any other Hough transform based approach. Next we present the technique used by us to separate graphics which is based on similar techniques but computationally cheaper and more effective in segmenting line art made of dotted or dashed lines or very short line segments irrespective of their orientation 3. Proposed Method We have started with gray image of the document and using a texture based technique 17 half-tones are removed. Next, it is converted to binary image by the well known technique proposed by Otsu; 18 so we do the graphics segmentation with binary images. Graphics segmentation using connected component analysis yields good results if the lines and arcs forming the graphical component are all connected. Here are a pair of examples shown in Fig. 1. Note that the graphics (single big connected component) is totally removed in both the cases. However graphics made of dotted (or dashed) lines or short line segments are difficult to detect as the size of the individual components are similar to text characters. Therefore, the individual connected component does not signify anything, but a sequence (or group) of such components together represent graphics. The presence of such graphics which are difficult to detect is shown in Fig. 2. Special care is taken in our approach to segment out graphics made of dotted and dashed lines in any orientation. Our approach to detect graphics made of small components is based on grouping of small disconnected components supposedly form a big (spatially) component. In order to segment these cases we have made use of grouping of small components starting from any particular point by observing (1) (2) (3) (4)
adjacency, overlapping, pseudo continuity and stroke width
S. Mandal, S. P. Chowdhury, A. K. Das and B. Chanda
109
the document is known and the text line detection of skew free document is also known there should not be any confusion in identifying two nearby characters belonging to two adjacent text lines. Thus, grouping of lines made of dots or dashes and small arcs is possible as elaborated next. We start with the characteristics of graphics made with dashed or dotted lines; (1) Number of foreground to background transitions is 1 in vertical or horizontal directions. (2) Ratio of height to width is within a range of 0.5 to 2. (3) Ratio of foreground to background pixel within the bounding box encompassing a component is more than 0.5. This rule is exclusive for dots only. (4) Two components would treated as adjacent if their extended (5 times) bounding boxes have spatial overlap.
Fig. 1. Segmentation using connected component analysis; First row shows two pages with graphics made of connected components and the second row shows the result after graphics removal.
Applying the above rules we could group small components forming part of a graphics object. However, we have introduced more checking to rule out the possibilities of grouping small components available in normal text. We form a group of valid components (small) if the count goes more than 5. This is a check against the possibilities of grouping 5 consecutive dots (dots of i and j) together.13. For grouping of solid short arc segments we use the following rules. (1) Number of foreground to background transitions is 1 in vertical or horizontal directions. (2) The ratio of pen width of the two components will have a range of 0.5 to 2.0. (3) Extended skeleton of the components will have spatial overlaps.
(a)
(b)
Fig. 2. Graphics made of small components (a) Dotted lines along with big connected components; (b) Graphics with dotted and dashed lines.
of nearby components. It may be observed that the text characters do possess similar properties but there are subtle differences. For example, two characters are adjacent to each other mostly in horizontal direction and they share overlapping of their extended bounding boxes, again, in horizontal direction. As the orientation of
At this juncture; the terms pen width and extended skeleton need be explained. Pen width is a measure of the stroke width of the arcs forming a graphics. It is computed by drawing four lines (with 0, 45, 90 and 135 degree slopes) from all the points in the skeleton to the object boundary and computing average of the minimum of those radial lines. It is defined as below: Pw = Average of all skeleton points (Minimum of(/°,/90,(/45+/135)/2)) It may be noted that minimum of 3 radial lines are b
Consider the word "divisibility" which has got 5 is'
110
An Efficient Method for Graphics Segmentation from Document
Images
pen width is used to verify that adjacent components belonging to a group if the pen width has a permitted variation. Note that for text portion pen width variation is limited and the same is true in case of graphics made up of short lines and arcs where abrupt changes are unlikely. Extension of the skeleton is done by dividing the skeleton of the component into four parts along the principal axis. So we get three control points (points of divisions) in the skeleton. We make a copy of the lower half and upper half of the skeleton. Take their mirror images and join them from the lower and upper end points (trying to maintain the slope). This is shown in Fig. 3.
/
/ / if
|(a)
fbj
(cl
/ I
(d)
(e)
|
Fig. 3. Expansion of the skeleton taking the mirrored and flipped lower and upper half of the skeleton; a) Original component; (b) Its skeleton; (c) Lower part of the skeleton; (d) upper part of the skeleton and (e) Extended skeleton.
JA^V^-'V.
Fig. 4. Result of graphics Segmentation. Original Images in the left column and segmented images are in the right column.
actually taken as in actual calculation average of the lines in slopes 45 and 135 degrees are considered. The
This effectively extends the skeleton as shown in the figure roughly maintaining the original slope in both the ends. So, in one sense it is a pseudo window creation strategy in which the components, in their extended form, come closer to each other. Thereafter, the adjacency conditions are checked to form a group. Adjacency conditions need be fine tuned to accommodate adjacent components in bends, crossings and corners. Without such a measure, a single curved graphics component may be segmented as multiple one as the component at the corners or bends would be the missing links. To accommodate them we will accept the number of vertical or horizontal transitions be more than 1 but the pen width remains within the range of 0.5 to 2.0. The results of the graphics separation is shown in Fig. 4 for a number of cases.
S. Mandal, S. P. Chowdhury, A. K. Das and B. Chanda 111 4. E x p e r i m e n t a l R e s u l t s We have carried out t h e experiments using around 200 samples taken from t h e U W - I , U W - I I databases as well as our own collection of scanned images from variety of sources e.g., books, reports, magazines and articles. T h e s u m m a r y of t h e results is presented in Table 1 for graphics zones whose ground t r u t h e d information is available in t h e U W - I and II database. Note t h a t t h e g r o u n d - t r u t h information for our collection is done manually. All experiments were carried out in a p4 based high end P C and all t h e programs are written in C a n d t h e average segmentation time excluding t h e half-tone removal is around 3.4 second.
8.
Table 1. Segmentation Performance (in %)
9.
Big Graphics (BG) Cla-
Actual Small Graphics (SG)
3. 4. 5.
6.
7.
10. Other Components (OC)
(BG)
98
0
ssi-
(SG)
0
92
10
fied
(OC)
2
8
88
11. 12.
2
13. 14.
T h e table shows near perfect result for graphics with big connected components. Result for small graphics is also very impressive; however we fail t o cluster some of t h e m as they are dispersed in t h e graphical portion. It m a y be noted t h a t this segmentation algorithm is fully automatic and t h e parameters used work satisfactorily for a wide variety of fonts and styles.
References 1. L. Fletcher and R. Kasturi, IEEE Transaction on Pattern Analysis and Machine Intelligence 10 (6), 910 (1988). 2. O. T. Akindele and A. Belaid, Page segmentation
15. 16.
17.
18.
by segment tracing, in ICDAR93, (Tsukuba, Japan, 1993). K. C. Fan, C. H. Liu and Y. K. Wang, Pattern Recognition Letters Vol. 15, 1201 (1994). A. K. Jain and S. Bhattacharjee, Machine Vision and Application 5, 169 (1992). A. K. Jain and B. Yu, Page segmentation using document model, in Proc. ICDAR'97, Ulm, Germany, August 1997. A. K. Das and B. Chanda, Segmentation of text and graphics from document image: A morphological approach, in International Conf. on Computational linguistics, Speech and Document Processing (ICCLSDP'98); Feb. 18-20,Calcutta, India:, 1998. T. Pavlidis and J. Zhou, Computer Vision Graphics and Image Processing vol. 54, 484 (1992). F. M. Wahl, K. Y. Wong and R. G. Casey, CGIP 20, 375 (1982). W. Liu and D. Dori, Computer Vision and Image Understanding 70(3), 420 (1998). C.-C. Han and K.-C. Fan, Pattern Recognition 27(2), 261 (1994). T. Pavlidis, CVGIP 35, 111 (1986). J. Song, F. Su, C. Tai, J. Chen and S. Cai, Line net global vectorization: An algorithm and its performance evaluation, in Proc. CVPR, Nov. 2000. J. Chiang, S. Tue and Y. Leu, Pattern Recognition 12, 1541 (1998). H. Luo and R. Kasturi, Improved Directional Morphological Operations for separation of Characters from Maps/Graphics, in K. Tombre and A. K. Chhabra, editors, Graphics Recognition - Algorithms and Systems, LNCS, Volume 1389, (SpringerVerlag, 1998), pp. 35-47. A. Kacem, A. Belaid and M. B. Ahmed, IJDAR 4, N o . 2, 97 (2001). D. Dori and L. Wenyin, Vector based Segmentation of Text Connected to Graphicss in Engineering Drawings, in Advances in Structural and Syntactical Pattern Recognition, P. Perner, P. Wang and A. Rosenfeld, editors, volume 1121 of LNCS, (SpringerVerlag, August 1996), pp. 322-331. A. K. Das and B. Chanda, Extraction of half-tones from document images: A morphological approach, in Proc. Int. Conf. on Advances in Computing, Calicut, India, Apr 6-8, 1998. N. Otsu, IEEE Trans. SMC, 9 N o . 1, 62 (1979).
112
Identification of Indian Languages in Romanized Form
Pratibha Yadav*, Girish Mishra and P. K. Saxena Scientific Analysis Group Defence Research and Development Organization Ministry of Defence Metcalfe House, Delhi-110054> India E-mail: [email protected]*
This paper deals with identification of romanized plaintexts of five Indian Languages - Hindi, Bengali, Manipuri, Urdu and Kashmiri. Fuzzy Pattern Recognition technique has been adopted for identification. Suitable features/characteristics are extracted from training samples of each of these five languages and represented suitably through fuzzy sets. Prototypes in the form of fuzzy sets are constructed for each of these five languages. The identification is based on computation of dissimilarity with prototypes of each of these languages. It is computed using a dissimilarity measure extracted through fuzzy relational matrices. The identifier proposed is independent of dictionary of any of these languages and can even identify plaintext without word break-ups. The identification can be used for automatic segregation of plain texts of these languages while analysing intercepted multiplexed interlieved Speech/Data/Fax communication on RF channel, in a computer network or Internet. Keywords: Fuzzy sets; Fuzzy relations; Linguistic charateristics; Fuzzy Pattern Recognition (FPR); Fuzzy distance measures
1. Introduction In digital communications, be it terrestrial or satellite based, many channels are used to carry across multiplexed Speech/Data/Fax, of course with suitable modulation and following required protocols. In case of Networks such as Internet also, where T C P / I P protocol is followed, communications also take place in the form of packets of Speech/Data/Fax. Since English had been a language of common use for many, most of text communication has been in English language. With the emergence of need for Information flow and free exchange of ideas among various communities, application softwares are being developed for languages other than English. Regional languages are emerging as a viable media for both written as well as spoken communication. When it comes to secure communication, languages also provide a natural barrier apart from security through encryption. Thus, while monitoring/intercepting such plain traffic, though the protocol followed helps in segregating texts from voice and fax, yet the problem still remains to segregate text messages into different regional languages without expert domain knowledge. In case, the communication is protected through encryption the problem becomes more complex when one needs decryption first and then go for text/language identification. It is this problem that needs to be addressed. In this paper a solution has been proposed towards
this issue using Fuzzy Pattern Recognition approach. While using regional languages for text communication the most common way would be to romanize such texts using 26 Roman Alphabets and apply the existing computer and communication tools, which are based on English. This romanization can be done either following certain standards or some non-standard natural way based on phonetics. Out of these two, the second one is more common and natural but involves more vagueness and uncertainty, and that is the reason, fuzzy logic was found to be suitable to address the issue of identification of various romanized regional languages. Fuzzy Logic, introduced by Zadeh in 1965, l provides a handy tool to deal with uncertainty (vagueness). 8 Fuzzy Pattern Recognition 2-5 has been one of the main application oriented research that had been pursued by many researchers.2 Most of the work on Language identification 9-11 has been based on dictionary. 7 For the first time Fuzzy Pattern Recognition based techniques were applied for identification of three European languages namely English, German and French even when the work-break-up was not known.6 In this paper, the problem of Language Identification for non-standard romanized plaintexts of five Indian languages - Hindi, Bengali, Manipuri, Urdu and Kashmiri has been tackled using Fuzzy Pattern Recognition (FPR) when the texts are continuous
Pratibha Yadav, Girish Mishra and P. K. Saxena
113
(without word break-up) and no dictionary is available. The problem is quite challenging as all the five languages are quite similar phonetically and moreover romanization is non-standard. A set of 12 feature fuzzy sets has been selected for the purpose of classification and a classification algorithm has been designed based on fuzzy dissimilarity as described in the following sections.
the entries by 23 as the maximum number of different letters contacting a given letter does not exceed 23. For construction of fuzzy sets /is, fi& and /X7 corresponding to characteristics 5, 6 and 7, corresponding scores are taken out of a text length of 10. For constructing fuzzy set for frequency categorization very high, high, medium, low and very low frequent letters, characteristic value /*8 is taken as
2. Features for Classification
/i 8 (x)
For the problem of identification of these five Indian languages, which are phonetic in nature, the linguistic characteristics of these languages are exploited. These characteristics are based on the occurrences of various alphabets and their affinity to combine with some alphabets. After a thorough and careful study of these languages a set of fuzzy features has been selected, based on the following set of linguistic characteristics: 1. Frequencies of alphabets 2. Variety of left contact letters of an alphabet 3. Variety of right contact letters of an alphabet 4. Variety of two-sided contact letters of an alphabet 5. Frequencies of doublets 6. Occurrences of highest diagraph starting with a letter 7. Occurrences of highest diagraph ending with a letter 8. High-medium-low-very low frequency categorization 9. Frequency of the alphabet with which highest diagraph starting with a letter is formed 10. Frequency of the alphabet with which highest diagraph ending with a letter is formed 11. Frequency of the alphabet with which the specified alphabet makes most frequent reversal 12. Frequency of the alphabet with which the specified alphabet makes least frequent reversal Fuzzy sets corresponding to each of these characteristics have been constructed with the set of 26 alphabets as the basic (Universal) set by defining characteristic values in the interval [0,1]. The characteristic values of the fuzzy set m corresponding to the first characteristic are obtained by dividing the frequencies of alphabets by maximum frequency of alphabets. The characteristic values of the fuzzy sets /i2, /i3 and fii corresponding to characteristics 2, 3 and 4 respectively are obtained by normalization of
= = = = =
1.0 0.9 0.7 0.5 0.1
if if if if if
fii(x)>0.7 0.5
Finally for the construction of fuzzy sets fig, /*io, / i n , and /U12 corresponding to characteristics 9,10,11 and 12, normalization is done by dividing the values by maximum frequency to bring down the membership grades in the interval [0,1]. Thus, for each of the five languages considered, such fuzzy sets are constructed by choosing a large number of texts of each of languages (each text with length=400 characters). Finally, for each of these languages standard feature fuzzy sets/prototypes were constructed by taking averages. Thus five sets of prototypes say, H? , A»f, fJ-f1, Mi7 and fif (i = 1 , . . . , 12) are extracted for the languages Hindi, Bengali, Manipuri, Urdu and Kashmiri respectively. After the construction of prototypes for each of these languages, the next problem is to develop a classification criterion so that a given unknown text can be identified and classified to one of these five classes. 3. Classification Criteria For classification of patterns, one has to use some distance measure or similarity measure to decide about the closeness of the unknown pattern with various prototypes. There are various distance measures 5 like Hamming Distance, Euclidean distance, Minkowski's distance etc. which can be used for comparing two feature fuzzy sets /i and v. All of these distance measures were tested but the classification score was not very satisfactory. Hence, in this paper, a new dissimilarity measure defined in 6 ' 11 has been used. It is defined through the following fuzzy relation fin between /i and v.
114
Identification of Indian Languages in Romanized Form
= e-kMx*-''W
pR(xi,xj)
(1)
Here k and I are parameters, which can be fixed imperically or through experimentation. Such a matrix R with entries coming from fuzzy relations of the form (1) is called a fuzzy relational matrix, denoted by /XR. Then the dissimilarity between fuzzy sets \x and v is defined using fuzzy relational matrix HR as df(n, v) = a(26 - TracefiR) + + PJ2T,\lJ'R(xi>xj) *
-f*R{xj,Xi)\
(2)
Table 1. Values of the Dissimilarity Measures with Prototypes of Five Languages. Texts
DH
DB
DM
Du
DK
Text Text Text Text Text
1 2 3 4 5
03.85 03.86 03.79 03.75 03.62
05.91 05.83 05.99 06.52 06.00
05.63 04.83 05.12 05.34 05.01
04.21 04.06 04.00 04.11 03.77
05.23 04.52 04.78 04.98 04.92
Text Text Text Text Text
6 7 8 9 10
04.76 04.04 04.67 04.57 05.43
02.83 03.67 02.88 02.91 02.73
04.68 04.75 05.33 05.11 05.57
04.81 04.23 05.12 04.86 05.69
05.33 04.86 05.51 05.21 06.00
Text Text Text Text Text
11 12 13 14 15
04.45 04.90 04.55 04.49 04.47
05.25 06.05 05.52 05.77 05.42
02.92 03.43 02.99 02.96 03.49
04.70 05.43 04.88 04.64 04.98
04.71 04.82 04.51 04.68 04.77
Text Text Text Text Text
16 17 18 19 20
04.55 04.43 04.18 04.41 04.14
05.83 06.47 05.92 06.02 05.91
05.14 05.24 04.84 05.37 05.16
03.83 03.79 03.14 03.34 03.34
05.27 05.44 05.12 05.15 05.15
Text Text Text Text Text
21 22 23 24 25
04.57 04.07 04.63 04.64 04.70
05.80 05.94 06.21 06.44 06.11
04.87 04.72 04.99 05.03 04.90
04.90 04.63 05.19 05.07 05.01
03.35 02.91 03.28 05.23 03.35
3
After lots of experimentation and learning, the parameters k and I were chosen to be 1.0 and 2.0 respectively. The values of a and (3 were fixed as 2.0 and 0.25 respectively. For any given unknown text 12 feature fuzzy sets / x i , . . . , /ii2 are calculated. For calculating the association of this unknown pattern with a class say Hindi, the following process is followed. For each i, /i, is compared with (if1 and dissimilarity measure df is calculated using (1) and (2) as
d? = dI{&,fl?)
(3)
The final dissimilarity value DH of the unknown text with the Hindi language is calculated as follows: 12
DH = 5 > 4 d ?
(4)
t=i
where tUj's are weights. These weights are chosen according to the significance of the characteristics. In our case the weights were selected as W\
Dissimilarity values of twenty-five test samples with each of the five classes have been reflected in Table 1. Out of these, text 1 to text 5 are Hindi texts, text
W2
Ws
W4
W5
0.140
0.135
0.135
0.135
0.059
W-j
W%
Wg
Wio
Wn
WQ
0.100 W\2
0.100 0.056 0.035 0.035 0.035 0.035 Similarly, other dissimilarity measures DB, DM, Du, and DK (with Bengali, Manipuri, Urdu and Kashmiri resp.) were computed. The unknown text is classified to the class with which the value of the dissimilarity measure is minimum. 4. Results The Algorithm has been tested on a number of texts from each of the five languages. Text length has been taken as 400 characters (chars). The program has been tested on PC and takes few seconds of CPU.
6 to text 10 are Bengali texts, text 11 to text 15 are Manipuri texts, text 16 to text 20 are Urdu texts and text 21 to text 25 are Kashmiri texts. Table 2 gives a summary of a very large number of experiments and tests reflecting the over all success of the Identifier to be almost 100% (depicted through barchart in Fig. 1). After trying the algorithm with text length 400, efforts were made to find the optimal length required for the algorithm so that a good success rate is achieved. For this purpose the algorithm has been tried with text length 200 and 150 chars also. The summary of results with text length 200 and 150 chars has been shown in Tables 3 and 4 respectively (depicted through bar-charts in Fig. 2 and Fig. 4 respectively). Even in these cases success rate achieved is very good (88% to 98%).
Pratibha Yadav, Girish Mishra and P. K. Saxena
identifier developed is working well with a very high success rate (above 99%) for plaintexts of these five languages for text length of 400 chars. Success r a t e is above 95% for text length of 200 chars and is slightly less (above 88%) for text length of 150 chars. These are depicted through bar-chart in the following Figure:
Text Length - 400 chars
Hnd
Bengali Maripuri
Urdu
Kashmiri
Hindi
Fig. 1. % Success for Language Identification (for texts with 400 chars).
Urdu
Urdu
Kashmiri
Kashmiri
Text Length -150 chars
Hnd
• # C orrectly C lassified
Fig. 2. % Success for Language Identification (for texts with 200 chars). 5.
Mampun
Fig. 3. Comparison of success(%) for different text lengths considered.
Text Length - 200 chars
Bengali Manipuri
Bangla
• 400 chars • 200 chars a 150 chars
• # Correctly Classified
Hnd
115
Bengali Maripuri
Urdu
Kashmiri
• # Correctly Classified Fig. 4. % Success for Language Identification (for texts with 150 chars).
Conclusions
A five-class classification problem has been addressed using F P R . T h e advantage of the approach is t h a t the prototypes of each of the five languages are constructed once and then any single given unknown text is identified if it belongs to any of these five languages. It is independent of dictionary and processes texts even when word-break-up is not known. T h e
References 1. L. A. Zadeh; Fuzzy Sets. Inform, and control 8, pp338-353, 1965. 2. Bezdek C. J. and Pal, S. K.; Fuzzy models for Pattern Recognition, IEEE Press, 1992. 3. C. J. Bezdek; Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum, New York, 1981.
116 Identification of Indian Languages in Romanized Form Table 2. Success rate of Language Identification (Text length = 400 chars) Language
#Texts
Hindi
Bengali
Manipuri
Urdu
Kashmiri
Success Rate (%)
Hindi Bengali Manipuri Urdu Kashmiri
8139 7502 7584 976 396
8102 22 0 1 0
10 7457 13 0 0
0 2 7570 0 1
27 21 1 975 0
0 0 0 0 395
99.54 99.40 99.82 99.90 99.75
Table 3. Success rate of Language Identification (Text length = 200 chars) Language
#Texts
Hindi
Bengali
Manipuri
Urdu
Kashmiri
Success Rate (%)
Hindi Bengali Manipuri Urdu Kashmiri
16278 15004 15167 1952 791
15449 398 3 25 8
108 14402 107 42 0
26 32 15049 12 0
769 183 11 1872 0
20 1 0 0 783
94.91 95.99 99.22 95.99 98.99
Table 4. Success rate of Language Identification (Text length = 150 chars) Language
#Texts
Hindi
Bengali
Manipuri
Urdu
Kashmiri
Success Rate (%)
Hindi Bengali Manipuri Urdu Kashmiri
21704 20006 20223 2602 1054
19279 733 49 74 10
288 18715 197 63 5
105 97 19906 35 1
2183 496 78 2429 0
95 9 0 0 1038
88.83 93.55 98.43 93.35 98.48
4. H. J. Zimmermann; Fuzzy Set Theory and its Applications, 4th Ed., Kluwer Academic Publisher, Bonston/ Dordrecht/London, 2001. 5. G. J. Klir and B. Yuan; Fuzzy Sets and Fuzzy Logic: Theory and Applications, Prentice Hall, 1997. 6. P. K. Saxena and Uma Gupta; Fuzzy Language Identifier, Proceedings of the National Seminar on Cryptology (NSCR), Delhi, pp. D-ll to D-20 , 1998. 7. Kenneth R. Beesley; Language Identifier: A Computer Program for Automatic Natural-Language Identification of On-line Text, Language of Crossroads: Proceedings of the 29th Annual Conference of the American Translators Association, pp. 47-54, 1216 Oct,1988. 8. Saxena P. K., Yadav P.and Sarvjeet K.; Fuzzy Sets in
Cryptology, Proceedings INFOSEC 1994, Bangalore, pp. 13 to 33 , 29-30 Oct,1994. Navneet Gaba, Sarvjeet Kaur and P. K. Saxena; Identification of Encryption Schemes for Romanized Indian Languages, Proceedings of ICAPR 2003, pp. 164 to 168 , 2003. 10. Verma N., Khan S. S. and Shrikant; Statistical Feature Extraction to Discriminate Various Languages: Plain and Crypt, Proceedings of National Conference on Information Security (NCIS), pp. 1 to 10 , Jan, 2003. 11. Yadav P. and Saxena P. K.; Identification of Regional Languages - A Fuzzy Theoretic Approach, Proceedings of NCIS, New Delhi pp. 11 to 19 , Jan, 2003.
117
Online Bangla Handwriting Recognition S y s t e m
K. Roy Dept. of Computer Science West Bengal University of Technology BF 142, Saltlake, Kolkata-700 064, India N. Sharma, T. Pal and U. Pal Computer
Vision and Pattern Recognition Indian Statistical Institute Kolkata- 700 108, India
Unit
Handwriting recognition is a difficult task because of the variability involved in the writing styles of different individuals. This paper presents a scheme for the online handwriting recognition of Bangla script. Online handwriting recognition refers to the problem of interpretation of handwriting input captured as a stream of pen positions using a digitizer or other pen position sensor. The sequential and dynamical information obtained from the pen movements on the writing pads are used as features in our proposed scheme. These features are then fed to the quadratic classifier for recognition. We tested our system on 2500 Bangla numeral data and 12500 Bangla character d a t a and obtained 98.42% accuracy on numeral data and 91.13% accuracy on character d a t a from the proposed system. Keywords: Online Recognition, Indian Script, Bangla, Modified quadratic discriminant function
1. Introduction Data entry using pen-based devices is gaining popularity in recent times. This is so because machines are getting smaller in size and keyboards are becoming more difficult to use in these smaller devices. Also, data entry for scripts having large alphabet size is difficult using keyboard. Moreover, there is an attempt to mimic the pen and paper metaphor by automatic processing of online characters. However, wide variation of human writing style makes online handwriting recognition a challenging pattern recognition problem. Work on online character recognition started gaining momentum about forty years ago. Numerous approaches have been proposed in the literature [1-5] and the existing approaches can be grouped into three classes namely, (i) structural analysis methods where each character is classified by its stroke structures, (ii) statistical approaches where various features extracted from character strokes are matched against a set of templates using statistical tools and (iii) motor function models that explicitly use trajectory information where the time evaluation of the pen co-ordinates plays an important role. Many techniques are available for online recognition of English,Arabic, Japanese and Chinese [1-4, 6-9] characters but there are only a few pieces
of work [10-13] available towards Indian characters although India is a multi-lingual and multi-script country. Connell et al. [11] presented a preliminary study on online Devnagari character recognition. Joshi et al. [12] also proposed a work on Devnagari online character recognition. Later Joshi et al. [13] proposed an elastic matching based scheme for online recognition of Tamil character recognition. Although there are some work towards online recognition of Devnagari and Tamil scripts but the online recognition work towards other Indian languages is very few. In this paper we propose a system for the online recognition of Bangla character. Recognition of Indian character is very difficult with compare to English because of its shape variability of the characters as well as larger number of character classes. See Figure 1 where samples of four Bangla characters are shown to get an idea of handwriting variability. There are twelve scripts in India and in most of these scripts the number of alphabets (basic and compound characters) is more than 250, which makes keyboard design and subsequent data entry a difficult job. Hence, online recognition of such scripts has a commercial demand. Although a number of studies [14-16] have been done for offline recognition of a few printed Indian scripts like Devnagari, Bangla, Gurumukhi, Oriya, etc. with
118
Online Bangla Handwriting
Recognition
System
commercial level accuracy, but to the best of our knowledge no system is commercially available for online recognition of any Indian script. In this paper we propose a scheme for online Bangla handwritten character recognition and the scheme is robust against stroke connections as well as shape variation while maintaining reasonable robustness against stroke order-variations. A quadratic classifier is used here for recognition purpose. The rest of the paper is organized as follows. In Section 2 we discuss about the Bangla language and data collection. Feature extraction process is presented in Section 3. Section 4 details the classifier used for recognition. The experimental results are discussed in Section 5. Finally, conclusion of the paper is given in Section 6.
v£ •ssr <&
•5*
vC 3? © a "*l ^
%
a
•
»
-zX " < ^ 0 3> &i • ^ *t
a
Fig. 1. Examples of some Bangla online characters. First three columns show samples of handwritten characters and last column shows samples of a numeral.
2. Bangla Language and online data collection Bangla, the second most popular language in India and the fifth most popular language in the world, is an ancient Indo-Aryans language. About 200 million people in the eastern part of Indian subcontinent speak in this language. Bangla script alphabets are used in texts of Bangla, Assamese and Manipuri languages. Also, Bangla is the national language of Bangladesh. The alphabet of the modern Bangla script consists of 11 vowels and 40 consonants. These characters are called as basic characters. Writing style in Bangla is from left to right and the concept
of upper/lower case is absent in this script. It can be seen that most of the characters of Bangla have a horizontal line (Matra) at the upper part. From a statistical analysis we notice that the probability that a Bangla word will have horizontal line is 0.994 [14]. In Bangla script a vowel following a consonant takes a modified shape. Depending on the vowel, its modified shape is placed at the left, right, both left and right, or bottom of the consonant. These modified shapes are called modified characters. A consonant or a vowel following a consonant sometimes takes a compound orthographic shape, which we call as compound character. Compound characters can be combinations of two consonants as well as a consonant and a vowel. Compounding of three or four characters also exists in Bangla. There are about 280 compound characters in Bangla [15]. In this paper we consider the recognition of Bangla basic characters. To get an idea of Bangla basic characters and their variability in handwriting, a set of handwritten Bangla basic characters are shown in Figure 2. Main difficulty of Bangla character recognition is shape similarity, stroke size and the order variation of different strokes. By stroke we mean the set of points obtained between a pen down and pen up. From the statistical analysis on our dataset we found that the minimum (maximum) number of stroke used to write a Bangla character is 1 (4). The average number of stroke per character is 2.2. We have also seen that the characters W , '^Q' and '*>' are mostly written by single stroke whereas the character 'w' is written by almost all writer by 4 strokes. It also found that the characters ' ^ , ' & , ' ^ ' , '^', '?', 'S',^' etc. are always written by 2 strokes. Online recognition of these characters imposes several problems like stroke-number, stroke connection and shape variations. Most characters are composed of multiple strokes. Another difficult problem involves in the stroke-order variations. Writing of different stroke sequences of a character is not similar for all persons. During handwriting, some people give upper stroke of a character before giving a lower stroke of that character. Whereas to write the same character, some people give lower stroke before giving the upper stroke of that character. These stoke order variations complicates the development procedure of an online recognition system.
K. Roy, N. Sharma,
\3f
vxi
*-£^>
© • * sj»3>2>£ *5» *4T ST T ^^C *
v!r
V
"3T
ar
-*\
^ ~